WO2017141173A2 - Compositions and methods for modifying genomes - Google Patents

Compositions and methods for modifying genomes Download PDF

Info

Publication number
WO2017141173A2
WO2017141173A2 PCT/IB2017/050845 IB2017050845W WO2017141173A2 WO 2017141173 A2 WO2017141173 A2 WO 2017141173A2 IB 2017050845 W IB2017050845 W IB 2017050845W WO 2017141173 A2 WO2017141173 A2 WO 2017141173A2
Authority
WO
WIPO (PCT)
Prior art keywords
cpfl
csml
sequence
dna
polypeptide
Prior art date
Application number
PCT/IB2017/050845
Other languages
French (fr)
Other versions
WO2017141173A3 (en
Inventor
Matthew Begemann
Benjamin Neil GRAY
Original Assignee
Benson Hill Biosystems, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to IL308791A priority Critical patent/IL308791A/en
Priority to KR1020237040585A priority patent/KR20230165368A/en
Application filed by Benson Hill Biosystems, Inc. filed Critical Benson Hill Biosystems, Inc.
Priority to ES17707411T priority patent/ES2973207T3/en
Priority to CN201780014661.1A priority patent/CN109312316B/en
Priority to CN202211143483.1A priority patent/CN115927440A/en
Priority to BR112018016408A priority patent/BR112018016408A2/en
Priority to MX2018009761A priority patent/MX2018009761A/en
Priority to MYPI2018001434A priority patent/MY197523A/en
Priority to KR1020187023481A priority patent/KR20180107155A/en
Priority to EP21212642.9A priority patent/EP4063501A1/en
Priority to AU2017220789A priority patent/AU2017220789B2/en
Priority to EP23207984.8A priority patent/EP4306642A3/en
Priority to EP17707411.9A priority patent/EP3307884B1/en
Priority to JP2018561102A priority patent/JP2019504649A/en
Priority to CA3014988A priority patent/CA3014988A1/en
Publication of WO2017141173A2 publication Critical patent/WO2017141173A2/en
Publication of WO2017141173A3 publication Critical patent/WO2017141173A3/en
Priority to IL261082A priority patent/IL261082A/en
Priority to PH12018501722A priority patent/PH12018501722A1/en
Priority to JP2022142420A priority patent/JP2022184892A/en
Priority to IL304398A priority patent/IL304398A/en
Priority to AU2023226754A priority patent/AU2023226754A1/en
Priority to JP2023199358A priority patent/JP2024028753A/en
Priority to AU2023270322A priority patent/AU2023270322A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/82Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
    • C12N15/8201Methods for introducing genetic material into plant cells, e.g. DNA, RNA, stable or transient incorporation, tissue culture methods adapted for transformation
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/82Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
    • C12N15/8241Phenotypically and genetically modified plants via recombinant DNA technology
    • C12N15/8261Phenotypically and genetically modified plants via recombinant DNA technology with agronomic (input) traits, e.g. crop yield
    • C12N15/8271Phenotypically and genetically modified plants via recombinant DNA technology with agronomic (input) traits, e.g. crop yield for stress resistance, e.g. heavy metal resistance
    • C12N15/8274Phenotypically and genetically modified plants via recombinant DNA technology with agronomic (input) traits, e.g. crop yield for stress resistance, e.g. heavy metal resistance for herbicide resistance
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/102Mutagenizing nucleic acids
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/113Non-coding nucleic acids modulating the expression of genes, e.g. antisense oligonucleotides; Antisense DNA or RNA; Triplex- forming oligonucleotides; Catalytic nucleic acids, e.g. ribozymes; Nucleic acids used in co-suppression or gene silencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/62DNA sequences coding for fusion proteins
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/82Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
    • C12N15/8201Methods for introducing genetic material into plant cells, e.g. DNA, RNA, stable or transient incorporation, tissue culture methods adapted for transformation
    • C12N15/8202Methods for introducing genetic material into plant cells, e.g. DNA, RNA, stable or transient incorporation, tissue culture methods adapted for transformation by biological means, e.g. cell mediated or natural vector
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/82Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
    • C12N15/8201Methods for introducing genetic material into plant cells, e.g. DNA, RNA, stable or transient incorporation, tissue culture methods adapted for transformation
    • C12N15/8213Targeted insertion of genes into the plant genome by homologous recombination
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/16Hydrolases (3) acting on ester bonds (3.1)
    • C12N9/22Ribonucleases RNAses, DNAses
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2310/00Structure or type of the nucleic acid
    • C12N2310/10Type of nucleic acid
    • C12N2310/20Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPRs]
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2800/00Nucleic acids vectors
    • C12N2800/22Vectors comprising a coding region that has been codon optimised for expression in a respective host

Definitions

  • compositions comprise DNA constructs comprising nucleotide sequences that encode a Cpfl or Csml protein operably linked to a promoter that is operable in the cells of interest.
  • the DNA constructs can be used to direct the modification of genomic DNA at pre-determined genomic loci. Methods to use these DNA constructs to modify genomic DNA sequences are described herein. Modified plants, plant cells, plant parts and seeds are also encompassed. Compositions and methods for modulating the expression of genes are also provided.
  • compositions comprise DNA constructs comprising nucleotide sequences that encode a modified Cpfl or Csml protein with diminished or abolished nuclease activity, optionally fused to a transcriptional activation or repression domain. Methods to use these DNA constructs to modify gene expression are described herein.
  • Figure 2 shows sequence data obtained from rice calli generated during Experiment 1.
  • Figure 2A shows the results of an insertion of an hph cassette at the CAOl locus.
  • the PAM sequence is boxed and the sequence targeted by the guide RNA is underlined.
  • the ellipsis indicates that a large insertion existed, but the full sequence data is not shown here.
  • Figures 2B, 2C, and 2D show data obtained from rice calli in which an FnCpfl -mediated deletion event occurred in Experiment 01 (Table 7).
  • the lanes depict callus pieces #1-16, from left to right, followed by a molecular weight ladder lane.
  • Figure 5 shows the sequence of the upstream region of callus piece #46-161 from Experiment 46 (Table 7).
  • the PAM site is boxed, showing the expected mutation of this site in the transformed rice callus, and the sequence data indicates successful insertion of the vector 131633 insert at the rice CAOl genomic locus.
  • Cpfl or Csml polypeptide is universal and can be used with different guide RNAs to target different genomic sequences.
  • Cpfl and Csml endonucleases have certain advantages over the Cas nucleases (e.g., Cas9) traditionally used with CRISPR arrays.
  • Cas9 a short protospacer-adjacent motif
  • Cpfl-crRNA complexes can cleave target DNA preceded by a short protospacer-adjacent motif (PAM) that is often T-rich, in contrast to the G-rich PAM following the target DNA for many Cas9 systems.
  • PAM protospacer-adjacent motif
  • the Cpfl or Csml polypeptide disclosed herein can further comprise at least one plastid targeting signal peptide, at least one mitochondrial targeting signal peptide, or a signal peptide targeting the Cpfl or Csml polypeptide to both plastids and mitochondria.
  • Plastid, mitochondrial, and dual-targeting signal peptide localization signals are known in the art (see, e.g., Nassoury and Morse (2005) Biochim Biophys Acta 1743:5-19; Kunze and Berger (2015) Front Physiol
  • a DNA-targeting RNA, or a DNA polynucleotide encoding a DNA-targeting RNA can comprise: a first segment comprising a nucleotide sequence that is complementary to a sequence in the target DNA, and a second segment that interacts with a Cpfl or Csml polypeptide.
  • Codon optimization is when one or more codons are altered at the nucleic acid level such that the amino acids are not changed but expression in a particular host organism is increased.
  • codon tables and other references providing preference information for a wide range of organisms are available in the art (see, e.g., Zhang et al. (1991) Gene 105:61-72; Murray et al. (1989) Nucl. Acids Res. 17:477-508).
  • Methodology for optimizing a nucleotide sequence for expression in a plant is provided, for example, in U.S. Pat. No. 6,015,891, and the references cited therein.
  • Example of codon optimized polynucleotides for expression in a plant are set forth in: SEQ ID NOs: 5, 8, 11, 14, 17, 19, 22, 25, and 174-206.
  • the Cpfl or Csml polypeptide of the fusion protein can be derived from a wild type Cpfl or Csml protein.
  • the Cpfl -derived or Csml -derived protein can be a modified variant or a fragment.
  • the Cpfl or Csml polypeptide can be modified to contain a nuclease domain (e.g. a RuvC-like domain) with reduced or eliminated nuclease activity.
  • the Cpfl -derived or Csml -derived polypeptide can be modified such that the nuclease domain is deleted or mutated such that it is no longer functional (i.e., the nuclease activity is absent).
  • the effector domain of the fusion protein can be an epigenetic modification domain.
  • epigenetic modification domains alter histone structure and/or chromosomal structure without altering the DNA sequence. Changes in histone and/or chromatin structure can lead to changes in gene expression. Examples of epigenetic modification include, without limit, acetylation or methylation of lysine residues in histone proteins, and methylation of cytosine residues in DNA.
  • Non-limiting examples of suitable epigenetic modification domains include histone acetyltansferase domains, histone deacetylase domains, histone methyltransferase domains, histone demethylase domains, DNA methyltransferase domains, and DNA demethylase domains.
  • the effector domain of the fusion protein can be a transcriptional activation domain.
  • a transcriptional activation domain interacts with transcriptional control elements and/or transcriptional regulatory proteins (i.e., transcription factors, RNA polymerases, etc.) to increase and/or activate transcription of one or more genes.
  • transcriptional control elements and/or transcriptional regulatory proteins i.e., transcription factors, RNA polymerases, etc.
  • transcriptional regulatory proteins i.e., transcription factors, RNA polymerases, etc.
  • the fusion protein further comprises at least one additional domain.
  • GRMZM2G 138727 Zea mays CLAVATA 1 , Zea mays MRP 1 , Oryza sativa PR602, Oryza sativa
  • phosphotransferase II showed similar characteristics. Additional root-preferred promoters include the VfENOD-GRP3 gene promoter (Kuster et al. (1995) Plant Mol. Biol. 29(4):759-772); and roIB promoter (Capana et al. (1994) Plant Mol. Biol. 25(4):681-691. See also U.S. Pat. Nos. 5,837,876; 5,750,386; 5,633,363; 5,459,252; 5,401,836; 5,110,732; and 5,023,179. The phaseolin gene (Murai et al. (1983) Science 23:476-482 and Sengopta-Gopalen et al.
  • the expression vector comprising the sequence encoding the Cpfl or Csml polypeptide or fusion protein can further comprise a sequence encoding a guide RNA.
  • the sequence encoding the guide RNA can be operably linked to at least one transcriptional control sequence for expression of the guide RNA in the plant or plant cell of interest.
  • DNA encoding the guide RNA can be operably linked to a promoter sequence that is recognized by RNA polymerase III (Pol III).
  • RNA polymerase III RNA polymerase III
  • suitable Pol III promoters include, but are not limited to, mammalian U6, U3, HI, and 7SL RNA promoters and rice U6 and U3 promoters. IV.
  • the methods comprise introducing into a plant cell, organelle, or embryo, a DNA-targeting RNA or a DNA polynucleotide encoding a DNA-targeting RNA, wherein the DNA- targeting RNA comprises: (a) a first segment comprising a nucleotide sequence that is complementary to a sequence in the target DNA; and (b) a second segment that interacts with a Cpf 1 or Csml polypeptide and also introducing to the plant cell a Cpfl or Csml polypeptide, or a polynucleotide encoding a Cpfl or Csml polypeptide, wherein the a Cpfl or Csml polypeptide comprises: (a) an RNA-binding portion that interacts with the DNA-targeting RNA; and (b) an activity portion that exhibits site-directed enzymatic activity.
  • Sterility genes can also be modified and provide an alternative to physical detasseling.
  • the methods disclosed herein further encompass modification of a nucleotide sequence or regulating expression of a nucleotide sequence in a plant cell, plant organelle, or plant embryo.
  • the methods can comprise introducing into the plant cell or plant embryo at least one fusion protein or nucleic acid encoding at least one fusion protein, wherein the fusion protein comprises a Cpfl or Csml polypeptide or a fragment or variant thereof and an effector domain, and (b) at least one guide RNA or DNA encoding the guide RNA, wherein the guide RNA guides the Cpfl or Csml polypeptide of the fusion protein to a target site in the chromosomal sequence and the effector domain of the fusion protein modifies the chromosomal sequence or regulates expression of the chromosomal sequence.
  • the method can comprise introducing one Cpfl or Csml polypeptide (or encoding nucleic acid) and at least one guide RNA (or encoding DNA) into a non-plant eukaryotic cell or organelle wherein the Cpfl or Csml polypeptide introduces more than one double-stranded break (i.e., two, three, or more than three double-stranded breaks) in the target nucleotide sequence of the nuclear or organellar chromosomal DNA.
  • the double-stranded break in the nucleotide sequence can be repaired by a non-homologous end-joining (NHEJ) repair process.
  • NHEJ non-homologous end-joining
  • the double- stranded breaks caused by the action of the Cpfl or Csml nuclease or nucleases are repaired in such a way that DNA is deleted from the chromosome of the non- plant eukaryotic cell or organelle.
  • one base, a few bases (i.e., 2, 3, 4, 5, 6, 7, 8, 9, or 10 bases), or a large section of DNA (i.e., more than 10, more than 50, more than 100, or more than 500 bases) is deleted from the chromosome of the non-plant eukaryotic cell or organelle.
  • a eukaryotic cell comprising mutations in its nuclear and/or organellar chromosomal DNA caused by the action of a Cpfl or Csml nuclease or nucleases is cultured to produce a eukaryotic organism.
  • a eukaryotic cell in which gene expression is modulated as a result of one or more Cpfl or Csml nucleases, or one or more variant Cpfl or Csml nucleases is cultured to produce a eukaryotic organism.
  • Methods for culturing non-plant eukaryotic cells to produce eukaryotic organisms are known in the art, for instance in U.S. Patent Applications 2016/0208243 and 2016/0138008, herein incorporated by reference.
  • the methods comprise introducing into a target cell a DNA-targeting RNA or a DNA polynucleotide encoding a DNA-targeting RNA, wherein the DNA-targeting RNA comprises: (a) a first segment comprising a nucleotide sequence that is complementary to a sequence in the target DNA; and (b) a second segment that interacts with a Cpfl or Csml polypeptide and also introducing to the target cell a Cpfl or Csml polypeptide, or a polynucleotide encoding a Cpfl or Csml polypeptide, wherein the Cpfl or Csml polypeptide comprises: (a) an RNA-binding portion that interacts with the DNA-targeting RNA; and (b) an activity portion that exhibits site-directed enzymatic activity.
  • a nucleic acid molecule comprising a polynucleotide sequence encoding a Cpfl or Csml polypeptide, wherein said polynucleotide sequence has been codon optimized for expression in a plant cell.
  • a nucleic acid molecule comprising a polynucleotide sequence encoding a Cpfl or Csml polypeptide, wherein said polynucleotide sequence has been codon optimized for expression in a prokaryotic cell, wherein said prokaryotic cell is not the natural host of said Cpfl or Csml polypeptide.
  • nucleic acid molecule of any one of embodiments 39-41 wherein said polynucleotide sequence is selected from the group consisting of: SEQ ID NOs: 4, 5, 7, 8, 10, 11, 13, 14, 16, 17, 19, 21, 22, 24, 25, and 174-206 or a fragment or variant thereof, or wherein said polynucleotide sequence encodes a Cpfl or Csml polypeptide selected from the group consisting of SEQ ID NOs: 3, 6, 9, 12, 15, 18, 20, 23, 106-173, and 230-236, and wherein said polynucleotide sequence encoding a Cpfl or Csml polypeptide is operably linked to a promoter that is heterologous to the polynucleotide sequence encoding a Cpfl or Csml polypeptide.
  • a plant cell, eukaryotic cell, or prokaryotic cell comprising the nucleic acid molecule of any one of embodiments 39-55.
  • a plant cell, eukaryotic cell, or prokaryotic cell comprising the fusion protein or polypeptide of any one of embodiments 56-59.
  • modified nucleotide sequence comprises insertion of a polynucleotide that encodes a protein conferring antibiotic or herbicide tolerance to transformed cells.
  • RNAs targeted to a region of DNA spanning the junction between the promoter and the 5' end of the GFP coding region were synthesized by Integrated DNA Technologies (Coralville, IA) as complete cassettes.
  • Each cassette included a rice U3 promoter (SEQ ID NO:42) operationally linked to the appropriate gRNA (SEQ ID NOs:47-53) that was operationally linked to the rice U3 terminator (SEQ ID NO:44). While each gRNA was targeted to the same region of the promoter and GFP gene, each gRNA was designed to ensure that it included the appropriate scaffold to interact correctly with its respective Cpfl enzyme.
  • gRNAs Guide RNAs
  • Cassettes containing the gRNA(s) of interest, operably linked to promoter(s) operable in plant cells, and containing the gene(s) encoding Cpfl fusion protein(s) fused to activation and/or repression domain(s), are cloned into a vector suitable for plant transformation. This vector is transformed into a plant cell, resulting in production of the gRNA(s) and the Cpfl fusion protein(s) in the plant cell.
  • the fusion protein containing the deactivated Cpfl protein and the activator or repressor domain effects a modulation of the expression of nearby genes in the plant genome.
  • a first gRNA is designed to anneal with a first desired site in the genome of a plant of interest and to allow for interaction with one or more Cpfl or Csml proteins.
  • a second gRNA is designed to anneal with a second desired site in the genome of a plant of interest and to allow for interaction with one or more Cpfl or Csml proteins.
  • Each of these gRNAs is operably linked to a promoter that is operable in a plant cell and is subsequently cloned into a vector that is suitable for plant
  • a gRNA is designed to anneal with a desired site in the genome of a plant of interest and to allow for interaction with one or more Cpfl or Csml proteins.
  • the gRNA is operably linked to a promoter that is operable in a plant cell and is subsequently cloned into a vector that is suitable for plant transformation.
  • One or more genes encoding a Cpfl or Csml protein is cloned in a vector such that they are operably linked to a promoter that is operable in a plant cell (the "cpfl cassette” or "csml cassette”).
  • the cpfl cassette or csml cassette and the gRNA cassette are both cloned into a single plant transformation vector that is subsequently transformed into Agrobacterium cells.
  • Table 3 Summary of cpfl and csml vectors used for biolistic experiments 131272 (SEQ 2X35S (SEQ ID Francisella tularensis (SEQ ID N0:5) 35S polyA(SEQID NO:54) ID NO:81) NO:43)
  • the macro-carriers containing the DNA-coated gold particles were assembled into a macro-carrier holder.
  • the rupture disk (1, 100 psi), stopping screen, and macro- carrier holder were assembled according to the manufacturer's instructions.
  • the plate containing the rice callus to be bombarded was placed 6 cm beneath the stopping screen and the callus pieces were bombarded after the vacuum chamber reached 25-28 in. Hg.
  • the callus was left on osmotic medium for 16-20 hours, then the callus pieces were transferred to selection medium (CIM supplemented with 50 mg/L hygromycin and 100 mg/L timentin). The plates were transferred to an incubator and held at 28°C in the dark to begin the recovery of transformed cells. Every two weeks, the callus was sub-cultured onto fresh selection medium. Hygromycin-resistant callus pieces began to appear after approximately five to six weeks on selection medium. Individual hygromycin-resistant callus pieces were transferred to new selection plates to allow the cells to divide and grow to produce sufficient tissue to be sampled for molecular analysis. Table 7 summarizes the combinations of DNA vectors that were used for these rice bombardment experiments. Table 7: Summary of rice particle bombardment experiments for hygromycin resistance gene insertion at CAO 1 locus
  • CAOl genomic locus mediated by the Lachnospiraceae bacterium ND2006 Cpfl enzyme (SEQ ID NO: 18, encoded by SEQ ID NO: 19).
  • PCR analysis of the region of the intended insertion site at the CAO 1 locus resulted in amplification of a band that is diagnostic of an insertion in callus piece #46- 161.
  • This genomic region was subjected to sequence analysis to confirm the presence of the intended DNA insertion at the rice CAOl locus.
  • Figure 5 shows the results of this sequence analysis, with the expected insertion from the 131633 vector present in the rice DNA at the expected site.
  • the mutated PAM site (TTTC>TAGC) present in the 131633 vector was also detected in the rice DNA from callus piece #46-161, further supporting HDR-mediated insertion of the 131633 vector insert at the rice CAOl locus as mediated by the site-specific DSB induction by the Lachnospiraceae bacterium ND2006 Cpfl enzyme.
  • DNA was extracted from sixteen hygromycin-resistant callus pieces produced in Experiment 01 (Table 7) and PCR was performed using primers with the sequences of SEQ ID NOs: 100 and 101 to test for the presence of the cpf 1 cassette. This PCR reaction showed that DNA extracted from callus pieces numbered 1, 2, 4, 6, 7, and 15 produced the expected 853 base pair amplicon consistent with insertion of the cpf 1 cassette in the rice genome ( Figure 2B). PCR was also performed with DNA extracted from these hygromycin-resistant rice callus pieces using primers with the sequences of SEQ ID NOs:98 and 99 to amplify a region of the rice CAOl genomic locus that was targeted by the gRNA in vector 131608.
  • Experiment 01 (Table 7) was repeated with additional pieces of rice callus to confirm the reproducibility of the results obtained initially.
  • the repeat of Experiment 01 resulted in the
  • Experiments 31 and 46 tested the ability of LbCpfl (SEQ ID NO: 18, encoded by SEQ ID NO: 19) to effect DSBs at two different locations in the rice CAOl locus.
  • Experiment 31 used plasmid 132033 as the gRNA source, while experiment 46 used plasmid 132054 as the gRNA source.
  • DNA was extracted from hygromycin-resistant rice callus pieces and subjected to T7EI assays.
  • T7EI assays identified one callus piece from experiment 31 and five callus pieces from experiment 46 that appeared to contain indels at the expected site.
  • callus pieces 46-38 and 46-77 showed two different indels, indicating that multiple indel production events had occurred in independent cells within these callus pieces. All of the indels from these experiments were located at the predicted site in the CAOl locus targeted by the respective guide RNA, indicating faithful production of DSBs at this site by the LbCpfl enzyme.
  • Experiment 80 tested the ability of the Moraxella caprae Cpf 1 enzyme (SEQ ID NO: 133, encoded by SEQ ID NO: 175) to effect DSBs at the rice CAOl locus.
  • Figure 3 A shows the results of these sequencing assays, with a forty-two base pair deletion present in callus piece #93-47 at the predicted site in the CAOl locus targeted by the respective guide RNA, indicating faithful DSB production at this site by the Sulfuricurvum sp. Csml enzyme.

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Chemical & Material Sciences (AREA)
  • Biotechnology (AREA)
  • Zoology (AREA)
  • Organic Chemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Wood Science & Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biochemistry (AREA)
  • Microbiology (AREA)
  • Biophysics (AREA)
  • Plant Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Cell Biology (AREA)
  • Medicinal Chemistry (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Breeding Of Plants And Reproduction By Means Of Culturing (AREA)
  • Peptides Or Proteins (AREA)
  • Enzymes And Modification Thereof (AREA)
  • Saccharide Compounds (AREA)
  • Compositions Of Macromolecular Compounds (AREA)
  • Preparation Of Compounds By Using Micro-Organisms (AREA)

Abstract

Compositions and methods for modifying genomic DNA sequences are provided. The methods roduce double-stranded breaks (DSBs) at pre-determined target sites in a genomic DNA sequence, sulting in mutation, insertion, and/or deletion of DNA sequences at the target site(s) in a genome. ompositions comprise DNA constructs comprising nucleotide sequences that encode a Cpf1 or Csm1 rotein operably linked to a promoter that is operable in the cells of interest. The DNA constructs can e used to direct the modification of genomic DNA at pre-determined genomic loci. Methods to use ese DNA constructs to modify genomic DNA sequences are described herein. Additionally, ompositions and methods for modulating the expression of genes are provided. Compositions omprise DNA constructs comprising a promoter that is operable in the cells of interest operably nked to nucleotide sequences that encode a mutated Cpf1 or Csm1 protein with an abolished ability to roduce DSBs, optionally linked to a domain that regulates transcriptional activity. The methods can e used to up- or down-regulate the expression of genes at predetermined genomic loci.

Description

COMPOSITIONS AND METHODS FOR MODIFYING GENOMES
FIELD OF THE INVENTION
The present invention relates to compositions and methods for editing genomic sequences at pre-selected locations and for modulating gene expression.
REFERENCE TO A SEQUENCE LISTING SUBMITTED AS A TEXT FILE VIA EFS-WEB The official copy of the sequence listing is submitted concurrently with the specification as a text file via EFS-Web, in compliance with the American Standard Code for Information Interchange (ASCII), with a file name of B88552_1060WO_0057_l_Seq_List.txt, a creation date of February 14, 2017, and a size of 1.62 MB. The sequence listing filed via EFS-Web is part of the specification and is hereby incorporated in its entirety by reference herein.
BACKGROUND OF THE INVENTION
Modification of genomic DNA is of immense importance for basic and applied research.
Genomic modifications have the potential to elucidate and in some cases to cure the causes of disease and to provide desirable traits in the cells and/or individuals comprising said modifications. Genomic modification may include, for example, modification of plant, animal, fungal, and/or prokaryotic genomic modification. One area in which genomic modification is practiced is in the modification of plant genomic DNA.
Modification of plant genomic DNA is of immense importance to both basic and applied plant research. Transgenic plants with stably modified genomic DNA can have new traits such as herbicide tolerance, insect resistance, and/or accumulation of valuable proteins including pharmaceutical proteins and industrial enzymes imparted to them. The expression of native plant genes may be up- or down-regulated or otherwise altered (e.g., by changing the tissue(s) in which native plant genes are expressed), their expression may be abolished entirely, DNA sequences may be altered (e.g., through point mutations, insertions, or deletions), or new non-native genes may be inserted into a plant genome to impart new traits to the plant.
The most common methods for modifying plant genomic DNA tend to modify the DNA at random sites within the genome. Such methods include, for example, Agrobacterium-mediated plant transformation and biolistic transformation, also referred to as particle bombardment. In many cases, however, it is desirable to modify the genomic DNA at a pre-determined target site in the plant genome of interest, e.g., to avoid disruption of native plant genes or to insert a transgene cassette at a genomic locus that is known to provide robust gene expression. Only recently have technologies for targeted modification of plant genomic DNA become available. Such technologies rely on the creation of a double-stranded break (DSB) at the desired site. This DSB causes the recruitment of the plant's native DNA-repair machinery to the DSB. The DNA-repair machinery may be harnessed to insert heterologous DNA at a pre-determined site, to delete native plant genomic DNA, or to produce point mutations, insertions, or deletions at a desired site.
SUMMARY OF THE INVENTION
Compositions and methods for modifying genomic DNA sequences are provided. As used herein, genomic DNA refers to linear and/or chromosomal DNA and/or to plasmid or other
extrachromosomal DNA sequences present in the cell or cells of interest. The methods produce double-stranded breaks (DSBs) at pre-determined target sites in a genomic DNA sequence, resulting in mutation, insertion, and/or deletion of DNA sequences at the target site(s) in a genome. Compositions comprise DNA constructs comprising nucleotide sequences that encode a Cpfl or Csml protein operably linked to a promoter that is operable in the cells of interest. The DNA constructs can be used to direct the modification of genomic DNA at pre-determined genomic loci. Methods to use these DNA constructs to modify genomic DNA sequences are described herein. Modified plants, plant cells, plant parts and seeds are also encompassed. Compositions and methods for modulating the expression of genes are also provided. The methods target protein(s) to pre-determined sites in a genome to effect an up- or down-regulation of a gene or genes whose expression is regulated by the targeted site in the genome. Compositions comprise DNA constructs comprising nucleotide sequences that encode a modified Cpfl or Csml protein with diminished or abolished nuclease activity, optionally fused to a transcriptional activation or repression domain. Methods to use these DNA constructs to modify gene expression are described herein.
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 shows a schematic depiction of the insertion of a hygromycin resistance gene cassette in the rice CAOl genomic locus. The star indicates the site of the intended Cpfl -mediated double- stranded break in the wild-type DNA. Dashed lines indicate homology between the repair donor cassette and wild-type DNA. Small arrows indicate the primer binding sites for the PCR reactions used to verify insertion at the intended genomic locus. 35S Term., CaMV 35S terminator; hph, hygromycin resistance gene; ZmUbi, maize ubiquitin promoter.
Figure 2 shows sequence data obtained from rice calli generated during Experiment 1. Figure 2A shows the results of an insertion of an hph cassette at the CAOl locus. The PAM sequence is boxed and the sequence targeted by the guide RNA is underlined. The ellipsis indicates that a large insertion existed, but the full sequence data is not shown here. Figures 2B, 2C, and 2D show data obtained from rice calli in which an FnCpfl -mediated deletion event occurred in Experiment 01 (Table 7). In Figures 2B and 2C, the lanes depict callus pieces #1-16, from left to right, followed by a molecular weight ladder lane. Figure 2B shows PCR amplification of the FnCpfl gene cassette, indicating insertion of this cassette in the rice genome in callus pieces 1, 2, 4, 6, 7, and 15. Figure 2C shows the results of a T7EI assay with DNA extracted from these same callus pieces, with the double-band pattern for callus #15 indicating a possible insertion or deletion. Similar T7EI assay results were obtained for additional calli in a repeat of Experiment 01, which resulted in the production of callus pieces 01-20, 01-21, 01- 30, and 01-31. Figure 2D shows an alignment of sequence data obtained from callus #15 (01-15), along with the sequence data from callus pieces 01-20, 01-21, 01-30, and 01-31. The PAM sequence is boxed and the sequence targeted by the guide RNA is underlined.
Figure 3 shows sequence data from Experiments 31, 46, 80, 81, 91, and 93, verifying Cpfl- mediated and Csml-mediated indels in the rice CAOl genomic locus. Figure 3A shows an alignment of the wild-type rice CAOl locus with sequence data from callus piece #21 from Experiment 31 (31- 21), callus piece #33 from Experiment 80 (80-33), callus pieces 9, 30, and 46 from Experiment 81 (81- 09, 81-30, and 81-46, respectively), callus piece #47 from Experiment 93 (93-47), callus piece #4 from Experiment 91 (91-04), callus pieces #112 and 141 from Experiment 97 (97-112 and 97-141), and callus pieces #4 and 11 from Experiment 119 (119-04 and 119-11). Figure 3B shows sequence data from callus pieces 46-38, 46-77, 46-86, 46-88, and 46-90 from Experiment 46. In both 4A and 4B, the PAM site is boxed and the region targeted by the guide RNA is underlined.
Figure 4 shows an overview of the unexpected recombination events recovered from
Experiments 70 and 75. Figure 4A shows a schematic overview of a portion of the 131633 plasmid including the homologous regions of the 35S terminator and the downstream arm that led to the recombination events recovered from Experiment 70. Regions of homology that appear to have mediated the unintended HDR events are underlined. Figure 4B shows the sequencing data from callus piece 70-15. WT, wild-type sequence; GE70, callus piece 70-15 sequence; 131633_upstream, upstream arm and 35S Term sequence from plasmid 131633; 131633_downstream, downstream arm sequence from plasmid 131633. Figure 4C shows a schematic overview of a portion of the 131633 plasmid including the homologous regions of the 35S terminator and the downstream arm that led to the recombination events recovered from experiment 75. Regions of homology that appear to have mediated the unintended HDR events are underlined. Figure 4D shows the sequencing data from callus piece 75-46. WT, wild-type sequence; GE75, callus piece 75-46 sequence; 131633_upstream, upstream arm and 35S Term sequence from plasmid 131633; 131633_downstream, downstream arm sequence from plasmid 131633. 35S Term, CaMV 35S terminator; hph, hygromycin
phosphotransferase coding region; pZmUbi, maize ubiquitin promoter. In Figures 4B and 4D, the PAM site is boxed.
Figure 5 shows the sequence of the upstream region of callus piece #46-161 from Experiment 46 (Table 7). The PAM site is boxed, showing the expected mutation of this site in the transformed rice callus, and the sequence data indicates successful insertion of the vector 131633 insert at the rice CAOl genomic locus.
DETAILED DESCRIPTION OF THE INVENTION
Methods and compositions are provided herein for the control of gene expression involving sequence targeting, such as genome perturbation or gene-editing, that relate to the CRISPR-Cpf or CRISPR-Csm system and components thereof. In certain embodiments, the CRISPR enzyme is a Cpf enzyme, e.g. a Cpf 1 ortholog. In certain embodiments, the CRISPR enzyme is a Csm enzyme, e.g. a Csml ortholog. The methods and compositions include nucleic acids to bind target DNA sequences. This is advantageous as nucleic acids are much easier and less expensive to produce than, for example, peptides, and the specificity can be varied according to the length of the stretch where homology is sought. Complex 3-D positioning of multiple fingers, for example is not required.
Also provided are nucleic acids encoding the Cpf 1 and Csml polypeptides, as well as methods of using Cpfl and Csml polypeptides to modify chromosomal (i.e., genomic) or organellar DNA sequences of host cells including plant cells. The Cpfl polypeptides interact with specific guide RNAs (gRNAs), which direct the Cpfl or Csml endonuclease to a specific target site, at which site the Cpfl or Csml endonuclease introduces a double-stranded break that can be repaired by a DNA repair process such that the DNA sequence is modified. Since the specificity is provided by the guide RNA, the Cpfl or Csml polypeptide is universal and can be used with different guide RNAs to target different genomic sequences. Cpfl and Csml endonucleases have certain advantages over the Cas nucleases (e.g., Cas9) traditionally used with CRISPR arrays. For example, Cpfl -associated CRISPR arrays are processed into mature crRNAs without the requirement of an additional trans-activating crRNA (tracrRNA). Also, Cpfl-crRNA complexes can cleave target DNA preceded by a short protospacer-adjacent motif (PAM) that is often T-rich, in contrast to the G-rich PAM following the target DNA for many Cas9 systems. Further, Cpfl can introduce a staggered DNA double-stranded break with a 4 or 5-nucleotide (nt) 5' overhang. Without being limited by theory, it is likely that Csml proteins similarly process their CRISPR arrays into mature crRNAs without the requirement of an additional trans-activating crRNA (tracrRNA) and produce staggered rather than blunt cuts. The methods disclosed herein can be used to target and modify specific chromosomal sequences and/or introduce exogenous sequences at targeted locations in the genome of plant cells or plant embryos. The methods can further be used to introduce sequences or modify regions within organelles (e.g., chloroplasts and/or mitochondria). Furthermore, the targeting is specific with limited off target effects. I. Cpfl and Csml endonucleases
Provided herein are Cpfl and Csml endonucleases, and fragments and variants thereof, for use in modifying genomes including plant genomes. As used herein, the term Cpfl endonucleases or Cpfl polypeptides refers to homologs and orthologs of the Cpfl polypeptides disclosed in Zetsche et al. (2015) Cell 163: 759-771 and of the Cpfl polypeptides disclosed in U.S. Patent Application
2016/0208243, and fragments and variants thereof. Examples of Cpfl polypeptides are set forth in SEQ ID NOs: 3, 6, 9, 12, 15, 18, 20, 23, 106-133, 135-146, 148-158, 161-173, and 231-236. As used herein, the term Csml endonucleases or Csml polypeptides refers to homologs and orthologs of SEQ ID NOs: 134, 147, 159, 160, and 230. Typically, Cpfl and Csml endonucleases can act without the use of tracrRNAs and can introduce a staggered DNA double-strand break. In general, Cpfl and Csml polypeptides comprise at least one RNA recognition and/or RNA binding domain. RNA recognition and/or RNA binding domains interact with guide RNAs. Cpfl and Csml polypeptides can also comprise nuclease domains (i.e., DNase or RNase domains), DNA binding domains, helicase domains, RNAse domains, protein-protein interaction domains, dimerization domains, as well as other domains. In specific embodiments, a Cpfl or Csml polypeptide, or a polynucleotide encoding a Cpfl or Csml polypeptide, comprises: an RNA-binding portion that interacts with the DNA-targeting RNA, and an activity portion that exhibits site-directed enzymatic activity, such as a RuvC endonuclease domain. Cpfl or Csml polypeptides can be wild type Cpfl or Csml polypeptide, modified Cpfl or Csml polypeptides, or a fragment of a wild type or modified Cpfl or Csml polypeptide. The Cpfl or Csml polypeptide can be modified to increase nucleic acid binding affinity and/or specificity, alter an enzymatic activity, and/or change another property of the protein. For example, nuclease (i.e., DNase, RNase) domains of the Cpfl or Csml polypeptide can be modified, deleted, or inactivated.
Alternatively, the Cpfl or Csml polypeptide can be truncated to remove domains that are not essential for the function of the protein. In specific embodiments, the Cpfl or Csml polypeptide forms a homodimer or a heterodimer.
In some embodiments, the Cpfl or Csml polypeptide can be derived from a wild type Cpfl or Csml polypeptide or fragment thereof. In other embodiments, the Cpfl or Csml polypeptide can be derived from a modified Cpfl or Csml polypeptide. For example, the amino acid sequence of the Cpfl or Csml polypeptide can be modified to alter one or more properties (e.g., nuclease activity, affinity, stability, etc.) of the protein. Alternatively, domains of the Cpfl or Csml polypeptide not involved in RNA-guided cleavage can be eliminated from the protein such that the modified Cpfl or Csml polypeptide is smaller than the wild type Cpfl or Csml polypeptide.
In general, a Cpfl or Csml polypeptide comprises at least one nuclease (i.e., DNase) domain, but need not contain an HNH domain such as the one found in Cas9 proteins. For example, a Cpfl or Csml polypeptide can comprise a RuvC-like nuclease domain. In some embodiments, the Cpfl or Csml polypeptide can be modified to inactivate the nuclease domain so that it is no longer functional. In some embodiments in which one of the nuclease domains is inactive, the Cpfl or Csml polypeptide does not cleave double-stranded DNA. In specific embodiments, the mutated Cpfl or Csml polypeptide comprises a mutation in a position corresponding to positions 917 or 1006 of FnCpfl (SEQ ID NO: 3) or to positions 701 or 922 of SmCsml (SEQ ID NO: 160) when aligned for maximum identity that reduces or eliminates the nuclease activity. For example, an aspartate to alanine (D917A) conversion and glutamate to alanine (E1006A) in a RuvC-like domain completely inactivated the DNA cleavage activity of FnCpfl (SEQ ID NO: 3), while aspartate to alanine (D1255A) significantly reduced cleavage activity (Zetsche et al. (2015) Cell 163: 759-771). The nuclease domain can be modified using well-known methods, such as site-directed mutagenesis, PCR-mediated mutagenesis, and total gene synthesis, as well as other methods known in the art. Cpfl or Csml proteins with inactivated nuclease domains (dCpfl or dCsml proteins) can be used to modulate gene expression without modifying DNA sequences. In certain embodiments, a dCpfl or dCsml protein may be targeted to particular regions of a genome such as promoters for a gene or genes of interest through the use of appropriate gRNAs. The dCpfl or dCsml protein can bind to the desired region of DNA and may interfere with RNA polymerase binding to this region of DNA and/or with the binding of transcription factors to this region of DNA. This technique may be used to up- or down-regulate the expression of one or more genes of interest. In certain other embodiments, the dCpfl or dCsml protein may be fused to a repressor domain to further downregulate the expression of a gene or genes whose expression is regulated by interactions of RNA polymerase, transcription factors, or other
transcriptional regulators with the region of chromosomal DNA targeted by the gRNA. In certain other embodiments, the dCpfl or dCsml protein may be fused to an activation domain to effect an upregulation of a gene or genes whose expression is regulated by interactions of RNA polymerase, transcription factors, or other transcriptional regulators with the region of chromosomal DNA targeted by the gRNA.
The Cpfl and Csml polypeptides disclosed herein can further comprise at least one nuclear localization signal (NLS). In general, an NLS comprises a stretch of basic amino acids. Nuclear localization signals are known in the art (see, e.g., Lange et al., J. Biol. Chem. (2007) 282:5101-5105). The NLS can be located at the N-terminus, the C-terminus, or in an internal location of the Cpfl or Csml polypeptide. In some embodiments, the Cpfl or Csml polypeptide can further comprise at least one cell-penetrating domain. The cell-penetrating domain can be located at the N-terminus, the C- terminus, or in an internal location of the protein.
The Cpfl or Csml polypeptide disclosed herein can further comprise at least one plastid targeting signal peptide, at least one mitochondrial targeting signal peptide, or a signal peptide targeting the Cpfl or Csml polypeptide to both plastids and mitochondria. Plastid, mitochondrial, and dual-targeting signal peptide localization signals are known in the art (see, e.g., Nassoury and Morse (2005) Biochim Biophys Acta 1743:5-19; Kunze and Berger (2015) Front Physiol
dx.doi.org/10.3389/fphys.2015.00259; Herrmann and Neupert (2003) IUBMB Life 55:219-225; Soil
(2002) Curr Opin Plant Biol 5:529-535; Carrie and Small (2013) Biochim Biophys Acta 1833:253-259; Carrie et al. (2009) FEBS J 276: 1187-1195; Silva-Filho (2003) Curr Opin Plant Biol 6:589-595;
Peeters and Small (2001) Biochim Biophys Acta 1541:54-63; Murcha et al. (2014) J Exp Bot 65:6301- 6335; Mackenzie (2005) Trends Cell Biol 15:548-554; Glaser et al. (1998) Plant Mol Biol 38:311- 338). The plastid, mitochondrial, or dual-targeting signal peptide can be located at the N-terminus, the C-terminus, or in an internal location of the Cpfl or Csml polypeptide. In still other embodiments, the Cpfl or Csml polypeptide can also comprise at least one marker domain. Non-limiting examples of marker domains include fluorescent proteins, purification tags, and epitope tags. In certain embodiments, the marker domain can be a fluorescent protein. Non limiting examples of suitable fluorescent proteins include green fluorescent proteins (e.g., GFP, GFP-2, tagGFP, turboGFP, EGFP, Emerald, Azami Green, Monomeric Azami Green, CopGFP, AceGFP, ZsGreenl), yellow fluorescent proteins (e.g. YFP, EYFP, Citrine, Venus, YPet, PhiYFP, ZsYellowl), blue fluorescent proteins (e.g. EBFP, EBFP2, Azurite, mKalamal, GFPuv, Sapphire, T-sapphire), cyan fluorescent proteins (e.g. ECFP, Cerulean, CyPet, AmCyanl, Midoriishi-Cyan), red fluorescent proteins (mKate, mKate2, mPlum, DsRed monomer, mCherry, mRFPl, DsRed-Express, DsRed2, DsRed-Monomer, HcRed-Tandem, HcRedl, AsRed2, eqFP611, mRasberry, mStrawberry, Jred), and orange fluorescent proteins (mOrange, mKO, Kusabira-Orange, Monomeric Kusabira-Orange, mTangerine, tdTomato) or any other suitable fluorescent protein. In other embodiments, the marker domain can be a purification tag and/or an epitope tag. Exemplary tags include, but are not limited to, glutathione-S-transferase (GST), chitin binding protein (CBP), maltose binding protein, thioredoxin (TRX), poly(NANP), tandem affinity purification (TAP) tag, myc, AcV5, AU1, AU5, E, ECS, E2, FLAG, HA, nus, Softag 1, Softag 3, Strep, SBP, Glu-Glu, HSV, KT3, S, SI, T7, V5, VSV-G, 6xHis, biotin carboxyl carrier protein (BCCP), and calmodulin.
In certain embodiments, the Cpfl or Csml polypeptide may be part of a protein-RNA complex comprising a guide RNA. The guide RNA interacts with the Cpfl or Csml polypeptide to direct the Cpfl or Csml polypeptide to a specific target site, wherein the 5' end of the guide RNA can base pair with a specific protospacer sequence of the nucleotide sequence of interest in the plant genome, whether part of the nuclear, plastid, and/or mitochondrial genome. As used herein, the term "DNA- targeting RNA" refers to a guide RNA that interacts with the Cpfl or Csml polypeptide and the target site of the nucleotide sequence of interest in the genome of a plant cell. A DNA-targeting RNA, or a DNA polynucleotide encoding a DNA-targeting RNA, can comprise: a first segment comprising a nucleotide sequence that is complementary to a sequence in the target DNA, and a second segment that interacts with a Cpfl or Csml polypeptide.
The polynucleotides encoding Cpfl and Csml polypeptides disclosed herein can be used to isolate corresponding sequences from other prokaryotic or eukaryotic organisms. In this manner, methods such as PCR, hybridization, and the like can be used to identify such sequences based on their sequence homology or identity to the sequences set forth herein. Sequences isolated based on their sequence identity to the entire Cpfl or Csml sequences set forth herein or to variants and fragments thereof are encompassed by the present invention. Such sequences include sequences that are orthologs of the disclosed Cpfl and Csml sequences. "Orthologs" is intended to mean genes derived from a common ancestral gene and which are found in different species as a result of speciation. Genes found in different species are considered orthologs when their nucleotide sequences and/or their encoded protein sequences share at least about 75%, about 80%, about 85%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%, about 97%, about 98%, about 99%, or greater sequence identity. Functions of orthologs are often highly conserved among species. Thus, isolated polynucleotides that encode polypeptides having Cpfl or Csml endonuclease activity and which share at least about 75% or more sequence identity to the sequences disclosed herein, are encompassed by the present invention. As used herein, Cpfl or Csml endonuclease activity refers to CRISPR endonuclease activity wherein, a guide RNA (gRNA) associated with a Cpfl or Csml polypeptide causes the Cpfl -gRNA or Csml -gRNA complex to bind to a pre-determined nucleotide sequence that is complementary to the gRNA; and wherein Cpfl or Csml activity can introduce a double-stranded break at or near the site targeted by the gRNA. In certain embodiments, this double-stranded break may be a staggered DNA double-stranded break. As used herein a "staggered DNA double-stranded break" can result in a double strand break with about 1, about 2, about 3, about 4, about 5, about 6, about 7, about 8, about 9, or about 10 nucleotides of overhang on either the 3' or 5' ends following cleavage. In specific embodiments, the Cpfl or Csml polypeptide introduces a staggered DNA double- stranded break with a 4 or 5-nt 5' overhang. The double strand break can occur at or near the sequence to which the DNA-targeting RNA (e.g., guide RNA) sequence is targeted.
Fragments and variants of the Cpfl and Csml polynucleotides and Cpfl and Csml amino acid sequences encoded thereby are encompassed herein. By "fragment" is intended a portion of the polynucleotide or a portion of the amino acid sequence. "Variants" is intended to mean substantially similar sequences. For polynucleotides, a variant comprises a polynucleotide having deletions (i.e., truncations) at the 5' and/or 3' end; deletion and/or addition of one or more nucleotides at one or more internal sites in the native polynucleotide; and/or substitution of one or more nucleotides at one or more sites in the native polynucleotide. As used herein, a "native" polynucleotide or polypeptide comprises a naturally occurring nucleotide sequence or amino acid sequence, respectively. Generally, variants of a particular polynucleotide of the invention will have at least about 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity to that particular polynucleotide as determined by sequence alignment programs and parameters as described elsewhere herein.
"Variant" amino acid or protein is intended to mean an amino acid or protein derived from the native amino acid or protein by deletion (so-called truncation) of one or more amino acids at the N- terminal and/or C-terminal end of the native protein; deletion and/or addition of one or more amino acids at one or more internal sites in the native protein; or substitution of one or more amino acids at one or more sites in the native protein. Variant proteins encompassed by the present invention are biologically active, that is they continue to possess the desired biological activity of the native protein. Biologically active variants of a native polypeptide will have at least about 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity to the amino acid sequence for the native sequence as determined by sequence alignment programs and parameters described herein. A biologically active variant of a protein of the invention may differ from that protein by as few as 1 - 15 amino acid residues, as few as 1-10, such as 6-10, as few as 5, as few as 4, 3, 2, or even 1 amino acid residue.
Variant sequences may also be identified by analysis of existing databases of sequenced genomes. In this manner, corresponding sequences can be identified and used in the methods of the invention.
Methods of alignment of sequences for comparison are well known in the art. Thus, the determination of percent sequence identity between any two sequences can be accomplished using a mathematical algorithm. Non-limiting examples of such mathematical algorithms are the algorithm of Myers and Miller (1988) CABIOS 4: 11-17; the local alignment algorithm of Smith et al. (1981) Adv. Appl. Math. 2:482; the global alignment algorithm of Needleman and Wunsch (1970) . Mol. Biol. 48:443-453; the search-for-local alignment method of Pearson and Lipman (1988) Proc. Natl. Acad. Sci. 85:2444-2448; the algorithm of Karlin and Altschul (1990) Proc. Natl. Acad. Sci. USA 87:2264- 2268, modified as in Karlin and Altschul (1993) Proc. Natl. Acad. Sci. USA 90:5873-5877.
Computer implementations of these mathematical algorithms can be utilized for comparison of sequences to determine sequence identity. Such implementations include, but are not limited to:
CLUSTAL in the PC/Gene program (available from Intelligenetics, Mountain View, California); the ALIGN program (Version 2.0) and GAP, BESTFIT, BLAST, FASTA, and TFASTA in the GCG Wisconsin Genetics Software Package, Version 10 (available from Accelrys Inc., 9685 Scranton Road, San Diego, California, USA). Alignments using these programs can be performed using the default parameters. The CLUSTAL program is well described by Higgins et al. (1988) Gene 73:237-244; Higgins et al. (1989) CABIOS 5: 151-153; Corpet et al. (1988) Nucleic Acids Res. 16: 10881-90; Huang et al. (1992) CABIOS 8: 155-65; and Pearson et al. (1994) Meth. Mol. Biol. 24:307-331. The ALIGN program is based on the algorithm of Myers and Miller (1988) supra. A PAM120 weight residue table, a gap length penalty of 12, and a gap penalty of 4 can be used with the ALIGN program when comparing amino acid sequences. The BLAST programs of Altschul et al (1990) . Mol. Biol. 215:403 are based on the algorithm of Karlin and Altschul (1990) supra. BLAST nucleotide searches can be performed with the BLASTN program, score = 100, wordlength = 12, to obtain nucleotide sequences homologous to a nucleotide sequence encoding a protein of the invention. BLAST protein searches can be performed with the BLASTX program, score = 50, wordlength = 3, to obtain amino acid sequences homologous to a protein or polypeptide of the invention. To obtain gapped alignments for comparison purposes, Gapped BLAST (in BLAST 2.0) can be utilized as described in Altschul et al. (1997) Nucleic Acids Res. 25:3389. Alternatively, PSI-BLAST (in BLAST 2.0) can be used to perform an iterated search that detects distant relationships between molecules. See Altschul et al. (1997) supra. When utilizing BLAST, Gapped BLAST, PSI-BLAST, the default parameters of the respective programs (e.g., BLASTN for nucleotide sequences, BLASTX for proteins) can be used. See the website at www.ncbi.nlm.nih.gov. Alignment may also be performed manually by inspection.
The nucleic acid molecules encoding Cpfl and Csml polypeptides, or fragments or variants thereof, can be codon optimized for expression in a plant of interest or other cell or organism of interest. A "codon-optimized gene" is a gene having its frequency of codon usage designed to mimic the frequency of preferred codon usage of the host cell. Nucleic acid molecules can be codon optimized, either wholly or in part. Because any one amino acid (except for methionine and tryptophan) is encoded by a number of codons, the sequence of the nucleic acid molecule may be changed without changing the encoded amino acid. Codon optimization is when one or more codons are altered at the nucleic acid level such that the amino acids are not changed but expression in a particular host organism is increased. Those having ordinary skill in the art will recognize that codon tables and other references providing preference information for a wide range of organisms are available in the art (see, e.g., Zhang et al. (1991) Gene 105:61-72; Murray et al. (1989) Nucl. Acids Res. 17:477-508). Methodology for optimizing a nucleotide sequence for expression in a plant is provided, for example, in U.S. Pat. No. 6,015,891, and the references cited therein. Example of codon optimized polynucleotides for expression in a plant are set forth in: SEQ ID NOs: 5, 8, 11, 14, 17, 19, 22, 25, and 174-206.
II. Fusion proteins
Fusion proteins are provided herein comprising a Cpfl or Csml polypeptide, or a fragment or variant thereof, and an effector domain. The Cpfl or Csml polypeptide can be directed to a target site by a guide RNA, at which site the effector domain can modify or effect the targeted nucleic acid sequence. The effector domain can be a cleavage domain, an epigenetic modification domain, a transcriptional activation domain, or a transcriptional repressor domain. The fusion protein can further comprise at least one additional domain chosen from a nuclear localization signal, plastid signal peptide, mitochondrial signal peptide, signal peptide capable of protein trafficking to multiple subcellular locations, a cell-penetrating domain, or a marker domain, any of which can be located at the N-terminus, C-terminus, or an internal location of the fusion protein. The Cpfl or Csml polypeptide can be located at the N-terminus, the C-terminus, or in an internal location of the fusion protein. The Cpfl or Csml polypeptide can be directly fused to the effector domain, or can be fused with a linker. In specific embodiments, the linker sequence fusing the Cpfl or Csml polypeptide with the effector domain can be at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 40, or 50 amino acids in length. For example, the linker can range from 1-5, 1-10, 1-20, 1-50, 2-3, 3-10, 3-20, 5-20, or 10-50 amino acids in length.
In some embodiments, the Cpfl or Csml polypeptide of the fusion protein can be derived from a wild type Cpfl or Csml protein. The Cpfl -derived or Csml -derived protein can be a modified variant or a fragment. In some embodiments, the Cpfl or Csml polypeptide can be modified to contain a nuclease domain (e.g. a RuvC-like domain) with reduced or eliminated nuclease activity. For example, the Cpfl -derived or Csml -derived polypeptide can be modified such that the nuclease domain is deleted or mutated such that it is no longer functional (i.e., the nuclease activity is absent). Particularly, a Cpfl or Csml polypeptide can have a mutation in a position corresponding to positions 917 or 1006 of FnCpfl (SEQ ID NO:3) or to positions 701 or 922 of SmCsml (SEQ ID NO: 160) when aligned for maximum identity. For example, an aspartate to alanine (D917A) conversion and glutamate to alanine (El 006 A) in a RuvC-like domain completely inactivated the DNA cleavage activity of FnCpfl, while aspartate to alanine (D 1255 A) significantly reduced cleavage activity (Zetsche et al. (2015) Cell 163: 759-771). Examples of Cpfl polypeptides having mutations in the RuvC domain are set forth in SEQ ID NOs: 26-41 and 63-70. The nuclease domain can be inactivated by one or more deletion mutations, insertion mutations, and/or substitution mutations using known methods, such as site-directed mutagenesis, PCR-mediated mutagenesis, and total gene synthesis, as well as other methods known in the art. In an exemplary embodiment, the Cpfl or Csml polypeptide of the fusion protein is modified by mutating the RuvC-like domain such that the Cpfl or Csml polypeptide has no nuclease activity.
The fusion protein also comprises an effector domain located at the N-terminus, the C- terminus, or in an internal location of the fusion protein. In some embodiments, the effector domain is a cleavage domain. As used herein, a "cleavage domain" refers to a domain that cleaves DNA. The cleavage domain can be obtained from any endonuclease or exonuclease. Non-limiting examples of endonucleases from which a cleavage domain can be derived include, but are not limited to, restriction endonucleases and homing endonucleases. See, for example, New England Biolabs Catalog or Belfort et al. (1997) Nucleic Acids Res. 25:3379-3388. Additional enzymes that cleave DNA are known (e.g., SI Nuclease; mung bean nuclease; pancreatic DNase I; micrococcal nuclease; yeast HO endonuclease). See also Linn et al. (eds.) Nucleases, Cold Spring Harbor Laboratory Press, 1993. One or more of these enzymes (or functional fragments thereof) can be used as a source of cleavage domains.
In some embodiments, the cleavage domain can be derived from a type II-S endonuclease. Type II-S endonucleases cleave DNA at sites that are typically several base pairs away from the recognition site and, as such, have separable recognition and cleavage domains. These enzymes generally are monomers that transiently associate to form dimers to cleave each strand of DNA at staggered locations. Non-limiting examples of suitable type II-S endonucleases include Bfil, Bpml, Bsal, Bsgl, BsmBI, Bsml, BspMI, Fokl, Mboll, and Sapl.
In certain embodiments, the type II-S cleavage can be modified to facilitate dimerization of two different cleavage domains (each of which is attached to a Cpfl or Csml polypeptide or fragment thereof). In embodiments wherein the effector domain is a cleavage domain the Cpfl or Csml polypeptide can be modified as discussed herein such that its endonuclease activity is eliminated. For example, the Cpfl or Csml polypeptide can be modified by mutating the RuvC-like domain such that the polypeptide no longer exhibits endonuclease activity.
In other embodiments, the effector domain of the fusion protein can be an epigenetic modification domain. In general, epigenetic modification domains alter histone structure and/or chromosomal structure without altering the DNA sequence. Changes in histone and/or chromatin structure can lead to changes in gene expression. Examples of epigenetic modification include, without limit, acetylation or methylation of lysine residues in histone proteins, and methylation of cytosine residues in DNA. Non-limiting examples of suitable epigenetic modification domains include histone acetyltansferase domains, histone deacetylase domains, histone methyltransferase domains, histone demethylase domains, DNA methyltransferase domains, and DNA demethylase domains.
In embodiments in which the effector domain is a histone acetyltansferase (HAT) domain, the HAT domain can be derived from EP300 (i.e., E1A binding protein p300), CREBBP (i.e., CREB- binding protein), CDY1, CDY2, CDYL1, CLOCK, ELP3, ESA1, GCN5 (KAT2A), HAT1, KAT2B, KAT5, MYST1, MYST2, MYST3, MYST4, NCOA1, NCOA2, NCOA3, NCOAT, P/CAF, Tip60, TAFII250, or TF3C4. In embodiments wherein the effector domain is an epigenetic modification domain, the Cpfl or Csml polypeptide can be modified as discussed herein such that its endonuclease activity is eliminated. For example, the Cpfl or Csml polypeptide can be modified by mutating the RuvC-like domain such that the polypeptide no longer possesses nuclease activity.
In some embodiments, the effector domain of the fusion protein can be a transcriptional activation domain. In general, a transcriptional activation domain interacts with transcriptional control elements and/or transcriptional regulatory proteins (i.e., transcription factors, RNA polymerases, etc.) to increase and/or activate transcription of one or more genes. In some embodiments, the
transcriptional activation domain can be, without limit, a herpes simplex virus VP 16 activation domain, VP64 (which is a tetrameric derivative of VP 16), a NFKB p65 activation domain, p53 activation domains 1 and 2, a CREB (cAMP response element binding protein) activation domain, an E2A activation domain, and an NFAT (nuclear factor of activated T-cells) activation domain. In other embodiments, the transcriptional activation domain can be Gal4, Gcn4, MLL, Rtg3, Gln3, Oafl, Pip2, Pdrl, Pdr3, Pho4, and Leu3. The transcriptional activation domain may be wild type, or it may be a modified version of the original transcriptional activation domain. In some embodiments, the effector domain of the fusion protein is a VP 16 or VP64 transcriptional activation domain. In embodiments wherein the effector domain is a transcriptional activation domain, the Cpfl or Csml polypeptide can be modified as discussed herein such that its endonuclease activity is eliminated. For example, the Cpfl or Csml polypeptide can be modified by mutating the RuvC-like domain such that the polypeptide no longer possesses nuclease activity.
In still other embodiments, the effector domain of the fusion protein can be a transcriptional repressor domain. In general, a transcriptional repressor domain interacts with transcriptional control elements and/or transcriptional regulatory proteins (i.e., transcription factors, RNA polymerases, etc.) to decrease and/or terminate transcription of one or more genes. Non-limiting examples of suitable transcriptional repressor domains include inducible cAMP early repressor (ICER) domains, Kruppel- associated box A (KRAB-A) repressor domains, YY1 glycine rich repressor domains, Spl-like repressors, E(spl) repressors, I.kappa.B repressor, and MeCP2. In embodiments wherein the effector domain is a transcriptional repressor domain, the Cpfl or Csml polypeptide can be modified as discussed herein such that its endonuclease activity is eliminated. For example, the Cpfl or Csml polypeptide can be modified by mutating the RuvC-like domain such that the polypeptide no longer possesses nuclease activity.
In some embodiments, the fusion protein further comprises at least one additional domain.
Non-limiting examples of suitable additional domains include nuclear localization signals, cell- penetrating or translocation domains, and marker domains.
When the effector domain of the fusion protein is a cleavage domain, a dimer comprising at least one fusion protein can form. The dimer can be a homodimer or a heterodimer. In some embodiments, the heterodimer comprises two different fusion proteins. In other embodiments, the heterodimer comprises one fusion protein and an additional protein.
The dimer can be a homodimer in which the two fusion protein monomers are identical with respect to the primary amino acid sequence. In one embodiment where the dimer is a homodimer, the Cpfl or Csml polypeptide can be modified such that the endonuclease activity is eliminated. In certain embodiments wherein the Cpfl or Csml polypeptide is modified such that endonuclease activity is eliminated, each fusion protein monomer can comprise an identical Cpfl or Csml polypeptide and an identical cleavage domain. The cleavage domain can be any cleavage domain, such as any of the exemplary cleavage domains provided herein. In such embodiments, specific guide RNAs would direct the fusion protein monomers to different but closely adjacent sites such that, upon dimer formation, the nuclease domains of the two monomers would create a double stranded break in the target DNA.
The dimer can also be a heterodimer of two different fusion proteins. For example, the Cpfl or Csml polypeptide of each fusion protein can be derived from a different Cpfl or Csml polypeptide or from an orthologous Cpfl or Csml polypeptide from a different bacterial species. For example, each fusion protein can comprise a Cpfl or Csml polypeptide derived from a different bacterial species. In these embodiments, each fusion protein would recognize a different target site (i.e., specified by the protospacer and/or PAM sequence). For example, the guide RNAs could position the heterodimer to different but closely adjacent sites such that their nuclease domains produce an effective double stranded break in the target DNA.
Alternatively, two fusion proteins of a heterodimer can have different effector domains. In embodiments in which the effector domain is a cleavage domain, each fusion protein can contain a different modified cleavage domain. In these embodiments, the Cpfl or Csml polypeptide can be modified such that their endonuclease activities are eliminated. The two fusion proteins forming a heterodimer can differ in both the Cpfl or Csml polypeptide domain and the effector domain.
In any of the above-described embodiments, the homodimer or heterodimer can comprise at least one additional domain chosen from nuclear localization signals (NLSs), plastid signal peptides, mitochondrial signal peptides, signal peptides capable of trafficking proteins to multiple subcellular locations, cell-penetrating, translocation domains and marker domains, as detailed above. In any of the above-described embodiments, one or both of the Cpfl or Csml polypeptides can be modified such that endonuclease activity of the polypeptide is eliminated or modified.
The heterodimer can also comprise one fusion protein and an additional protein. For example, the additional protein can be a nuclease. In one embodiment, the nuclease is a zinc finger nuclease. A zinc finger nuclease comprises a zinc finger DNA binding domain and a cleavage domain. A zinc finger recognizes and binds three (3) nucleotides. A zinc finger DNA binding domain can comprise from about three zinc fingers to about seven zinc fingers. The zinc finger DNA binding domain can be derived from a naturally occurring protein or it can be engineered. See, for example, Beerli et al.
(2002) Nat. Biotechnol. 20: 135-141 ; Pabo et al. (2001) Ann. Rev. Biochem. 70:313-340; Isalan et al. (2001) Nat. Biotechnol. 19:656-660; Segal et al. (2001) Curr. Opin. Biotechnol. 12:632-637; Choo et al. (2000) Curr. Opin. Struct. Biol. 10:411-416; Zhang et al. (2000) . Biol. Chem. 275(43):33850- 33860; Doyon et al. (2008) Nat. Biotechnol. 26:702-708; and Santiago et al. (2008) Proc. Natl. Acad. Sci. USA 105:5809-5814. The cleavage domain of the zinc finger nuclease can be any cleavage domain detailed herein. In some embodiments, the zinc finger nuclease can comprise at least one additional domain chosen from nuclear localization signals, plastid signal peptides, mitochondrial signal peptides, signal peptides capable of trafficking proteins to multiple subcellular locations, cell-penetrating or translocation domains, which are detailed herein.
In certain embodiments, any of the fusion proteins detailed above or a dimer comprising at least one fusion protein may be part of a protein-RNA complex comprising at least one guide RNA. A guide RNA interacts with the Cpfl or Csml polypeptide of the fusion protein to direct the fusion protein to a specific target site, wherein the 5' end of the guide RNA base pairs with a specific protospacer sequence.
III. Nucleic Acids Encoding Cpfl or Csml Polypeptides or Fusion Proteins
Nucleic acids encoding any of the Cpfl and Csml polypeptides or fusion proteins described herein are provided. The nucleic acid can be RNA or DNA. Examples of polynucleotides that encode Cpfl polypeptides are set forth in SEQ ID NOs: 4, 5, 7, 8, 10, 1 1 , 13, 14, 16, 17, 19, 21 , 22, 24, 25, and 174- 184, 187- 192, 194-201 , and 203-206. Examples of polynucleotides that encode Csml polypeptides are set forth in SEQ ID NOs: 185, 186, 193, and 202. In one embodiment, the nucleic acid encoding the Cpfl or Csml polypeptide or fusion protein is mRNA. The mRNA can be 5' capped and/or 3' polyadenylated. In another embodiment, the nucleic acid encoding the Cpfl or Csml polypeptide or fusion protein is DNA. The DNA can be present in a vector.
Nucleic acids encoding the Cpfl or Csml polypeptide or fusion proteins can be codon optimized for efficient translation into protein in the plant cell of interest. Programs for codon optimization are available in the art (e.g., OPTIMIZER at genomes.urv.es/OPTIMIZER;
OptimumGene.TM. from GenScript at www.genscript.com/codon_opt.html).
In certain embodiments, DNA encoding the Cpfl or Csml polypeptide or fusion protein can be operably linked to at least one promoter sequence. The DNA coding sequence can be operably linked to a promoter control sequence for expression in a host cell of interest. In some embodiments, the host cell is a plant cell. "Operably linked" is intended to mean a functional linkage between two or more elements. For example, an operable linkage between a promoter and a coding region of interest (e.g., region coding for a Cpfl or Csml polypeptide or guide RNA) is a functional link that allows for expression of the coding region of interest. Operably linked elements may be contiguous or noncontiguous. When used to refer to the joining of two protein coding regions, by operably linked is intended that the coding regions are in the same reading frame.
The promoter sequence can be constitutive, regulated, growth stage-specific, or tissue-specific. It is recognized that different applications can be enhanced by the use of different promoters in the nucleic acid molecules to modulate the timing, location and/or level of expression of the Cpfl or Csml polypeptide and/or guide RNA. Such nucleic acid molecules may also contain, if desired, a promoter regulatory region (e.g. , one conferring inducible, constitutive, environmentally- or developmentally- regulated, or cell- or tissue-specific/selective expression), a transcription initiation start site, a ribosome binding site, an RNA processing signal, a transcription termination site, and/or a polyadenylation signal.
In some embodiments, the nucleic acid molecules provided herein can be combined with constitutive, tissue -preferred, developmentally-preferred or other promoters for expression in plants. Examples of constitutive promoters functional in plant cells include the cauliflower mosaic virus (CaMV) 35S transcription initiation region, the Γ- or 2'-promoter derived from T-DNA of
Agrobacterium tumefaciens , the ubiquitin 1 promoter, the Smas promoter, the cinnamyl alcohol dehydrogenase promoter (U.S. Pat. No. 5,683,439), the Nos promoter, the pEmu promoter, the rubisco promoter, the GRP1-8 promoter and other transcription initiation regions from various plant genes known to those of skill. If low level expression is desired, weak promoter(s) may be used. Weak constitutive promoters include, for example, the core promoter of the Rsyn7 promoter (WO 99/43838 and U.S. Pat. No. 6,072,050), the core 35S CaMV promoter, and the like. Other constitutive promoters include, for example, U.S. Pat. Nos. 5,608,149; 5,608,144; 5,604,121; 5,569,597; 5,466,785;
5,399,680; 5,268,463; and 5,608, 142. See also, U.S. Pat. No. 6,177,611, herein incorporated by reference.
Examples of inducible promoters are the Adhl promoter which is inducible by hypoxia or cold stress, the Hsp70 promoter which is inducible by heat stress, the PPDK promoter and the
pepcarboxylase promoter which are both inducible by light. Also useful are promoters which are chemically inducible, such as the In2-2 promoter which is safener induced (U.S. Pat. No. 5,364,780), the ERE promoter which is estrogen induced, and the Axigl promoter which is auxin induced and tapetum specific but also active in callus (PCT US01/22169).
Examples of promoters under developmental control in plants include promoters that initiate transcription preferentially in certain tissues, such as leaves, roots, fruit, seeds, or flowers. A "tissue specific" promoter is a promoter that initiates transcription only in certain tissues. Unlike constitutive expression of genes, tissue-specific expression is the result of several interacting levels of gene regulation. As such, promoters from homologous or closely related plant species can be preferable to use to achieve efficient and reliable expression of transgenes in particular tissues. In some
embodiments, the expression comprises a tissue -preferred promoter. A "tissue preferred" promoter is a promoter that initiates transcription preferentially, but not necessarily entirely or solely in certain tissues. In some embodiments, the nucleic acid molecules encoding a Cpfl or Csml polypeptide and/or guide RNA comprise a cell type specific promoter. A "cell type specific" promoter is a promoter that primarily drives expression in certain cell types in one or more organs. Some examples of plant cells in which cell type specific promoters functional in plants may be primarily active include, for example, BETL cells, vascular cells in roots, leaves, stalk cells, and stem cells. The nucleic acid molecules can also include cell type preferred promoters. A "cell type preferred" promoter is a promoter that primarily drives expression mostly, but not necessarily entirely or solely in certain cell types in one or more organs. Some examples of plant cells in which cell type preferred promoters functional in plants may be preferentially active include, for example, BETL cells, vascular cells in roots, leaves, stalk cells, and stem cells. The nucleic acid molecules described herein can also comprise seed-preferred promoters. In some embodiments, the seed-preferred promoters have expression in embryo sac, early embryo, early endosperm, aleurone, and/or basal endosperm transfer cell layer (BETL).
Examples of seed-preferred promoters include, but are not limited to, 27 kD gamma zein promoter and waxy promoter, Boronat, A. et al. (1986) Plant Sci. 47:95-102; Reina, M. et al. Nucl. Acids Res. 18(21):6426; and Kloesgen, R. B. et al. (1986) Mol. Gen. Genet. 203:237-244. Promoters that express in the embryo, pericarp, and endosperm are disclosed in U.S. Pat. No. 6,225,529 and PCT publication WO 00/12733. The disclosures for each of these are incorporated herein by reference in their entirety.
Promoters that can drive gene expression in a plant seed-preferred manner with expression in the embryo sac, early embryo, early endosperm, aleurone and/or basal endosperm transfer cell layer (BETL) can be used in the compositions and methods disclosed herein. Such promoters include, but are not limited to, promoters that are naturally linked to Zea mays early endosperm 5 gene, Zea mays early endosperm 1 gene, Zea mays early endosperm 2 gene, GRMZM2G 124663, GRMZM2G006585, GRMZM2G120008, GRMZM2G 157806, GRMZM2G176390, GRMZM2G472234,
GRMZM2G 138727, Zea mays CLAVATA 1 , Zea mays MRP 1 , Oryza sativa PR602, Oryza sativa
PR9a, Zea mays BETl, Zea mays BETL-2, Zea mays BETL-3, Zea mays BETL-4, Zea mays BETL-9, Zea mays BETL- 10, Zea mays MEG1, Zea mays TCCR1, Zea mays ASP1, Oryza sativa ASP1, Triticum durum PR60, Triticum durum PR91, Triticum durum GL7, AT3G10590, AT4G18870, AT4G21080, AT5G23650, AT3G05860, AT5G42910, AT2G26320, AT3G03260, AT5G26630, AtIPT4, AtIPT8, AtLEC2, LFAH12. Additional such promoters are described in U.S. Patent Nos. 7803990, 8049000, 7745697, 7119251, 7964770, 7847160, 7700836, U.S. Patent Application Publication Nos. 20100313301, 20090049571, 20090089897, 20100281569, 20100281570,
20120066795, 20040003427; PCT Publication Nos. WO/1999/050427, WO/2010/129999,
WO/2009/094704, WO/2010/019996 and WO/2010/147825, each of which is herein incorporated by reference in its entirety for all purposes. Functional variants or functional fragments of the promoters described herein can also be operably linked to the nucleic acids disclosed herein.
Chemical-regulated promoters can be used to modulate the expression of a gene through the application of an exogenous chemical regulator. Depending upon the objective, the promoter may be a chemical-inducible promoter, where application of the chemical induces gene expression, or a chemical-repressible promoter, where application of the chemical represses gene expression.
Chemical-inducible promoters are known in the art and include, but are not limited to, the maize In2-2 promoter, which is activated by benzenesulfonamide herbicide safeners, the maize GST promoter, which is activated by hydrophobic electrophilic compounds that are used as pre-emergent herbicides, and the tobacco PR- 1 a promoter, which is activated by salicylic acid. Other chemical -regulated promoters of interest include steroid-responsive promoters (see, for example, the glucocorticoid- inducible promoter in Schena et al. (1991) Proc. Natl. Acad. Sci. USA 88: 10421-10425 and McNellis et al. (1998) Plant J. 14(2):247-257) and tetracycline-inducible and tetracycline-repressible promoters (see, for example, Gatz et al. (1991) Mol. Gen. Genet. 227:229-237, and U.S. Pat. Nos. 5,814,618 and 5,789,156), herein incorporated by reference.
Tissue-preferred promoters can be utilized to target enhanced expression of an expression construct within a particular tissue. In certain embodiments, the tissue-preferred promoters may be active in plant tissue. Tissue-preferred promoters are known in the art. See, for example, Yamamoto et al. (1997) Plant J. 12(2):255-265; Kawamata et al. (1997) Plant Cell Physiol. 38(7):792-803;
Hansen et al. (1997) Mol. Gen Genet. 254(3):337-343; Russell et al. (1997) Transgenic Res. 6(2): 157- 168; Rinehart et al. (1996) Plant Physiol. 112(3): 1331-1341 ; Van Camp et al. (1996) Plant Physiol. 112(2):525-535; Canevascini et al. (1996) Plant Physiol. 112(2):513-524; Yamamoto et al. (1994) Plant Cell Physiol. 35(5):773-778; Lam (1994) Results Probl. Cell Differ. 20: 181-196; Orozco et al. (1993) Plant Mol Biol. 23(6): 1129-1138; Matsuoka et al. (1993) Proc Natl. Acad. Sci. USA
90(20):9586-9590; and Guevara-Garcia et al. (1993) Plant J. 4(3):495-505. Such promoters can be modified, if necessary, for weak expression.
Leaf -preferred promoters are known in the art. See, for example, Yamamoto et al. (1997)
Plant J. 12(2):255-265; Kwon et al. (1994) Plant Physiol. 105:357-67; Yamamoto et al. (1994) Plant Cell Physiol. 35(5):773-778; Gotor et al. (1993) Plant J. 3:509-18; Orozco et al. (1993) Plant Mol. Biol. 23(6): 1129-1138; and Matsuoka et al. (1993) Proc. Natl. Acad. Sci. USA 90(20): 9586-9590. In addition, the promoters of cab and rubisco can also be used. See, for example, Simpson et al. (1958) EMBO 4:2723-2729 and Timko et al. (1988) Nature 318:57-58.
Root-preferred promoters are known and can be selected from the many available from the literature or isolated de novo from various compatible species. See, for example, Hire et al. (1992) Plant Mol. Biol. 20(2):207-218 (soybean root-specific glutamine synthetase gene); Keller and
Baumgartner (1991) Plant Cell 3(10): 1051-1061 (root-specific control element in the GRP 1.8 gene of French bean); Sanger et al. (1990) Plant Mol. Biol. 14(3):433-443 (root-specific promoter of the mannopine synthase (MAS) gene of Agrobacterium tumefaciens); and Miao et al. (1991) Plant Cell 3(1): 11-22 (full-length cDNA clone encoding cytosolic glutamine synthetase (GS), which is expressed in roots and root nodules of soybean). See also Bogusz et al. (1990) Plant Cell 2(7):633-641, where two root-specific promoters isolated from hemoglobin genes from the nitrogen-fixing nonlegume Parasponia andersonii and the related non-nitrogen-fixing nonlegume Trema tomentosa are described. The promoters of these genes were linked to a β-glucuronidase reporter gene and introduced into both the nonlegume Nicotiana tabacum and the legume Lotus corniculatus, and in both instances root- specific promoter activity was preserved. Leach and Aoyagi (1991) describe their analysis of the promoters of the highly expressed roIC and roID root-inducing genes of Agrobacterium rhizogenes (see Plant Science (Limerick) 79(l):69-76). They concluded that enhancer and tissue-preferred DNA determinants are dissociated in those promoters. Teeri et al. (1989) used gene fusion to lacZ to show that the Agrobacterium T-DNA gene encoding octopine synthase is especially active in the epidermis of the root tip and that the TR2' gene is root specific in the intact plant and stimulated by wounding in leaf tissue, an especially desirable combination of characteristics for use with an insecticidal or larvicidal gene (see EMBO J. 8(2):343-350). The TR1' gene, fused to nptll (neomycin
phosphotransferase II) showed similar characteristics. Additional root-preferred promoters include the VfENOD-GRP3 gene promoter (Kuster et al. (1995) Plant Mol. Biol. 29(4):759-772); and roIB promoter (Capana et al. (1994) Plant Mol. Biol. 25(4):681-691. See also U.S. Pat. Nos. 5,837,876; 5,750,386; 5,633,363; 5,459,252; 5,401,836; 5,110,732; and 5,023,179. The phaseolin gene (Murai et al. (1983) Science 23:476-482 and Sengopta-Gopalen et al. (1988) PNAS 82:3320-3324.The promoter sequence can be wild type or it can be modified for more efficient or efficacious expression. The nucleic acid sequences encoding the Cpfl or Csml polypeptide or fusion protein can be operably linked to a promoter sequence that is recognized by a phage RNA polymerase for in vitro mRNA synthesis. In such embodiments, the in vitro-transcribed RNA can be purified for use in the methods of genome modification described herein. For example, the promoter sequence can be a T7, T3, or SP6 promoter sequence or a variation of a T7, T3, or SP6 promoter sequence. In some embodiments, the sequence encoding the Cpfl or Csml polypeptide or fusion protein can be operably linked to a promoter sequence for in vitro expression of the Cpfl or Csml polypeptide or fusion protein in plant cells. In such embodiments, the expressed protein can be purified for use in the methods of genome modification described herein.
In certain embodiments, the DNA encoding the Cpfl or Csml polypeptide or fusion protein also can be linked to a polyadenylation signal (e.g., SV40 polyA signal and other signals functional in plants) and/or at least one transcriptional termination sequence. Additionally, the sequence encoding the Cpfl or Csml polypeptide or fusion protein also can be linked to sequence encoding at least one nuclear localization signal, at least one plastid signal peptide, at least one mitochondrial signal peptide, at least one signal peptide capable of trafficking proteins to multiple subcellular locations, at least one cell-penetrating domain, and/or at least one marker domain, described elsewhere herein.
The DNA encoding the Cpfl or Csml polypeptide or fusion protein can be present in a vector. Suitable vectors include plasmid vectors, phagemids, cosmids, artificial/mini-chromosomes, transposons, and viral vectors {e.g., lentiviral vectors, adeno-associated viral vectors, etc.). In one embodiment, the DNA encoding the Cpfl or Csml polypeptide or fusion protein is present in a plasmid vector. Non-limiting examples of suitable plasmid vectors include pUC, pBR322, pET, pBluescript, pCAMBIA, and variants thereof. The vector can comprise additional expression control sequences (e.g., enhancer sequences, Kozak sequences, polyadenylation sequences, transcriptional termination sequences, etc.), selectable marker sequences (e.g., antibiotic resistance genes), origins of replication, and the like. Additional information can be found in "Current Protocols in Molecular
Biology" Ausubel et al, John Wiley & Sons, New York, 2003 or "Molecular Cloning: A Laboratory Manual" Sambrook & Russell, Cold Spring Harbor Press, Cold Spring Harbor, N.Y., 3rd edition, 2001.
In some embodiments, the expression vector comprising the sequence encoding the Cpfl or Csml polypeptide or fusion protein can further comprise a sequence encoding a guide RNA. The sequence encoding the guide RNA can be operably linked to at least one transcriptional control sequence for expression of the guide RNA in the plant or plant cell of interest. For example, DNA encoding the guide RNA can be operably linked to a promoter sequence that is recognized by RNA polymerase III (Pol III). Examples of suitable Pol III promoters include, but are not limited to, mammalian U6, U3, HI, and 7SL RNA promoters and rice U6 and U3 promoters. IV. Methods for Modifying a Nucleotide Sequence in a Plant Genome
Methods are provided herein for modifying a nucleotide sequence of a plant cell, plant organelle, or plant embryo. The methods comprise introducing into a plant cell, organelle, or embryo, a DNA-targeting RNA or a DNA polynucleotide encoding a DNA-targeting RNA, wherein the DNA- targeting RNA comprises: (a) a first segment comprising a nucleotide sequence that is complementary to a sequence in the target DNA; and (b) a second segment that interacts with a Cpf 1 or Csml polypeptide and also introducing to the plant cell a Cpfl or Csml polypeptide, or a polynucleotide encoding a Cpfl or Csml polypeptide, wherein the a Cpfl or Csml polypeptide comprises: (a) an RNA-binding portion that interacts with the DNA-targeting RNA; and (b) an activity portion that exhibits site-directed enzymatic activity. The plant cell or plant embryo can then be cultured under conditions in which the Cpfl or Csml polypeptide is expressed and cleaves the nucleotide sequence. It is noted that the system described herein does not require the addition of exogenous Mg2+ or any other ions. Finally, a plant cell or organelle comprising the modified nucleotide sequence can be selected.
In some embodiments, the method can comprise introducing one Cpfl or Csml polypeptide (or encoding nucleic acid) and one guide RNA (or encoding DNA) into a plant cell, organelle, or embryo, wherein the Cpfl or Csml polypeptide introduces one double-stranded break in the target nucleotide sequence of the plant chromosomal DNA. In embodiments in which an optional donor polynucleotide is not present, the double-stranded break in the nucleotide sequence can be repaired by a nonhomologous end-joining (NHEJ) repair process. Because NHEJ is error-prone, deletions of at least one nucleotide, insertions of at least one nucleotide, substitutions of at least one nucleotide, or
combinations thereof can occur during the repair of the break. Accordingly, the targeted nucleotide sequence can be modified or inactivated. For example, a single nucleotide change (SNP) can give rise to an altered protein product, or a shift in the reading frame of a coding sequence can inactivate or "knock out" the sequence such that no protein product is made. In embodiments in which the optional donor polynucleotide is present, the donor sequence in the donor polynucleotide can be exchanged with or integrated into the nucleotide sequence at the targeted site during repair of the double-stranded break. For example, in embodiments in which the donor sequence is flanked by upstream and downstream sequences having substantial sequence identity with upstream and downstream sequences, respectively, of the targeted site in the nucleotide sequence of the plant, the donor sequence can be exchanged with or integrated into the nucleotide sequence at the targeted site during repair mediated by homology-directed repair process. Alternatively, in embodiments in which the donor sequence is flanked by compatible overhangs (or the compatible overhangs are generated in situ by the Cpf 1 or Csml polypeptide) the donor sequence can be ligated directly with the cleaved nucleotide sequence by a non-homologous repair process during repair of the double-stranded break. Exchange or integration of the donor sequence into the nucleotide sequence modifies the targeted nucleotide sequence of the plant or introduces an exogenous sequence into the nucleotide sequence of the plant cell, plant organelle, or plant embryo.
The methods disclosed herein can also comprise introducing two Cpfl or Csml polypeptides (or encoding nucleic acids) and two guide RNAs (or encoding DNAs) into a plant cell, organelle, or plant embryo, wherein the Cpfl or Csml polypeptides introduce two double-stranded breaks in the nucleotide sequence of the nuclear and/or organellar chromosomal DNA. The two breaks can be within several base pairs, within tens of base pairs, or can be separated by many thousands of base pairs. In embodiments in which an optional donor polynucleotide is not present, the resultant double-stranded breaks can be repaired by a non-homologous repair process such that the sequence between the two cleavage sites is lost and/or deletions of at least one nucleotide, insertions of at least one nucleotide, substitutions of at least one nucleotide, or combinations thereof can occur during the repair of the break(s). In embodiments in which an optional donor polynucleotide is present, the donor sequence in the donor polynucleotide can be exchanged with or integrated into the nucleotide sequence of the plant during repair of the double-stranded breaks by either a homology-based repair process (e.g., in embodiments in which the donor sequence is flanked by upstream and downstream sequences having substantial sequence identity with upstream and downstream sequences, respectively, of the targeted sites in the nucleotide sequence) or a non-homologous repair process (e.g., in embodiments in which the donor sequence is flanked by compatible overhangs).
By "altering" or "modulating" the expression level of a gene is intended that the expression of the gene is upregulated or downregulated. It is recognized that in some instances, plant growth and yield are increased by increasing the expression levels of one or more genes encoding proteins involved in photosynthesis, i.e. upregulating expression. Likewise, in some instances, plant growth and yield may be increased by decreasing the expression levels of one or more genes encoding proteins involved in photosynthesis, i.e. downregulating expression. Thus, the invention encompasses the upregulation or downregulation of one or more genes encoding proteins involved in photosynthesis using the Cpfl or Csml polypeptides disclosed herein. Further, the methods include the upregulation of at least one gene encoding a protein involved in photosynthesis and the downregulation of at least one gene encoding a protein involved in photosynthesis in a plant of interest. By modulating the concentration and/or activity of at least one of the genes encoding a protein involved in photosynthesis in a transgenic plant is intended that the concentration and/or activity is increased or decreased by at least about 1%, about 5%, about 10%, about 20%, about 30%, about 40%, about 50%, about 60%, about 70%, about 80%, or about 90% or greater relative to a native control plant, plant part, or cell which did not have the sequence of the invention introduced.
Plant cells possess nuclear, plastid, and mitochondrial genomes. The compositions and methods of the present invention may be used to modify the sequence of the nuclear, plastid, and/or
mitochondrial genome, or may be used to modulate the expression of a gene or genes encoded by the nuclear, plastid, and/or mitochondrial genome. Accordingly, by "chromosome" or "chromosomal" is intended the nuclear, plastid, or mitochondrial genomic DNA. "Genome" as it applies to plant cells encompasses not only chromosomal DNA found within the nucleus, but organelle DNA found within subcellular components (e.g., mitochondria or plastids) of the cell. Any nucleotide sequence of interest in a plant cell, organelle, or embryo can be modified using the methods described herein. In specific embodiments, the methods disclosed herein are used to modify a nucleotide sequence encoding an agronomically important trait, such as a plant hormone, plant defense protein, a nutrient transport protein, a biotic association protein, a desirable input trait, a desirable output trait, a stress resistance gene, a disease/pathogen resistance gene, a male sterility, a developmental gene, a regulatory gene, a gene involved in photosynthesis, a DNA repair gene, a transcriptional regulatory gene or any other polynucleotide and/or polypeptide of interest. Agronomically important traits such as oil, starch, and protein content can also be modified. Modifications include increasing content of oleic acid, saturated and unsaturated oils, increasing levels of lysine and sulfur, providing essential amino acids, and also modification of starch. Hordothionin protein modifications are described in U.S. Patent Nos.
5,703,049, 5,885,801, 5,885,802, and 5,990,389, herein incorporated by reference. Another example is lysine and/or sulfur rich seed protein encoded by the soybean 2S albumin described in U.S. Patent No. 5,850,016, and the chymotrypsin inhibitor from barley, described in Williamson et al. (1987) Eur. J. Biochem. 165:99-106, the disclosures of which are herein incorporated by reference. Derivatives of coding sequences can be made using the methods disclosed herein to increase the level of preselected amino acids in the encoded polypeptide. For example, the gene encoding the barley high lysine polypeptide (BHL) is derived from barley chymotrypsin inhibitor, U.S. Application Serial No. 08/740,682, filed November 1, 1996, and WO 98/20133, the disclosures of which are herein incorporated by reference. Other proteins include methionine-rich plant proteins such as from sunflower seed (Lilley et al. (1989) Proceedings of the World Congress on Vegetable Protein
Utilization in Human Foods and Animal Feedstuff s, ed. Applewhite (American Oil Chemists Society, Champaign, Illinois), pp. 497-502; herein incorporated by reference); corn (Pedersen et al. (1986) . Biol. Chem. 261:6279; Kirihara et al. (1988) Gene 71 :359; both of which are herein incorporated by reference); and rice (Musumura et al. (1989) Plant Mol. Biol. 12: 123, herein incorporated by reference). Other agronomically important genes encode latex, Floury 2, growth factors, seed storage factors, and transcription factors.
The methods disclosed herein can be used to modify herbicide resistance traits including genes coding for resistance to herbicides that act to inhibit the action of acetolactate synthase (ALS), in particular the sulfonylurea-type herbicides (e.g., the acetolactate synthase (ALS) gene containing mutations leading to such resistance, in particular the S4 and/or Hra mutations), genes coding for resistance to herbicides that act to inhibit action of glutamine synthase, such as phosphinothricin or basta (e.g., the bar gene); glyphosate (e.g., the EPSPS gene and the GAT gene; see, for example, U.S. Publication No. 20040082770 and WO 03/092360); or other such genes known in the art. The bar gene encodes resistance to the herbicide basta, the nptll gene encodes resistance to the antibiotics kanamycin and geneticin, and the ALS-gene mutants encode resistance to the herbicide chlorsulfuron. Additional herbicide resistance traits are described for example in U.S. Patent Application
2016/0208243, herein incorporated by reference.
Sterility genes can also be modified and provide an alternative to physical detasseling.
Examples of genes used in such ways include male tissue -preferred genes and genes with male sterility phenotypes such as QM, described in U.S. Patent No. 5,583,210. Other genes include kinases and those encoding compounds toxic to either male or female gametophytic development. Additional sterility traits are described for example in U.S. Patent Application 2016/0208243, herein incorporated by reference.
The quality of grain can be altered by modifying genes encoding traits such as levels and types of oils, saturated and unsaturated, quality and quantity of essential amino acids, and levels of cellulose. In corn, modified hordothionin proteins are described in U.S. Patent Nos. 5,703,049, 5,885,801, 5,885,802, and 5,990,389.
Commercial traits can also be altered by modifying a gene or that could increase for example, starch for ethanol production, or provide expression of proteins. Another important commercial use of modified plants is the production of polymers and bioplastics such as described in U.S. Patent No. 5,602,321. Genes such as β-Ketothiolase, PHBase (polyhydroxyburyrate synthase), and acetoacetyl- CoA reductase (see Schubert et al. (1988) . Bacteriol. 170:5837-5847) facilitate expression of polyhyroxyalkanoates (PHAs).
Exogenous products include plant enzymes and products as well as those from other sources including prokaryotes and other eukaryotes. Such products include enzymes, cofactors, hormones, and the like. The level of proteins, particularly modified proteins having improved amino acid distribution to improve the nutrient value of the plant, can be increased. This is achieved by the expression of such proteins having enhanced amino acid content.
The methods disclosed herein can also be used for insertion of heterologous genes and/or modification of native plant gene expression to achieve desirable plant traits. Such traits include, for example, disease resistance, herbicide tolerance, drought tolerance, salt tolerance, insect resistance, resistance against parasitic weeds, improved plant nutritional value, improved forage digestibility, increased grain yield, cytoplasmic male sterility, altered fruit ripening, increased storage life of plants or plant parts, reduced allergen production, and increased or decreased lignin content. Genes capable of conferring these desirable traits are disclosed in U.S. Patent Application 2016/0208243, herein incorporated by reference.
(a) Cpfl or Csml polypeptide
The methods disclosed herein comprise introducing into a plant cell, plant organelle, or plant embryo at least one Cpfl or Csml polypeptide or a nucleic acid encoding at least one Cpfl or Csml polypeptide, as described herein. In some embodiments, the Cpfl or Csml polypeptide can be introduced into the plant cell, organelle, or plant embryo as an isolated protein. In such embodiments, the Cpfl or Csml polypeptide can further comprise at least one cell-penetrating domain, which facilitates cellular uptake of the protein. In some embodiments, the Cpfl or Csml polypeptide can be introduced into the plant cell, organelle, or plant embryo as a ribonucleoprotein in complex with a guide RNA. In other embodiments, the Cpfl or Csml polypeptide can be introduced into the plant cell, organelle, or plant embryo as an mRNA molecule. In still other embodiments, the Cpfl or Csml polypeptide can be introduced into the plant cell, organelle, or plant embryo as a DNA molecule. In general, DNA sequences encoding the Cpfl or Csml polypeptide or fusion protein described herein are operably linked to a promoter sequence that will function in the plant cell, organelle, or plant embryo of interest. The DNA sequence can be linear, or the DNA sequence can be part of a vector. In still other embodiments, the Cpfl or Csml polypeptide or fusion protein can be introduced into the plant cell, organelle, or embryo as an RNA-protein complex comprising the guide RNA or a fusion protein and the guide RNA.
In certain embodiments, mRNA encoding the Cpfl or Csml polypeptide may be targeted to an organelle (e.g., plastid or mitochondria). In certain embodiments, mRNA encoding one or more guide RNAs may be targeted to an organelle (e.g., plastid or mitochondria). In certain embodiments, mRNA encoding the Cpfl or Csml polypeptide and one or more guide RNAs may be targeted to an organelle (e.g., plastid or mitochondria). Methods for targeting mRNA to organelles are known in the art (see, e.g., U.S. Patent Application 2011/0296551; U.S. Patent Application 2011/0321187; Gomez and Pallas (2010) PLoS One 5:el2269), and are incorporated herein by reference.
In certain embodiments, DNA encoding the Cpfl or Csml polypeptide can further comprise a sequence encoding a guide RNA. In general, each of the sequences encoding the Cpfl or Csml polypeptide and the guide RNA is operably linked to one or more appropriate promoter control sequences that allow expression of the Cpfl or Csml polypeptide and the guide RNA, respectively, in the plant cell, organelle, or plant embryo. The DNA sequence encoding the Cpfl or Csml polypeptide and the guide RNA can further comprise additional expression control, regulatory, and/or processing sequence(s). The DNA sequence encoding the Cpfl or Csml polypeptide and the guide RNA can be linear or can be part of a vector,
(b) Guide RNA
Methods described herein further can also comprise introducing into a plant cell, organelle, or plant embryo at least one guide RNA or DNA encoding at least one guide RNA. A guide RNA interacts with the Cpfl or Csml polypeptide to direct the Cpfl or Csml polypeptide to a specific target site, at which site the 5' end of the guide RNA base pairs with a specific protospacer sequence in the plant nucleotide sequence. Guide RNAs can comprise three regions: a first region that is
complementary to the target site in the targeted chromosomal sequence, a second region that forms a stem loop structure, and a third region that remains essentially single-stranded. The first region of each guide RNA is different such that each guide RNA guides a Cpfl or Csml polypeptide to a specific target site. The second and third regions of each guide RNA can be the same in all guide RNAs.
One region of the guide RNA is complementary to a sequence (i.e., protospacer sequence) at the target site in the plant genome including the nuclear chromosomal sequence as well as plastid or mitochondrial sequences such that the first region of the guide RNA can base pair with the target site. In various embodiments, the first region of the guide RNA can comprise from about 8 nucleotides to more than about 30 nucleotides. For example, the region of base pairing between the first region of the guide RNA and the target site in the nucleotide sequence can be about 8, about 9, about 10, about 11, about 12, about 13, about 14, about 15, about 16, about 17, about 18, about 19, about 20, about 22, about 23, about 24, about 25, about 27, about 30 or more than 30 nucleotides in length. In an exemplary embodiment, the first region of the guide RNA is about 23, 24, or 25 nucleotides in length. The guide RNA also can comprise a second region that forms a secondary structure. In some embodiments, the secondary structure comprises a stem or hairpin. The length of the stem can vary. For example, the stem can range from about 6, to about 10, to about 15, to about 20, to about 25 base pairs in length. The stem can comprise one or more bulges of 1 to about 10 nucleotides. Thus, the overall length of the second region can range from about 16 to about 25 nucleotides in length. In certain embodiments, the loop is about 5 nucleotides in length and the stem comprises about 10 base pairs.
The guide RNA can also comprise a third region that remains essentially single-stranded. Thus, the third region has no complementarity to any nucleotide sequence in the cell of interest and has no complementarity to the rest of the guide RNA. The length of the third region can vary. In general, the third region is more than about 4 nucleotides in length. For example, the length of the third region can range from about 5 to about 60 nucleotides in length. The combined length of the second and third regions (also called the universal or scaffold region) of the guide RNA can range from about 30 to about 120 nucleotides in length. In one aspect, the combined length of the second and third regions of the guide RNA range from about 40 to about 45 nucleotides in length.
In some embodiments, the guide RNA comprises a single molecule comprising all three regions. In other embodiments, the guide RNA can comprise two separate molecules. The first RNA molecule can comprise the first region of the guide RNA and one half of the "stem" of the second region of the guide RNA. The second RNA molecule can comprise the other half of the "stem" of the second region of the guide RNA and the third region of the guide RNA. Thus, in this embodiment, the first and second RNA molecules each contain a sequence of nucleotides that are complementary to one another. For example, in one embodiment, the first and second RNA molecules each comprise a sequence (of about 6 to about 25 nucleotides) that base pairs to the other sequence to form a functional guide RNA. In specific embodiments, the guide RNA is a single molecule (i.e., crRNA) that interacts with the target site in the chromosome and the Cpfl polypeptide without the need for a second guide RNA (i.e., a tracrRNA).
In certain embodiments, the guide RNA can be introduced into the plant cell, organelle, or plant embryo as an RNA molecule. The RNA molecule can be transcribed in vitro. Alternatively, the RNA molecule can be chemically synthesized. In other embodiments, the guide RNA can be introduced into the plant cell, organelle, or embryo as a DNA molecule. In such cases, the DNA encoding the guide RNA can be operably linked to promoter control sequence for expression of the guide RNA in the plant cell, organelle, or plant embryo of interest. For example, the RNA coding sequence can be operably linked to a promoter sequence that is recognized by RNA polymerase III (Pol III). In exemplary embodiments, the RNA coding sequence is linked to a plant specific promoter.
The DNA molecule encoding the guide RNA can be linear or circular. In some embodiments, the DNA sequence encoding the guide RNA can be part of a vector. Suitable vectors include plasmid vectors, phagemids, cosmids, artificial/mini-chromosomes, transposons, and viral vectors. In an exemplary embodiment, the DNA encoding the Cpfl or Csml polypeptide is present in a plasmid vector. Non-limiting examples of suitable plasmid vectors include pUC, pBR322, pET, pBluescript, pCAMBIA, and variants thereof. The vector can comprise additional expression control sequences (e.g., enhancer sequences, Kozak sequences, polyadenylation sequences, transcriptional termination sequences, etc.), selectable marker sequences (e.g., antibiotic resistance genes), origins of replication, and the like.
In embodiments in which both the Cpfl or Csml polypeptide and the guide RNA are introduced into the plant cell, organelle, or embryo as DNA molecules, each can be part of a separate molecule (e.g., one vector containing Cpfl or Csml polypeptide or fusion protein coding sequence and a second vector containing guide RNA coding sequence) or both can be part of the same molecule (e.g., one vector containing coding (and regulatory) sequence for both the Cpfl or Csml polypeptide or fusion protein and the guide RNA).
(c) Target Site A Cpfl or Csml polypeptide in conjunction with a guide RNA is directed to a target site in a plant, including the chromosomal sequence of a plant, plant cell, plant organelle (e.g., plastid or mitochondria) or plant embryo, wherein the Cpfl or Csml polypeptide introduces a double-stranded break in the chromosomal sequence. The target site has no sequence limitation except that the sequence is immediately preceded (upstream) by a consensus sequence. This consensus sequence is also known as a protospacer adjacent motif (PAM). Examples of PAM sequences include, but are not limited to, TTN, CTN, TCN, CCN, TTTN, TCTN, TTCN, CTTN, ATTN, TCCN, TTGN, GTTN, CCCN, CCTN, TTAN, TCGN, CTCN, ACTN, GCTN, TCAN, GCCN, and CCGN (wherein N is defined as any nucleotide). It is well-known in the art that PAM sequence specificity for a given nuclease enzyme is affected by enzyme concentration (Karvelis et al. (2015) Genome Biol 16:253).
Thus, modulating the concentrations of Cpfl or Csml protein delivered to the cell or in vitro system of interest represents a way to alter the PAM site or sites associated with that Cpfl or Csml enzyme. Modulating Cpfl or Csml protein concentration in the system of interest may be achieved, for instance, by altering the promoter used to express the Cpfl-encoding or Csml-encoding gene, by altering the concentration of ribonucleoprotein delivered to the cell or in vitro system, or by adding or removing introns that may play a role in modulating gene expression levels. As detailed herein, the first region of the guide RNA is complementary to the protospacer of the target sequence. Typically, the first region of the guide RNA is about 19 to 21 nucleotides in length.
The target site can be in the coding region of a gene, in an intron of a gene, in a control region of a gene, in a non-coding region between genes, etc. The gene can be a protein coding gene or an RNA coding gene. The gene can be any gene of interest as described herein,
(d) Donor Polynucleotide
In some embodiments, the methods disclosed herein further comprise introducing at least one donor polynucleotide into a plant cell, organelle, or plant embryo. A donor polynucleotide comprises at least one donor sequence. In some aspects, a donor sequence of the donor polynucleotide corresponds to an endogenous or native plant genomic sequence found in the cell nucleus or in an organelle of interest (e.g., plastid or mitochondria). For example, the donor sequence can be essentially identical to a portion of the chromosomal sequence at or near the targeted site, but which comprises at least one nucleotide change. Thus, the donor sequence can comprise a modified version of the wild type sequence at the targeted site such that, upon integration or exchange with the native sequence, the sequence at the targeted location comprises at least one nucleotide change. For example, the change can be an insertion of one or more nucleotides, a deletion of one or more nucleotides, a substitution of one or more nucleotides, or combinations thereof. As a consequence of the integration of the modified sequence, the plant, plant cell, or plant embryo can produce a modified gene product from the targeted chromosomal sequence.
The donor sequence of the donor polynucleotide can alternatively correspond to an exogenous sequence. As used herein, an "exogenous" sequence refers to a sequence that is not native to the plant cell, organelle, or embryo, or a sequence whose native location in the genome of the cell, organelle, or embryo is in a different location. For example, the exogenous sequence can comprise a protein coding sequence, which can be operably linked to an exogenous promoter control sequence such that, upon integration into the genome, the plant cell or organelle is able to express the protein coded by the integrated sequence. For example, the donor sequence can be any gene of interest, such as those encoding agronomically important traits as described elsewhere herein. Alternatively, the exogenous sequence can be integrated into the nuclear, plastid, and/or mitochondrial chromosomal sequence such that its expression is regulated by an endogenous promoter control sequence. In other iterations, the exogenous sequence can be a transcriptional control sequence, another expression control sequence, or an RNA coding sequence. Integration of an exogenous sequence into a chromosomal sequence is termed a "knock in. " The donor sequence can vary in length from several nucleotides to hundreds of nucleotides to hundreds of thousands of nucleotides.
In some embodiments, the donor sequence in the donor polynucleotide is flanked by an upstream sequence and a downstream sequence, which have substantial sequence identity to sequences located upstream and downstream, respectively, of the targeted site in the plant nuclear, plastid, and/or mitochondrial genomic sequence. Because of these sequence similarities, the upstream and
downstream sequences of the donor polynucleotide permit homologous recombination between the donor polynucleotide and the targeted sequence such that the donor sequence can be integrated into (or exchanged with) the targeted plant sequence.
The upstream sequence, as used herein, refers to a nucleic acid sequence that shares substantial sequence identity with a chromosomal sequence upstream of the targeted site. Similarly, the downstream sequence refers to a nucleic acid sequence that shares substantial sequence identity with a chromosomal sequence downstream of the targeted site. As used herein, the phrase "substantial sequence identity" refers to sequences having at least about 75% sequence identity. Thus, the upstream and downstream sequences in the donor polynucleotide can have about 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% sequence identity with sequence upstream or downstream to the targeted site. In an exemplary embodiment, the upstream and downstream sequences in the donor polynucleotide can have about 95% or 100% sequence identity with nucleotide sequences upstream or downstream to the targeted site. In one embodiment, the upstream sequence shares substantial sequence identity with a nucleotide sequence located immediately upstream of the targeted site (i.e., adjacent to the targeted site). In other embodiments, the upstream sequence shares substantial sequence identity with a nucleotide sequence that is located within about one hundred (100) nucleotides upstream from the targeted site. Thus, for example, the upstream sequence can share substantial sequence identity with a nucleotide sequence that is located about 1 to about 20, about 21 to about 40, about 41 to about 60, about 61 to about 80, or about 81 to about 100 nucleotides upstream from the targeted site. In one embodiment, the downstream sequence shares substantial sequence identity with a nucleotide sequence located immediately downstream of the targeted site (i.e., adjacent to the targeted site). In other embodiments, the downstream sequence shares substantial sequence identity with a nucleotide sequence that is located within about one hundred (100) nucleotides downstream from the targeted site. Thus, for example, the downstream sequence can share substantial sequence identity with a nucleotide sequence that is located about 1 to about 20, about 21 to about 40, about 41 to about 60, about 61 to about 80, or about 81 to about 100 nucleotides downstream from the targeted site.
Each upstream or downstream sequence can range in length from about 20 nucleotides to about 5000 nucleotides. In some embodiments, upstream and downstream sequences can comprise about 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800, 1900, 2000, 2100, 2200, 2300, 2400, 2500, 2600, 2800, 3000, 3200, 3400, 3600, 3800, 4000, 4200, 4400, 4600, 4800, or 5000 nucleotides. In exemplary embodiments, upstream and downstream sequences can range in length from about 50 to about 1500 nucleotides.
Donor polynucleotides comprising the upstream and downstream sequences with sequence similarity to the targeted nucleotide sequence can be linear or circular. In embodiments in which the donor polynucleotide is circular, it can be part of a vector. For example, the vector can be a plasmid vector.
In certain embodiments, the donor polynucleotide can additionally comprise at least one targeted cleavage site that is recognized by the Cpfl or Csml polypeptide. The targeted cleavage site added to the donor polynucleotide can be placed upstream or downstream or both upstream and downstream of the donor sequence. For example, the donor sequence can be flanked by targeted cleavage sites such that, upon cleavage by the Cpf 1 or Csml polypeptide, the donor sequence is flanked by overhangs that are compatible with those in the nucleotide sequence generated upon cleavage by the Cpfl or Csml polypeptide. Accordingly, the donor sequence can be ligated with the cleaved nucleotide sequence during repair of the double stranded break by a non-homologous repair process. Generally, donor polynucleotides comprising the targeted cleavage site(s) will be circular (e.g., can be part of a plasmid vector).
The donor polynucleotide can be a linear molecule comprising a short donor sequence with optional short overhangs that are compatible with the overhangs generated by the Cpfl or Csml polypeptide. In such embodiments, the donor sequence can be ligated directly with the cleaved chromosomal sequence during repair of the double-stranded break. In some instances, the donor sequence can be less than about 1,000, less than about 500, less than about 250, or less than about 100 nucleotides. In certain cases, the donor polynucleotide can be a linear molecule comprising a short donor sequence with blunt ends. In other iterations, the donor polynucleotide can be a linear molecule comprising a short donor sequence with 5' and/or 3' overhangs. The overhangs can comprise 1, 2, 3, 4, or 5 nucleotides.
In some embodiments, the donor polynucleotide will be DNA. The DNA may be single- stranded or double-stranded and/or linear or circular. The donor polynucleotide may be a DNA plasmid, a bacterial artificial chromosome (BAC), a yeast artificial chromosome (YAC), a viral vector, a linear piece of DNA, a PCR fragment, a naked nucleic acid, or a nucleic acid complexed with a delivery vehicle such as a liposome or poloxamer. In certain embodiments, the donor polynucleotide comprising the donor sequence can be part of a plasmid vector. In any of these situations, the donor polynucleotide comprising the donor sequence can further comprise at least one additional sequence, (e) Introducing into the Plant Cell
The Cpfl or Csml polypeptide (or encoding nucleic acid), the guide RNA(s) (or encoding
DNA), and the optional donor polynucleotide(s) can be introduced into a plant cell, organelle, or plant embryo by a variety of means, including transformation. Transformation protocols as well as protocols for introducing polypeptides or polynucleotide sequences into plants may vary depending on the type of plant or plant cell, i.e., monocot or dicot, targeted for transformation. Suitable methods of introducing polypeptides and polynucleotides into plant cells include microinjection (Crossway et al. (1986) Biotechniques 4:320-334), electroporation (Riggs et al. (1986) Proc. Natl. Acad. Sci. USA 83:5602-5606, Agrobacterium-mediated transformation (U.S. Patent No. 5,563,055 and U.S. Patent No. 5,981,840), direct gene transfer (Paszkowski et al. (1984) EMBO J. 3:2717-2722), and ballistic particle acceleration (see, for example, U.S. Patent Nos. 4,945,050; U.S. Patent No. 5,879,918; U.S. Patent No. 5,886,244; and, 5,932,782; Tomes et al. (1995) in Plant Cell, Tissue, and Organ Culture: Fundamental Methods, ed. Gamborg and Phillips (Springer- Verlag, Berlin); McCabe et al. (1988)
Biotechnology 6:923-926); and Lecl transformation (WO 00/28058). Also see Weissinger et al. (1988) Ann. Rev. Genet. 22:421-477; Sanford et al. (1987) Particulate Science and Technology 5:27-37 (onion); Christou et al. (1988) Plant Physiol. 87:671-674 (soybean); McCabe et al. (1988)
Bio/Technology 6:923-926 (soybean); Finer and McMullen (1991) In Vitro Cell Dev. Biol. 27P: 175- 182 (soybean); Singh et al. (1998) Theor. Appl. Genet. 96:319-324 (soybean); Datta et al. (1990)
Biotechnology 8:736-740 (rice); Klein et al. (1988) Proc. Natl. Acad. Sci. USA 85:4305-4309 (maize); Klein et al. (1988) Biotechnology 6:559-563 (maize); U.S. Patent Nos. 5,240,855; 5,322,783; and, 5,324,646; Klein et al. (1988) Plant Physiol. 91:440-444 (maize); Fromm et al. (1990) Biotechnology 8:833-839 (maize); Hooykaas-Van Slogteren et al. (1984) Nature (London) 311:763-764; U.S. Patent No. 5,736,369 (cereals); Bytebier et al. (1987) Proc. Natl. Acad. Sci. USA 84:5345-5349 (Liliaceae); De Wet et al. (1985) in The Experimental Manipulation of Ovule Tissues, ed. Chapman et al.
(Longman, New York), pp. 197-209 (pollen); Kaeppler et al. (1990) Plant Cell Reports 9:415-418 and Kaeppler et al. (1992) Theor. Appl. Genet. 84:560-566 (whisker-mediated transformation); D'Halluin et al. (1992) Plant Cell 4: 1495-1505 (electroporation); Li et al. (1993) Plant Cell Reports 12:250-255 and Christou and Ford (1995) Annals of Botany 75:407-413 (rice); Osjoda et al. (1996) Nature Biotechnology 14:745-750 (maize via Agrobacterium tumefaciens); all of which are herein
incorporated by reference. Site-specific genome editing of plant cells by biolistic introduction of a ribonucleoprotein comprising a nuclease and suitable guide RNA has been demonstrated (Svitashev et al (2016) Nat Commun doi: 10.1038/ncomms 13274); these methods are herein incorporated by reference. "Stable transformation" is intended to mean that the nucleotide construct introduced into a plant integrates into the genome of the plant and is capable of being inherited by the progeny thereof. The nucleotide construct may be integrated into the nuclear, plastid, or mitochondrial genome of the plant. Methods for plastid transformation are known in the art (see, e.g., Chloroplast Biotechnology: Methods and Protocols (2014) Pal Maliga, ed. and U.S. Patent Application 2011/0321187), and methods for plant mitochondrial transformation have been described in the art (see, e.g., U.S. Patent Application 2011/0296551), herein incorporated by reference. The cells that have been transformed may be grown into plants (i.e., cultured) in accordance with conventional ways. See, for example, McCormick et al. (1986) Plant Cell Reports 5:81-84. In this manner, the present invention provides transformed seed (also referred to as "transgenic seed") having a nucleic acid modification stably incorporated into their genome.
"Introduced" in the context of inserting a nucleic acid fragment (e.g., a recombinant DNA construct) into a cell, means "transfection" or "transformation" or "transduction" and includes reference to the incorporation of a nucleic acid fragment into a plant cell where the nucleic acid fragment may be incorporated into the genome of the cell (e.g., nuclear chromosome, plasmid, plastid chromosome or mitochondrial chromosome), converted into an autonomous replicon, or transiently expressed (e.g., transfected mRNA).
The present invention may be used for transformation of any plant species, including, but not limited to, monocots and dicots (i.e., monocotyledonous and dicotyledonous, respectively). Examples of plant species of interest include, but are not limited to, corn {Zea mays), Brassica sp. (e.g., B. napus, B. rapa, B. juncea), particularly those Brassica species useful as sources of seed oil, alfalfa (Medicago sativa), rice (Oryza sativa), rye (Secale cereale), sorghum {Sorghum bicolor, Sorghum vulgare), millet (e.g., pearl millet (Pennisetum glaucum), proso millet (Panicum miliaceum), foxtail millet (Setaria italica), finger millet (Eleusine coracana)), sunflower (Helianthus annuus), safflower (Carthamus tinctorius), wheat (Triticum aestivum), soybean {Glycine max), tobacco (Nicotiana tabacum), potato (Solanum tuberosum), peanuts (Arachis hypogaea), cotton (Gossypium barbadense, Gossypium hirsutum), sweet potato (Ipomoea batatus), cassava (Manihot esculenta), coffee (Coffea spp.), coconut (Cocos nucifera), pineapple (Ananas comosus), citrus trees (Citrus spp.), cocoa (Theobroma cacao), tea (Camellia sinensis), banana (Musa spp.), avocado (Persea americana), fig (Ficus casica), guava (Psidium guajava), mango (Mangifera indica), olive (Olea europaea), papaya (Carica papaya), cashew (Anacardium occidentale) , macadamia (Macadamia integrifolia), almond (Prunus amygdalus), sugar beets (Beta vulgaris), sugarcane (Saccharum spp.), oil palm (Elaeis guineensis), poplar (Populus spp.), eucalyptus (Eucalyptus spp.), oats (Avena sativa), barley (Hordeum vulgare), vegetables, ornamentals, and conifers.
The Cpfl or Csml polypeptides (or encoding nucleic acid), the guide RNA(s) (or DNAs encoding the guide RNA), and the optional donor polynucleotide(s) can be introduced into the plant cell, organelle, or plant embryo simultaneously or sequentially. The ratio of the Cpfl polypeptides (or encoding nucleic acid) to the guide RNA(s) (or encoding DNA) generally will be about stoichiometric such that the two components can form an RNA-protein complex with the target DNA. In one embodiment, DNA encoding a Cpfl or Csml polypeptide and DNA encoding a guide RNA are delivered together within the plasmid vector.
The compositions and methods disclosed herein can be used to alter expression of genes of interest in a plant, such as genes involved in photosynthesis. Therefore, the expression of a gene encoding a protein involved in photosynthesis may be modulated as compared to a control plant. A "subject plant or plant cell" is one in which genetic alteration, such as a mutation, has been effected as to a gene of interest, or is a plant or plant cell which is descended from a plant or cell so altered and which comprises the alteration. A "control" or "control plant" or "control plant cell" provides a reference point for measuring changes in phenotype of the subject plant or plant cell. Thus, the expression levels are higher or lower than those in the control plant depending on the methods of the invention.
A control plant or plant cell may comprise, for example: (a) a wild-type plant or cell, i.e., of the same genotype as the starting material for the genetic alteration which resulted in the subject plant or cell; (b) a plant or plant cell of the same genotype as the starting material but which has been transformed with a null construct (i.e. with a construct which has no known effect on the trait of interest, such as a construct comprising a marker gene); (c) a plant or plant cell which is a non- transformed segregant among progeny of a subject plant or plant cell; (d) a plant or plant cell genetically identical to the subject plant or plant cell but which is not exposed to conditions or stimuli that would induce expression of the gene of interest; or (e) the subject plant or plant cell itself, under conditions in which the gene of interest is not expressed.
While the invention is described in terms of transformed plants, it is recognized that transformed organisms of the invention also include plant cells, plant protoplasts, plant cell tissue cultures from which plants can be regenerated, plant calli, plant clumps, and plant cells that are intact in plants or parts of plants such as embryos, pollen, ovules, seeds, leaves, flowers, branches, fruit, kernels, ears, cobs, husks, stalks, roots, root tips, anthers, and the like. Grain is intended to mean the mature seed produced by commercial growers for purposes other than growing or reproducing the species. Progeny, variants, and mutants of the regenerated plants are also included within the scope of the invention, provided that these parts comprise the introduced polynucleotides.
(f) Method for Using a Fusion Protein to Modify a Plant Sequence or Regulate Expression of a
Plant Sequence The methods disclosed herein further encompass modification of a nucleotide sequence or regulating expression of a nucleotide sequence in a plant cell, plant organelle, or plant embryo. The methods can comprise introducing into the plant cell or plant embryo at least one fusion protein or nucleic acid encoding at least one fusion protein, wherein the fusion protein comprises a Cpfl or Csml polypeptide or a fragment or variant thereof and an effector domain, and (b) at least one guide RNA or DNA encoding the guide RNA, wherein the guide RNA guides the Cpfl or Csml polypeptide of the fusion protein to a target site in the chromosomal sequence and the effector domain of the fusion protein modifies the chromosomal sequence or regulates expression of the chromosomal sequence.
Fusion proteins comprising a Cpfl or Csml polypeptide or a fragment or variant thereof and an effector domain are described herein. In general, the fusion proteins disclosed herein can further comprise at least one nuclear localization signal, plastid signal peptide, mitochondrial signal peptide, or signal peptide capable of trafficking proteins to multiple subcellular locations. Nucleic acids encoding fusion proteins are described herein. In some embodiments, the fusion protein can be introduced into the cell or embryo as an isolated protein (which can further comprise a cell-penetrating domain). Furthermore, the isolated fusion protein can be part of a protein-RNA complex comprising the guide RNA. In other embodiments, the fusion protein can be introduced into the cell or embryo as a RNA molecule (which can be capped and/or polyadenylated). In still other embodiments, the fusion protein can be introduced into the cell or embryo as a DNA molecule. For example, the fusion protein and the guide RNA can be introduced into the cell or embryo as discrete DNA molecules or as part of the same DNA molecule. Such DNA molecules can be plasmid vectors.
In some embodiments, the method further comprises introducing into the cell, organelle, or embryo at least one donor polynucleotide as described elsewhere herein. Means for introducing molecules into plant cells, organelles, or plant embryos, as well as means for culturing cells (including cells comprising organelles) or embryos are described herein.
In certain embodiments in which the effector domain of the fusion protein is a cleavage domain, the method can comprise introducing into the plant cell, organelle, or plant embryo one fusion protein (or nucleic acid encoding one fusion protein) and two guide RNAs (or DNA encoding two guide RNAs). The two guide RNAs direct the fusion protein to two different target sites in the chromosomal sequence, wherein the fusion protein dimerizes (e.g., forms a homodimer) such that the two cleavage domains can introduce a double stranded break into the chromosomal sequence. In embodiments in which the optional donor polynucleotide is not present, the double-stranded break in the chromosomal sequence can be repaired by a non-homologous end-joining (NHEJ) repair process. Because NHEJ is error-prone, deletions of at least one nucleotide, insertions of at least one nucleotide, substitutions of at least one nucleotide, or combinations thereof can occur during the repair of the break. Accordingly, the targeted chromosomal sequence can be modified or inactivated. For example, a single nucleotide change (SNP) can give rise to an altered protein product, or a shift in the reading frame of a coding sequence can inactivate or "knock out" the sequence such that no protein product is made. In embodiments in which the optional donor polynucleotide is present, the donor sequence in the donor polynucleotide can be exchanged with or integrated into the chromosomal sequence at the targeted site during repair of the double-stranded break. For example, in embodiments in which the donor sequence is flanked by upstream and downstream sequences having substantial sequence identity with upstream and downstream sequences, respectively, of the targeted site in the
chromosomal sequence, the donor sequence can be exchanged with or integrated into the chromosomal sequence at the targeted site during repair mediated by homology-directed repair process.
Alternatively, in embodiments in which the donor sequence is flanked by compatible overhangs (or the compatible overhangs are generated in situ by the Cpfl or Csml polypeptide) the donor sequence can be ligated directly with the cleaved chromosomal sequence by a non-homologous repair process during repair of the double-stranded break. Exchange or integration of the donor sequence into the
chromosomal sequence modifies the targeted chromosomal sequence or introduces an exogenous sequence into the chromosomal sequence of the plant cell, organelle, or embryo.
In other embodiments in which the effector domain of the fusion protein is a cleavage domain, the method can comprise introducing into the plant cell, organelle, or plant embryo two different fusion proteins (or nucleic acid encoding two different fusion proteins) and two guide RNAs (or DNA encoding two guide RNAs). The fusion proteins can differ as detailed elsewhere herein. Each guide RNA directs a fusion protein to a specific target site in the chromosomal sequence, wherein the fusion proteins can dimerize (e.g., form a heterodimer) such that the two cleavage domains can introduce a double stranded break into the chromosomal sequence. In embodiments in which the optional donor polynucleotide is not present, the resultant double-stranded breaks can be repaired by a nonhomologous repair process such that deletions of at least one nucleotide, insertions of at least one nucleotide, substitutions of at least one nucleotide, or combinations thereof can occur during the repair of the break. In embodiments in which the optional donor polynucleotide is present, the donor sequence in the donor polynucleotide can be exchanged with or integrated into the chromosomal sequence during repair of the double-stranded break by either a homology-based repair process (e.g., in embodiments in which the donor sequence is flanked by upstream and downstream sequences having substantial sequence identity with upstream and downstream sequences, respectively, of the targeted sites in the chromosomal sequence) or a non-homologous repair process (e.g., in embodiments in which the donor sequence is flanked by compatible overhangs).
In certain embodiments in which the effector domain of the fusion protein is a transcriptional activation domain or a transcriptional repressor domain, the method can comprise introducing into the plant cell, organelle, or plant embryo one fusion protein (or nucleic acid encoding one fusion protein) and one guide RNA (or DNA encoding one guide RNA). The guide RNA directs the fusion protein to a specific chromosomal sequence, wherein the transcriptional activation domain or a transcriptional repressor domain activates or represses expression, respectively, of a gene or genes located near the targeted chromosomal sequence. That is, transcription may be affected for genes in close proximity to the targeted chromosomal sequence or may be affected for genes located at further distance from the targeted chromosomal sequence. It is well-known in the art that gene transcription can be regulated by distantly located sequences that may be located thousands of bases away from the transcription start site or even on a separate chromosome (Harmston and Lenhard (2013) Nucleic Acids Res 41:7185- 7199).
In alternate embodiments in which the effector domain of the fusion protein is an epigenetic modification domain, the method can comprise introducing into the plant cell, organelle, or plant embryo one fusion protein (or nucleic acid encoding one fusion protein) and one guide RNA (or DNA encoding one guide RNA). The guide RNA directs the fusion protein to a specific chromosomal sequence, wherein the epigenetic modification domain modifies the structure of the targeted the chromosomal sequence. Epigenetic modifications include acetylation, methylation of histone proteins and/or nucleotide methylation. In some instances, structural modification of the chromosomal sequence leads to changes in expression of the chromosomal sequence.
V. Plants and Plant Cells Comprising a Genetic Modification
Provided herein are plants, plant cells, plant organelles, and plant embryos comprising at least one nucleotide sequence that has been modified using a Cpfl or Csml polypeptide-mediated or fusion protein-mediated process as described herein. Also provided are plant cells, organelles, and plant embryos comprising at least one DNA or RNA molecule encoding Cpfl or Csml polypeptide or fusion protein targeted to a chromosomal sequence of interest or a fusion protein, at least one guide RNA, and optionally one or more donor polynucleotide(s). The genetically modified plants disclosed herein can be heterozygous for the modified nucleotide sequence or homozygous for the modified nucleotide sequence. Plant cells comprising one or more genetic modifications in organellar DNA may be heteroplasmic or homoplasmic.
The modified chromosomal sequence of the plant, plant organelle, or plant cell may be modified such that it is inactivated, has up-regulated or down-regulated expression, or produces an altered protein product, or comprises an integrated sequence. The modified chromosomal sequence may be inactivated such that the sequence is not transcribed and/or a functional protein product is not produced. Thus, a genetically modified plant comprising an inactivated chromosomal sequence may be termed a "knock out" or a "conditional knock out." The inactivated chromosomal sequence can include a deletion mutation (i.e., deletion of one or more nucleotides), an insertion mutation (i.e., insertion of one or more nucleotides), or a nonsense mutation (i.e., substitution of a single nucleotide for another nucleotide such that a stop codon is introduced). As a consequence of the mutation, the targeted chromosomal sequence is inactivated and a functional protein is not produced. The inactivated chromosomal sequence comprises no exogenously introduced sequence. Also included herein are genetically modified plants in which two, three, four, five, six, seven, eight, nine, or ten or more chromosomal sequences are inactivated.
The modified chromosomal sequence can also be altered such that it codes for a variant protein product. For example, a genetically modified plant comprising a modified chromosomal sequence can comprise a targeted point mutation(s) or other modification such that an altered protein product is produced. In one embodiment, the chromosomal sequence can be modified such that at least one nucleotide is changed and the expressed protein comprises one changed amino acid residue (missense mutation). In another embodiment, the chromosomal sequence can be modified to comprise more than one missense mutation such that more than one amino acid is changed. Additionally, the chromosomal sequence can be modified to have a three nucleotide deletion or insertion such that the expressed protein comprises a single amino acid deletion or insertion. The altered or variant protein can have altered properties or activities compared to the wild type protein, such as altered substrate specificity, altered enzyme activity, altered kinetic rates, etc.
In some embodiments, the genetically modified plant can comprise at least one chromosomally integrated nucleotide sequence. A genetically modified plant comprising an integrated sequence may be termed a "knock in" or a "conditional knock in." The nucleotide sequence that is integrated sequence can, for example, encode an orthologous protein, an endogenous protein, or combinations of both. In one embodiment, a sequence encoding an orthologous protein or an endogenous protein can be integrated into a nuclear or organellar chromosomal sequence encoding a protein such that the chromosomal sequence is inactivated, but the exogenous sequence is expressed. In such a case, the sequence encoding the orthologous protein or endogenous protein may be operably linked to a promoter control sequence. Alternatively, a sequence encoding an orthologous protein or an endogenous protein may be integrated into a nuclear or organellar chromosomal sequence without affecting expression of a chromosomal sequence. For example, a sequence encoding a protein can be integrated into a "safe harbor" locus. The present disclosure also encompasses genetically modified plants in which two, three, four, five, six, seven, eight, nine, or ten or more sequences, including sequences encoding protein(s), are integrated into the genome. Any gene of interest as disclosed herein can be introduced integrated into the chromosomal sequence of the plant nucleus or organelle. In particular embodiments, genes that increase plant growth or yield are integrated into the chromosome.
The chromosomally integrated sequence encoding a protein can encode the wild type form of a protein of interest or can encode a protein comprising at least one modification such that an altered version of the protein is produced. For example, a chromosomally integrated sequence encoding a protein related to a disease or disorder can comprise at least one modification such that the altered version of the protein produced causes or potentiates the associated disorder. Alternatively, the chromosomally integrated sequence encoding a protein related to a disease or disorder can comprise at least one modification such that the altered version of the protein protects the plant against the development of the associated disease or disorder.
In certain embodiments, the genetically modified plant can comprise at least one modified chromosomal sequence encoding a protein such that the expression pattern of the protein is altered. For example, regulatory regions controlling the expression of the protein, such as a promoter or a transcription factor binding site, can be altered such that the protein is over-expressed, or the tissue- specific or temporal expression of the protein is altered, or a combination thereof. Alternatively, the expression pattern of the protein can be altered using a conditional knockout system. A non-limiting example of a conditional knockout system includes a Cre-lox recombination system. A Cre-lox recombination system comprises a Cre recombinase enzyme, a site-specific DNA recombinase that can catalyze the recombination of a nucleic acid sequence between specific sites (lox sites) in a nucleic acid molecule. Methods of using this system to produce temporal and tissue specific expression are known in the art.
VI. Methods for Modifying a Nucleotide Sequence in a Non-Plant Eukaryotic Genome and Non-Plant Eukaryotic Cells Comprising a Genetic Modification
Methods are provided herein for modifying a nucleotide sequence of a non-plant eukaryotic cell, or non-plant eukaryotic organelle. The methods comprise introducing into a target cell or organelle a DNA-targeting RNA or a DNA polynucleotide encoding a DNA-targeting RNA, wherein the DNA-targeting RNA comprises: (a) a first segment comprising a nucleotide sequence that is complementary to a sequence in the target DNA; and (b) a second segment that interacts with a Cpf 1 or Csml polypeptide and also introducing to the target cell or organelle a Cpfl or Csml polypeptide, or a polynucleotide encoding a Cpfl or Csml polypeptide, wherein the Cpfl or Csml polypeptide comprises: (a) an RNA-binding portion that interacts with the DNA-targeting RNA; and (b) an activity portion that exhibits site-directed enzymatic activity. The target cell or organelle can then be cultured under conditions in which the chimeric nuclease polypeptide is expressed and cleaves the nucleotide sequence. It is noted that the system described herein does not require the addition of exogenous Mg2+ or any other ions. Finally, a non-plant eukaryotic cell or organelle comprising the modified nucleotide sequence can be selected.
In some embodiments, the method can comprise introducing one Cpfl or Csml polypeptide (or encoding nucleic acid) and one guide RNA (or encoding DNA) into a non-plant eukaryotic cell or organelle wherein the Cpfl or Csml polypeptide introduces one double-stranded break in the target nucleotide sequence of the nuclear or organellar chromosomal DNA. In some embodiments, the method can comprise introducing one Cpfl or Csml polypeptide (or encoding nucleic acid) and at least one guide RNA (or encoding DNA) into a non-plant eukaryotic cell or organelle wherein the Cpfl or Csml polypeptide introduces more than one double-stranded break (i.e., two, three, or more than three double-stranded breaks) in the target nucleotide sequence of the nuclear or organellar chromosomal DNA. In embodiments in which an optional donor polynucleotide is not present, the double-stranded break in the nucleotide sequence can be repaired by a non-homologous end-joining (NHEJ) repair process. Because NHEJ is error-prone, deletions of at least one nucleotide, insertions of at least one nucleotide, substitutions of at least one nucleotide, or combinations thereof can occur during the repair of the break. Accordingly, the targeted nucleotide sequence can be modified or inactivated. For example, a single nucleotide change (SNP) can give rise to an altered protein product, or a shift in the reading frame of a coding sequence can inactivate or "knock out" the sequence such that no protein product is made. In embodiments in which the optional donor polynucleotide is present, the donor sequence in the donor polynucleotide can be exchanged with or integrated into the nucleotide sequence at the targeted site during repair of the double-stranded break. For example, in embodiments in which the donor sequence is flanked by upstream and downstream sequences having substantial sequence identity with upstream and downstream sequences, respectively, of the targeted site in the nucleotide sequence of the non-plant eukaryotic cell or organelle, the donor sequence can be exchanged with or integrated into the nucleotide sequence at the targeted site during repair mediated by homology-directed repair process. Alternatively, in embodiments in which the donor sequence is flanked by compatible overhangs (or the compatible overhangs are generated in situ by the Cpf 1 or Csml polypeptide) the donor sequence can be ligated directly with the cleaved nucleotide sequence by a non-homologous repair process during repair of the double-stranded break. Exchange or integration of the donor sequence into the nucleotide sequence modifies the targeted nucleotide sequence or introduces an exogenous sequence into the targeted nucleotide sequence of the non-plant eukaryotic cell or organelle.
In some embodiments, the double- stranded breaks caused by the action of the Cpfl or Csml nuclease or nucleases are repaired in such a way that DNA is deleted from the chromosome of the non- plant eukaryotic cell or organelle. In some embodiments one base, a few bases (i.e., 2, 3, 4, 5, 6, 7, 8, 9, or 10 bases), or a large section of DNA (i.e., more than 10, more than 50, more than 100, or more than 500 bases) is deleted from the chromosome of the non-plant eukaryotic cell or organelle.
In some embodiments, the expression of non-plant eukaryotic genes may be modulated as a result of the double-stranded breaks caused by the Cpfl or Csml nuclease or nucleases. In some embodiments, the expression of non-plant eukaryotic genes may be modulated by variant Cpfl or Csml enzymes comprising a mutation that renders the Cpfl or Csml nuclease incapable of producing a double-stranded break. In some preferred embodiments, the variant Cpfl or Csml nuclease comprising a mutation that renders the Cpfl or Csml nuclease incapable of producing a double- stranded break may be fused to a transcriptional activation or transcriptional repression domain.
In some embodiments, a eukaryotic cell comprising mutations in its nuclear and/or organellar chromosomal DNA caused by the action of a Cpfl or Csml nuclease or nucleases is cultured to produce a eukaryotic organism. In some embodiments, a eukaryotic cell in which gene expression is modulated as a result of one or more Cpfl or Csml nucleases, or one or more variant Cpfl or Csml nucleases, is cultured to produce a eukaryotic organism. Methods for culturing non-plant eukaryotic cells to produce eukaryotic organisms are known in the art, for instance in U.S. Patent Applications 2016/0208243 and 2016/0138008, herein incorporated by reference.
The present invention may be used for transformation of any eukaryotic species, including, but not limited to animals (including but not limited to mammals, insects, fish, birds, and reptiles), fungi, amoeba, and yeast.
Methods for the introduction of nuclease proteins, DNA or RNA molecules encoding nuclease proteins, guide RNAs or DNA molecules encoding guide RNAs, and optional donor sequence DNA molecules into non-plant eukaryotic cells or organelles are known in the art, for instance in U.S. Patent Application 2016/0208243, herein incorporated by reference. Exemplary genetic modifications to non- plant eukaryotic cells or organelles that may be of particular value for industrial applications are also known in the art, for instance in U.S. Patent Application 2016/0208243, herein incorporated by reference.
VII. Methods for Modifying a Nucleotide Sequence in a Prokaryotic Genome and Prokaryotic Cells Comprising a Genetic Modification
Methods are provided herein for modifying a nucleotide sequence of a prokaryotic (e.g., bacterial or archeal) cell. The methods comprise introducing into a target cell a DNA-targeting RNA or a DNA polynucleotide encoding a DNA-targeting RNA, wherein the DNA-targeting RNA comprises: (a) a first segment comprising a nucleotide sequence that is complementary to a sequence in the target DNA; and (b) a second segment that interacts with a Cpfl or Csml polypeptide and also introducing to the target cell a Cpfl or Csml polypeptide, or a polynucleotide encoding a Cpfl or Csml polypeptide, wherein the Cpfl or Csml polypeptide comprises: (a) an RNA-binding portion that interacts with the DNA-targeting RNA; and (b) an activity portion that exhibits site-directed enzymatic activity. The target cell can then be cultured under conditions in which the Cpfl or Csml polypeptide is expressed and cleaves the nucleotide sequence. It is noted that the system described herein does not require the addition of exogenous Mg2+ or any other ions. Finally, prokaryotic cells comprising the modified nucleotide sequence can be selected. It is further noted that he prokaryotic cells comprising the modified nucleotide sequence or sequences are not the natural host cells of the polynucleotides encoding the Cpfl or Csml polypeptide of interest, and that a non-naturally occurring guide RNA is used to effect the desired changes in the prokaryotic nucleotide sequence or sequences. It is further noted that the targeted DNA may be present as part of the prokaryotic chromosome(s) or may be present on one or more plasmids or other non-chromosomal DNA molecules in the prokaryotic cell.
In some embodiments, the method can comprise introducing one Cpfl or Csml polypeptide (or encoding nucleic acid) and one guide RNA (or encoding DNA) into a prokaryotic cell wherein the Cpfl or Csml polypeptide introduces one double- stranded break in the target nucleotide sequence of the prokaryotic cellular DNA. In some embodiments, the method can comprise introducing one Cpfl or Csml polypeptide (or encoding nucleic acid) and at least one guide RNA (or encoding DNA) into a prokaryotic cell wherein the Cpfl or Csml polypeptide introduces more than one double-stranded break (i.e., two, three, or more than three double-stranded breaks) in the target nucleotide sequence of the prokaryotic cellular DNA. In embodiments in which an optional donor polynucleotide is not present, the double-stranded break in the nucleotide sequence can be repaired by a non-homologous end-joining (NHEJ) repair process. Because NHEJ is error-prone, deletions of at least one nucleotide, insertions of at least one nucleotide, substitutions of at least one nucleotide, or combinations thereof can occur during the repair of the break. Accordingly, the targeted nucleotide sequence can be modified or inactivated. For example, a single nucleotide change (SNP) can give rise to an altered protein product, or a shift in the reading frame of a coding sequence can inactivate or "knock out" the sequence such that no protein product is made. In embodiments in which the optional donor polynucleotide is present, the donor sequence in the donor polynucleotide can be exchanged with or integrated into the nucleotide sequence at the targeted site during repair of the double-stranded break. For example, in embodiments in which the donor sequence is flanked by upstream and downstream sequences having substantial sequence identity with upstream and downstream sequences,
respectively, of the targeted site in the nucleotide sequence of the prokaryotic cell, the donor sequence can be exchanged with or integrated into the nucleotide sequence at the targeted site during repair mediated by homology-directed repair process. Alternatively, in embodiments in which the donor sequence is flanked by compatible overhangs (or the compatible overhangs are generated in situ by the Cpfl or Csml polypeptide) the donor sequence can be ligated directly with the cleaved nucleotide sequence by a non-homologous repair process during repair of the double-stranded break. Exchange or integration of the donor sequence into the nucleotide sequence modifies the targeted nucleotide sequence or introduces an exogenous sequence into the targeted nucleotide sequence of the prokaryotic cellular DNA. In some embodiments, the double- stranded breaks caused by the action of the Cpfl or Csml nuclease or nucleases are repaired in such a way that DNA is deleted from the prokaryotic cellular DNA. In some embodiments one base, a few bases (i.e., 2, 3, 4, 5, 6, 7, 8, 9, or 10 bases), or a large section of DNA (i.e., more than 10, more than 50, more than 100, or more than 500 bases) is deleted from the prokaryotic cellular DNA.
In some embodiments, the expression of prokaryotic genes may be modulated as a result of the double-stranded breaks caused by the Cpfl or Csml nuclease or nucleases. In some embodiments, the expression of prokaryotic genes may be modulated by variant Cpfl or Csml nucleases comprising a mutation that renders the Cpfl or Csml nuclease incapable of producing a double-stranded break. In some preferred embodiments, the variant Cpfl or Csml nuclease comprising a mutation that renders the Cpfl or Csml nuclease incapable of producing a double-stranded break may be fused to a transcriptional activation or transcriptional repression domain.
The present invention may be used for transformation of any prokaryotic species, including, but not limited to, cyanobacteria, Corynebacterium sp., Bifidobacterium sp., Mycobacterium sp.,
Streptomyces sp., Thermobifida sp., Chlamydia sp., Prochlorococcus sp., Synechococcus sp.,
Thermosynechococcus sp., Thermus sp., Bacillus sp., Clostridium sp., Geobacillus sp., Lactobacillus sp., Listeria sp., Staphylococcus sp., Streptococcus sp., Fusobacterium sp., Agrobacterium sp., Bradyrhizobium sp., Ehrlichia sp., Mesorhizobium sp., Nitrobacter sp., Rickettsia sp., Wolbachia sp., Zymomonas sp., Burkholderia sp., Neisseria sp., Ralstonia sp., Acinetobacter sp., Erwinia sp., Escherichia sp., Haemophilus sp., Legionella sp., Pasteurella sp., Pseudomonas sp., Psychrobacter sp., Salmonella sp., Shewanella sp., Shigella sp., Vibrio sp., Xanthomonas sp., Xylella sp., Yersinia sp., Campylobacter sp., Desulfovibrio sp., Helicobacter sp., Geobacter sp., Leptospira sp., Treponema sp., Mycoplasma sp., and Thermotoga sp.
Methods for the introduction of nuclease proteins, DNA or RNA molecules encoding nuclease proteins, guide RNAs or DNA molecules encoding guide RNAs, and optional donor sequence DNA molecules into prokaryotic cells or organelles are known in the art, for instance in U.S. Patent
Application 2016/0208243, herein incorporated by reference. Exemplary genetic modifications to prokaryotic cells that may be of particular value for industrial applications are also known in the art, for instance in U.S. Patent Application 2016/0208243, herein incorporated by reference.
All publications and patent applications mentioned in the specification are indicative of the level of skill of those skilled in the art to which this invention pertains. All publications and patent applications are herein incorporated by reference to the same extent as if each individual publication or patent application was specifically and individually indicated to be incorporated by reference.
Although the foregoing invention has been described in some detail by way of illustration and example for purposes of clarity of understanding, it will be obvious that certain changes and modifications may be practiced within the scope of the appended claims.
Embodiments of the invention include:
1. A method of modifying a nucleotide sequence at a target site in the genome of a eukaryotic cell comprising:
introducing into said eukaryotic cell
(i) a DNA-targeting RNA, or a DNA polynucleotide encoding a DNA-targeting RNA, wherein the DNA-targeting RNA comprises: (a) a first segment comprising a nucleotide sequence that is complementary to a sequence in the target DNA; and (b) a second segment that interacts with a Cpfl or Csml polypeptide; and
(ii) a Cpfl or Csml polypeptide, or a polynucleotide encoding a Cpfl or Csml polypeptide, wherein the Cpfl or Csml polypeptide comprises: (a) an RNA-binding portion that interacts with the DNA-targeting RNA; and (b) an activity portion that exhibits site- directed enzymatic activity.
2. A method of modifying a nucleotide sequence at a target site in the genome of a prokaryotic cell comprising:
introducing into said prokaryotic cell
(i) a DNA-targeting RNA, or a DNA polynucleotide encoding a DNA-targeting RNA, wherein the DNA-targeting RNA comprises: (a) a first segment comprising a nucleotide sequence that is complementary to a sequence in the target DNA; and (b) a second segment that interacts with a Cpfl or Csml polypeptide; and
(ii) a Cpfl or Csml polypeptide, or a polynucleotide encoding a Cpfl or Csml polypeptide, wherein the Cpfl or Csml polypeptide comprises: (a) an RNA-binding portion that interacts with the DNA-targeting RNA; and (b) an activity portion that exhibits site- directed enzymatic activity,
wherein said prokaryotic cell is not the native host of a gene encoding said Cpfl or Csml polypeptide. 3. A method of modifying a nucleotide sequence at a target site in the genome of a plant cell comprising:
introducing into said plant cell
(i) a DNA-targeting RNA, or a DNA polynucleotide encoding a DNA-targeting RNA, wherein the DNA-targeting RNA comprises: (a) a first segment comprising a nucleotide sequence that is complementary to a sequence in the target DNA; and (b) a second segment that interacts with a Cpfl or Csml polypeptide; and
(ii) a Cpfl or Csml polypeptide, or a polynucleotide encoding a Cpfl or Csml polypeptide, wherein the Cpfl or Csml polypeptide comprises: (a) an RNA-binding portion that interacts with the DNA-targeting RNA; and (b) an activity portion that exhibits site- directed enzymatic activity.
4. The method of any one of embodiments 1-3, further comprising:
culturing the plant under conditions in which the Cpfl or Csml polypeptide is expressed and cleaves the nucleotide sequence at the target site to produce a modified nucleotide sequence; and
selecting a plant comprising said modified nucleotide sequence.
5. The method of any one of embodiments 1-4, wherein cleaving of the nucleotide sequence at the target site comprises a double strand break at or near the sequence to which the DNA-targeting RNA sequence is targeted.
6. The method of embodiment 5, wherein said double strand break is a staggered double strand break.
7. The method of embodiment 6, wherein said staggered double strand break creates a 5' overhang of 3-6 nucleotides.
8. The method of any one of embodiments 1-7, wherein said DNA-targeting RNA is a guide RNA (gRNA).
9. The method of any one of embodiments 1-8, wherein said modified nucleotide sequence comprises insertion of heterologous DNA into the genome of the cell, deletion of a nucleotide sequence from the genome of the cell, or mutation of at least one nucleotide in the genome of the cell.
10. The method of any one of embodiments 1-9, wherein said Cpfl or Csml polypeptide is selected from the group consisting of: SEQ ID NOs: 3, 6, 9, 12, 15, 18, 20, 23, 106-173, and 230-236. 11. The method of any one of embodiments 1-10, wherein said polynucleotide encoding a Cpfl or Csml polypeptide is selected from the group of SEQ ID NOs: 4, 5, 7, 8, 10, 11, 13, 14, 16, 17, 19, 21, 22, 24, 25, and 174-206.
12. The method of any one of embodiments 1-11, wherein said Cpfl or Csml polypeptide has at least 80% identity with one or more polypeptide sequences selected from the group of SEQ ID NOs: 3, 6, 9, 12, 15, 18, 20, 23, 106-173, and 230-236.
13. The method of any one of embodiments 1-12, wherein said polynucleotide encoding a Cpfl or Csml polypeptide has at least 70% identity with one or more nucleic acid sequences selected from the group of SEQ ID NOs: 4, 5, 7, 8, 10, 11, 13, 14, 16, 17, 19, 21, 22, 24, 25, and 174-206.
14. The method of any one of embodiments 1-13, wherein the Cpfl or Csml polypeptide forms a homodimer or heterodimer.
15. The method of any one of embodiments 1-14, wherein said plant cell is from a
monocotyledonous species.
16. The method of any one of embodiments 1-14, wherein said plant cell is from a dicotyledonous species.
17. The method of any one of embodiments 1-16, wherein the expression of the Cpfl or Csml polypeptide is under the control of an inducible or constitutive promoter.
18. The method of any one of embodiments 1-17, wherein the expression of the Cpfl or Csml polypeptide is under the control of a cell type-specific or developmentally-preferred promoter.
19. The method of any one of embodiments 1-18, wherein the PAM sequence comprises 5'-TTN, wherein N can be any nucleotide.
20. The method of any one of embodiments 1-19, wherein said nucleotide sequence at a target site in the genome of a cell encodes an SBPase, FBPase, FBP aldolase, AGPase large subunit, AGPase small subunit, sucrose phosphate synthase, starch synthase, PEP carboxylase, pyruvate phosphate dikinase, transketolase, rubisco small subunit, or rubisco activase protein, or encodes a transcription factor that regulates the expression of one or more genes encoding an SBPase, FBPase, FBP aldolase, AGPase large subunit, AGPase small subunit, sucrose phosphate synthase, starch synthase, PEP carboxylase, pyruvate phosphate dikinase, transketolase, rubisco small subunit, or rubisco activase protein.
21. The method of any one of embodiments 1-20, the method further comprising contacting the target site with a donor polynucleotide, wherein the donor polynucleotide, a portion of the donor polynucleotide, a copy of the donor polynucleotide, or a portion of a copy of the donor polynucleotide integrates into the target DNA.
22. The method of any one of embodiments 1-21, wherein the target DNA is modified such that nucleotides within the target DNA are deleted.
23. The method of any one of embodiments 1-22, wherein said polynucleotide encoding a Cpfl or Csml polypeptide is codon optimized for expression in a plant cell.
24. The method of any one of embodiments 1-23, wherein the expression of said nucleotide sequence is increased or decreased.
25. The method of any one of embodiments 1-24, wherein the polynucleotide encoding a Cpfl or Csml polypeptide is operably linked to a promoter that is constitutive, cell specific, inducible, or activated by alternative splicing of a suicide exon.
26. The method of any one of embodiments 1-25, wherein said Cpfl or Csml polypeptide comprises one or more mutations that reduce or eliminate the nuclease activity of said Cpfl or Csml polypeptide.
27. The method of embodiment 26, wherein said mutated Cpfl or Csml polypeptide comprises a mutation in a position corresponding to positions 917 or 1006 of FnCpfl (SEQ ID NO:3) or to positions 701 or 922 of SmCsml (SEQ ID NO: 160) when aligned for maximum identity, or wherein said mutated Cpfl or Csml polypeptide comprises a mutation at positions 917 and 1006 of FnCpfl (SEQ ID NO:3) or positions 701 and 922 of SmCsml (SEQ ID NO: 160) when aligned for maximum identity.
28. The method of embodiment 27, wherein said mutations in positions corresponding to positions 917 or 1006 of FnCpfl (SEQ ID NO: 3) are D917A and E1006A, respectively, or wherein said mutations in positions corresponding to positions 701 or 922 of SmCsml (SEQ ID NO: 160) are D701A and E922A, respectively.
29. The method of any one of embodiments 26-28, wherein said mutated Cpfl or Csml polypeptide comprises the amino acid sequence set forth in the group of SEQ ID NOs: 26-41 and 63- 70.
30. The method of any one of embodiments 26-29, wherein the mutated Cpfl or Csml polypeptide is fused to a transcriptional activation domain. 31. The method of embodiment 30, wherein the mutated Cpfl or Csml polypeptide is directly fused to a transcriptional activation domain or fused to a transcriptional activation domain with a linker.
32. The method of any one of embodiments 26-29, wherein the mutated Cpfl or Csml polypeptide is fused to a transcriptional repressor domain.
33. The method of embodiment 32, wherein the mutated Cpfl or Csml polypeptide is fused to a transcriptional repressor domain with a linker.
34. The method of any one of embodiments 1-33 wherein said Cpfl or Csml polypeptide further comprises a nuclear localization signal.
35. The method of embodiment 34 wherein said nuclear localization signal comprises SEQ ID NO: 1, or is encoded by SEQ ID NO:2.
36. The method of any one of embodiments 1-33 wherein said Cpfl or Csml polypeptide further comprises a chloroplast signal peptide.
37. The method of any one of embodiments 1-33 wherein said Cpfl or Csml polypeptide further comprises a mitochondrial signal peptide.
38. The method of any one of embodiments 1-33 wherein said Cpfl or Csml polypeptide further comprises a signal peptide that targets said Cpfl or Csml polypeptide to multiple subcellular locations.
39. A nucleic acid molecule comprising a polynucleotide sequence encoding a Cpfl or Csml polypeptide, wherein said polynucleotide sequence has been codon optimized for expression in a plant cell.
40. A nucleic acid molecule comprising a polynucleotide sequence encoding a Cpfl or Csml polypeptide, wherein said polynucleotide sequence has been codon optimized for expression in a eukaryotic cell.
41. A nucleic acid molecule comprising a polynucleotide sequence encoding a Cpfl or Csml polypeptide, wherein said polynucleotide sequence has been codon optimized for expression in a prokaryotic cell, wherein said prokaryotic cell is not the natural host of said Cpfl or Csml polypeptide.
42. The nucleic acid molecule of any one of embodiments 39-41, wherein said polynucleotide sequence is selected from the group consisting of: SEQ ID NOs: 4, 5, 7, 8, 10, 11, 13, 14, 16, 17, 19, 21, 22, 24, 25, and 174-206 or a fragment or variant thereof, or wherein said polynucleotide sequence encodes a Cpfl or Csml polypeptide selected from the group consisting of SEQ ID NOs: 3, 6, 9, 12, 15, 18, 20, 23, 106-173, and 230-236, and wherein said polynucleotide sequence encoding a Cpfl or Csml polypeptide is operably linked to a promoter that is heterologous to the polynucleotide sequence encoding a Cpfl or Csml polypeptide.
43. The nucleic acid molecule of any one of embodiments 39-41, wherein said variant
polynucleotide sequence has at least 70% sequence identity to a polynucleotide sequence selected from the group consisting of: SEQ ID NOs: 4, 5, 7, 8, 10, 11, 13, 14, 16, 17, 19, 21, 22, 24, 25, and 174-206, or wherein said polynucleotide sequence encodes a Cpfl or Csml polypeptide that has at least 80% sequence identity to a polypeptide selected from the group consisting of SEQ ID NOs: 3, 6, 9, 12, 15, 18, 20, 23, 106-173, and 230-236, and wherein said polynucleotide sequence encoding a Cpfl or Csml polypeptide is operably linked to a promoter that is heterologous to the polynucleotide sequence encoding a Cpfl or Csml polypeptide.
44. The nucleic acid molecule of any one of embodiments 39-41, wherein said Cpfl or Csml polypeptide comprises an amino acid sequence selected from the group consisting of: SEQ ID NOs: 3, 6, 9, 12, 15, 18, 20, 23, 106-173, and 230-236, or a fragment or variant thereof.
45. The nucleic acid molecule of embodiment 44, wherein said variant polypeptide sequence has at least 70% sequence identity to a polypeptide sequence selected from the group consisting of: SEQ ID
NOs: 3, 6, 9, 12, 15, 18, 20, 23, 106-173, and 230-236.
46. The nucleic acid molecule of any one of embodiments 39-45, wherein said polynucleotide sequence encoding a Cpfl or Csml polypeptide is operably linked to a promoter that is active in a plant cell.
47. The nucleic acid molecule of any one of embodiments 39-45, wherein said polynucleotide sequence encoding a Cpfl or Csml polypeptide is operably linked to a promoter that is active in a eukaryotic cell.
48. The nucleic acid molecule of any one of embodiments 39-45, wherein said polynucleotide sequence encoding a Cpfl or Csml polypeptide is operably linked to a promoter that is active in a prokaryotic cell.
49. The nucleic acid molecule of any one of embodiments 39-45, wherein said polynucleotide sequence encoding a Cpfl or Csml polypeptide is operably linked to a constitutive promoter, inducible promoter, cell type-specific promoter, or developmentally-preferred promoter.
50. The nucleic acid molecule of any one of embodiments 39-45, wherein said nucleic acid molecule encodes a fusion protein comprising said Cpfl or Csml polypeptide and an effector domain. 51. The nucleic acid molecule of embodiment 50, wherein said effector domain is selected from the group consisting of: transcriptional activator, transcriptional repressor, nuclear localization signal, and cell penetrating signal.
52. The nucleic acid molecule of embodiment 51, wherein said Cpfl or Csml polypeptide is mutated to reduce or eliminate nuclease activity.
53. The nucleic acid molecule of embodiment 52, wherein said mutated Cpfl or Csml polypeptide comprises a mutation in a position corresponding to positions 917 or 1006 of FnCpfl (SEQ ID NO:3) or to positions 701 or 922 of SmCsml (SEQ ID NO: 160) when aligned for maximum identity, or wherein said mutated Cpfl or Csml polypeptide comprises a mutation at positions corresponding to positions 917 and 1006 of FnCpfl (SEQ ID NO:3) or positions 701 and 922 of SmCsml (SEQ ID NO: 160) when aligned for maximum identity.
54. The nucleic acid molecule of any one of embodiments 50-53, wherein said Cpfl or Csml polypeptide is fused to said effector domain with a linker.
55. The nucleic acid molecule of any one of embodiments 39-54, wherein said Cpfl or Csml polypeptide forms a dimer.
56. A fusion protein encoded by the nucleic acid molecule of any one of embodiments 50-55.
57. A Cpfl or Csml polypeptide encoded by the nucleic acid molecule of any one of embodiments 39-45.
58. A Cpfl or Csml polypeptide mutated to reduce or eliminate nuclease activity.
59. The Cpfl or Csml polypeptide of embodiment 58, wherein said mutated Cpfl or Csml polypeptide comprises a mutation in a position corresponding to positions 917 or 1006 of FnCpfl (SEQ ID NO:3) or to positions 701 or 922 of SmCsml (SEQ ID NO: 160) when aligned for maximum identity or wherein said mutated Cpfl or Csml polypeptide comprises mutations at positions corresponding to positions 917 and 1006 of FnCpfl (SEQ ID NO:3) or positions 701 and 922 of SmCsml (SEQ ID NO: 160) when aligned for maximum identity.
60. A plant cell, eukaryotic cell, or prokaryotic cell comprising the nucleic acid molecule of any one of embodiments 39-55.
61. A plant cell, eukaryotic cell, or prokaryotic cell comprising the fusion protein or polypeptide of any one of embodiments 56-59.
62. A plant cell produced by the method of any one of embodiments 1 and 3-38.
63. A plant comprising the nucleic acid molecule of any one of embodiments 39-55. 64. A plant comprising the fusion protein or polypeptide of any one of embodiments 56-59.
65. A plant produced by the method of any one of embodiments 1 and 3-38.
66. The seed of the plant of any one of embodiments 63-65.
67. The method of any one of embodiments 1 and 3-38 wherein said modified nucleotide sequence comprises insertion of a polynucleotide that encodes a protein conferring antibiotic or herbicide tolerance to transformed cells.
68. The method of embodiment 67 wherein said polynucleotide that encodes a protein conferring antibiotic or herbicide tolerance comprises SEQ ID NO: 76, or encodes a protein that comprises SEQ ID NO:77.
69. The method of any one of embodiments 3-38 wherein said target site in the genome of a plant cell comprises SEQ ID NO:71, or shares at least 80% identity with a portion or fragment of SEQ ID NO:71.
70. The method of any one of embodiments 1-38 wherein said DNA polynucleotide encoding a DNA-targeting RNA comprises SEQ ID NO:73, SEQ ID NO:91, SEQ ID NO:92, SEQ ID NO:93, or
SEQ ID NO:95.
71. The nucleic acid molecule of any one of embodiments 39-55 wherein said polynucleotide sequence encoding a Cpf 1 or Csml polypeptide further comprises a polynucleotide sequence encoding a nuclear localization signal.
72. The nucleic acid molecule of embodiment 71 wherein said nuclear localization signal comprises SEQ ID NO: l or is encoded by SEQ ID NO:2.
73. The nucleic acid molecule of any one of embodiments 39-55 wherein said polynucleotide sequence encoding a Cpf 1 or Csml polypeptide further comprises a polynucleotide sequence encoding a chloroplast signal peptide.
74. The nucleic acid molecule of any one of embodiments 39-55 wherein said polynucleotide sequence encoding a Cpf 1 or Csml polypeptide further comprises a polynucleotide sequence encoding a mitochondrial signal peptide.
75. The nucleic acid molecule of any one of embodiments 39-55 wherein said polynucleotide sequence encoding a Cpf 1 or Csml polypeptide further comprises a polynucleotide sequence encoding a signal peptide that targets said Cpfl or Csml polypeptide to multiple subcellular locations. 76. The fusion protein of embodiment 56 wherein said fusion protein further comprises a nuclear localization signal, chloroplast signal peptide, mitochondrial signal peptide, or signal peptide that targets said Cpfl or Csml polypeptide to multiple subcellular locations.
77. The Cpfl or Csml polypeptide of any one of embodiments 57-59 wherein said Cpfl or Csml polypeptide further comprises a nuclear localization signal, chloroplast signal peptide, mitochondrial signal peptide, or signal peptide that targets said Cpfl or Csml polypeptide to multiple subcellular locations.
The following examples are offered by way of illustration and not by way of limitation.
EXPERIMENTAL
Example 1 - Cloning cpfl constructs
Cpfl -containing constructs (Construct numbers 131306-131311 and 131313) are summarized in Table 1. Briefly, the cpfl genes were de novo synthesized by GenScript (Piscataway, NJ) and amplified by PCR to add an N-terminal SV40 nuclear localization tag (SEQ ID NO: 2) in frame with the cpfl coding sequence of interest as well as restriction enzyme sites for cloning. Using the appropriate restriction enzyme sites, each individual cpfl gene was cloned downstream of the 2x35s promoter (SEQ ID NO: 43).
Guide RNAs targeted to a region of DNA spanning the junction between the promoter and the 5' end of the GFP coding region were synthesized by Integrated DNA Technologies (Coralville, IA) as complete cassettes. Each cassette included a rice U3 promoter (SEQ ID NO:42) operationally linked to the appropriate gRNA (SEQ ID NOs:47-53) that was operationally linked to the rice U3 terminator (SEQ ID NO:44). While each gRNA was targeted to the same region of the promoter and GFP gene, each gRNA was designed to ensure that it included the appropriate scaffold to interact correctly with its respective Cpfl enzyme.
Constructs were assembled and cloned into a modified pSB 11 vector backbone containing the hptll gene that can confer hygromycin b resistance in plants (SEQ ID NO:45). The hptll gene was situated downstream from the maize ubiquitin promoter and 5'UTR (pZmUbi; SEQ ID NO:46).
Table 1: Cpfl vectors
Construct Cpfl Cpfl eene1 Cpfl sRNA eRNA sequence gRNA Number promoter Terminator promoter terminator
131306 2X 35S Francisella tularensis 35S poly A rice U3 (SEQ Francisella GFP rice U3 (SEQ ID
(SEQ ID subsp. novicida U112 (SEQ ID gRNA (SEQ ID NO: NO: 43) (SEQ ID NO: 5) NO: 54) ID NO: 42) 47) NO: 44)
131307 2X 35S Acidaminococcus sp. 35S poly A rice U3 (SEQ Acidaminococcus GFP rice U3 (SEQ ID
(SEQ ID BV3L6 (SEQ ID NO: 8) (SEQ ID ID NO: 42) gRNA (SEQ ID NO: NO: 44) NO: 43) NO: 54) 48)
131308 2X 35S Lachno spiraceae 35S poly A rice U3 (SEQ LachnosMA2020 GFP rice U3 (SEQ ID
(SEQ ID bacterium MA2020 (SEQ (SEQ ID ID NO: 42) gRNA (SEQ ID NO: NO: 44) NO: 43) ID NO: 11) NO: 54) 49)
131309 2X 35S Candidatus 35S poly A rice U3 (SEQ Candidatus GFP rice U3 (SEQ ID
(SEQ ID Methanoplasma termitum (SEQ ID ID NO: 42) gRNA (SEQ ID NO: NO: 44) NO: 43) (SEQ ID NO: 14) NO: 54) 50)
131310 2X 35S Moraxella bovoculi 237 35S poly A rice U3 (SEQ Moraxella GFP gRNA rice U3 (SEQ ID
(SEQ ID (SEQ ID NO: 17) (SEQ ID ID NO: 42) (SEQ ID NO: 51) NO: 44) NO: 43) NO: 54)
131311 2X 35S Lachno spiraceae 35S poly A rice U3 (SEQ LanchnosND2006 rice U3 (SEQ ID
(SEQ ID bacterium ND2006 (SEQ (SEQ ID ID NO: 42) GFP gRNA (SEQ ID NO: 44) NO: 43) ID NO: 19) NO: 54) NO: 52)
131313 2X 35S Prevotella disiens (SEQ 35S poly A rice U3 (SEQ Prevo GFP gRNA rice U3 (SEQ ID
(SEQ ID ID NO: 25) (SEQ ID ID NO: 42) (SEQ ID NO: 53) NO: 44) NO: 43) NO: 54) encoding the amino acid sequence of SEQ ID NO: 1) at its 5' end.
Example 2 - Agrobacterium-mediated Rice transformation
Rice (Oryza sativa cv. Kitaake) calli were infected with Agrobacterium cells harboring a super binary plasmid that contained a gene encoding green fluorescent protein (GFP; SEQ ID NO: 55 encoding SEQ ID NO: 56) operably linked to a constitutive promoter. Three infected calli showing high levels of GFP-derived fluorescence based on visual inspection were selected and divided into multiple sections. These sections were allowed to propagate on selection media. After the callus pieces were allowed to recover and grow larger, these calli were re-infected with Agrobacterium cells harboring genes encoding Cpf 1 enzymes and their respective guide RNAs (gRNAs). Following infection with the cpf 1 -containing vectors, the calli were propagated on selection medium containing hygromycin b. Callus pieces that putatively expressed functional Cpf 1 proteins were selected visually by inspecting the callus pieces for regions that were no longer visibly fluorescent. This loss of fluorescence was likely to result from successful Cpf 1 -mediated editing of the GFP-encoding sequence, resulting in a non-functional GFP gene. For example, rice callus transformed first with a
GFP construct and then with construct 131307, containing a gene encoding the Cpfl protein from
Acidaminococcus sp. BV3L6 (SEQ ID NO: 8, encoding SEQ ID NO: 6) resulted in parts of the callus showing an apparent loss of GFP-derived fluorescence. Those rice callus pieces that contained clusters of cells that did not exhibit GFP-derived fluorescence were prioritized for more in-depth molecular characterization. Example 3 - T7EI Assay
The T7 endonuclease I (T7EI) assay is used to identify samples with insertions and/or deletions at the desired location and to assess targeting efficiency of genome editing enzymes. The assay protocol is modified from Shan et al (2014) Nature Protocols 9: 2395-2410. The basis of the assay is that T7EI recognizes and cleaves non-perfectly matched DNA. Briefly, a PCR reaction is performed to amplify a region of DNA that contains the DNA sequence targeted by the gRNA. As both edited and unedited DNA are expected to be included in the sample, a mixture of PCR products is obtained. The PCR products are melted and then allowed to re-anneal. When an unedited PCR product re-anneals with an edited PCR product, a DNA mismatch results. These DNA mismatches are digested by T7EI and can be identified by gel-based assays. DNA is extracted from rice callus that appeared to exhibit a loss of GFP-derived fluorescence. PCR is performed with this DNA as a template using primers designed to amplify a region of DNA spanning the junction between the promoter and the GFP open reading frame. PCR products are melted and re-annealed, then digested with T7EI (New England Biolabs, Ipswich, MA) according to the manufacturer's protocol. The resulting DNA is
electrophoresed on a 2% agarose gel. In those samples where Cpfl produced an insertion or deletion at the desired location, the initial band is digested to produce two smaller bands.
Example 4 - Sequencing of DNA from Rice Callus
DNA extracted from rice callus that appeared, based on visual inspection for loss of fluorescence and/or based on results of T7EI assays, to comprise genomic DNA edited as a result of the accumulation of functional Cpfl enzyme is selected for sequence -based analysis. DNA is extracted from the appropriate rice callus pieces and primers are used to PCR-amplify the GFP coding sequence from this DNA. The resulting PCR products are cloned into plasmids that are subsequently
transformed into E. coli cells. These plasmids are recovered and Sanger sequencing is used to analyze the DNA to identify insertions, deletions, and/or point mutations in the GFP-encoding DNA.
Example 5 - Using deactivated Cpfl proteins to modulate gene expression The RuvC-like domain of Cpfl has been shown to mediate DNA cleavage (Zetsche et al (2015) Cell 163: 759-771), with specific residues identified in the Cpfl enzyme from Francisella tularensis subsp. novicida Ul 12 (i.e., D917 and E1006) that completely inactivated DNA cleavage activity when mutated from the native amino acid to alanine. Amino-acid based alignments using Clustal W Multiple Alignment (Thompson et al (1994) Nucleic Acid Research 22: 4673-4680) of the eight Cpfl enzymes investigated here were performed to identify the corresponding amino acid residues in the other enzymes. Table 2 lists these amino acid residues. The amino acid sequences of deactivated Cpfl proteins corresponding to point mutations at each of the amino acid residues listed in Table 2 are found in SEQ ID NOs: 26-41. The amino acid sequences of the double mutant deactivated Cpfl proteins comprising mutations at both residues listed for each cpfl protein in Table 2 are found in SEQ ID NOs: 63-70.
Table 2: amino acid residues mutated to generate deactivated Cpfl enzymes
Figure imgf000060_0001
Appropriate primers are designed so that Quikchange PCR (Agilent Technologies, Santa Clara, CA) can be performed to produce genes that encode the deactivated Cpfl sequences listed in SEQ ID NOs: 26-41 and to produce genes that encode the deactivated Cpfl sequences listed in SEQ ID NOs: 63-70. PCR is performed to produce genes that encode a fusion protein containing a deactivated Cpfl protein fused to a gene expression activation or repression domain such as the EDLL or TAL activation domains or the SRDX repressor domain, with the SV40 nuclear localization signal (SEQ ID NO:2, encoding SEQ ID NO: 1) fused in frame at the 5' end of the gene. Guide RNAs (gRNAs) are designed to allow the gRNA to interact with the deactivated Cpfl protein and to guide the deactivated Cpfl protein to a desired location in a plant genome. Cassettes containing the gRNA(s) of interest, operably linked to promoter(s) operable in plant cells, and containing the gene(s) encoding Cpfl fusion protein(s) fused to activation and/or repression domain(s), are cloned into a vector suitable for plant transformation. This vector is transformed into a plant cell, resulting in production of the gRNA(s) and the Cpfl fusion protein(s) in the plant cell. The fusion protein containing the deactivated Cpfl protein and the activator or repressor domain effects a modulation of the expression of nearby genes in the plant genome.
Example 6 - Editing pre-determined genomic loci in maize {Zea mays)
One or more gRNAs is designed to anneal with a desired site in the maize genome and to allow for interaction with one or more Cpfl or Csml proteins. These gRNAs are cloned in a vector such that they are operably linked to a promoter that is operable in a plant cell (the "gRNA cassette"). One or more genes encoding a Cpfl or Csml protein is cloned in a vector such that they are operably linked to a promoter that is operable in a plant cell (the "cpfl cassette" or "csml cassette"). The gRNA cassette and the cpfl cassette or csml cassette are each cloned into a vector that is suitable for plant transformation, and this vector is subsequently transformed into Agrobacterium cells. These cells are brought into contact with maize tissue that is suitable for transformation. Following this incubation with the Agrobacterium cells, the maize cells are cultured on a tissue culture medium that is suitable for regeneration of intact plants. Maize plants are regenerated from the cells that were brought into contact with Agrobacterium cells harboring the vector that contained the cpfl or csml cassette and gRNA cassette. Following regeneration of the maize plants, plant tissue is harvested and DNA is extracted from the tissue. T7EI assays and/or sequencing assays are performed, as appropriate, to determine whether a change in the DNA sequence has occurred at the desired genomic location.
Alternatively, particle bombardment is used to introduce the cpfl or csml cassette and gRNA cassette into maize cells. Vectors containing a cpfl or csml cassette and a gRNA cassette are coated onto gold beads or titanium beads that are then used to bombard maize tissue that is suitable for regeneration. Following bombardment, the maize tissue is transferred to tissue culture medium for regeneration of maize plants. Following regeneration of the maize plants, plant tissue is harvested and DNA is extracted from the tissue. T7EI assays and/or sequencing assays are performed, as appropriate, to determine whether a change in the DNA sequence has occurred at the desired genomic location. Example 7 - Editing pre-determined genomic loci in Setaria viridis One or more gRNAs is designed to anneal with a desired site in the Setaria viridis genome and to allow for interaction with one or more Cpfl or Csml proteins. These gRNAs are cloned in a vector such that they are operably linked to a promoter that is operable in a plant cell (the "gRNA cassette"). One or more genes encoding a Cpfl or Csml protein is cloned in a vector such that they are operably linked to a promoter that is operable in a plant cell (the "cpfl cassette" or "csml cassette"). The gRNA cassette and the cpfl cassette or csml cassette are each cloned into a vector that is suitable for plant transformation, and this vector is subsequently transformed into Agrobacterium cells. These cells are brought into contact with Setaria viridis tissue that is suitable for transformation. Following this incubation with the Agrobacterium cells, the Setaria viridis cells are cultured on a tissue culture medium that is suitable for regeneration of intact plants. Setaria viridis plants are regenerated from the cells that were brought into contact with Agrobacterium cells harboring the vector that contained the cpfl cassette or csml cassette and gRNA cassette. Following regeneration of the Setaria viridis plants, plant tissue is harvested and DNA is extracted from the tissue. T7EI assays and/or sequencing assays are performed, as appropriate, to determine whether a change in the DNA sequence has occurred at the desired genomic location.
Alternatively, particle bombardment is used to introduce the cpfl cassette or csml cassette and gRNA cassette into S. viridis cells. Vectors containing a cpfl cassette or csml cassette and a gRNA cassette are coated onto gold beads or titanium beads that are then used to bombard S. viridis tissue that is suitable for regeneration. Following bombardment, the S. viridis tissue is transferred to tissue culture medium for regeneration of intact plants. Following regeneration of the plants, plant tissue is harvested and DNA is extracted from this tissue. T7EI assays and/or sequencing assays are performed, as appropriate, to determine whether a change in the DNA sequence has occurred at the desired genomic location. Example 8 - Deleting DNA from a pre-determined genomic locus
A first gRNA is designed to anneal with a first desired site in the genome of a plant of interest and to allow for interaction with one or more Cpfl or Csml proteins. A second gRNA is designed to anneal with a second desired site in the genome of a plant of interest and to allow for interaction with one or more Cpfl or Csml proteins. Each of these gRNAs is operably linked to a promoter that is operable in a plant cell and is subsequently cloned into a vector that is suitable for plant
transformation. One or more genes encoding a Cpfl or Csml protein is cloned in a vector such that they are operably linked to a promoter that is operable in a plant cell (the "cpfl cassette" or "csml cassette"). The cpfl cassette or csml cassette and the gRNA cassettes are cloned into a single plant transformation vector that is subsequently transformed into Agrobacterium cells. These cells are brought into contact with plant tissue that is suitable for transformation. Following this incubation with the Agrobacterium cells, the plant cells are cultured on a tissue culture medium that is suitable for regeneration of intact plants. Alternatively, the vector containing the cpfl cassette or csml cassette and the gRNA cassettes is coated onto gold or titanium beads suitable for bombardment of plant cells. The cells are bombarded and are then transferred to tissue culture medium that is suitable for the regeneration of intact plants. The gRNA-Cpfl or gRNA-Csml complexes effect double-stranded breaks at the desired genomic loci and in some cases the DNA repair machinery causes the DNA to be repaired in such a way that the native DNA sequence that was located between the two targeted genomic loci is deleted. Plants are regenerated from the cells that are brought into contact with Agrobacterium cells harboring the vector that contains the cpfl cassette or csml cassette and gRNA cassettes or are bombarded with beads coated with this vector. Following regeneration of the plants, plant tissue is harvested and DNA is extracted from the tissue. T7EI assays and/or sequencing assays are performed, as appropriate, to determine whether DNA has been deleted from the desired genomic location or locations.
Example 9 - Inserting DNA at a pre-determined genomic locus
A gRNA is designed to anneal with a desired site in the genome of a plant of interest and to allow for interaction with one or more Cpfl or Csml proteins. The gRNA is operably linked to a promoter that is operable in a plant cell and is subsequently cloned into a vector that is suitable for plant transformation. One or more genes encoding a Cpfl or Csml protein is cloned in a vector such that they are operably linked to a promoter that is operable in a plant cell (the "cpfl cassette" or "csml cassette"). The cpfl cassette or csml cassette and the gRNA cassette are both cloned into a single plant transformation vector that is subsequently transformed into Agrobacterium cells. These cells are brought into contact with plant tissue that is suitable for transformation. Concurrently, donor DNA is introduced into these same plant cells. Said donor DNA includes a DNA molecule that is to be inserted at the desired site in the plant genome, flanked by upstream and downstream flanking regions. The upstream flanking region is homologous to the region of genomic DNA upstream of the genomic locus targeted by the gRNA, and the downstream flanking region is homologous to the region of genomic DNA downstream of the genomic locus targeted by the gRNA. The upstream and downstream flanking regions mediate the insertion of DNA into the desired site of the plant genome. Following this incubation with the Agrobacterium cells and introduction of the donor DNA, the plant cells are cultured on a tissue culture medium that is suitable for regeneration of intact plants. Plants are regenerated from the cells that were brought into contact with Agrobacterium cells harboring the vector that contained the cpfl cassette or csml cassette and gRNA cassettes. Following regeneration of the plants, plant tissue is harvested and DNA is extracted from the tissue. T7EI assays and/or sequencing assays are performed, as appropriate, to determine whether DNA has been inserted at the desired genomic location or locations.
Example 10 - Biolistically Inserting DNA at the rice CAOl genomic locus
For biolistic insertion of DNA at a pre-determined genomic locus, vectors were designed with cpfl cassettes or csml cassettes. These vectors contained a 2X35S promoter (SEQ ID NO:43) upstream of the cpfl or csml ORF and a 35S polyA terminator sequence (SEQ ID NO:54) downstream of the cpfl or csml ORF. Table 3 summarizes these cpfl and csml vectors.
Table 3: Summary of cpfl and csml vectors used for biolistic experiments
Figure imgf000064_0001
131272 (SEQ 2X35S (SEQ ID Francisella tularensis (SEQ ID N0:5) 35S polyA(SEQID NO:54) ID NO:81) NO:43)
131273 (SEQ 2X35S (SEQ ID Acidaminococcus sp. (SEQ ID N0:8) 35S polyA(SEQID NO:54) ID NO:82) NO:43)
131274 (SEQ 2X35S (SEQ ID Lachnospiraceae bacterium MA2020 35S polyA(SEQID NO:54) ID NO:83) NO:43) (SEQIDN0:11)
131275 (SEQ 2X35S (SEQ ID Candidatus Methanoplasma termitum 35S polyA(SEQID NO:54) ID NO:84) NO:43) (SEQIDN0:14)
131276 (SEQ 2X35S (SEQ ID Moraxella bovoculi 237 (SEQ ID N0:17) 35S polyA(SEQID NO:54) ID NO:85) NO:43)
131277 (SEQ 2X35S (SEQ ID Lachnospiraceae bacterium ND2006 35S polyA(SEQID NO:54) ID NO:86) NO:43) (SEQIDN0:19)
131278 (SEQ 2X35S (SEQ ID Porphyromonas crevioricanis (SEQ ID 35S polyA(SEQID NO:54) ID NO:87) NO:43) NO:22)
131279 (SEQ 2X35S (SEQ ID Prevotella disiens (SEQ ID NO:25) 35S polyA(SEQID NO:54) ID NO:88) NO:43)
132058 2X35S (SEQ ID Anaerovibrio sp. M50 (SEQ ID NO:176) 35S polyA(SEQID NO:54) NO:43)
132059 2X35S (SEQ ID Lachnospiraceae bacterium MC2017 35S polyA(SEQID NO:54) NO:43) (SEQIDNO:174)
132065 2X35S (SEQ ID Moraxella caprae DSM 19149 (SEQ ID 35S polyA(SEQID NO:54) NO:43) NO:175)
132066 2X35S (SEQ ID Succinivibrio dextrinosolvens H5 (SEQ ID 35S polyA(SEQID NO:54) NO:43) NO:177)
132067 2X35S (SEQ ID Prevotella bryantii B14 (SEQ ID NO:179) 35S polyA(SEQID NO:54) NO:43)
132068 2X35S (SEQ ID Flavobacterium branchiophilum FL-15 35S polyA(SEQID NO:54) NO:43) (SEQIDNO:178)
132075 2X35S (SEQ ID Lachnospiraceae bacterium NC2008 (SEQ 35S polyA(SEQID NO:54) NO:43) ID NO:180)
132082 2X35S (SEQ ID Pseudobutyrivibrio ru minis (SEQ ID 35S polyA(SEQID NO:54) NO:43) N0:181)
132083 2X35S (SEQ ID Helcococcus kunzii ATCC 51366 (SEQ ID 35S polyA(SEQID NO:54) NO:43) NO:183)
132084 2X35S (SEQ ID Smithella sp. SCADC (SEQ ID NO:185) 35S polyA(SEQID NO:54) NO:43)
132095 2X35S (SEQ ID Uncultured bacterium (gcode 4) 35S polyA(SEQID NO:54) NO:43) ACD_3C00058 (SEQ ID NO:187)
132096 2X35S (SEQ ID Proteocatella sphenisci (SEQ ID NO:191) 35S polyA(SEQID NO:54) NO:43)
132098 2X35S (SEQ ID Candidate division WS6 bacterium 35S polyA(SEQID NO:54) NO:43) GW2011_GWA2_37_6 US52_C0007 (SEQ ID NO:182)
132099 2X35S (SEQ ID Butyrivibrio sp. NC3005 (SEQ ID NO:190) 35S polyA(SEQID NO:54) NO:43)
132105 2X35S (SEQ ID Flavobacterium sp.316 (SEQ ID NO:196) 35S polyA(SEQID NO:54) NO:43)
132100 2X35S (SEQ ID Butyrivibrio fibrisolvens (SEQ ID NO:192) 35S polyA(SEQID NO:54) NO:43)
132094 2X35S (SEQ ID Bacteroidetes oral taxon 274 str. F0058 35S polyA(SEQID NO:54) NO:43) (SEQIDNO:188)
132093 2X35S (SEQ ID Lachnospiraceae bacterium COE1 (SEQ 35S polyA(SEQID NO:54) NO:43) ID NO:189)
132111 2X35S (SEQ ID Parcubacteria bacterium GW2011 (SEQ 35S polyA(SEQID NO:54) NO:43) ID NO:197)
132092 2X35S (SEQ ID Sulfuricurvum sp. PC08-66 (SEQ ID 35S polyA(SEQID NO:54) NO:43) NO:186)
132097 2X35S (SEQ ID Candidatus Methanomethylophilus alvus 35S polyA(SEQID NO:54) NO:43) Mxl201(SEQIDNO:184)
132106 2X35S (SEQ ID Eubacterium sp. (SEQ ID NO:200) 35S polyA(SEQID NO:54) NO:43)
132107 2X35S (SEQ ID Microgenomates ( oizmanbacteria) 35S polyA(SEQID NO:54) NO:43) bacterium GW2011_GWA2_37_7 (SEQ
ID NO:201)
132102 2X35S (SEQ ID Microgenomates (Roizmanbacteria) 35S polyA(SEQID NO:54) NO:43) bacterium GW2011_GWA2_37_7 (SEQ
ID NO:193)
132104 2X35S (SEQ ID Prevotella brevis ATCC 19188 (SEQ ID 35S polyA(SEQID NO:54) NO:43) NO:199)
132109 2X35S (SEQ ID Smithella sp. SCADC (SEQ ID NO:203) 35S polyA(SEQID NO:54) NO:43)
132101 2X35S (SEQ ID Oribacterium sp. NK2B42 (SEQ ID 35S polyA(SEQID NO:54) NO:43) NO:194)
132103 2X35S (SEQ ID Synergistes jonesii strain 78-1 (SEQ ID 35S polyA(SEQID NO:54) NO:43) NO:195)
132108 2X35S (SEQ ID Smithella sp. SC_K08D17 (SEQ ID 35S polyA(SEQID NO:54) NO:43) NO:202)
132110 2X35S (SEQ ID Prevotella albensis (SEQ ID NO:198) 35S polyA(SEQID NO:54) NO:43)
132143 2X35S (SEQ ID Moraxella lacunata (SEQ ID NO:206) 35S polyA(SEQID NO:54) NO:43)
132144 2X35S (SEQ ID Eubacterium coprostanoligenes (SEQ ID 35S polyA(SEQID NO:54) NO:43) NO:205) 132145 2X35S (SEQ ID Succiniclasticum ruminis (SEQ ID 35S polyA (SEQ ID NO:54)
NO:43) NO:204)
In addition to the cpfl and csml vectors described in Table 3, vectors with gRNA cassettes were designed such that the gRNA would anneal with a region of the CAO 1 gene locus in the rice {Oryza sativa) genome (SEQ ID NO:71) and would also allow for interaction with the appropriate Cpfl or Csml protein. In these vectors, the gRNA was operably linked to the rice U6 promoter (SEQ ID NO:72) and terminator (SEQ ID NO:74). Table 4 summarizes these gRNA vectors.
Table 4: Summary of gRNA vectors used for biolistic experiments at the rice CAOl genomic locus
Figure imgf000067_0001
To facilitate insertion of a hygromycin gene cassette into the rice CAOl genomic locus, donor cassettes were designed with approximately 1 ,000-base pair homology upstream and downstream of the site of the intended double-strand break to be caused by the action of the Cpf 1 or Csml enzyme coupled with the gRNA targeting this locus. Figure 1 provides a schematic view of the CAO 1 genomic locus and the homology arms that were used to guide homologous recombination and insertion of the hygromycin gene cassette in the CAOl locus. The hygromycin gene cassette that was inserted into the rice CAOl genomic locus included the maize ubiquitin promoter (SEQ ID NO: 46) driving the expression of the hygromycin resistance gene (SEQ ID NO:76, encoding SEQ ID NO:77), which was flanked at its 3' end by the Cauliflower Mosaic Virus 35S polyA sequence (SEQ ID NO:54). Table 5 summarizes the repair donor cassette vectors that were constructed for hygromycin insertion in the rice CAOl genomic locus. Table 5: Rice CAOl repair donor cassettes for hygromycin resistance gene insertion
Figure imgf000068_0001
For introduction of the cpfl cassette or csml cassette, gRNA-containing plasmid, and repair donor cassette into rice cells, particle bombardment was used. For bombardment, 2 mg of 0.6 μπι gold particles were weighed out and transferred to sterile 1.5-mL tubes. 500 mL of 100% ethanol was added, and the tubes were sonicated for 10-15 seconds. Following centrifugation, the ethanol was removed. One milliliter of sterile, double-distilled water was then added to the tube containing the gold beads. The bead pellet was briefly vortexed and then was re-formed by centrifugation, after which the water was removed from the tube. In a sterile laminar flow hood, DNA was coated onto the beads. Table 6 shows the amounts of DNA added to the beads. The plasmid containing the Cpfl cassette or Csml cassette, the gRNA-containing plasmid, and the repair donor cassette were added to the beads and sterile, double-distilled water was added to bring the total volume to 50 μΕ. To this, 20 μΕ of spermidine (1 M) was added, followed by 50 μΕ of CaCl2 (2.5 M). The gold particles were allowed to pellet by gravity for several minutes, and were then pelleted by centrifugation. The supernatant liquid was removed, and 800 μί of 100% ethanol was added. Following a brief sonication, the gold particles were allowed to pellet by gravity for 3-5 minutes, then the tube was centrifuged to form a pellet. The supernatant was removed and 30 μL· of 100% ethanol was added to the tube. The DNA-coated gold particles were resuspended in this ethanol by vortexing, and 10 μL· of the resuspended gold particles were added to each of three macro-carriers (Bio-Rad, Hercules, CA). The macro-carriers were allowed to air-dry for 5-10 minutes in the laminar flow hood to allow the ethanol to evaporate.
Table 6: Amounts of DNA used for particle bombardment experiments (all amounts are per 2 mg of gold particles)
Figure imgf000069_0001
Rice callus tissue was used for bombardment. The rice callus was maintained on callus induction medium (CIM; 3.99 g/L N6 salts and vitamins, 0.3 g/L casein hydrolysates, 30 g/L sucrose, 2.8 g/L L-proline, 2 mg/L 2,4-D, 8 g/L agar, adjusted to pH 5.8) for 4-7 days at 28°C in the dark prior to bombardment. Approximately 80-100 callus pieces, each 0.2-0.3 cm in size and totaling 1-1.5 g by weight, were arranged in the center of a Petri dish containing osmotic solid medium (CIM
supplemented with 0.4 M sorbitol and 0.4 M mannitol) for a 4-hour osmotic pretreatment prior to particle bombardment. For bombardment, the macro-carriers containing the DNA-coated gold particles were assembled into a macro-carrier holder. The rupture disk (1, 100 psi), stopping screen, and macro- carrier holder were assembled according to the manufacturer's instructions. The plate containing the rice callus to be bombarded was placed 6 cm beneath the stopping screen and the callus pieces were bombarded after the vacuum chamber reached 25-28 in. Hg. Following bombardment, the callus was left on osmotic medium for 16-20 hours, then the callus pieces were transferred to selection medium (CIM supplemented with 50 mg/L hygromycin and 100 mg/L timentin). The plates were transferred to an incubator and held at 28°C in the dark to begin the recovery of transformed cells. Every two weeks, the callus was sub-cultured onto fresh selection medium. Hygromycin-resistant callus pieces began to appear after approximately five to six weeks on selection medium. Individual hygromycin-resistant callus pieces were transferred to new selection plates to allow the cells to divide and grow to produce sufficient tissue to be sampled for molecular analysis. Table 7 summarizes the combinations of DNA vectors that were used for these rice bombardment experiments. Table 7: Summary of rice particle bombardment experiments for hygromycin resistance gene insertion at CAO 1 locus
Experiment Cpfl orCsml gRNA Plasmid Repair Donor Plasmid
Plasmid
1 131272 131608 131760
2 131272 131608 131632
3 131273 131610 131632
4 131276 131612 131632
5 131272 131609 131633
6 131273 131611 131633
7 131276 131613 131633
13 131279 131912 131632
14 131274 131914 131632
15 131275 131913 131632
31 131277 132033 131632
32 131272 131985 131987
33 131272 131986 131988
43 131279 132051 131633
44 131274 132052 131633
45 131275 132053 131633
46 131277 132054 131633
53 131272 131982 131992
54 131272 131983 131993
55 131272 131980 131990
56 131272 131981 131991
57 131272 131984 131994
58 132058 131609 131633
59 132059 131609 131633
66 132068 131609 131633
67 132066 131609 131633
68 132065 131609 131633
69 132075 131609 131633
70 132067 131609 131633
71 132082 131609 131633
75 132096 131609 131633
76 132098 131609 131633
78 132083 131608 131632
79 132066 131608 131632
80 132065 131608 131632
81 132084 131608 131632
85 132075 131608 131632 86 132095 131608 131632
87 132099 131608 131632
88 132105 131608 131632
89 132100 131608 131632
90 132094 131608 131632
91 132093 131608 131632
92 132111 131608 131632
93 132092 131608 131632
94 132097 131608 131632
95 132106 131608 131632
96 132107 131608 131632
97 132102 131608 131632
98 132104 131608 131632
99 132109 132164 131632
100 132101 132164 131632
101 132103 132164 131632
102 132108 132164 131632
104 132143 132164 131632
105 132145 132164 131632
106 132059 131608 131632
107 132067 131608 131632
108 132096 131608 131632
109 132058 131608 131632
118 132110 131608 131632
119 132144 131608 131632
After the individual hygromycin-resistant callus pieces from each experiment were transferred to new plates, they grew to a size that was sufficient for sampling. A small amount of tissue was harvested from each individual piece of hygromycin-resistant rice callus and DNA was extracted from these tissue samples for PCR and DNA sequencing analysis. For those experiments that used repair donor plasmids 131760 or 131632, PCR was performed on these DNA extracts using primers with the sequences of SEQ ID NOs:78 and 79 designed to amplify a region of DNA spanning from the ZmUbi promoter into a region of the rice genome that falls outside of the downstream repair donor arm, as depicted schematically in Figure 1. For those experiments that used repair donor plasmid 131633, primers with the sequences of SEQ ID NOs: 102 and 103 were used to amplify a region of DNA spanning from the CaMV 35S terminator into a region of the rice genome that falls outside of the upstream repair donor arm, as depicted schematically in Figure 1. These PCR reactions do not produce an amplicon from wild-type rice DNA, nor from the repair donor plasmid, and thus are indicative of an insertion event at the rice CAOl locus. Table 8 summarizes the number of hygromycin-resistant callus pieces produced from each experiment described in Table 7 as well as the number of PCR-positive callus pieces in which a putative insertion event occurred. The number of callus pieces used for each bombardment experiment was estimated by weight based on a survey of ten plates, with 159 ± 11.1 callus pieces per plate.
Table 8: Summary of rice callus bombardment experiments
Experiment # Callus Pieces # hygromycin- # Events PC -
Bombarded resistant callus Positive for
(approximate) pieces Insertion
01 4134 290 12
02 795 20 0
03 1749 39 1
04 954 24 0
05 2067 46 3
06 1908 57 5
07 3339 49 3
13 3339 90 0
14 3180 81 0
15 4134 138 0
31 2067 55 0
43 1908 68 0
44 2544 117 0
45 1908 143 0
46 1908 192 3
58 1908 192 1
59 1431 192 1
66 1431 192 0
67 477 192 0
68 477 192 0
70 1431 192 1
71 1431 192 0
75 1431 192 1
76 1431 192 0
78 1908 192 0
79 954 192 0
80 954 160 0
81 1431 192 0
85 1113 96 0
86 1113 133 0 87 1113 155 0
88 1272 192 0
89 954 192 0
90 954 192 0
91 954 192 0
92 954 192 0
93 954 192 0
94 954 192 0
95 954 192 0
97 954 192 0
98 1431 192 0
99 1272 192 0
100 1272 192 0
101 1272 192 0
102 1272 192 0
104 1272 192 0
105 1113 192 0
106 1113 192 0
107 1113 192 0
108 1272 96 0
109 1272 96 0
118 954 96 0
119 954 96 0
For the PCR-positive callus pieces listed in Table 8, additional PCR analysis was done to amplify across the junctions between both homology arms and the rice genome. Primers with the sequence of SEQ ID NOs:96 and 97 were used to amplify the upstream region for those experiments that used repair donor plasmids 131760 or 131632. The location of these primer binding sites are shown schematically in Figure 1.
Sanger sequencing of the PCR amplicons produced using the primer pairs described above to amplify the downstream region of the insertion event showed that the expected sequence was present in the transformed rice callus, confirming insertion of the hygromycin gene cassette at the expected genomic locus mediated by the double-stranded break produced by the Cpf 1 or Csml enzyme. Sanger sequencing of the PCR amplicons produced using the primer pairs described above to amplify the upstream region of the insertion events also showed the expected sequence was present in the transformed rice callus, further confirming insertion of the hygromycin gene cassette at the expected genomic locus mediated by the double-stranded break produced by the Cpf 1 or Csml enzyme. Importantly, a deletion of five base pairs (GCCTT) from the rice genomic sequence was predicted to occur at the upstream insertion site following Cpfl -mediated DSB formation, and this deletion was confirmed from the sequencing data, thus further verifying that the observed insertion events were mediated by the action of Cpfl . Figure 2A shows an alignment summarizing the sequencing data that confirmed the insertion events at the rice CAOl locus targeted in Experiment 1 (see Table 7).
Sequencing of the PCR products used to confirm the presence of a targeted insertion at the rice CAOl locus as targeted in Experiments 5 and 7 (see Table 7) was performed. Primers with the sequence of SEQ ID NOs: 104 and 105 were used to amplify the downstream region of these insertion events. These PCR products were sequenced and the expected sequences were observed for insertion events mediated by DSB production by FnCpfl (Experiment 5) and MbCpf 1 (Experiment 7). The hph cassette was inserted in the CAO 1 locus at the targeted site with no base changes in the downstream arm.
Experiment 70 (Table 7) resulted in an insertion of a portion of the 35S terminator present in plasmid 131633 at the intended insertion site in the rice CAOl genomic locus rather than an insertion of the entire hph cassette. Sequence analysis showed that the 35S terminator contained an eleven base pair region that shared ten bases with the downstream arm (Figure 4A). It appears that this region in the 35S terminator mediated an unintended homologous recombination event with the downstream arm in rice callus piece #70-15, while the upstream arm in plasmid 131633 mediated the intended recombination event between this plasmid and the sequence upstream of the locus in the rice CAO 1 gene targeted by the guide RNA and Cpfl enzyme, resulting in the insertion sequence shown in Figure 4B. The resulting insertion led to a 179 base pair deletion and a 133 base pair insertion at the rice CAOl locus. While the insertion event uncovered in experiment 70 included only a portion of the 35S terminator rather than the full hph cassette that was intended for insertion, the event recovered was at the intended site in the CAO 1 locus targeted by the Prevotella bryantii Cpfl enzyme (SEQ ID
NO: 138, encoded by SEQ ID NO: 179), indicating that this Cpfl enzyme was effective at producing the intended DSB in the CAOl genomic locus.
Experiment 75 (Table 7) resulted in an insertion of a portion of the 35S terminator present in plasmid 131633 at the intended insertion site in the rice CAOl genomic locus rather than an insertion of the entire hph cassette. Sequence analysis showed that the 35S terminator contained a twelve base pair region that shared eight bases with the downstream arm (Figure 4C). It appears that this region in the 35S terminator mediated an unintended homologous recombination event with the downstream arm in rice callus piece #75-46, while the upstream arm in plasmid 131633 mediated the intended recombination event between this plasmid and the sequence upstream of the locus in the rice CAO 1 gene targeted by the guide RNA and Cpf 1 enzyme, resulting in the insertion sequence shown in Figure 4D. The resulting insertion led to a 47 base pair deletion and a 24 base pair insertion at the rice CAOl locus. While the insertion event uncovered in experiment 75 included only a portion of the 35S terminator rather than the full hph cassette that was intended for insertion, the event recovered was at the intended site in the CAO 1 locus targeted by the Proteocatella sphenisci Cpf 1 enzyme (SEQ ID NO: 142, encoded by SEQ ID NO: 191), indicating that this Cpfl enzyme was effective at producing the intended DSB in the CAOl genomic locus.
Experiment 46 (Table 7) resulted in an insertion of at the intended insertion site in the rice
CAOl genomic locus, mediated by the Lachnospiraceae bacterium ND2006 Cpfl enzyme (SEQ ID NO: 18, encoded by SEQ ID NO: 19). PCR analysis of the region of the intended insertion site at the CAO 1 locus resulted in amplification of a band that is diagnostic of an insertion in callus piece #46- 161. This genomic region was subjected to sequence analysis to confirm the presence of the intended DNA insertion at the rice CAOl locus. Figure 5 shows the results of this sequence analysis, with the expected insertion from the 131633 vector present in the rice DNA at the expected site. The mutated PAM site (TTTC>TAGC) present in the 131633 vector was also detected in the rice DNA from callus piece #46-161, further supporting HDR-mediated insertion of the 131633 vector insert at the rice CAOl locus as mediated by the site-specific DSB induction by the Lachnospiraceae bacterium ND2006 Cpfl enzyme.
Experiment 58 (Table 7) resulted in an insertion of at the intended insertion site in the rice CAOl genomic locus, mediated by the Anaerovibrio sp. RM50 Cpfl enzyme (SEQ ID NO: 143, encoded by SEQ ID NO: 176). PCR analysis of the region of the intended insertion site at the CAOl locus resulted in amplification of a band that is diagnostic of an insertion in callus piece #58-169. This genomic region is subjected to sequence analysis to confirm the presence of the intended DNA insertion at the rice CAOl locus.
Example 11 - Cpfl -mediated genomic DNA modification at the rice CAOl locus
Rice callus was bombarded as described above with gold beads that were coated with a cpfl vector and gRNA vector. Rice callus that was bombarded as described for experiment 01 (Table 7) was left on osmotic medium for 16-20 hours following bombardment, then the callus pieces were transferred to selection medium (CIM supplemented with 50 mg/L hygromycin and 100 mg/L timentin). The plates were transferred to an incubator and held at 28°C in the dark to begin the recovery of transformed cells. Every two weeks, the callus was sub-cultured onto fresh selection medium. Hygromycin-resistant callus pieces began to appear after approximately five to six weeks on selection medium. Individual hygromycin-resistant callus pieces were transferred to new selection plates to allow the cells to divide and grow to produce sufficient tissue to be sampled for molecular analysis.
DNA was extracted from sixteen hygromycin-resistant callus pieces produced in Experiment 01 (Table 7) and PCR was performed using primers with the sequences of SEQ ID NOs: 100 and 101 to test for the presence of the cpf 1 cassette. This PCR reaction showed that DNA extracted from callus pieces numbered 1, 2, 4, 6, 7, and 15 produced the expected 853 base pair amplicon consistent with insertion of the cpf 1 cassette in the rice genome (Figure 2B). PCR was also performed with DNA extracted from these hygromycin-resistant rice callus pieces using primers with the sequences of SEQ ID NOs:98 and 99 to amplify a region of the rice CAOl genomic locus that was targeted by the gRNA in vector 131608. This PCR reaction produced a 595-base pair amplicon when using wild-type rice DNA as the template. Following the PCR reaction with SEQ ID NOs:98 and 99 as the primers, a T7 endonuclease assay was performed with the resulting PCR product to test for small insertions and/or deletions at this locus. DNA from callus piece number 15 showed a band pattern consistent with a small insertion or deletion (Figure 2C). The PCR products produced from the reaction using primers with SEQ ID NOs:98 and 99 was cloned into E. coli cells using the pGEM® system (Promega, Madison, WI) according to the manufacturer's instructions. DNA was extracted from eight single E. coli colonies for sequencing. Five of the eight colonies showed the same seven base pair deletion at the predicted Cpf 1 -mediated double-strand break site in the CAOl locus (Figure 2D). Without being limited by theory, a likely explanation for this deletion is that the rice cell DNA repair machinery produced the deletion following repair of the double-stranded break caused by FnCpf 1 at the CAO 1 locus.
Experiment 01 (Table 7) was repeated with additional pieces of rice callus to confirm the reproducibility of the results obtained initially. The repeat of Experiment 01 resulted in the
identification of four additional callus pieces that appeared to be positive for indel production based on T7EI assay results. DNA was extracted from these callus pieces for sequence analysis. PCR was performed to amplify the region of the rice genome surrounding the targeted site in the CAOl gene and Sanger sequencing was performed. The sequencing results confirmed the T7EI assay results. Figure 2D shows the resulting sequence data. These four callus pieces showed varying deletion sizes ranging from a three base pair to a seventy-five base pair deletion, all located at the expected site targeted by FnCpfl (SEQ ID NO:3, encoded by SEQ ID NO:5).
Experiments 31 and 46 (Table 7) tested the ability of LbCpfl (SEQ ID NO: 18, encoded by SEQ ID NO: 19) to effect DSBs at two different locations in the rice CAOl locus. Experiment 31 used plasmid 132033 as the gRNA source, while experiment 46 used plasmid 132054 as the gRNA source. Following bombardment of rice callus with the plasmids used for these experiments, DNA was extracted from hygromycin-resistant rice callus pieces and subjected to T7EI assays. Following PCR amplification of the rice CAOl genomic locus, T7EI assays identified one callus piece from experiment 31 and five callus pieces from experiment 46 that appeared to contain indels at the expected site. PCR products from these rice callus pieces were analyzed by Sanger sequencing to identify the sequence(s) present at the CAOl locus in these callus pieces. Figure 3 shows the results of the Sanger sequencing analyses, confirming the presence of indels at the expected locations in the rice CAOl locus. Figure 3 A shows the results from Experiment 31 and Figure 3B shows the results from Experiment 46. As Figure 3 A shows, callus piece 31-21 showed a deletion of fifty-six base pairs along with a ten base pair insertion. The calli from experiment 46 (data presented in Figure 3B) showed deletions with sizes ranging from three to fifteen base pairs. It should be noted that callus pieces 46-38 and 46-77 showed two different indels, indicating that multiple indel production events had occurred in independent cells within these callus pieces. All of the indels from these experiments were located at the predicted site in the CAOl locus targeted by the respective guide RNA, indicating faithful production of DSBs at this site by the LbCpfl enzyme.
Experiment 80 (Table 7) tested the ability of the Moraxella caprae Cpf 1 enzyme (SEQ ID NO: 133, encoded by SEQ ID NO: 175) to effect DSBs at the rice CAOl locus. Following
bombardment of rice callus with the plasmids used for this experiment, DNA was extracted from hygromycin-resistant rice callus pieces and subjected to T7EI assays. Following PCR amplification of the rice CAOl genomic locus, T7EI assays identified one callus piece from the experiment that contained an indel at the expected site. A PCR product from this rice callus piece was analyzed by Sanger sequencing to identify the sequence present at the CAOl locus in this callus piece. Figure 3 A shows the results of these sequencing assays, with an eight base pair deletion present in callus piece #80-33 at the predicted site in the CAOl locus targeted by the respective guide RNA, indicating faithful DSB production at this site by the Moraxella caprae Cpf 1 enzyme. Experiment 91 (Table 7) tested the ability of the Lachnospiraceae bacterium COE1 Cpfl enzyme (SEQ ID NO: 125, encoded by SEQ ID NO: 189) to effect DSBs at the rice CAOl locus.
Following bombardment of rice callus with the plasmids used for this experiment, DNA was extracted from hygromycin-resistant rice callus pieces and subjected to T7EI assays. Following PCR
amplification of the rice CAOl genomic locus, T7EI assays identified one callus piece from the experiment that contained an indel at the expected site. A PCR product from this rice callus piece was analyzed by Sanger sequencing to identify the sequence present at the CAOl locus in this callus piece. Figure 3A shows the results of these sequencing assays, with a nine base pair deletion present in callus piece #91 -4 at the predicted site in the CAO 1 locus targeted by the respective guide RNA, indicating faithful DSB production at this site by the Lachnospiraceae bacterium COE1 Cpfl enzyme.
Experiment 119 (Table 7) tested the ability of the Eubacterium coprostanoligenes Cpfl enzyme (SEQ ID NO: 173, encoded by SEQ ID NO:205) to effect DSBs at the rice CAOl locus. Following bombardment of rice callus with the plasmids used for this experiment, DNA was extracted from hygromycin-resistant rice callus pieces and subjected to T7EI assays. Following PCR amplification of the rice CAOl genomic locus, T7EI assays identified two callus pieces from the experiment that contained an indel at the expected site. A PCR product from these rice callus pieces were analyzed by Sanger sequencing to identify the sequence present at the CAOl locus in these calli. Figure 3 A shows the results of these sequencing assays, with an identical eight base pair deletion present in both callus pieces #119-4 and #119-11 at the predicted site in the CAOl locus targeted by the respective guide RNA, indicating faithful DSB production at this site by the Eubacterium coprostanoligenes Cpfl enzyme.
Example 12 - Regeneration of Rice Plants with an Insertion at the CAOl Locus
Rice callus transformed with an hph cassette targeted to the CAOl locus by an FnCpfl- mediated DSB in Experiment 1 (see Tables 7 and 8) was cultured on tissue culture medium to produce shoots. These shoots were subsequently transferred to rooting medium, and the rooted plants were transferred to soil for cultivation in a greenhouse. The rooted plants appeared to be phenotypically normal in soil. DNA was extracted from the rooted plants for PCR analysis. PCR amplification of the upstream and downstream arms confirmed that the hph cassette was present in the rice CAO 1 genomic locus. TO-generation rice plants generated in Experiment 1 with the hph cassette insertion at the CAOl locus were cultivated and self-pollinated to produce Tl -generation seed. This seed was planted and the resulting Tl -generation plants were genotyped to identify homozygous, hemizygous, and null plants. The Tl plants segregated as expected, with approximately 25% of the Tl plants being hemizygous for the hph insertion, 25% being null segregants, and 50% being heterozygous.
Homozygous plants were observed phenotypically, with the expected yellow leaf phenotype associated with knockout of the CAOl gene (Lee et al. (2005) Plant Mol Biol 57:805-818).
TO-generation plants were regenerated from GE0046 callus number 33, 40, 62, and 90, that had shown positive results for indels via T7EI assays and (for callus piece #90) sequence verification (Figure 3B). Regenerated plants derived from callus pieces 46-33, 40, 62, and 90 were positive for the presence of an indel at the CAO 1 locus based on T7EI assays using DNA extracted from the regenerated plant tissue. Plants were also regenerated from GE0046 callus pieces 46-96 and 46-161, which had previously been shown to have an insertion of the hygromycin marker at the CAOl locus. Plants derived from callus pieces 46-96 and 46-161 were all positive for the insertion as detected by a PCR screen. Sequence data obtained from DNA extracted from two plants regenerated from callus piece #46-90 showed the same eight base pair deletion detected in the callus (Figure 3B), indicating that this deletion was stable through the regeneration process. Sequence data obtained from DNA extracted from plants derived from callus pieces #46-40 and from #46-62 showed 8-, 9-, 10, and 11- base pair deletions (data not shown).
Example 13 - Identification of a putative new class of Cpfl -like proteins
Examination of phylogenetic trees of putative Cpfl proteins (Zetsche et al. (2015) Cell 163: 759-771 and data not shown), along with sequence analyses of Cpfl proteins and Cpfl -like proteins identified through BLAST searches, uncovered a small group of proteins that appeared to be related to Cpfl proteins, but with significantly altered sequences relative to known Cpfl proteins. As two of these proteins are found in Smithella sp. SCADC and in Microgenomates, this putative new class of proteins has been named Csml (CRISPR-associated proteins from Smithella and Microgenomates). Like Cpfl proteins, these Csml proteins comprise RuvCI, RuvCII, and RuvCIII domains, but importantly the amino acid sequences of these domains are often quite divergent as compared with those found in Cpfl protein amino acid sequences, particularly for the RuvCIII domain. Additionally, the RuvCI-RuvCII and RuvCII-RuvCIII spacing is significantly altered in Csml proteins relative to Cpfl proteins.
Alignment of the Smithella sp. SCADC Csml protein (SmCsml ; SEQ ID NO: 160) with known Cpfl proteins using the BLASTP algorithm default parameters (blast.ncbi.nlm.nih.gov/Blast.cgi) showed very little apparent sequence identity between these proteins. It was particularly apparent that while the RuvCI domain in the SmCsml protein appeared to be present and well-aligned with the corresponding sequences in Cpfl proteins, the RuvCII and RuvCIII regions, well-conserved in Cpfl proteins (Shmakov et al. (2016) Mol Cell 60:385-397), did not initially appear to be present in the putative Csml protein. Additional analyses using HHPred (toolkit.tuebingen.mpg.de/hhpred; Soding et al. (2006) Nucleic Acids Res 34:W374-W378) uncovered putative RuvCII and RuvCIII domains in this SmCsml protein. Table 9 shows the putative RuvCII domains in several Cpfl and putative Csml proteins, and a representative C2cl protein, along with the amino acid residue numbers in each sequence listing corresponding to the RuvCII sequence listed. The putative active residue is underlined for each protein listed.
Table 9: RuvCII sequences from Cpfl and Csml proteins
Figure imgf000081_0001
Table 10 shows the putative RuvCIII domains in several Cpfl and putative Csml proteins along with a representative C2c 1 protein, along with the amino acid residue numbers in each sequence listing corresponding to the RuvCIII sequence listed. The putative active residue is underlined for each protein listed.
Table 10: RuvCIII sequences from Cpfl and Csml proteins
Figure imgf000081_0002
As Tables 9 and 10 show, the RuvCII and RuvCIII domains identified by HHPred for the putative Csml proteins (SEQ ID NOs: 134, 147, 159, 160, and 230) are significantly divergent from those found in Cpfl proteins (representative sequences SEQ ID NOs:6 and 18 shown above). Of particular note, the ANGAY motif following the active residue in the RuvCIII domain is extremely well-conserved among Cpfl proteins (Shmakov et al. (2016) Mol Cell 60:385-397 and data not shown), but is absent or altered in most of these Csml proteins. Analysis of the RuvCII and RuvCIII domains in Csml, Cpfl, and C2cl proteins (Shmakov et al. (2016) Mol Cell 60:385-397) suggests that Csml proteins appear to be intermediate between Cpfl and C2cl proteins, as the Csml RuvCII sequences are similar to those found in Cpfl proteins, while Csml RuvCIII sequences are similar to those found in C2cl proteins. The RuvCIII domains of Csml proteins mostly contain a DXXAA motif that is conserved in the C2cl protein sequence.
While Csml proteins share some sequence similarity with C2cl proteins, their genomic context suggests that Csml proteins function in many ways like Cpfl proteins. Specifically, C2cl proteins require both a crRNA and a tracrRNA, with the tracrRNA being partially complementary to the crRNA sequence. The genomic locus comprising the Csml -encoding ORF from Smithella sp. SCADC (SEQ ID NO:238) includes a CRISPR array with Cpfl-like direct repeats, preceded by a Csml ORF, Cas4 ORF, Casl ORF, and Cas2 ORF. This is consistent with the genomic organization found in Cpfl- encoding genomes (Shmakov et al. (2017) Nat Rev Microbiol doi: 10.1038/nrmicro.2016.184). In contrast, C2cl genomic organization tends to contain a fused Casl/Cas4 ORF. Further, C2cl- containing genomic loci tend to encode both a crRNA array and a tracrRNA with partial
complementarity to the crRNA direct repeat. Examination of the Smithella sp. SCADC genomic locus containing the Csml -encoding ORF and associated crRNA sequences did not uncover any tracrRNA - like sequences, strongly suggesting that Csml does not require a tracrRNA to produce double-stranded breaks.
Recently, a new class of nucleases termed CasX proteins was described (Burstein et al. (2016)
Nature http://dx.doi.orgl0.1038/nature20159). The CasX protein from Deltaproteobacteria (SEQ ID NO:239) was described as a ~980-amino acid protein that was found in a genomic region that included Cas 1 , Cas4, and Cas2 protein-coding regions as well as a CRISPR repeat region and a tracrRNA. The report describing CasX showed conclusively that this tracrRNA was required for endonuclease function, in sharp contrast with Csml proteins that do not require a tracrRNA. BLASTP alignments of SmCsml (SEQ ID NO: 160) and Deltaproteobacterial CasX (SEQ ID NO:239) showed a very poor alignment (data not shown). HHPred analysis of this CasX protein was used to identify putative RuvCI, RuvCII, and RuvCIII domains and their respective active site residues.
In addition to the altered amino acid sequences of the putative RuvCII and RuvCIII domains in Csml proteins relative to Cpfl proteins, the protein organization is significantly altered such that the spacing between these domains is significantly different between Csml and Cpfl proteins. Table 11 shows a comparison of the spacing between the active residues in RuvC subdomains in known Cpfl proteins (AsCpfl and LbCpfl; SEQ ID NOs:6 and 18) as compared with the spacing in these putative Csml proteins (SEQ ID NOs: 134, 147, 159, 160, and 230), the Deltaproteobacterial CasX protein (SEQ ID NO:239), and a representative C2cl protein (SEQ ID NO:237). The data in Table 11 shows clearly that Cpfl, CasX, C2cl, and Csml proteins have a characteristic RuvC domain spacing, with the CasX RuvCI-RuvCII spacing resembling Cpfl and the RuvCII-RuvCIII spacing resembling
Csml/C2cl spacing. The spacing of the RuvCI, RuvCII, and RuvCIII domains in C2cl and Csml proteins is similar, but the divergent RuvCIII sequences and the lack of a tracrRNA in Csml systems supports the classification of Csml nucleases as separate from C2cl nucleases.
Table 11: Comparison of RuvC subdomain spacing
Figure imgf000083_0001
Along with the divergent RuvCII and RuvCIII amino acid sequences and altered spacing of these domains in Csml proteins relative to Cpfl proteins, it should be noted that in many cases, HHPred analyses did not find any Csml sequence corresponding to the amino acid residues corresponding to D1225 in FnCpfl (SEQ ID NO:3) (D1234 in AsCpfl (SEQ ID NO:6) and D1148 in LbCpfl (SEQ ID NO: 18)). Mutation analysis of the FnCpfl D1225 residue showed that mutating this residue very significantly reduced the catalytic activity of this nuclease (Zetsche et al. (2015) Cell 163: 759-771), suggesting that the enzymatic function of this residue is very important for Cpfl enzymes.
In addition to the altered amino acid sequence of the putative RuvC domains in Csml proteins relative to Cpfl proteins, HHPred analyses with Csml proteins show no matches with Cpfl proteins in their N-terminus, in contrast with HHPred analyses based on known Cpfl proteins. An HHPred analysis with the FnCpfl amino acid sequence (SEQ ID NO:3) resulted in only two matches, to AsCpfl (SEQ ID NO:6) and LbCpfl (SEQ ID NO: 18) with 100% probability and covering the entirety of the FnCpfl amino acid sequence. In contrast, an HHPred analysis with SmCsml (SEQ ID NO: 160) only finds matches with Cpfl proteins covering the regions from amino acids 391-1017 and 1003-1064 in SmCsml. Amino acids 1003-1030 find matches with a variety of proteins including a probable lysine biosynthesis protein, an amino acid carrier protein, a transcription initiation factor, 50S and 30S ribosomal proteins, and DNA-directed RNA polymerases. No matches for the first 390 amino acids of Csml are found in this HHPred analysis. Similar HHPred analyses with additional Csml proteins (SEQ ID NOs: 134, 147, 159, 160, and 230) also failed to find any match for the N-terminal portions of these Csml proteins, further supporting the conclusion that these proteins share some similarity with Cpfl proteins but are not bona fide Cpfl proteins themselves.
Example 14 - Csml Functional Characterization
Given the divergent nature of Csml proteins relative to Cpfl proteins, we sought to confirm that these proteins were capable of producing DSBs in vivo. While the amino acid sequences of the Csml proteins are quite divergent relative to Cpfl proteins, genomic analyses of the organisms that are the source of these Csml proteins uncovered CRISPR arrays (data not shown), suggesting that these proteins could in fact be functional.
Experiment 81 (Table 7) tested the ability of a Smithella sp. SCADC Csml enzyme (SEQ ID NO: 160, encoded by SEQ ID NO: 185) to effect DSBs at the rice CAOl locus. Following
bombardment of rice callus with the plasmids used for this experiment, DNA was extracted from hygromycin-resistant rice callus pieces and subjected to T7EI assays. Following PCR amplification of the rice CAOl genomic locus, T7EI assays identified three callus pieces from the experiment that contained an indel at the expected site. PCR products from these rice callus pieces were analyzed by Sanger sequencing to identify the sequence present at the CAOl locus in this calli. Figure 3 A shows the results of these sequencing assays, with an eight base pair deletion present in callus piece #81-46, an identical eight base pair deletion present in callus piece #81 -30, and a twelve base pair deletion present in callus piece #81-9 at the predicted site in the CAOl locus targeted by the respective guide RNA, indicating faithful DSB production at this site by the Smithella sp. SCADC Csml enzyme.
Experiment 93 (Table 7) tested the ability of a Sulfuricurvum sp. Csml enzyme (SEQ ID NO: 147, encoded by SEQ ID NO: 186) to effect DSBs at the rice CAOl locus. Following
bombardment of rice callus with the plasmids used for this experiment, DNA was extracted from hygromycin-resistant rice callus pieces and subjected to T7EI assays. Following PCR amplification of the rice CAOl genomic locus, T7EI assays identified one callus piece from the experiment that contained an indel at the expected site. A PCR product from rice callus piece #93-47 was analyzed by Sanger sequencing to identify the sequence present at the CAOl locus in this callus piece. Figure 3 A shows the results of these sequencing assays, with a forty-two base pair deletion present in callus piece #93-47 at the predicted site in the CAOl locus targeted by the respective guide RNA, indicating faithful DSB production at this site by the Sulfuricurvum sp. Csml enzyme.
Experiment 97 (Table 7) tested the ability of a Microgenomates (Roizmanbacteria) bacterium Csml enzyme (SEQ ID NO: 134, encoded by SEQ ID NO: 193) to effect DSBs at the rice CAOl locus. Following bombardment of rice callus with the plasmids used for this experiment, DNA was extracted from hygromycin-resistant rice callus pieces and subjected to T7EI assays. Following PCR
amplification of the rice CAOl genomic locus, T7EI assays identified three callus piece from the experiment that contained an indel at the expected site. Callus pieces #97-112, 97-130, and 97-141 showed a banding pattern in the T7EI experiment analysis consistent with faithful DSB production at this site by the Microgenomates (Roizmanbacteria) bacterium Csml enzyme. DNA extracted from callus pieces #97-112 and #97-141 were subjected to sequence analysis (Fig. 3A). This sequence analysis showed an identical eight base pair deletion present in both of these calli, indicating faithful DSB production at this site by the Microgenomates (Roizmanbacteria) bacterium Csml enzyme.

Claims

WE CLAIM:
1. A method of modifying a nucleotide sequence at a target site in the genome of a eukaryotic cell comprising:
introducing into said eukaryotic cell
(i) a DNA-targeting RNA, or a DNA polynucleotide encoding a DNA-targeting RNA, wherein the DNA-targeting RNA comprises: (a) a first segment comprising a nucleotide sequence that is complementary to a sequence in the target DNA; and (b) a second segment that interacts with a Cpfl or Csml polypeptide; and
(ii) a Cpfl or Csml polypeptide, or a polynucleotide encoding a Cpfl or Csml polypeptide, wherein the Cpfl or Csml polypeptide comprises: (a) an RNA-binding portion that interacts with the DNA-targeting RNA; and (b) an activity portion that exhibits site- directed enzymatic activity,
wherein said Cpfl or Csml polypeptide is selected from the group consisting of: SEQ ID NO: 3, 6, 9, 12, 15, 18, 20, 23, 106-173, and 230-236, and wherein said genome of a eukaryotic cell is a nuclear, plastid, or mitochondrial genome.
2. A method of modifying a nucleotide sequence at a target site in the genome of a prokaryotic cell comprising:
introducing into said prokaryotic cell
(i) a DNA-targeting RNA, or a DNA polynucleotide encoding a DNA-targeting RNA, wherein the DNA-targeting RNA comprises: (a) a first segment comprising a nucleotide sequence that is complementary to a sequence in the target DNA; and (b) a second segment that interacts with a Cpfl or Csml polypeptide; and
(ii) a Cpfl or Csml polypeptide, or a polynucleotide encoding a Cpfl or Csml polypeptide, wherein the Cpfl or Csml polypeptide comprises: (a) an RNA-binding portion that interacts with the DNA-targeting RNA; and (b) an activity portion that exhibits site- directed enzymatic activity,
wherein said Cpfl or Csml polypeptide is selected from the group consisting of: SEQ ID NO: 3, 6, 9, 12, 15, 18, 20, 23, 106-173, and 230-236, wherein said genome of a prokaryotic cell is a chromosomal, plasmid, or other intracellular DNA sequence, and wherein said prokaryotic cell is not the natural host of a gene encoding said Cpfl or Csml polypeptide.
3. The method of claim 1 wherein said eukaryotic cell is a plant cell.
4. The method of claim 3, further comprising:
culturing the plant under conditions in which the Cpfl or Csml polypeptide is expressed and cleaves the nucleotide sequence at the target site to produce a modified nucleotide sequence; and selecting a plant comprising said modified nucleotide sequence.
5. The method of claim 1, wherein said DNA-targeting RNA is a guide RNA (gRNA).
6. The method of claim 2, wherein said DNA-targeting RNA is a guide RNA (gRNA).
7. The method of claim 1, wherein said modified nucleotide sequence comprises insertion of heterologous DNA into the genome of the cell, deletion of a nucleotide sequence from the genome of the cell, or mutation of at least one nucleotide in the genome of the cell.
8. The method of claim 2, wherein said modified nucleotide sequence comprises insertion of heterologous DNA into the genome of the cell, deletion of a nucleotide sequence from the genome of the cell, or mutation of at least one nucleotide in the genome of the cell.
9. The method of claim 1 wherein said modified nucleotide sequence comprises insertion of a polynucleotide that encodes a protein conferring antibiotic or herbicide tolerance to transformed cells.
10. The method of claim 9 wherein said polynucleotide that encodes a protein conferring antibiotic or herbicide tolerance comprises SEQ ID NO:76, or encodes a protein that comprises SEQ ID NO:77.
11. A nucleic acid molecule comprising a polynucleotide sequence encoding a Cpfl or Csml polypeptide, wherein said polynucleotide sequence is selected from the group consisting of SEQ ID NOs: 4, 5, 7, 8, 10, 11, 13, 14, 16, 17, 19, 21, 22, 24, 25, and 174-206, or a fragment or variant thereof, or wherein said polynucleotide sequence encodes a Cpfl or Csml polypeptide selected from the group consisting of SEQ ID NOs: 3, 6, 9, 12, 15, 18, 20, 23, 106-173, and 230-236, and wherein said polynucleotide sequence encoding a Cpfl or Csml polypeptide is operably linked to a promoter that is heterologous to the polynucleotide sequence encoding a Cpfl or Csml polypeptide.
12. A fusion protein encoded by a nucleic acid molecule comprising
(i) a sequence selected from the group of SEQ ID NOs: 4, 5, 7, 8, 10, 11, 13, 14, 16, 17, 19, 21,
22, 24, 25, and 174-206, or a fragment or variant thereof; and
(ii) a nucleic acid molecule that encodes an effector domain.
13. A Cpfl or Csml polypeptide encoded by the nucleic acid molecule of claim 11.
14. The nucleic acid of claim 11 wherein said fragment or variant encodes a Cpfl or Csml polypeptide comprising one or more mutations in one or more positions corresponding to positions 917 or 1006 of FnCpfl (SEQ ID NO:3) or to positions 701 or 922 of SmCsml (SEQ ID NO: 160) when aligned for maximum identity.
15. A plant cell, eukaryotic cell, or prokaryotic cell comprising the nucleic acid molecule of claim 11.
16. A plant cell, eukaryotic cell, or prokaryotic cell comprising the fusion protein of claim 12.
17. A eukaryotic cell produced by the method of claim 1.
18. The eukaryotic cell of claim 17 wherein said eukaryotic cell is a plant cell.
19. A plant comprising the nucleic acid molecule of claim 11.
20. A plant comprising the fusion protein or polypeptide of claim 12.
21. A plant produced by the method of claim 3.
22. The seed of the plant of claim 19.
23. The seed of the plant of claim 21.
24. The nucleic acid molecule of claim 11 wherein said polynucleotide sequence encoding a Cpfl or Csml polypeptide is codon-optimized for expression in a plant cell.
25. The method of claim 3 wherein said polynucleotide encoding a Cpfl or Csml polypeptide is codon-optimized for expression in a plant cell.
PCT/IB2017/050845 2016-02-15 2017-02-15 Compositions and methods for modifying genomes WO2017141173A2 (en)

Priority Applications (22)

Application Number Priority Date Filing Date Title
EP17707411.9A EP3307884B1 (en) 2016-02-15 2017-02-15 Compositions and methods for modifying genomes using csm1
EP23207984.8A EP4306642A3 (en) 2016-02-15 2017-02-15 Compositions and methods for modifying genomes
ES17707411T ES2973207T3 (en) 2016-02-15 2017-02-15 Compositions and procedures to modify genomes using Csm1
KR1020237040585A KR20230165368A (en) 2016-02-15 2017-02-15 Compositions and methods for modifying genomes using cpf1 or csm1
CN202211143483.1A CN115927440A (en) 2016-02-15 2017-02-15 Compositions and methods for modifying genomes
BR112018016408A BR112018016408A2 (en) 2016-02-15 2017-02-15 compositions and methods for modifying genomes
JP2018561102A JP2019504649A (en) 2016-02-15 2017-02-15 Compositions and methods for modifying a genome
MYPI2018001434A MY197523A (en) 2016-02-15 2017-02-15 Compositions and methods for modifying genomes
KR1020187023481A KR20180107155A (en) 2016-02-15 2017-02-15 Compositions and methods for modifying the genome using CPF1 or CSM1
EP21212642.9A EP4063501A1 (en) 2016-02-15 2017-02-15 Compositions and methods for modifying genomes
AU2017220789A AU2017220789B2 (en) 2016-02-15 2017-02-15 Compositions and Methods for Modifying Genomes
IL308791A IL308791A (en) 2016-02-15 2017-02-15 Compositions And Methods For Modifying Genomes
CN201780014661.1A CN109312316B (en) 2016-02-15 2017-02-15 Compositions and methods for modifying genomes
MX2018009761A MX2018009761A (en) 2016-02-15 2017-02-15 Compositions and methods for modifying genomes using cpf1 or csm1.
CA3014988A CA3014988A1 (en) 2016-02-15 2017-02-15 Compositions and methods for modifying genomes
IL261082A IL261082A (en) 2016-02-15 2018-08-09 Compositions and methods for modifying genomes
PH12018501722A PH12018501722A1 (en) 2016-02-15 2018-08-14 Compositions and methods for modifying genomes
JP2022142420A JP2022184892A (en) 2016-02-15 2022-09-07 Compositions and methods for modifying genomes
IL304398A IL304398A (en) 2016-02-15 2023-07-11 Compositions and methods for modifying genomes
AU2023226754A AU2023226754A1 (en) 2016-02-15 2023-09-08 Compositions and methods for modifying genomes
JP2023199358A JP2024028753A (en) 2016-02-15 2023-11-24 Composition and method for modifying genome
AU2023270322A AU2023270322A1 (en) 2016-02-15 2023-11-24 Compositions and methods for modifying genomes

Applications Claiming Priority (10)

Application Number Priority Date Filing Date Title
US201662295325P 2016-02-15 2016-02-15
US62/295,325 2016-02-15
US201662372108P 2016-08-08 2016-08-08
US62/372,108 2016-08-08
US201662403854P 2016-10-04 2016-10-04
US62/403,854 2016-10-04
US201662429112P 2016-12-02 2016-12-02
US62/429,112 2016-12-02
US201762450743P 2017-01-26 2017-01-26
US62/450,743 2017-01-26

Publications (2)

Publication Number Publication Date
WO2017141173A2 true WO2017141173A2 (en) 2017-08-24
WO2017141173A3 WO2017141173A3 (en) 2017-11-02

Family

ID=58162971

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2017/050845 WO2017141173A2 (en) 2016-02-15 2017-02-15 Compositions and methods for modifying genomes

Country Status (14)

Country Link
US (3) US9896696B2 (en)
EP (3) EP4063501A1 (en)
JP (3) JP2019504649A (en)
KR (2) KR20180107155A (en)
CN (2) CN115927440A (en)
AU (3) AU2017220789B2 (en)
BR (1) BR112018016408A2 (en)
CA (2) CA3014988A1 (en)
ES (1) ES2973207T3 (en)
IL (3) IL308791A (en)
MX (2) MX2018009761A (en)
MY (1) MY197523A (en)
PH (1) PH12018501722A1 (en)
WO (1) WO2017141173A2 (en)

Cited By (46)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018054911A1 (en) 2016-09-23 2018-03-29 Bayer Cropscience Nv Targeted genome optimization in plants
CN108546712A (en) * 2018-04-26 2018-09-18 中国农业科学院作物科学研究所 A method of realizing target gene homologous recombination in plant using CRISPR/LbCpf1 systems
WO2019030695A1 (en) * 2017-08-09 2019-02-14 Benson Hill Biosystems, Inc. Compositions and methods for modifying genomes
CN109593743A (en) * 2018-12-12 2019-04-09 广州普世利华科技有限公司 Novel C RISPR/ScCas12a albumen and preparation method thereof
US10323236B2 (en) 2011-07-22 2019-06-18 President And Fellows Of Harvard College Evaluation and improvement of nuclease cleavage specificity
US10465176B2 (en) 2013-12-12 2019-11-05 President And Fellows Of Harvard College Cas variants for gene editing
US10508298B2 (en) 2013-08-09 2019-12-17 President And Fellows Of Harvard College Methods for identifying a target site of a CAS9 nuclease
KR20200018345A (en) * 2018-08-09 2020-02-19 (주)지플러스 생명과학 Novel crispr associated protein and use thereof
US10597679B2 (en) 2013-09-06 2020-03-24 President And Fellows Of Harvard College Switchable Cas9 nucleases and uses thereof
US10682410B2 (en) 2013-09-06 2020-06-16 President And Fellows Of Harvard College Delivery system for functional nucleases
US10704062B2 (en) 2014-07-30 2020-07-07 President And Fellows Of Harvard College CAS9 proteins including ligand-dependent inteins
US10745677B2 (en) 2016-12-23 2020-08-18 President And Fellows Of Harvard College Editing of CCR5 receptor gene to protect against HIV infection
WO2020168315A1 (en) 2019-02-15 2020-08-20 Just-Evotec Biologics, Inc. Automated biomanufacturing systems, facilities, and processes
WO2020234468A1 (en) 2019-05-23 2020-11-26 Nomad Bioscience Gmbh Rna viral rna molecule for gene editing
US10858639B2 (en) 2013-09-06 2020-12-08 President And Fellows Of Harvard College CAS9 variants and uses thereof
EP3612204A4 (en) * 2017-04-21 2021-01-27 The General Hospital Corporation Inducible, tunable, and multiplex human gene regulation using crispr-cpf1
US10947530B2 (en) 2016-08-03 2021-03-16 President And Fellows Of Harvard College Adenosine nucleobase editors and uses thereof
US11046948B2 (en) 2013-08-22 2021-06-29 President And Fellows Of Harvard College Engineered transcription activator-like effector (TALE) domains and uses thereof
EP3878958A1 (en) 2020-03-11 2021-09-15 B.R.A.I.N. Biotechnology Research And Information Network AG Crispr-cas nucleases from cpr-enriched metagenome
EP3922719A1 (en) 2020-06-12 2021-12-15 Eligo Bioscience Specific decolonization of antibiotic resistant bacteria for prophylactic purposes
US11214780B2 (en) 2015-10-23 2022-01-04 President And Fellows Of Harvard College Nucleobase editors and uses thereof
WO2022003209A1 (en) 2020-07-03 2022-01-06 Eligo Bioscience Method of containment of nucleic acid vectors introduced in a microbiome population
EP3943600A1 (en) 2020-07-21 2022-01-26 B.R.A.I.N. Biotechnology Research And Information Network AG Novel, non-naturally occurring crispr-cas nucleases for genome editing
US11268082B2 (en) 2017-03-23 2022-03-08 President And Fellows Of Harvard College Nucleobase editors comprising nucleic acid programmable DNA binding proteins
US11306324B2 (en) 2016-10-14 2022-04-19 President And Fellows Of Harvard College AAV delivery of nucleobase editors
US11319532B2 (en) 2017-08-30 2022-05-03 President And Fellows Of Harvard College High efficiency base editors comprising Gam
WO2022101286A1 (en) 2020-11-11 2022-05-19 Leibniz-Institut Für Pflanzenbiochemie Fusion protein for editing endogenous dna of a eukaryotic cell
WO2022144382A1 (en) 2020-12-30 2022-07-07 Eligo Bioscience Chimeric receptor binding proteins resistant to proteolytic degradation
WO2022165001A1 (en) 2021-01-29 2022-08-04 Merck Sharp & Dohme Llc Compositions of programmed death receptor 1 (pd-1) antibodies and methods of obtaining the compositions thereof
US11434478B2 (en) 2018-08-09 2022-09-06 Gflas Life Sciences, Inc. Compositions and methods for genome engineering with Cas12a proteins
WO2022184765A1 (en) 2021-03-02 2022-09-09 BRAIN Biotech AG NOVEL CRISPR-Cas NUCLEASES FROM METAGENOMES
US11447770B1 (en) 2019-03-19 2022-09-20 The Broad Institute, Inc. Methods and compositions for prime editing nucleotide sequences
EP4063501A1 (en) 2016-02-15 2022-09-28 Benson Hill, Inc. Compositions and methods for modifying genomes
WO2022238552A1 (en) 2021-05-12 2022-11-17 Eligo Bioscience Production bacterial cells and use thereof in production methods
US11542496B2 (en) 2017-03-10 2023-01-03 President And Fellows Of Harvard College Cytosine to guanine base editor
US11542509B2 (en) 2016-08-24 2023-01-03 President And Fellows Of Harvard College Incorporation of unnatural amino acids into proteins using base editing
WO2023287707A1 (en) 2021-07-15 2023-01-19 Just-Evotec Biologics, Inc. Bidirectional tangential flow filtration (tff) perfusion system
US11560566B2 (en) 2017-05-12 2023-01-24 President And Fellows Of Harvard College Aptazyme-embedded guide RNAs for use with CRISPR-Cas9 in genome editing and transcriptional activation
US11584781B2 (en) 2019-12-30 2023-02-21 Eligo Bioscience Chimeric receptor binding proteins resistant to proteolytic degradation
US11661590B2 (en) 2016-08-09 2023-05-30 President And Fellows Of Harvard College Programmable CAS9-recombinase fusion proteins and uses thereof
US11732274B2 (en) 2017-07-28 2023-08-22 President And Fellows Of Harvard College Methods and compositions for evolving base editors using phage-assisted continuous evolution (PACE)
US11746352B2 (en) 2019-12-30 2023-09-05 Eligo Bioscience Microbiome modulation of a host by delivery of DNA payloads with minimized spread
US11795443B2 (en) 2017-10-16 2023-10-24 The Broad Institute, Inc. Uses of adenosine base editors
US11898179B2 (en) 2017-03-09 2024-02-13 President And Fellows Of Harvard College Suppression of pain by gene editing
US11912985B2 (en) 2020-05-08 2024-02-27 The Broad Institute, Inc. Methods and compositions for simultaneous editing of both strands of a target double-stranded nucleotide sequence
US12098372B2 (en) 2019-12-30 2024-09-24 Eligo Bioscience Microbiome modulation of a host by delivery of DNA payloads with minimized spread

Families Citing this family (143)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2015342749B2 (en) 2014-11-07 2022-01-27 Editas Medicine, Inc. Methods for improving CRISPR/Cas-mediated genome-editing
GB201506509D0 (en) 2015-04-16 2015-06-03 Univ Wageningen Nuclease-mediated genome editing
US11390884B2 (en) 2015-05-11 2022-07-19 Editas Medicine, Inc. Optimized CRISPR/cas9 systems and methods for gene editing in stem cells
KR20180031671A (en) 2015-06-09 2018-03-28 에디타스 메디신, 인코포레이티드 CRISPR / CAS-related methods and compositions for improving transplantation
US10648020B2 (en) 2015-06-18 2020-05-12 The Broad Institute, Inc. CRISPR enzymes and systems
WO2017053879A1 (en) 2015-09-24 2017-03-30 Editas Medicine, Inc. Use of exonucleases to improve crispr/cas-mediated genome editing
WO2017106657A1 (en) 2015-12-18 2017-06-22 The Broad Institute Inc. Novel crispr enzymes and systems
EP3433363A1 (en) 2016-03-25 2019-01-30 Editas Medicine, Inc. Genome editing systems comprising repair-modulating enzyme molecules and methods of their use
US11236313B2 (en) 2016-04-13 2022-02-01 Editas Medicine, Inc. Cas9 fusion molecules, gene editing systems, and methods of use thereof
EP3445856A1 (en) * 2016-04-19 2019-02-27 The Broad Institute Inc. Novel crispr enzymes and systems
US10337051B2 (en) 2016-06-16 2019-07-02 The Regents Of The University Of California Methods and compositions for detecting a target RNA
US11293021B1 (en) 2016-06-23 2022-04-05 Inscripta, Inc. Automated cell processing methods, modules, instruments, and systems
AU2017302551B2 (en) * 2016-07-26 2023-04-27 The General Hospital Corporation Variants of CRISPR from Prevotella and Francisella 1 (Cpf1)
WO2018064352A1 (en) 2016-09-30 2018-04-05 The Regents Of The University Of California Rna-guided nucleic acid modifying enzymes and methods of use thereof
GB2569733B (en) 2016-09-30 2022-09-14 Univ California RNA-guided nucleic acid modifying enzymes and methods of use thereof
WO2018067985A1 (en) 2016-10-07 2018-04-12 Altria Client Services Llc Composition and methods for producing tobacco plants and products having reduced tobacco-specific nitrosamines (tsnas)
US20190309317A1 (en) 2016-12-21 2019-10-10 Altria Client Services Llc Compositions and methods for producing tobacco plants and products having altered alkaloid levels
US11859219B1 (en) 2016-12-30 2024-01-02 Flagship Pioneering Innovations V, Inc. Methods of altering a target nucleotide sequence with an RNA-guided nuclease and a single guide RNA
EP4095263A1 (en) 2017-01-06 2022-11-30 Editas Medicine, Inc. Methods of assessing nuclease cleavage
BR112019021719A2 (en) 2017-04-21 2020-06-16 The General Hospital Corporation CPF1 VARIANT (CAS12A) WITH CHANGED PAM SPECIFICITY
EP3615672A1 (en) 2017-04-28 2020-03-04 Editas Medicine, Inc. Methods and systems for analyzing guide rna molecules
WO2018226880A1 (en) 2017-06-06 2018-12-13 Zymergen Inc. A htp genomic engineering platform for improving escherichia coli
CN110719956A (en) 2017-06-06 2020-01-21 齐默尔根公司 High throughput genome engineering platform for improving fungal strains
CN110997908A (en) 2017-06-09 2020-04-10 爱迪塔斯医药公司 Engineered CAS9 nucleases
RU2769475C2 (en) * 2017-06-23 2022-04-01 Инскрипта, Инк. Nucleic acid-directed nucleases
US9982279B1 (en) 2017-06-23 2018-05-29 Inscripta, Inc. Nucleic acid-guided nucleases
US10011849B1 (en) * 2017-06-23 2018-07-03 Inscripta, Inc. Nucleic acid-guided nucleases
DK3645719T3 (en) 2017-06-30 2022-05-16 Inscripta Inc Automated cell processing methods, modules, instruments and systems
US11866726B2 (en) 2017-07-14 2024-01-09 Editas Medicine, Inc. Systems and methods for targeted integration and genome editing and detection thereof using integrated priming sites
WO2019035003A1 (en) 2017-08-17 2019-02-21 Benson Hill Biosystems, Inc. Increasing plant growth and yield by using a glutaredoxin
US20210054404A1 (en) * 2017-08-22 2021-02-25 Napigen, Inc. Organelle genome modification using polynucleotide guided endonuclease
AU2018320865B2 (en) 2017-08-23 2023-09-14 The General Hospital Corporation Engineered CRISPR-Cas9 nucleases with altered PAM specificity
US10738327B2 (en) 2017-08-28 2020-08-11 Inscripta, Inc. Electroporation cuvettes for automation
GB201713926D0 (en) * 2017-08-30 2017-10-11 Univ Edinburgh Gene editing method
WO2019041296A1 (en) * 2017-09-01 2019-03-07 上海科技大学 Base editing system and method
EP3681307A4 (en) 2017-09-15 2021-06-16 Covercress Inc. Low fiber pennycress meal and methods of making
CN111372650A (en) 2017-09-30 2020-07-03 因思科瑞普特公司 Flow-through electroporation apparatus
US11578334B2 (en) 2017-10-25 2023-02-14 Monsanto Technology Llc Targeted endonuclease activity of the RNA-guided endonuclease CasX in eukaryotes
SG11202003863VA (en) 2017-11-01 2020-05-28 Univ California Casz compositions and methods of use
WO2019089808A1 (en) 2017-11-01 2019-05-09 The Regents Of The University Of California Class 2 crispr/cas compositions and methods of use
BR112020011350A2 (en) 2017-12-08 2020-11-17 Synthetic Genomics, Inc. improvement of lipid productivity of algae through genetic modification of a protein that contains the tpr domain
WO2019123246A1 (en) 2017-12-19 2019-06-27 Benson Hill Biosystems, Inc. Modified agpase large subunit sequences and methods for detection of precise genome edits
WO2019140297A1 (en) 2018-01-12 2019-07-18 Altria Client Services Llc Compositions and methods for producing tobacco plants and products having altered alkaloid levels
WO2019143926A1 (en) 2018-01-19 2019-07-25 Covercress Inc. Low glucosinolate pennycress meal and methods of making
BR112020018187A2 (en) 2018-03-05 2021-04-27 Altria Client Services Llc compositions and methods for the production of plants and tobacco products that have altered levels of alkaloids with desirable leaf quality
CN112204131A (en) 2018-03-29 2021-01-08 因思科瑞普特公司 Automated control of cell growth rate for induction and transformation
EP4062774B1 (en) 2018-04-03 2024-03-20 Altria Client Services LLC Composition and methods for producing tobacco plants and products having increased phenylalanine and reduced tobacco-specific nitrosamines (tsnas)
WO2019200004A1 (en) 2018-04-13 2019-10-17 Inscripta, Inc. Automated cell processing instruments comprising reagent cartridges
US10508273B2 (en) 2018-04-24 2019-12-17 Inscripta, Inc. Methods for identifying selective binding pairs
US10557216B2 (en) 2018-04-24 2020-02-11 Inscripta, Inc. Automated instrumentation for production of T-cell receptor peptide libraries
US10858761B2 (en) 2018-04-24 2020-12-08 Inscripta, Inc. Nucleic acid-guided editing of exogenous polynucleotides in heterologous cells
WO2019213910A1 (en) * 2018-05-10 2019-11-14 Syngenta Participations Ag Methods and compositions for targeted editing of polynucleotides
CA3103500A1 (en) 2018-06-15 2019-12-19 KWS SAAT SE & Co. KGaA Methods for improving genome engineering and regeneration in plant ii
WO2019238909A1 (en) 2018-06-15 2019-12-19 KWS SAAT SE & Co. KGaA Methods for improving genome engineering and regeneration in plant
EP3813974A4 (en) 2018-06-30 2022-08-03 Inscripta, Inc. Instruments, modules, and methods for improved detection of edited sequences in live cells
US11384361B2 (en) 2018-07-26 2022-07-12 Altria Client Services Llc Compositions and methods based on PMT engineering for producing tobacco plants and products having altered alkaloid levels
WO2020028729A1 (en) 2018-08-01 2020-02-06 Mammoth Biosciences, Inc. Programmable nuclease compositions and methods of use thereof
MX2021001553A (en) * 2018-08-08 2021-07-21 Integrated Dna Tech Inc Novel mutations that enhance the dna cleavage activity of acidaminococcus sp. cpf1.
US10532324B1 (en) 2018-08-14 2020-01-14 Inscripta, Inc. Instruments, modules, and methods for improved detection of edited sequences in live cells
US10752874B2 (en) 2018-08-14 2020-08-25 Inscripta, Inc. Instruments, modules, and methods for improved detection of edited sequences in live cells
US11142740B2 (en) 2018-08-14 2021-10-12 Inscripta, Inc. Detection of nuclease edited sequences in automated modules and instruments
EP3841205A4 (en) * 2018-08-22 2022-08-17 The Regents of The University of California Variant type v crispr/cas effector polypeptides and methods of use thereof
CN112955540A (en) 2018-08-30 2021-06-11 因思科瑞普特公司 Improved detection of nuclease edited sequences in automated modules and instruments
US11459551B1 (en) 2018-08-31 2022-10-04 Inari Agriculture Technology, Inc. Compositions, systems, and methods for genome editing
EP3623379A1 (en) 2018-09-11 2020-03-18 KWS SAAT SE & Co. KGaA Beet necrotic yellow vein virus (bnyvv)-resistance modifying gene
US11214781B2 (en) 2018-10-22 2022-01-04 Inscripta, Inc. Engineered enzyme
CN113227368B (en) 2018-10-22 2023-07-07 因思科瑞普特公司 Engineered enzymes
US11407995B1 (en) 2018-10-26 2022-08-09 Inari Agriculture Technology, Inc. RNA-guided nucleases and DNA binding proteins
WO2020092704A1 (en) 2018-10-31 2020-05-07 Zymergen Inc. Multiplexed deterministic assembly of dna libraries
US11434477B1 (en) 2018-11-02 2022-09-06 Inari Agriculture Technology, Inc. RNA-guided nucleases and DNA binding proteins
US11317593B2 (en) 2018-12-04 2022-05-03 Altria Client Services Llc Low-nicotine tobacco plants and tobacco products made therefrom
EP3931313A2 (en) 2019-01-04 2022-01-05 Mammoth Biosciences, Inc. Programmable nuclease improvements and compositions and methods for nucleic acid amplification and detection
WO2020154466A1 (en) 2019-01-24 2020-07-30 Altria Client Services Llc Tobacco plants comprising reduced nicotine and reduced tobacco specific nitrosamines
EP3918059A4 (en) * 2019-01-29 2022-11-30 Flagship Pioneering Innovations V, Inc. Compositions comprising an endonuclease and methods for purifying an endonuclease
EP3918080A1 (en) 2019-01-29 2021-12-08 The University Of Warwick Methods for enhancing genome engineering efficiency
US11053515B2 (en) 2019-03-08 2021-07-06 Zymergen Inc. Pooled genome editing in microbes
CN113728106A (en) 2019-03-08 2021-11-30 齐默尔根公司 Iterative genome editing in microorganisms
US11001831B2 (en) 2019-03-25 2021-05-11 Inscripta, Inc. Simultaneous multiplex genome editing in yeast
CN113631713A (en) 2019-03-25 2021-11-09 因思科瑞普特公司 Simultaneous multiplex genome editing in yeast
CA3139122C (en) 2019-06-06 2023-04-25 Inscripta, Inc. Curing for recursive nucleic acid-guided cell editing
TW202113074A (en) 2019-06-07 2021-04-01 美商斯奎柏治療公司 Engineered casx systems
US10907125B2 (en) 2019-06-20 2021-02-02 Inscripta, Inc. Flow through electroporation modules and instrumentation
CA3139124C (en) 2019-06-21 2023-01-31 Inscripta, Inc. Genome-wide rationally-designed mutations leading to enhanced lysine production in e. coli
US10927385B2 (en) 2019-06-25 2021-02-23 Inscripta, Inc. Increased nucleic-acid guided cell editing in yeast
CA3153301A1 (en) * 2019-09-05 2021-03-11 Benson Hill, Inc. Compositions and methods for modifying genomes
CA3154479A1 (en) * 2019-09-10 2021-03-18 Consejo Nacional De Investigaciones Cientificas Y Tecnicas (Conicet) Novel class 2 type ii and type v crispr-cas rna-guided endonucleases
CN115103910A (en) 2019-10-03 2022-09-23 工匠开发实验室公司 CRISPR system with engineered dual guide nucleic acids
CN114929880A (en) 2019-10-10 2022-08-19 奥驰亚客户服务有限公司 QPT engineering based compositions and methods for producing tobacco plants and products with altered alkaloid levels
CN114829605A (en) 2019-10-10 2022-07-29 奥驰亚客户服务有限公司 Compositions and methods for producing tobacco plants and products with altered levels of alkaloids and desired leaf quality by manipulating leaf quality genes
CN111235130B (en) * 2019-11-15 2022-11-25 武汉大学 II-type V-type CRISPR protein CeCas12a and application thereof in gene editing
EP4063500A4 (en) * 2019-11-18 2023-12-27 Suzhou Qi Biodesign biotechnology Company Limited Gene editing system derived from flavobacteria
WO2021102059A1 (en) 2019-11-19 2021-05-27 Inscripta, Inc. Methods for increasing observed editing in bacteria
WO2021099996A1 (en) * 2019-11-19 2021-05-27 Benson Hill, Inc. Anti-bacterial crispr compositions and methods
EP4065701A4 (en) * 2019-11-27 2023-11-29 Danmarks Tekniske Universitet Constructs, compositions and methods thereof having improved genome editing efficiency and specificity
JP2023504511A (en) 2019-12-03 2023-02-03 アルトリア クライアント サーヴィシーズ リミテッド ライアビリティ カンパニー Compositions and methods for producing tobacco plants and tobacco products with altered alkaloid levels
WO2021113763A1 (en) 2019-12-06 2021-06-10 Scribe Therapeutics Inc. Compositions and methods for the targeting of rhodopsin
WO2021118626A1 (en) 2019-12-10 2021-06-17 Inscripta, Inc. Novel mad nucleases
US10704033B1 (en) 2019-12-13 2020-07-07 Inscripta, Inc. Nucleic acid-guided nucleases
IL292895A (en) 2019-12-18 2022-07-01 Inscripta Inc Cascade/dcas3 complementation assays for in vivo detection of nucleic acid-guided nuclease edited cells
US10689669B1 (en) 2020-01-11 2020-06-23 Inscripta, Inc. Automated multi-module cell processing methods, instruments, and systems
US20210230627A1 (en) 2020-01-17 2021-07-29 Altria Client Services Llc Methods and compositions related to improved nitrogen use efficiency
US20230105789A1 (en) 2020-01-27 2023-04-06 Altria Client Services Llc Compositions and Methods Based on PMT Engineering for Producing Tobacco Plants and Products Having Altered Alkaloid Levels
EP4096770A1 (en) 2020-01-27 2022-12-07 Inscripta, Inc. Electroporation modules and instrumentation
CN115552015A (en) * 2020-02-28 2022-12-30 香港中文大学 Engineering immune cells via simultaneous knock-in and gene disruption
US20210332388A1 (en) 2020-04-24 2021-10-28 Inscripta, Inc. Compositions, methods, modules and instruments for automated nucleic acid-guided nuclease editing in mammalian cells
CN111471702B (en) * 2020-04-30 2022-07-26 福建农林大学 Slow-growing rhizobium stable red fluorescent labeling vector and application thereof
US11787841B2 (en) 2020-05-19 2023-10-17 Inscripta, Inc. Rationally-designed mutations to the thrA gene for enhanced lysine production in E. coli
US20230175019A1 (en) * 2020-05-28 2023-06-08 University Of Southern California Scalable trio guide rna approach for integration of large donor dna
BR112022024801A2 (en) 2020-06-03 2023-05-09 Altria Client Services Llc COMPOSITIONS AND METHODS FOR PRODUCING TOBACCO PLANTS AND PRODUCTS HAVING ALTERED ALKALOID LEVELS
WO2022060749A1 (en) 2020-09-15 2022-03-24 Inscripta, Inc. Crispr editing to embed nucleic acid landing pads into genomes of live cells
KR20230074525A (en) * 2020-09-24 2023-05-30 플래그쉽 파이어니어링 이노베이션스 브이, 인크. Compositions and methods for inhibiting gene expression
JP2023548580A (en) * 2020-11-05 2023-11-17 ローカス バイオサイエンシーズ,インク. Phage compositions against Escherichia including CRISPR-CAS system and methods of use thereof
US20240011041A1 (en) * 2020-11-05 2024-01-11 Locus Biosciences, Inc. Phage compositions for pseudomonas comprising crispr-cas systems and methods of use thereof
US11512297B2 (en) 2020-11-09 2022-11-29 Inscripta, Inc. Affinity tag for recombination protein recruitment
US20220195405A1 (en) * 2020-12-17 2022-06-23 Monsanto Technology Llc Engineered ssdnase-free crispr endonucleases
WO2022146497A1 (en) 2021-01-04 2022-07-07 Inscripta, Inc. Mad nucleases
WO2022150269A1 (en) 2021-01-07 2022-07-14 Inscripta, Inc. Mad nucleases
CN112778399A (en) * 2021-01-21 2021-05-11 南开大学 Preparation and property characterization method of nano antibacterial peptide derived from toxic amyloid fiber
US20240115739A1 (en) * 2021-02-12 2024-04-11 The Board Of Trustees Of The Leland Stanford Junior University Synthetic cas12a for enhanced multiplex gene control and editing
US11884924B2 (en) 2021-02-16 2024-01-30 Inscripta, Inc. Dual strand nucleic acid-guided nickase editing
CN114277015B (en) * 2021-03-16 2023-12-15 山东舜丰生物科技有限公司 CRISPR enzyme and application
WO2022236147A1 (en) 2021-05-06 2022-11-10 Artisan Development Labs, Inc. Modified nucleases
CN112920263B (en) * 2021-05-11 2021-08-10 上海浦东复旦大学张江科技研究院 Application of epigenetic modification OsMOF protein in improvement of rice yield traits
CN113373130B (en) * 2021-05-31 2023-12-22 复旦大学 Cas12 protein, gene editing system containing Cas12 protein and application
EP4419672A2 (en) 2021-06-01 2024-08-28 Artisan Development Labs, Inc. Compositions and methods for targeting, editing, or modifying genes
EP4370676A2 (en) 2021-06-18 2024-05-22 Artisan Development Labs, Inc. Compositions and methods for targeting, editing or modifying human genes
EP4367229A1 (en) * 2021-07-09 2024-05-15 Acrigen Biosciences Compositions and methods for nucleic acid modifications
JPWO2023027041A1 (en) * 2021-08-23 2023-03-02
EP4441209A1 (en) * 2021-11-29 2024-10-09 Editas Medicine, Inc. Engineered crispr/cas12a effector proteins, and uses thereof
WO2023167882A1 (en) 2022-03-01 2023-09-07 Artisan Development Labs, Inc. Composition and methods for transgene insertion
CN114277029B (en) * 2022-03-08 2022-05-10 农业农村部环境保护科研监测所 Method for efficiently extracting intestinal contents and extracellular DNA (deoxyribonucleic acid) of earthworms
WO2023225410A2 (en) 2022-05-20 2023-11-23 Artisan Development Labs, Inc. Systems and methods for assessing risk of genome editing events
WO2023240061A2 (en) * 2022-06-06 2023-12-14 Board Of Regents, The University Of Texas System Compositions and methods related to modified cas12a2 molecules
WO2024005863A1 (en) 2022-06-30 2024-01-04 Inari Agriculture Technology, Inc. Compositions, systems, and methods for genome editing
EP4299733A1 (en) 2022-06-30 2024-01-03 Inari Agriculture Technology, Inc. Compositions, systems, and methods for genome editing
WO2024005864A1 (en) 2022-06-30 2024-01-04 Inari Agriculture Technology, Inc. Compositions, systems, and methods for genome editing
EP4299739A1 (en) 2022-06-30 2024-01-03 Inari Agriculture Technology, Inc. Compositions, systems, and methods for genome editing
WO2024062138A1 (en) 2022-09-23 2024-03-28 Mnemo Therapeutics Immune cells comprising a modified suv39h1 gene
WO2024133272A1 (en) * 2022-12-21 2024-06-27 BASF Agricultural Solutions Seed US LLC Increased editing efficiency by co-delivery of rnp with nucleic acid
CN116286737B (en) * 2023-01-02 2023-09-22 华中农业大学 PAM-free endonuclease and gene editing system mediated by same
CN116179513B (en) * 2023-03-10 2023-12-22 之江实验室 Cpf1 protein and application thereof in gene editing
CN116179511B (en) * 2023-03-10 2023-12-22 之江实验室 Application of Cpf1 protein in preparation of kit for nucleic acid detection
CN116751763B (en) * 2023-05-08 2024-02-13 珠海舒桐医疗科技有限公司 Cpf1 protein, V-type gene editing system and application

Citations (61)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4945050A (en) 1984-11-13 1990-07-31 Cornell Research Foundation, Inc. Method for transporting substances into living cells and tissues and apparatus therefor
US5023179A (en) 1988-11-14 1991-06-11 Eric Lam Promoter enhancer element for gene expression in plant roots
US5110732A (en) 1989-03-14 1992-05-05 The Rockefeller University Selective gene expression in plants
US5240855A (en) 1989-05-12 1993-08-31 Pioneer Hi-Bred International, Inc. Particle gun
US5268463A (en) 1986-11-11 1993-12-07 Jefferson Richard A Plant promoter α-glucuronidase gene construct
US5322783A (en) 1989-10-17 1994-06-21 Pioneer Hi-Bred International, Inc. Soybean transformation by microparticle bombardment
US5324646A (en) 1992-01-06 1994-06-28 Pioneer Hi-Bred International, Inc. Methods of regeneration of Medicago sativa and expressing foreign DNA in same
US5364780A (en) 1989-03-17 1994-11-15 E. I. Du Pont De Nemours And Company External regulation of gene expression by inducible promoters
US5399680A (en) 1991-05-22 1995-03-21 The Salk Institute For Biological Studies Rice chitinase promoter
US5401836A (en) 1992-07-16 1995-03-28 Pioneer Hi-Bre International, Inc. Brassica regulatory sequence for root-specific or root-abundant gene expression
US5459252A (en) 1991-01-31 1995-10-17 North Carolina State University Root specific gene promoter
US5466785A (en) 1990-04-12 1995-11-14 Ciba-Geigy Corporation Tissue-preferential promoters
US5563055A (en) 1992-07-27 1996-10-08 Pioneer Hi-Bred International, Inc. Method of Agrobacterium-mediated transformation of cultured soybean cells
US5569597A (en) 1985-05-13 1996-10-29 Ciba Geigy Corp. Methods of inserting viral DNA into plant material
US5583210A (en) 1993-03-18 1996-12-10 Pioneer Hi-Bred International, Inc. Methods and compositions for controlling plant development
US5602321A (en) 1992-11-20 1997-02-11 Monsanto Company Transgenic cotton plants producing heterologous polyhydroxy(e) butyrate bioplastic
US5604121A (en) 1991-08-27 1997-02-18 Agricultural Genetics Company Limited Proteins with insecticidal properties against homopteran insects and their use in plant protection
US5608144A (en) 1994-08-12 1997-03-04 Dna Plant Technology Corp. Plant group 2 promoters and uses thereof
US5608142A (en) 1986-12-03 1997-03-04 Agracetus, Inc. Insecticidal cotton plants
US5608149A (en) 1990-06-18 1997-03-04 Monsanto Company Enhanced starch biosynthesis in tomatoes
US5633363A (en) 1994-06-03 1997-05-27 Iowa State University, Research Foundation In Root preferential promoter
US5683439A (en) 1993-10-20 1997-11-04 Hollister Incorporated Post-operative thermal blanket
US5703049A (en) 1996-02-29 1997-12-30 Pioneer Hi-Bred Int'l, Inc. High methionine derivatives of α-hordothionin for pathogen-control
US5736369A (en) 1994-07-29 1998-04-07 Pioneer Hi-Bred International, Inc. Method for producing transgenic cereal plants
US5750386A (en) 1991-10-04 1998-05-12 North Carolina State University Pathogen-resistant transgenic plants
WO1998020133A2 (en) 1996-11-01 1998-05-14 Pioneer Hi-Bred International, Inc. Proteins with enhanced levels of essential amino acids
US5789156A (en) 1993-06-14 1998-08-04 Basf Ag Tetracycline-regulated transcriptional inhibitors
US5814618A (en) 1993-06-14 1998-09-29 Basf Aktiengesellschaft Methods for regulating gene expression
US5837876A (en) 1995-07-28 1998-11-17 North Carolina State University Root cortex specific gene promoter
US5850016A (en) 1996-03-20 1998-12-15 Pioneer Hi-Bred International, Inc. Alteration of amino acid compositions in seeds
US5879918A (en) 1989-05-12 1999-03-09 Pioneer Hi-Bred International, Inc. Pretreatment of microprojectiles prior to using in a particle gun
US5885801A (en) 1995-06-02 1999-03-23 Pioneer Hi-Bred International, Inc. High threonine derivatives of α-hordothionin
US5886244A (en) 1988-06-10 1999-03-23 Pioneer Hi-Bred International, Inc. Stable transformation of plant cells
US5885802A (en) 1995-06-02 1999-03-23 Pioneer Hi-Bred International, Inc. High methionine derivatives of α-hordothionin
US5932782A (en) 1990-11-14 1999-08-03 Pioneer Hi-Bred International, Inc. Plant transformation method using agrobacterium species adhered to microprojectiles
WO1999043838A1 (en) 1998-02-24 1999-09-02 Pioneer Hi-Bred International, Inc. Synthetic promoters
WO1999050427A2 (en) 1998-03-27 1999-10-07 Max-Plack-Gesellschaft Zur Förderung Der Wissenschaften E.V. Novel basal endosperm transfer cell layer (betl) specific genes
US5981840A (en) 1997-01-24 1999-11-09 Pioneer Hi-Bred International, Inc. Methods for agrobacterium-mediated transformation
US5990389A (en) 1993-01-13 1999-11-23 Pioneer Hi-Bred International, Inc. High lysine derivatives of α-hordothionin
US6015891A (en) 1988-09-09 2000-01-18 Mycogen Plant Science, Inc. Synthetic insecticidal crystal protein gene having a modified frequency of codon usage
WO2000012733A1 (en) 1998-08-28 2000-03-09 Pioneer Hi-Bred International, Inc. Seed-preferred promoters from end genes
WO2000028058A2 (en) 1998-11-09 2000-05-18 Pioneer Hi-Bred International, Inc. Transcriptional activator lec1 nucleic acids, polypeptides and their uses
US6177611B1 (en) 1998-02-26 2001-01-23 Pioneer Hi-Bred International, Inc. Maize promoters
US6225529B1 (en) 1998-08-20 2001-05-01 Pioneer Hi-Bred International, Inc. Seed-preferred promoters
WO2003092360A2 (en) 2002-04-30 2003-11-13 Verdia, Inc. Novel glyphosate-n-acetyltransferase (gat) genes
US20040082770A1 (en) 2000-10-30 2004-04-29 Verdia, Inc. Novel glyphosate N-acetyltransferase (GAT) genes
US20090049571A1 (en) 2007-08-15 2009-02-19 Pioneer Hi-Bred International, Inc. Seed-Preferred Promoters
US20090089897A1 (en) 2007-09-28 2009-04-02 Pioneer Hi-Bred International, Inc. Seed-Preferred Promoters
WO2009094704A1 (en) 2008-01-31 2009-08-06 The University Of Adelaide Seed specific expression in plants
WO2010019996A1 (en) 2008-08-18 2010-02-25 Australian Centre For Plant Functional Genomics Pty Ltd Seed active transcriptional control sequences
US7700836B2 (en) 2007-08-13 2010-04-20 Pioneer Hi-Bred International, Inc. Seed-preferred regulatory elements
US7745697B2 (en) 2003-11-03 2010-06-29 Biogemma MEG1 endosperm-specific promoters and genes
US7803990B2 (en) 1999-04-16 2010-09-28 Pioneer Hi-Bred International, Inc. Early endosperm promoter eep1
US20100281569A1 (en) 2009-05-04 2010-11-04 Pioneer Hi-Bred International, Inc. Maize 17kd oleosin seed-preferred regulatory element
US20100281570A1 (en) 2009-05-04 2010-11-04 Pioneer Hi-Bred International, Inc. Maize 18kd oleosin seed-preferred regulatory element
WO2010129999A1 (en) 2009-05-13 2010-11-18 Molecular Plant Breeding Nominees Ltd Plant promoter operable in basal endosperm transfer layer of endosperm and uses thereof
US20100313301A1 (en) 2009-06-09 2010-12-09 Pioneer Hi-Bred International, Inc. Early Endosperm Promoter and Methods of Use
US20110296551A1 (en) 2008-11-25 2011-12-01 Algentech Sas Plant mitochondria transformation method
US20110321187A1 (en) 2008-11-25 2011-12-29 Algentech Sas Plant plastid transformation method
US20160138008A1 (en) 2012-05-25 2016-05-19 The Regents Of The University Of California Methods and compositions for rna-directed target dna modification and for rna-directed modulation of transcription
US20160208243A1 (en) 2015-06-18 2016-07-21 The Broad Institute, Inc. Novel crispr enzymes and systems

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5208842B2 (en) 2009-04-20 2013-06-12 株式会社カプコン GAME SYSTEM, GAME CONTROL METHOD, PROGRAM, AND COMPUTER-READABLE RECORDING MEDIUM CONTAINING THE PROGRAM
CA3226329A1 (en) * 2011-12-16 2013-06-20 Targetgene Biotechnologies Ltd Compositions and methods for modifying a predetermined target nucleic acid sequence
US20140356956A1 (en) 2013-06-04 2014-12-04 President And Fellows Of Harvard College RNA-Guided Transcriptional Regulation
EP3071695A2 (en) 2013-11-18 2016-09-28 Crispr Therapeutics AG Crispr-cas system materials and methods
RS64527B1 (en) * 2015-01-28 2023-09-29 Caribou Biosciences Inc Crispr hybrid dna/rna polynucleotides and methods of use
WO2017015015A1 (en) 2015-07-17 2017-01-26 Emory University Crispr-associated protein from francisella and uses related thereto
WO2017106657A1 (en) * 2015-12-18 2017-06-22 The Broad Institute Inc. Novel crispr enzymes and systems
US9896696B2 (en) 2016-02-15 2018-02-20 Benson Hill Biosystems, Inc. Compositions and methods for modifying genomes
US20190330659A1 (en) * 2016-07-15 2019-10-31 Zymergen Inc. Scarless dna assembly and genome editing using crispr/cpf1 and dna ligase

Patent Citations (69)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4945050A (en) 1984-11-13 1990-07-31 Cornell Research Foundation, Inc. Method for transporting substances into living cells and tissues and apparatus therefor
US5569597A (en) 1985-05-13 1996-10-29 Ciba Geigy Corp. Methods of inserting viral DNA into plant material
US5268463A (en) 1986-11-11 1993-12-07 Jefferson Richard A Plant promoter α-glucuronidase gene construct
US5608142A (en) 1986-12-03 1997-03-04 Agracetus, Inc. Insecticidal cotton plants
US5886244A (en) 1988-06-10 1999-03-23 Pioneer Hi-Bred International, Inc. Stable transformation of plant cells
US6015891A (en) 1988-09-09 2000-01-18 Mycogen Plant Science, Inc. Synthetic insecticidal crystal protein gene having a modified frequency of codon usage
US5023179A (en) 1988-11-14 1991-06-11 Eric Lam Promoter enhancer element for gene expression in plant roots
US5110732A (en) 1989-03-14 1992-05-05 The Rockefeller University Selective gene expression in plants
US5364780A (en) 1989-03-17 1994-11-15 E. I. Du Pont De Nemours And Company External regulation of gene expression by inducible promoters
US5879918A (en) 1989-05-12 1999-03-09 Pioneer Hi-Bred International, Inc. Pretreatment of microprojectiles prior to using in a particle gun
US5240855A (en) 1989-05-12 1993-08-31 Pioneer Hi-Bred International, Inc. Particle gun
US5322783A (en) 1989-10-17 1994-06-21 Pioneer Hi-Bred International, Inc. Soybean transformation by microparticle bombardment
US5466785A (en) 1990-04-12 1995-11-14 Ciba-Geigy Corporation Tissue-preferential promoters
US5608149A (en) 1990-06-18 1997-03-04 Monsanto Company Enhanced starch biosynthesis in tomatoes
US5932782A (en) 1990-11-14 1999-08-03 Pioneer Hi-Bred International, Inc. Plant transformation method using agrobacterium species adhered to microprojectiles
US5459252A (en) 1991-01-31 1995-10-17 North Carolina State University Root specific gene promoter
US5399680A (en) 1991-05-22 1995-03-21 The Salk Institute For Biological Studies Rice chitinase promoter
US5604121A (en) 1991-08-27 1997-02-18 Agricultural Genetics Company Limited Proteins with insecticidal properties against homopteran insects and their use in plant protection
US5750386A (en) 1991-10-04 1998-05-12 North Carolina State University Pathogen-resistant transgenic plants
US5324646A (en) 1992-01-06 1994-06-28 Pioneer Hi-Bred International, Inc. Methods of regeneration of Medicago sativa and expressing foreign DNA in same
US5401836A (en) 1992-07-16 1995-03-28 Pioneer Hi-Bre International, Inc. Brassica regulatory sequence for root-specific or root-abundant gene expression
US5563055A (en) 1992-07-27 1996-10-08 Pioneer Hi-Bred International, Inc. Method of Agrobacterium-mediated transformation of cultured soybean cells
US5602321A (en) 1992-11-20 1997-02-11 Monsanto Company Transgenic cotton plants producing heterologous polyhydroxy(e) butyrate bioplastic
US5990389A (en) 1993-01-13 1999-11-23 Pioneer Hi-Bred International, Inc. High lysine derivatives of α-hordothionin
US5583210A (en) 1993-03-18 1996-12-10 Pioneer Hi-Bred International, Inc. Methods and compositions for controlling plant development
US5789156A (en) 1993-06-14 1998-08-04 Basf Ag Tetracycline-regulated transcriptional inhibitors
US5814618A (en) 1993-06-14 1998-09-29 Basf Aktiengesellschaft Methods for regulating gene expression
US5683439A (en) 1993-10-20 1997-11-04 Hollister Incorporated Post-operative thermal blanket
US5633363A (en) 1994-06-03 1997-05-27 Iowa State University, Research Foundation In Root preferential promoter
US5736369A (en) 1994-07-29 1998-04-07 Pioneer Hi-Bred International, Inc. Method for producing transgenic cereal plants
US5608144A (en) 1994-08-12 1997-03-04 Dna Plant Technology Corp. Plant group 2 promoters and uses thereof
US5885802A (en) 1995-06-02 1999-03-23 Pioneer Hi-Bred International, Inc. High methionine derivatives of α-hordothionin
US5885801A (en) 1995-06-02 1999-03-23 Pioneer Hi-Bred International, Inc. High threonine derivatives of α-hordothionin
US5837876A (en) 1995-07-28 1998-11-17 North Carolina State University Root cortex specific gene promoter
US5703049A (en) 1996-02-29 1997-12-30 Pioneer Hi-Bred Int'l, Inc. High methionine derivatives of α-hordothionin for pathogen-control
US5850016A (en) 1996-03-20 1998-12-15 Pioneer Hi-Bred International, Inc. Alteration of amino acid compositions in seeds
US6072050A (en) 1996-06-11 2000-06-06 Pioneer Hi-Bred International, Inc. Synthetic promoters
WO1998020133A2 (en) 1996-11-01 1998-05-14 Pioneer Hi-Bred International, Inc. Proteins with enhanced levels of essential amino acids
US5981840A (en) 1997-01-24 1999-11-09 Pioneer Hi-Bred International, Inc. Methods for agrobacterium-mediated transformation
WO1999043838A1 (en) 1998-02-24 1999-09-02 Pioneer Hi-Bred International, Inc. Synthetic promoters
US6177611B1 (en) 1998-02-26 2001-01-23 Pioneer Hi-Bred International, Inc. Maize promoters
US7119251B2 (en) 1998-03-27 2006-10-10 Max-Planck-Gesellschaft Zur Forderung Der Wissenchaften E.V. Basal endosperm transfer cell layer (BELT) specific genes
WO1999050427A2 (en) 1998-03-27 1999-10-07 Max-Plack-Gesellschaft Zur Förderung Der Wissenschaften E.V. Novel basal endosperm transfer cell layer (betl) specific genes
US20040003427A1 (en) 1998-03-27 2004-01-01 Max-Planck-Gesellschaft Zur Forderung Der Wissenschaften Ev Novel basal endosperm transfer cell layer (BELT) specific genes
US6225529B1 (en) 1998-08-20 2001-05-01 Pioneer Hi-Bred International, Inc. Seed-preferred promoters
WO2000012733A1 (en) 1998-08-28 2000-03-09 Pioneer Hi-Bred International, Inc. Seed-preferred promoters from end genes
WO2000028058A2 (en) 1998-11-09 2000-05-18 Pioneer Hi-Bred International, Inc. Transcriptional activator lec1 nucleic acids, polypeptides and their uses
US8049000B2 (en) 1999-04-16 2011-11-01 Pioneer Hi-Bred International, Inc. Early endosperm promoter eep2
US7803990B2 (en) 1999-04-16 2010-09-28 Pioneer Hi-Bred International, Inc. Early endosperm promoter eep1
US20040082770A1 (en) 2000-10-30 2004-04-29 Verdia, Inc. Novel glyphosate N-acetyltransferase (GAT) genes
WO2003092360A2 (en) 2002-04-30 2003-11-13 Verdia, Inc. Novel glyphosate-n-acetyltransferase (gat) genes
US7745697B2 (en) 2003-11-03 2010-06-29 Biogemma MEG1 endosperm-specific promoters and genes
US7700836B2 (en) 2007-08-13 2010-04-20 Pioneer Hi-Bred International, Inc. Seed-preferred regulatory elements
US7847160B2 (en) 2007-08-15 2010-12-07 Pioneer Hi-Bred International, Inc. Seed-preferred promoters
US20090049571A1 (en) 2007-08-15 2009-02-19 Pioneer Hi-Bred International, Inc. Seed-Preferred Promoters
US7964770B2 (en) 2007-09-28 2011-06-21 Pioneer Hi-Bred International, Inc. Seed-preferred promoter from Sorghum kafirin gene
US20090089897A1 (en) 2007-09-28 2009-04-02 Pioneer Hi-Bred International, Inc. Seed-Preferred Promoters
WO2009094704A1 (en) 2008-01-31 2009-08-06 The University Of Adelaide Seed specific expression in plants
WO2010019996A1 (en) 2008-08-18 2010-02-25 Australian Centre For Plant Functional Genomics Pty Ltd Seed active transcriptional control sequences
US20110321187A1 (en) 2008-11-25 2011-12-29 Algentech Sas Plant plastid transformation method
US20110296551A1 (en) 2008-11-25 2011-12-01 Algentech Sas Plant mitochondria transformation method
US20100281570A1 (en) 2009-05-04 2010-11-04 Pioneer Hi-Bred International, Inc. Maize 18kd oleosin seed-preferred regulatory element
US20100281569A1 (en) 2009-05-04 2010-11-04 Pioneer Hi-Bred International, Inc. Maize 17kd oleosin seed-preferred regulatory element
WO2010129999A1 (en) 2009-05-13 2010-11-18 Molecular Plant Breeding Nominees Ltd Plant promoter operable in basal endosperm transfer layer of endosperm and uses thereof
US20120066795A1 (en) 2009-05-13 2012-03-15 Basf Plant Science Company Gmbh Plant Promoter Operable in Basal Endosperm Transfer Layer of Endosperm and Uses Thereof
WO2010147825A1 (en) 2009-06-09 2010-12-23 Pioneer Hi-Bred International, Inc. Early endosperm promoter and methods of use
US20100313301A1 (en) 2009-06-09 2010-12-09 Pioneer Hi-Bred International, Inc. Early Endosperm Promoter and Methods of Use
US20160138008A1 (en) 2012-05-25 2016-05-19 The Regents Of The University Of California Methods and compositions for rna-directed target dna modification and for rna-directed modulation of transcription
US20160208243A1 (en) 2015-06-18 2016-07-21 The Broad Institute, Inc. Novel crispr enzymes and systems

Non-Patent Citations (119)

* Cited by examiner, † Cited by third party
Title
ALTSCHUL ET AL., J. MOL. BIOL., vol. 215, 1990, pages 403
ALTSCHUL ET AL., NUCLEIC ACIDS RES., vol. 25, 1997, pages 3389
AUSUBEL ET AL.: "Current Protocols in Molecular Biology", 2003, JOHN WILEY & SONS
BEERLI ET AL., NAT. BIOTECHNOL., vol. 20, 2002, pages 135 - 141
BELFORT ET AL., NUCLEIC ACIDS RES., vol. 25, 1997, pages 3379 - 3388
BOGUSZ ET AL., PLANT CELL, vol. 2, no. 7, 1990, pages 633 - 641
BORONAT, A. ET AL., PLANT SCI., vol. 47, 1986, pages 95 - 102
BURSTEIN ET AL., NATURE, 2016, Retrieved from the Internet <URL:http://dx.doi.org10.1038/nature20159>
BYTEBIER ET AL., PROC. NATL. ACAD. SCI. USA, vol. 84, 1987, pages 5345 - 5349
CANEVASCINI ET AL., PLANT PHYSIOL., vol. 112, no. 2, 1996, pages 513 - 524
CAPANA ET AL., PLANT MOL. BIOL., vol. 25, no. 4, 1994, pages 681 - 691
CARRIE ET AL., FEBS J, vol. 276, 2009, pages 1187 - 1195
CARRIE; SMALL, BIOCHIM BIOPHYS ACTA, vol. 1833, 2013, pages 253 - 259
CHOO ET AL., CURR. OPIN. STRUCT. BIOL., vol. 10, 2000, pages 411 - 416
CHRISTOU ET AL., PLANT PHYSIOL., vol. 87, 1988, pages 671 - 674
CHRISTOU; FORD: "Annals of Botany", vol. 75, 1995, pages: 407 - 413
CORPET ET AL., NUCLEIC ACIDS RES., vol. 16, 1988, pages 10881 - 90
CROSSWAY ET AL., BIOTECHNIQUES, vol. 4, 1986, pages 320 - 334
DATTA ET AL., BIOTECHNOLOGY, vol. 8, 1990, pages 736 - 740
DE WET ET AL.: "The Experimental Manipulation of Ovule Tissues", 1985, LONGMAN, pages: 197 - 209
D'HALLUIN ET AL., PLANT CELL, vol. 4, 1992, pages 1495 - 1505
DOYON ET AL., NAT. BIOTECHNOL., vol. 26, 2008, pages 702 - 708
EMBO J., vol. 8, no. 2, pages 343 - 350
FINER; MCMULLEN, IN VITRO CELL DEV. BIOL., vol. 27P, 1991, pages 175 - 182
FROMM ET AL., BIOTECHNOLOGY, vol. 8, 1990, pages 833 - 839
GATZ ET AL., MOL. GEN. GENET., vol. 227, 1991, pages 229 - 237
GLASER ET AL., PLANT MOL BIOL, vol. 38, 1998, pages 311 - 338
GOMEZ; PALLAS, PLOS ONE, vol. 5, 2010, pages E12269
GOTOR ET AL., PLANT J, vol. 3, 1993, pages 509 - 18
GUEVARA-GARCIA ET AL., PLANT J., vol. 4, no. 3, 1993, pages 495 - 505
HANSEN ET AL., MOL. GEN GENET, vol. 254, no. 3, 1997, pages 337 - 343
HARMSTON; LENHARD, NUCLEIC ACIDS RES, vol. 41, 2013, pages 7185 - 7199
HERRMANN; NEUPERT, IUBMB LIFE, vol. 55, 2003, pages 219 - 225
HIGGINS ET AL., CABIOS, vol. 5, 1989, pages 151 - 153
HIGGINS ET AL., GENE, vol. 73, 1988, pages 237 - 244
HIRE ET AL., PLANT MOL. BIOL., vol. 20, no. 2, 1992, pages 207 - 218
HOOYKAAS-VAN SLOGTEREN ET AL., NATURE (LONDON), vol. 311, 1984, pages 763 - 764
HUANG ET AL., CABIOS, vol. 8, 1992, pages 155 - 65
ISALAN ET AL., NAT. BIOTECHNOL., vol. 19, 2001, pages 656 - 660
KAEPPLER ET AL., PLANT CELL REPORTS, vol. 9, 1990, pages 415 - 418
KAEPPLER ET AL., THEOR. APPL. GENET., vol. 84, 1992, pages 560 - 566
KARLIN; ALTSCHUL, PROC. NATL. ACAD. SCI. USA, vol. 87, 1990, pages 2264 - 2268
KARLIN; ALTSCHUL, PROC. NATL. ACAD. SCI. USA, vol. 90, 1993, pages 5873 - 5877
KARVELIS ET AL., GENOME BIOL, vol. 16, 2015, pages 253
KAWAMATA ET AL., PLANT CELL PHYSIOL., vol. 38, no. 7, 1997, pages 792 - 803
KELLER; BAUMGARTNER, PLANT CELL, vol. 3, no. 10, 1991, pages 1051 - 1061
KIRIHARA ET AL., GENE, vol. 71, 1988, pages 359
KLEIN ET AL., BIOTECHNOLOGY, vol. 6, 1988, pages 559 - 563
KLEIN ET AL., PLANT PHYSIOL., vol. 91, 1988, pages 440 - 444
KLEIN ET AL., PROC. NATL. ACAD. SCI. USA, vol. 85, 1988, pages 4305 - 4309
KLOESGEN, R. B. ET AL., MOL. GEN. GENET., vol. 203, 1986, pages 237 - 244
KUNZE; BERGER, FRONT PHYSIOL, 2015
KUSTER ET AL., PLANT MOL. BIOL., vol. 29, no. 4, 1995, pages 759 - 772
KWON ET AL., PLANT PHYSIOL., vol. 105, 1994, pages 357 - 67
LAM, RESULTS PROBL. CELL DIFFER., vol. 20, 1994, pages 181 - 196
LANGE ET AL., J. BIOL. CHEM., vol. 282, 2007, pages 5101 - 5105
LEE ET AL., PLANT MOL BIOL, vol. 57, 2005, pages 805 - 818
LI ET AL., PLANT CELL REPORTS, vol. 12, 1993, pages 250 - 255
LILLEY ET AL.: "Proceedings of the World Congress on Vegetable Protein Utilization in Human Foods and Animal Feedstuffs,", 1989, AMERICAN OIL CHEMISTS SOCIETY, CHAMPAIGN, pages: 497 - 502
LINN ET AL.: "Nucleases", 1993, COLD SPRING HARBOR LABORATORY PRESS
MACKENZIE, TRENDS CELL BIOL, vol. 15, 2005, pages 548 - 554
MATSUOKA ET AL., PROC NATL. ACAD. SCI. USA, vol. 90, no. 20, 1993, pages 9586 - 9590
MATSUOKA ET AL., PROC. NATL. ACAD. SCI. USA, vol. 90, no. 20, 1993, pages 9586 - 9590
MCCABE ET AL., BIOLTECHNOLOGY, vol. 6, 1988, pages 923 - 926
MCCABE ET AL., BIOTECHNOLOGY, vol. 6, 1988, pages 923 - 926
MCCORMICK ET AL., PLANT CELL REPORTS, vol. 5, 1986, pages 81 - 84
MCNELLIS ET AL., PLANT J., vol. 14, no. 2, 1998, pages 247 - 257
MIAO ET AL., PLANT CELL, vol. 3, no. 1, 1991, pages 11 - 22
MURA ET AL., SCIENCE, vol. 23, 1983, pages 476 - 482
MURCHA ET AL., J EXP BOT, vol. 65, 2014, pages 6301 - 6335
MURRAY ET AL., NUCL. ACIDS RES, vol. 17, 1989, pages 477 - 508
MUSUMURA ET AL., PLANT MOL. BIOL., vol. 12, 1989, pages 123
MYERS; MILLER, CABIOS, vol. 4, 1988, pages 11 - 17
NASSOURY; MORSE, BIOCHIM BIOPHYS ACTA, vol. 1743, 2005, pages 5 - 19
NEEDLEMAN; WUNSCH, J. MOL. BIOL., vol. 48, 1970, pages 443 - 453
OROZCO ET AL., PLANT MOL BIOL, vol. 23, no. 6, 1993, pages 1129 - 1138
OROZCO ET AL., PLANT MOL. BIOL., vol. 23, no. 6, 1993, pages 1129 - 1138
OSJODA ET AL., NATURE BIOTECHNOLOGY, vol. 14, 1996, pages 745 - 750
PABO ET AL., ANN. REV. BIOCHEM, vol. 70, 2001, pages 313 - 340
PAL MALIGA: "Chloroplast Biotechnology: Methods and Protocols", 2014
PASZKOWSKI E, EMBO J., vol. 3, 1984, pages 2717 - 2722
PEARSON ET AL., METH. MOL. BIOL, vol. 24, 1994, pages 307 - 331
PEARSON; LIPMAN, PROC. NATL. ACAD. SCI., vol. 85, 1988, pages 2444 - 2448
PEDERSEN ET AL., J. BIOL. CHEM., vol. 261, 1986, pages 6279
PEETERS; SMALL, BIOCHIM BIOPHYS ACTA, vol. 1541, 2001, pages 54 - 63
PLANT SCIENCE (LIMERICK, vol. 79, no. L, pages 69 - 76
REINA, M. ET AL., NUCL. ACIDS RES., vol. 18, no. 21, pages 6426
RIGGS ET AL., PROC. NATL. ACAD. SCI. USA, vol. 83, 1986, pages 5602 - 5606
RINEHART ET AL., PLANT PHYSIOL., vol. 112, no. 3, 1996, pages 1331 - 1341
RUSSELL ET AL., TRANSGENIC RES., vol. 6, no. 2, 1997, pages 157 - 168
SAMBROOK; RUSSELL: "Molecular Cloning: A Laboratory Manual, 3rd ed.", 2001, COLD SPRING HARBOR PRESS
SANFORD ET AL., PARTICULATE SCIENCE AND TECHNOLOGY, vol. 5, 1987, pages 27 - 37
SANGER ET AL., PLANT MOL. BIOL., vol. 14, no. 3, 1990, pages 433 - 443
SANTIAGO ET AL., PROC. NATL. ACAD. SCI. USA, vol. 105, 2008, pages 5809 - 5814
SCHENA ET AL., PROC. NATL. ACAD. SCI. USA, vol. 88, 1991, pages 10421 - 10425
SCHUBERT ET AL., J. BACTERIOL, vol. 170, 1988, pages 5837 - 5847
SEGAL ET AL., CURR. OPIN. BIOTECHNOL, vol. 12, 2001, pages 632 - 637
SENGOPTA-GOPALEN ET AL., PNAS, vol. 82, 1988, pages 3320 - 3324
SHAN ET AL., NATURE PROTOCOLS, vol. 9, 2014, pages 2395 - 2410
SHMAKOV ET AL., MOL CELL, vol. 60, 2016, pages 385 - 397
SHMAKOV ET AL., NAT REV MICROBIOL, 2017
SILVA-FILHO, CURR OPIN PLANT BIOL, vol. 6, 2003, pages 589 - 595
SIMPSON ET AL., EMBO J, vol. 4, 1958, pages 2723 - 2729
SINGH ET AL., THEOR. APPL. GENET., vol. 96, 1998, pages 319 - 324
SMITH ET AL., ADV. APPL. MATH., vol. 2, 1981, pages 482
SODING ET AL., NUCLEIC ACIDS RES, vol. 34, 2006, pages W374 - W378
SOIL, CURR OPIN PLANT BIOL, vol. 5, 2002, pages 529 - 535
SVITASHEV ET AL., NAT COMMUN, 2016
THOMPSON ET AL., NUCLEIC ACID RESEARCH, vol. 22, 1994, pages 4673 - 4680
TIMKO ET AL., NATURE, vol. 318, 1988, pages 57 - 58
TOMES ET AL.: "Plant Cell, Tissue, and Organ Culture: Fundamental Methods", 1995, SPRINGER-VERLAG
VAN CAMP ET AL., PLANT PHYSIOL., vol. 112, no. 2, 1996, pages 525 - 535
WEISSINGER ET AL., ANN. REV. GENET, vol. 22, 1988, pages 421 - 477
WILLIAMSON ET AL., EUR. J. BIOCHEM., vol. 165, 1987, pages 99 - 106
YAMAMOTO ET AL., PLANT CELL PHYSIOL., vol. 35, no. 5, 1994, pages 773 - 778
YAMAMOTO ET AL., PLANT J., vol. 12, no. 2, 1997, pages 255 - 265
ZETSCHE ET AL., CELL, vol. 163, 2015, pages 759 - 771
ZHANG ET AL., GENE, vol. 105, 1991, pages 61 - 72
ZHANG ET AL., J. BIOL. CHEM., vol. 275, no. 43, 2000, pages 33850 - 33860

Cited By (90)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12006520B2 (en) 2011-07-22 2024-06-11 President And Fellows Of Harvard College Evaluation and improvement of nuclease cleavage specificity
US10323236B2 (en) 2011-07-22 2019-06-18 President And Fellows Of Harvard College Evaluation and improvement of nuclease cleavage specificity
US10954548B2 (en) 2013-08-09 2021-03-23 President And Fellows Of Harvard College Nuclease profiling system
US11920181B2 (en) 2013-08-09 2024-03-05 President And Fellows Of Harvard College Nuclease profiling system
US10508298B2 (en) 2013-08-09 2019-12-17 President And Fellows Of Harvard College Methods for identifying a target site of a CAS9 nuclease
US11046948B2 (en) 2013-08-22 2021-06-29 President And Fellows Of Harvard College Engineered transcription activator-like effector (TALE) domains and uses thereof
US11299755B2 (en) 2013-09-06 2022-04-12 President And Fellows Of Harvard College Switchable CAS9 nucleases and uses thereof
US10682410B2 (en) 2013-09-06 2020-06-16 President And Fellows Of Harvard College Delivery system for functional nucleases
US10912833B2 (en) 2013-09-06 2021-02-09 President And Fellows Of Harvard College Delivery of negatively charged proteins using cationic lipids
US10858639B2 (en) 2013-09-06 2020-12-08 President And Fellows Of Harvard College CAS9 variants and uses thereof
US10597679B2 (en) 2013-09-06 2020-03-24 President And Fellows Of Harvard College Switchable Cas9 nucleases and uses thereof
US10465176B2 (en) 2013-12-12 2019-11-05 President And Fellows Of Harvard College Cas variants for gene editing
US11053481B2 (en) 2013-12-12 2021-07-06 President And Fellows Of Harvard College Fusions of Cas9 domains and nucleic acid-editing domains
US11124782B2 (en) 2013-12-12 2021-09-21 President And Fellows Of Harvard College Cas variants for gene editing
US11578343B2 (en) 2014-07-30 2023-02-14 President And Fellows Of Harvard College CAS9 proteins including ligand-dependent inteins
US10704062B2 (en) 2014-07-30 2020-07-07 President And Fellows Of Harvard College CAS9 proteins including ligand-dependent inteins
US12043852B2 (en) 2015-10-23 2024-07-23 President And Fellows Of Harvard College Evolved Cas9 proteins for gene editing
US11214780B2 (en) 2015-10-23 2022-01-04 President And Fellows Of Harvard College Nucleobase editors and uses thereof
EP4063501A1 (en) 2016-02-15 2022-09-28 Benson Hill, Inc. Compositions and methods for modifying genomes
EP4306642A2 (en) 2016-02-15 2024-01-17 Benson Hill Holdings, Inc. Compositions and methods for modifying genomes
US11702651B2 (en) 2016-08-03 2023-07-18 President And Fellows Of Harvard College Adenosine nucleobase editors and uses thereof
US10947530B2 (en) 2016-08-03 2021-03-16 President And Fellows Of Harvard College Adenosine nucleobase editors and uses thereof
US11999947B2 (en) 2016-08-03 2024-06-04 President And Fellows Of Harvard College Adenosine nucleobase editors and uses thereof
US11661590B2 (en) 2016-08-09 2023-05-30 President And Fellows Of Harvard College Programmable CAS9-recombinase fusion proteins and uses thereof
US11542509B2 (en) 2016-08-24 2023-01-03 President And Fellows Of Harvard College Incorporation of unnatural amino acids into proteins using base editing
US12084663B2 (en) 2016-08-24 2024-09-10 President And Fellows Of Harvard College Incorporation of unnatural amino acids into proteins using base editing
WO2018054911A1 (en) 2016-09-23 2018-03-29 Bayer Cropscience Nv Targeted genome optimization in plants
US11306324B2 (en) 2016-10-14 2022-04-19 President And Fellows Of Harvard College AAV delivery of nucleobase editors
US10745677B2 (en) 2016-12-23 2020-08-18 President And Fellows Of Harvard College Editing of CCR5 receptor gene to protect against HIV infection
US11820969B2 (en) 2016-12-23 2023-11-21 President And Fellows Of Harvard College Editing of CCR2 receptor gene to protect against HIV infection
US11898179B2 (en) 2017-03-09 2024-02-13 President And Fellows Of Harvard College Suppression of pain by gene editing
US11542496B2 (en) 2017-03-10 2023-01-03 President And Fellows Of Harvard College Cytosine to guanine base editor
US11268082B2 (en) 2017-03-23 2022-03-08 President And Fellows Of Harvard College Nucleobase editors comprising nucleic acid programmable DNA binding proteins
EP3612204A4 (en) * 2017-04-21 2021-01-27 The General Hospital Corporation Inducible, tunable, and multiplex human gene regulation using crispr-cpf1
US11667677B2 (en) 2017-04-21 2023-06-06 The General Hospital Corporation Inducible, tunable, and multiplex human gene regulation using CRISPR-Cpf1
US11560566B2 (en) 2017-05-12 2023-01-24 President And Fellows Of Harvard College Aptazyme-embedded guide RNAs for use with CRISPR-Cas9 in genome editing and transcriptional activation
US11732274B2 (en) 2017-07-28 2023-08-22 President And Fellows Of Harvard College Methods and compositions for evolving base editors using phage-assisted continuous evolution (PACE)
EP4317443A3 (en) * 2017-08-09 2024-02-28 RiceTec, Inc. Compositions and methods for modifying genomes
JP7355730B2 (en) 2017-08-09 2023-10-03 ベンソン ヒル,インコーポレイティド Compositions and methods for modifying the genome
US11624070B2 (en) 2017-08-09 2023-04-11 Benson Hill, Inc. Compositions and methods for modifying genomes
WO2019030695A1 (en) * 2017-08-09 2019-02-14 Benson Hill Biosystems, Inc. Compositions and methods for modifying genomes
US10837023B2 (en) 2017-08-09 2020-11-17 Benson Hill, Inc. Compositions and methods for modifying genomes
KR20200062184A (en) * 2017-08-09 2020-06-03 벤슨 힐, 인크. Compositions and methods for modifying the genome
KR102631985B1 (en) * 2017-08-09 2024-02-01 라이스텍, 인크. Compositions and methods for modifying the genome
JP2020533963A (en) * 2017-08-09 2020-11-26 ベンソン ヒル,インコーポレイティド Compositions and methods for modifying the genome
US10316324B2 (en) 2017-08-09 2019-06-11 Benson Hill Biosystems, Inc. Compositions and methods for modifying genomes
US11932884B2 (en) 2017-08-30 2024-03-19 President And Fellows Of Harvard College High efficiency base editors comprising Gam
US11319532B2 (en) 2017-08-30 2022-05-03 President And Fellows Of Harvard College High efficiency base editors comprising Gam
US11795443B2 (en) 2017-10-16 2023-10-24 The Broad Institute, Inc. Uses of adenosine base editors
CN108546712A (en) * 2018-04-26 2018-09-18 中国农业科学院作物科学研究所 A method of realizing target gene homologous recombination in plant using CRISPR/LbCpf1 systems
CN108546712B (en) * 2018-04-26 2020-08-07 中国农业科学院作物科学研究所 Method for realizing homologous recombination of target gene in plant by using CRISPR/L bcPf1 system
US11434478B2 (en) 2018-08-09 2022-09-06 Gflas Life Sciences, Inc. Compositions and methods for genome engineering with Cas12a proteins
KR20200018364A (en) * 2018-08-09 2020-02-19 (주)지플러스 생명과학 Novel crispr associated protein and use thereof
KR20200018345A (en) * 2018-08-09 2020-02-19 (주)지플러스 생명과학 Novel crispr associated protein and use thereof
KR102096592B1 (en) * 2018-08-09 2020-04-02 (주)지플러스 생명과학 Novel crispr associated protein and use thereof
KR102096604B1 (en) * 2018-08-09 2020-04-02 (주)지플러스 생명과학 Novel crispr associated protein and use thereof
CN109593743A (en) * 2018-12-12 2019-04-09 广州普世利华科技有限公司 Novel C RISPR/ScCas12a albumen and preparation method thereof
WO2020168315A1 (en) 2019-02-15 2020-08-20 Just-Evotec Biologics, Inc. Automated biomanufacturing systems, facilities, and processes
US11795452B2 (en) 2019-03-19 2023-10-24 The Broad Institute, Inc. Methods and compositions for prime editing nucleotide sequences
US11447770B1 (en) 2019-03-19 2022-09-20 The Broad Institute, Inc. Methods and compositions for prime editing nucleotide sequences
US11643652B2 (en) 2019-03-19 2023-05-09 The Broad Institute, Inc. Methods and compositions for prime editing nucleotide sequences
WO2020234468A1 (en) 2019-05-23 2020-11-26 Nomad Bioscience Gmbh Rna viral rna molecule for gene editing
US11746352B2 (en) 2019-12-30 2023-09-05 Eligo Bioscience Microbiome modulation of a host by delivery of DNA payloads with minimized spread
US11584781B2 (en) 2019-12-30 2023-02-21 Eligo Bioscience Chimeric receptor binding proteins resistant to proteolytic degradation
US12098372B2 (en) 2019-12-30 2024-09-24 Eligo Bioscience Microbiome modulation of a host by delivery of DNA payloads with minimized spread
EP3878958A1 (en) 2020-03-11 2021-09-15 B.R.A.I.N. Biotechnology Research And Information Network AG Crispr-cas nucleases from cpr-enriched metagenome
WO2021180359A1 (en) 2020-03-11 2021-09-16 B.R.A.I.N. Biotechnology Research And Information Network Ag Crispr-cas nucleases from cpr-enriched metagenome
US11912985B2 (en) 2020-05-08 2024-02-27 The Broad Institute, Inc. Methods and compositions for simultaneous editing of both strands of a target double-stranded nucleotide sequence
US12031126B2 (en) 2020-05-08 2024-07-09 The Broad Institute, Inc. Methods and compositions for simultaneous editing of both strands of a target double-stranded nucleotide sequence
WO2021250284A1 (en) 2020-06-12 2021-12-16 Eligo Bioscience Specific decolonization of antibiotic resistant bacteria for prophylactic purposes
EP3922719A1 (en) 2020-06-12 2021-12-15 Eligo Bioscience Specific decolonization of antibiotic resistant bacteria for prophylactic purposes
WO2022003209A1 (en) 2020-07-03 2022-01-06 Eligo Bioscience Method of containment of nucleic acid vectors introduced in a microbiome population
WO2022017633A3 (en) * 2020-07-21 2022-03-10 BRAIN Biotech AG Novel, non-naturally occurring crispr-cas nucleases for genome editing
EP4279597A3 (en) * 2020-07-21 2024-04-24 BRAIN Biotech AG Novel, non-naturally occurring crispr-cas nucleases for genome editing
EP4279597A2 (en) 2020-07-21 2023-11-22 BRAIN Biotech AG Novel, non-naturally occurring crispr-cas nucleases for genome editing
WO2022017633A2 (en) 2020-07-21 2022-01-27 BRAIN Biotech AG Novel, non-naturally occurring crispr-cas nucleases for genome editing
EP3943600A1 (en) 2020-07-21 2022-01-26 B.R.A.I.N. Biotechnology Research And Information Network AG Novel, non-naturally occurring crispr-cas nucleases for genome editing
WO2022101286A1 (en) 2020-11-11 2022-05-19 Leibniz-Institut Für Pflanzenbiochemie Fusion protein for editing endogenous dna of a eukaryotic cell
WO2022144382A1 (en) 2020-12-30 2022-07-07 Eligo Bioscience Chimeric receptor binding proteins resistant to proteolytic degradation
WO2022144381A1 (en) 2020-12-30 2022-07-07 Eligo Bioscience Microbiome modulation of a host by delivery of dna payloads with minimized spread
WO2022165001A1 (en) 2021-01-29 2022-08-04 Merck Sharp & Dohme Llc Compositions of programmed death receptor 1 (pd-1) antibodies and methods of obtaining the compositions thereof
US12110330B2 (en) 2021-01-29 2024-10-08 Merck Sharp & Dohme Llc Compositions of programmed death receptor 1 (PD-1) antibodies and methods of obtaining the compositions thereof
WO2022184765A1 (en) 2021-03-02 2022-09-09 BRAIN Biotech AG NOVEL CRISPR-Cas NUCLEASES FROM METAGENOMES
US11952595B2 (en) 2021-05-12 2024-04-09 Eligo Bioscience Production of lytic phages
US11939598B2 (en) 2021-05-12 2024-03-26 Eligo Bioscience Production bacterial cells and use thereof in production methods
US11697802B2 (en) 2021-05-12 2023-07-11 Eligo Bioscience Production bacterial cells and use thereof in production methods
WO2022238552A1 (en) 2021-05-12 2022-11-17 Eligo Bioscience Production bacterial cells and use thereof in production methods
WO2022238555A1 (en) 2021-05-12 2022-11-17 Eligo Bioscience Production of lytic phages
US11739304B2 (en) 2021-05-12 2023-08-29 Eligo Bioscience Production of lytic phages
WO2023287707A1 (en) 2021-07-15 2023-01-19 Just-Evotec Biologics, Inc. Bidirectional tangential flow filtration (tff) perfusion system

Also Published As

Publication number Publication date
US20180148735A1 (en) 2018-05-31
EP3307884A2 (en) 2018-04-18
MX2023014014A (en) 2023-12-11
EP4306642A2 (en) 2024-01-17
CN109312316B (en) 2022-10-14
US9896696B2 (en) 2018-02-20
MY197523A (en) 2023-06-20
IL308791A (en) 2024-01-01
CA3014988A1 (en) 2017-08-24
AU2017220789B2 (en) 2023-06-08
JP2019504649A (en) 2019-02-21
IL261082A (en) 2018-11-04
US10113179B2 (en) 2018-10-30
PH12018501722A1 (en) 2019-05-20
MX2018009761A (en) 2018-11-29
CN115927440A (en) 2023-04-07
AU2017220789A1 (en) 2018-08-30
EP3307884B1 (en) 2024-01-03
JP2022184892A (en) 2022-12-13
US20170233756A1 (en) 2017-08-17
EP4306642A3 (en) 2024-07-03
AU2023270322A1 (en) 2023-12-14
CN109312316A (en) 2019-02-05
EP3307884C0 (en) 2024-01-03
KR20230165368A (en) 2023-12-05
ES2973207T3 (en) 2024-06-19
BR112018016408A2 (en) 2018-12-18
JP2024028753A (en) 2024-03-05
US20190071688A1 (en) 2019-03-07
CA3221070A1 (en) 2017-08-24
EP4063501A1 (en) 2022-09-28
AU2023226754A1 (en) 2023-09-28
IL304398A (en) 2023-09-01
KR20180107155A (en) 2018-10-01
WO2017141173A3 (en) 2017-11-02

Similar Documents

Publication Publication Date Title
US10113179B2 (en) Compositions and methods for modifying genomes
US11624070B2 (en) Compositions and methods for modifying genomes
US20210180076A1 (en) Compositions and methods for genome editing in plants
EP4025697A1 (en) Compositions and methods for modifying genomes
WO2023119135A1 (en) Compositions and methods for modifying genomes
EP4453199A1 (en) Compositions and methods for modifying genomes
WO2022236071A1 (en) Genome editing in plants using cas12a nucleases
BR122023025443A2 (en) COMPOSITIONS AND METHODS FOR MODIFYING GENOMES

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17707411

Country of ref document: EP

Kind code of ref document: A2

WWE Wipo information: entry into national phase

Ref document number: 2017707411

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 261082

Country of ref document: IL

Ref document number: 3014988

Country of ref document: CA

WWE Wipo information: entry into national phase

Ref document number: MX/A/2018/009761

Country of ref document: MX

ENP Entry into the national phase

Ref document number: 2018561102

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 20187023481

Country of ref document: KR

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2017220789

Country of ref document: AU

Date of ref document: 20170215

Kind code of ref document: A

REG Reference to national code

Ref country code: BR

Ref legal event code: B01A

Ref document number: 112018016408

Country of ref document: BR

ENP Entry into the national phase

Ref document number: 112018016408

Country of ref document: BR

Kind code of ref document: A2

Effective date: 20180810