WO2015168158A1 - Targeted genome editing to modify lignin biosynthesis and cell wall composition - Google Patents

Targeted genome editing to modify lignin biosynthesis and cell wall composition Download PDF

Info

Publication number
WO2015168158A1
WO2015168158A1 PCT/US2015/028057 US2015028057W WO2015168158A1 WO 2015168158 A1 WO2015168158 A1 WO 2015168158A1 US 2015028057 W US2015028057 W US 2015028057W WO 2015168158 A1 WO2015168158 A1 WO 2015168158A1
Authority
WO
WIPO (PCT)
Prior art keywords
plant
cell
sugarcane
comt
tissue
Prior art date
Application number
PCT/US2015/028057
Other languages
French (fr)
Inventor
Fredy Altpeter
Je JUNG
Original Assignee
Fredy Altpeter
Jung Je
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fredy Altpeter, Jung Je filed Critical Fredy Altpeter
Publication of WO2015168158A1 publication Critical patent/WO2015168158A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/82Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
    • C12N15/8241Phenotypically and genetically modified plants via recombinant DNA technology
    • C12N15/8242Phenotypically and genetically modified plants via recombinant DNA technology with non-agronomic quality (output) traits, e.g. for industrial processing; Value added, non-agronomic traits
    • C12N15/8243Phenotypically and genetically modified plants via recombinant DNA technology with non-agronomic quality (output) traits, e.g. for industrial processing; Value added, non-agronomic traits involving biosynthetic or metabolic pathways, i.e. metabolic engineering, e.g. nicotine, caffeine
    • C12N15/8255Phenotypically and genetically modified plants via recombinant DNA technology with non-agronomic quality (output) traits, e.g. for industrial processing; Value added, non-agronomic traits involving biosynthetic or metabolic pathways, i.e. metabolic engineering, e.g. nicotine, caffeine involving lignin biosynthesis
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/82Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
    • C12N15/8201Methods for introducing genetic material into plant cells, e.g. DNA, RNA, stable or transient incorporation, tissue culture methods adapted for transformation
    • C12N15/8213Targeted insertion of genes into the plant genome by homologous recombination

Definitions

  • the present invention relates to methods for reducing lignin content in and/or modifying the lignin profile of a plant, plant cell or plant tissue.
  • the invention further provides plants, cells and/or tissues having reduced lignin content and/or modified lignin profiles as well as products produced from the plants, cells and/or tissues and uses thereof.
  • a potential alternative to fossil fuels for an energy source is lignocellulosic biomass.
  • the sugar fraction in the lignocellulosic biomass is primarily located in the secondary cell wall and can be used for the production of liquid biofuels, such as bioethanol.
  • liquid biofuels such as bioethanol.
  • current techniques for converting lignocellulosic biomass into liquid biofuels are inefficient due in large part to the complexity of the cell wall structure and the presence of lignin therein. As such, there exists the need to develop techniques for improving the conversion of lignocellulosic biomass into liquid biofuels.
  • a method of reducing the lignin content and/or modifying the lignin profile of a sugarcane plant, cell and/or tissue comprising: mutagenizing nucleic acid in a sugarcane plant, cell and/or tissue to produce a stable inactivating deletion or insertion in more than 50% of the alleles and/or copies of said sugarcane plant, cell and/or tissue that encode CoA O-methyltransferase (COMT), thereby reducing the lignin content of the sugarcane plant, cell and/or tissue and/or modifying the lignin profile of the sugarcane plant, cell and/or tissue as compared to a wild type sugarcane plant, cell and/or tissue.
  • COMP CoA O-methyltransferase
  • a genetically modified sugarcane plant, cell and/or tissue comprising: a stable, inactivating deletion or insertion in more than 50% of the alleles and/or copies of alleles encoding CoA O-methyltransferase (COMT) in said sugarcane plant, cell and/or tissue.
  • COMP CoA O-methyltransferase
  • a method of increasing the efficiency of conversion of sugarcane lignocellulosic biomass into biofuel comprising: providing a sugarcane or sugarcane crop of the invention; and converting the lignocellulosic biomass from said sugarcane plant and/or crop into biofuel, thereby increasing the efficiency of conversion of sugarcane lignocellulosic biomass into biofuel as compared to a wild type sugarcane plant or crop.
  • a method of providing an animal feed having increased digestibility comprising: providing a sugarcane plant of the invention or crop of the invention, thereby providing a more readily digestible animal feed.
  • a method of providing an animal feed having increased digestibility comprising: providing a sugarcane plant of the invention or crop of the invention; and converting the lignocellulosic biomass from said plant and/or crop into animal feed, thereby providing a more readily digestible animal feed.
  • Fig. 1 shows the lignin biosynthetic pathway in sugarcane.
  • Fig. 2 shows the mechanism behind modification of lignin biosynthesis using TALEN mediated genome editing to selectively disrupt the CoA O-methyltransferase (COMT) gene in sugarcane.
  • COMP CoA O-methyltransferase
  • Fig. 3 shows one embodiment of a TALEN expression vector used to produce a
  • the TALEN expression vector in Fig. 3 has a DNA sequence corresponding to SEQ ID NO: 14.
  • Fig. 4 shows another embodiment of a TALEN expression vector used to produce a TALEN protein synthesized to recognize specific TALEN binding sites within the genomic sequence corresponding to the sugarcane COMT gene.
  • the TALEN expression vector in Fig. 3 has a DNA sequence corresponding to SEQ ID NO: 15.
  • Fig. 5 shows one embodiment of a minimal T ALEN expression cassette with a heat inducible Cre-loxP mediated self-excision system.
  • the minimal TALEN expression cassette of this figure has a DNA sequence corresponding to SEQ ID NO: 17.
  • Fig. 6 shows one embodiment of a minimal TALEN expression cassette with a heat inducible Cre-loxP mediated self-excision system.
  • the minimal TALEN expression cassette of this figure has a DNA sequence corresponding to SEQ ID NO: 18.
  • Fig. 7 shows one embodiment of a section of the TALEN expression vector of Fig.3 or the minimal expression cassette of Fig. 5 and illustrates the region that a 332 base pair (bp) amplicon that is used for screening cells and plants.
  • the figure shows that the amplified region is located in the NtHSP terminator region of the TALEN expression vector or cassette.
  • This same 332 bp region is amplified to screen cells and plants produced using the TALEN expression vector or minimal TALEN cassette shown in Figs. 4 and 6, respectively.
  • Primers used to amplify this region have sequences corresponding to SEQ ID NO:l and SEQ ID NO:2.
  • Fig. 8 shows the electrophoretic results from a PCR-based screening assay designed to amplify the 332 bp region of the TALEN expression vector or minimal TALEN cassette described and shown in Fig. 7. Presence of the 332 bp amplicon indicates presence of the
  • Fig. 9 shows a portion of the DNA sequence corresponding to sugarcane COMT.
  • Illustrated in Fig. 9 are the location of TALEN specific binding and target sites, primer locations for primers used in screening assays, and BsaHI restriction endonuclease sites within the wild-type sugarcane COMT gene sequence. +1 indicates the transcription start site of the sugarcane COMT gene.
  • Fig. 10 shows electrophoretic results of a PGR screening assay in which the PGR amplicon was digested with BSaHI restriction endonuclease prior to running out on an agarose gel.
  • PGR amplicons from samples from plants with a mutated COMT gene were not cut by the BSaHI restriction endonuclease and produced an amplicon of approximately 125 bp.
  • WT is used to designate lanes containing wild-type samples.
  • M is used to designate the molecular weight marker.
  • the boxes indicate amplicons that were uncut by the BSaHI restriction endonuclease.
  • Fig. 11A-11B shows DNA fragment analysis of a sugarcane amplicon from a wild type sugarcane plant (Fig.
  • Fig. 12A-12I shows the DN A fragment analysis of wild type sugarcane plants (Figs. 12A-12C) and two tissues (Tl and T2) of TALEN induced mutant sugarcane plants
  • Figs.l2D-12I The fragment analysis in Fig. 12A-12I suggests uniform and stable TALEN induced mutation of sugarcane COMT genomic DNA.
  • Fig. 13A-13I shows the DNA fragment analysis of wild type sugarcane plants (Figs. 13A-13C) and two tissues (Tl and T2) of TALEN induced mutant sugarcane plants (Figs. 13D-13I). The variation between Tl and T2 in each mutant line suggests progressing mutagenesis or chimeric events.
  • Fig. 14 shows DNA sequence analysis of COMT of wild type (WT) and TALEN induced mutant (M) sugarcane plants.
  • Fig. 15A t l SC shows sugarcane plants with TALEN induced mutations.
  • the sugarcane plant in Fig. 15A is shown actively growing in soil and producing secondary tillers.
  • Fig. 16A to 16C shows COMT mutants with lignin reduction of more than 20% displayed brown coloration in internodes and mid-rib (Fig. 16A and 16B).
  • Fig. 16C shows that the growth performance of sugarcane mutant lines with up to 22% reduction in lignin did not differ from wild-type under greenhouse conditions.
  • Fig. 17 provides sequence confirmation of TALEN induced COMT mutation in line C I 6 with both significant reduction of lignin and stem diameter compared to WT.
  • COMTa and COMTb are two confirmed COMT homo(eo)logs with a SNP indicated with a red arrow.
  • TALEN binding site is indicted in the boxes. Read length and number of reads with the specific sequences are shown on the right.
  • Fig. 18 provides sequence confirmation of TALEN induced COM ⁇ mutation in line C6 with significant reduction of lignin (22%) but not a significant reduction of stem diameter when compared to WT.
  • COMTa and COMTb are two confirmed COMT homo(eo)logs with a SNP indicated with a red arrow.
  • TALEN binding site indicted with blue boxes. Read length is shown on the right.
  • Fig. 19 shows mutant lines (CI 6, C6, CI 4) representing uniform TALEN induced mutation events.
  • WT original sugarcane without mutation
  • PT, VP indicate primary mutant line and vegetative progeny, respectively.
  • Fig. 20 shows mutant lines (CI 7, C7) representing uniform TALEN induced mutation events.
  • WT original sugarcane without mutation
  • PT VP indicate primary mutant line and vegetative progeny, respectively.
  • Figs. 21 A-21 I show Cre/loxP mediated site specific recombination for excision of TALEN expression cassette from the sugarcane genome.
  • Fig. 21 A shows the TALEN Cre expression cassette ( 12983 bp).
  • Fig. 21B outlines the method treatments.
  • Fig. 21C shows the excision (474 bp) and
  • Fig. 21D shows the confirmation of excision sites.
  • LoxP site for site specific recombination driven by the ere recombinase, 35S poly A: 3'UT , NPTII: selectable marker neomycin phosphotransferase, GmHSP promoter: Heat inducible promoter o the glycine max heat shock protein, Cre: ere recombinase for site specific recombination between the lox P sites, Nos: 3 ' UTR from nopaline synthase, AlHSP terminator: 3 ' UTR from Arabidopsis thaliana heat shock protein, Right TALEN: TALEN arm targeted to conserved region of the sugarcane COMT gene, YLCV promoter: Constitutive promoter of the yellow leaf curl virus, Left TALEN: TALEN arm targeted to conserved region of the sugarcane COMT gene, NtHSP terminator: 3 ' UTR of Nicotiana Tobacco, Lox P: site for site specific recombination driven by the cre recomb
  • first e.g., a first promoter sequence
  • second e.g., a second promoter sequence
  • the term "about,” as used herein when referring to a measurable value such as an amount of a compound, dose, time, temperature, and the like, is meant to encompass variations of 20%, 10%, 5%, 1%, 0.5%, or even 0.1% of the specified amount.
  • phrases such as “between X and Y” and “between about X and Y” should be interpreted to include X and Y.
  • phrases such as “between about X and Y” mean “between about X and about Y” and phrases such as “from about X to Y” mean “from about X to about Y.”
  • the term “decrease,” “inhibit” or “reduce” or grammatical variations thereof as used herein refers to a decrease or diminishment in the specified level or activity of at least about 15%, 25%, 35%, 40%, 50%, 60%, 75%, 80%, 90%, 95% or more. In particular embodiments, the inhibition or reduction results in little or essentially no detectible activity (at most, an insignificant amount, e.g., less than about 10% or even 5%).
  • decreasing or reducing the amount of lignin in a plant, plant cell or plant tissue means decreasing the level of lignin by about 3% to about 30% as compared to the level of lignin in a control plant, plant cell or plant tissue.
  • decreasing or reducing the level of COMT activity in a plant, plant cell or plant tissue means decreasing the level of COMT activity by about 50% to about 100% (e.g., about 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100%, or any range or value therein) as compared to the level of COMT activity in a control plant, plant cell or plant tissue.
  • the terms “increase,” “increases,” “increased,” “increasing,” and similar terms indicate an elevation of at least about 25%, 50%, 75%, 100%, 150%, 200%, 300%, 400%, 500% or more.
  • modulating means an alteration in, for example, the lignin profie of a plant, plant cell or plant tissue as described herein.
  • Modulating can also refer to the expression of a target gene or target polynucleotide by increasing or reducing the expression of said target polynucleotide or target gene.
  • RNA differential production of RNA, including but not limited to mRNA, tRNA, miRNA, siR A, snRNA, and piRNA transcribed from a gene or regulatory region of a genome, or the protein product encoded by a gene as compared to the level of production of RNA or protein by the same gene or regulator region in a normal or a control cell.
  • “differentially expressed” also refers to nucleotide sequences or proteins in a cell or tissue which have different temporal and/or spatial expression profiles as compared to a normal or control cell and/or tissue.
  • RNA or protein product encoded by a gene as compared to the level of expression of the RNA or protein product in a normal or control plant, plant cell and/or plant tissue.
  • underexpressed or underexpression refers to decreased expression level of an RNA or protein product encoded by a gene as compared to the level of expression of the RNA or protein product in a normal or control plant, plant cell and/or plant tissue.
  • express refers to the process by which polynucleotides (e.g., RNA or DNA) are transcribed into RNA transcripts and, optionally, translated into peptides, polypeptides, or proteins.
  • a nucleic acid molecule and/or a nucleotide sequence may express a polypeptide of interest or, for example, a functional untranslated RNA.
  • the recombinant nucleic acid molecules, and/or nucleotide sequences of the invention are "isolated.”
  • isolated means separated from constituents, cellular and otherwise, in which the polynucleotide, peptide, polypeptide, protein, antibody, or fragments thereof, are normally associated with in nature.
  • an "isolated" nucleic acid molecule, an “isolated” nucleotide sequence or an “isolated” polypeptide is a nucleic acid molecule, nucleotide sequence or polypeptide that, by the hand of man, exists apart from its native environment and is therefore not a product of nature (i.e., non-naturally occurring).
  • an isolated nucleic acid molecule, nucleotide sequence or polypeptide may exist in a purified form that is at least partially separated from at least some f the other components of the naturally occurring organism or virus, for example, the cell or viral structural components or other polypeptides or nucleic acids commonly found associated with the polynucleotide.
  • the isolated nucleic acid molecule, the isolated nucleotide sequence and/or the isolated polypeptide is at least about 1%, 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, or more pure.
  • purified is used in reference to a nucleic acid sequence, peptide, or polypeptide or other compound that has increased purity relative to the natural environment.
  • a non-naturally occurring polynucleotide, peptide, polypeptide, protein, antibody, or fragments thereof, does not require "isolation" to distinguish it from its naturally occurring counterpart.
  • an isolated nucleic acid molecule, nucleotide sequence or polypeptide may exist in a non-native environment such as, for example, a recombinant host cell.
  • isolated means that it is separated from the chromosome and/or cell in which it naturally occurs.
  • a polynucleotide is also isolated if it is separated from the chromosome and/or cell in which it naturally occurs in and is then inserted into a genetic context, a chromosome and/or a cell in which it does not naturally occur (e.g., a different host cell, different regulatory sequences, and/or different position in the genome than as found in nature).
  • a chromosome and/or a cell in which it does not naturally occur e.g., a different host cell, different regulatory sequences, and/or different position in the genome than as found in nature.
  • the recombinant nucleic acid molecules, nucleotide sequences and their encoded functional nucleic acids or polypeptides are "isolated" in that, by the hand of man, they exist apart from their native environment and therefore are not products of nature, however, in some embodiments, they can be introduced into and exist in a recombinant host cell.
  • a “native” or “wild type” nucleic acid, nucleotide sequence, polypeptide or amino acid sequence refers to a naturally occurring or endogenous nucleic acid, nucleotide sequence, polypeptide or amino acid sequence.
  • a wild type nucleic acid or a “wild type protein” is a nucleic acid or protein that is naturally occurring in or endogenous to the organism.
  • a “homologous” nucleic acid sequence is a nucleotide sequence naturally associated with a host cell into which it is introduced.
  • nucleic acid can be used interchangeably and encompass both RNA and DNA, including cDNA, genomic DNA, mRNA, synthetic (e.g., chemically synthesized) DNA or RNA and chimeras of RNA and DNA.
  • polynucleotide, nucleotide sequence, or nucleic acid refers to a chain of nucleotides without regard to length of the chain.
  • the nucleic acid can be double-stranded or single-stranded. Where single-stranded, the nucleic acid can be a sense strand or an antisense strand.
  • the nucleic acid can be synthesized using oligonucleotide analogs or derivatives (e.g., inosine or phosphorothioate nucleotides). Such oligonucleotides can be used, for example, to prepare nucleic acids that have altered base-pairing abilities or increased resistance to nucleases.
  • the present invention further provides a nucleic acid that is the complement (which can be either a full complement or a partial complement) of a nucleic acid, nucleotide sequence, or polynucleotide of this invention.
  • Nucleic acid molecules and/or nucleotide sequences provided herein are presented herein in the 5' to 3' direction, from left to right and are represented using the standard code for representing the nucleotide characters as set forth in the U.S. sequence rules, 37 CFR ⁇ 1.821 - 1.825 and the World Intellectual Property Organization (WIPO) Standard ST.25.
  • WIPO World Intellectual Property Organization
  • concentrate refers to a molecule, including but not limited to a polynucleotide, peptide, polypeptide, protein, antibody, or fragments thereof, that is distinguishable from its naturally occurring counterpart in that the concentration or number of molecules per volume is greater than that of its naturally occurring counterpart.
  • diluted refers to a molecule, including but not limited to a polynucleotide, peptide, polypeptide, protein, antibody, or fragments thereof, that is distinguishable from its naturally occurring counterpart in that the concentration or number o molecules per volume is less than that of its naturally occurring counterpart.
  • separated refers to the state of being physically divided from the original source or population such that the separated compound, agent, particle, or molecule can no longer be considered part of the original source or population.
  • operative when referring to a first nucleic acid sequence that is operatively linked to a second nucleic acid sequence, means a situation when the first nucleic acid sequence is placed in a functional relationship with the second nucleic acid sequence.
  • a promoter and/or other transcriptional control elements e.g., enhancers, termination elements and the like
  • a DNA “promoter” is an untranslated DNA sequence upstream of a coding region that contains the binding site for RNA polymerase and initiates transcription of the DNA.
  • promoter includes any sequence capable of driving transcription of a coding sequence.
  • the term “promoter” as used herein refers to a DNA sequence generally described as the 5' regulator region of a gene, located proximal to the start codon. The transcription of an adjacent coding sequence(s) is initiated at the promoter region.
  • the term “promoter " also includes fragments of a promoter that arc functional in initiating/driving transcription of the gene.
  • a “promoter region” can also include other elements that act as regulators of gene expression.
  • Promoters can include, for example, constitutive, inducible, temporally regulated, developmentally regulated, chemically regulated, tissue-preferred and tissue-specific promoters for use in the preparation of recombinant nucleic acid molecules, i.e.. chimeric genes.
  • a "promoter " useful with the invention is a promoter capable of initiating transcription of a nucleotide sequence in a cell of a plant.
  • plasmid refers to a non-chromosomal double- stranded DNA sequence including an intact "replicon” such that the plasmid is replicated in a host cell.
  • sequence identity refers to the extent to which two optimally aligned polynucleotide or peptide sequences are invariant throughout a window of alignment of components, e.g. , nucleotides or amino acids. "Identity” can be readily calculated by known methods including, but not limited to, those described in: Computational Molecular Biology (Lesk, A. M., ed.) Oxford University Press, New York (1988); Biocomputing: Informatics and Genome Projects (Smith, D. W., ed.) Academic Press, New York (1993); Computer Analysis of Sequence Data, Part I (Griffin, A. M., and Griffin, H.
  • Identity refers to the degree of similarity between two nucleic acid or amino acid sequences.
  • sequence comparison typically one sequence acts as a reference sequence to which test sequences are compared.
  • sequence comparison algorithm When using a sequence comparison algorithm, test and reference sequences are input into a computer, subsequence coordinates are designated if necessary, and sequence algorithm program parameters are designated. The sequence comparison algorithm then calculates the percent sequence identity for the test sequence(s) relative to the reference sequence, based on the designated program parameters.
  • the percent identity between two sequences can be determined by using analysis software (e.g., Sequence Analysis Software Package of the Genetics Computer Group, Madison Wis.) that incorporates the Needelman and Wunsch, (J. Mol. Biol., 1970, 48: 443- 453,) algorithm (e.g., NBLAST, and XBLAST). The default parameters are used to determine the identity for the polypeptides of the present disclosure.
  • substantially identical in the context of two nucleic acids or two amino acid sequences, refers to two or more sequences or subsequences that have at least about 50% nucleotide or amino acid residue identity when compared and aligned for maximum correspondence as measured using one of the following sequence comparison algorithms or by visual inspection.
  • substantially identical sequences have at least about 60%, or at least about 70%, or at least about 80%, or even at least about 90% or 95% nucleotide or amino acid residue identity.
  • substantial identity exists over a region of the sequences that is at least about 50 residues in length, or over a region of at least about 100 residues, or the sequences are substantially identical over at least about 150 residues.
  • the sequences are substantially identical when they are identical over the entire length of the coding regions.
  • homologous in the context of the invention refers to the level of similarity between nucleic acid or amino acid sequences in terms of nucleotide or amino acid identity or similarity, respectively, i.e., sequence similarity or identity.
  • homologue, and homologous also refers to the concept of similar functional properties among different nucleic acids or proteins.
  • Homologues include genes that are orthologous and paralogous. Homologues can be determined by using the coding sequence for a gene, disclosed herein or found in appropriate database (such as that at NCBI or others) in one or more of the following ways. For an amino acid sequence, the sequences should be compared using algorithms (for instance see section on "identity” and "substantial identity”).
  • the sequence of one DNA molecule can be compared to the sequence of a known or putative homologue in much the same way.
  • Homologues are at least 20% identical, or at least 30% identical, or at least 40% identical, or at least 50% identical, or at least 60% identical, or at least 70% identical, or at least 80% identical, or at least 88% identical, or at least 90% identical, or at least 92% identical, or at least 95% identical, across any substantial region of the molecule (DNA, RNA. or protein molecule).
  • Optimal alignment of sequences for comparison can be conducted, e.g., by the local homology algorithm of Smith & Waterman, Adv. Appl. Math. 2: 482 (1981), by the homology alignment algorithm of Needleman & Wunsch, J Mol. Biol. 48: 443 (1970), by the search for similarity method of Pearson & Lipman, Proc. Nat'l. Acad. Sci. USA 85: 2444 (1988), by computerized implementations of these algorithms (GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics Computer Group, 575 Science Dr., Madison, WI), or by visual inspection (see generally, Ausubel et al, infra).
  • HSPs high scoring sequence pairs
  • initial neighborhood word hits act as seeds for initiating searches to find longer HSPs containing them.
  • the word hits are then extended in both directions along each sequence for as far as the cumulative alignment score can be increased. Cumulative scores are calculated using, for nucleotide sequences, the parameters M (reward score for a pair of matching residues; always > 0) and N (penalty score for mismatching residues; always ⁇ 0). For amino acid sequences, a scoring matrix is used to calculate the cumulative score. Extension of the word hits in each direction are halted when the cumulative alignment score falls off by the quantity X from its maximum achieved value, the cumulative score goes to zero or below due to the accumulation of one or more negative-scoring residue alignments, or the end of either sequence is reached.
  • the BLAST algorithm parameters W, T, and X determine the sensitivity and speed of the alignment.
  • the BLASTP program uses as defaults a wordlength (W) of 3, an expectation (E) of 10, and the BLOSUM62 scoring matrix (see I lenikoff & HenikolT, Proc. Natl. Acad. Sci. USA 89: 10915 (1989)).
  • the BLAST algorithm In addition to calculating percent sequence identity, the BLAST algorithm also performs a statistical analysis of the similarity between two sequences (see, e.g., Karlin & Altschul, Proc. Nat'l. Acad. Sci. USA 90: 5873-5787 (1993)).
  • One measure of similarity provided by the BLAST algorithm is the smallest sum probability (P(N)), which provides an indication of the probability by which a match between two nucleotide or amino acid sequences would occur by chance.
  • a test nucleic acid sequence is considered similar to a reference sequence if the smallest sum probability in a comparison of the test nucleic acid sequence to the reference nucleic acid sequence is less than about 0.1, more preferably less than about 0.01, and most preferably less than about 0.001.
  • CLUSTALW vl .6 Another widely used and accepted computer program for performing sequence alignments is CLUSTALW vl .6 (Thompson, et al. Nuc. Acids Res., 22: 4673-4680, 1994).
  • the number of matching bases or amino acids is divided by the total number of bases or amino acids, and multiplied by 100 to obtain a percent identity. For example, if two 580 base pair sequences had 145 matched bases, they would be 25 percent identical. If the two compared sequences are of different lengths, the number of matches is divided by the shorter of the two lengths. For example, if there were 100 matched amino acids between a 200 and a 400 amino acid protein, they are 50 percent identical with respect to the shorter sequence.
  • two nucleotide sequences can also be considered to be substantially complementary when the two sequences hybridize to each other under stringent conditions. In some representative embodiments, two nucleotide sequences considered to be substantially complementary hybridize to each other under highly stringent conditions.
  • the term “substantially complementary” means that two nucleic acid sequences are at least about 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%. 98%, 99% or more complementary.
  • the term “substantially complementary” can mean that two nucleic acid sequences can hybridize together under high stringency conditions (as described herein).
  • substantially complementary means about 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% complementary, or any value or range therein).
  • stringent conditions include reference to conditions under which a nucleic acid will selectively hybridize to a target sequence to a detectably greater degree than other sequences (e.g., at least 2-fold over a non-target sequence), and optionally may substantially exclude binding to non-target sequences.
  • Stringent conditions are sequence-dependent and will vary under different circumstances. By controlling the stringency of the hybridization and/or washing conditions, target sequences can be identified that can be up to 100% complementary to the reference nucleotide sequence. Alternatively, conditions of moderate or even low stringency can be used to allow some mismatching in sequences so that lower degrees of sequence similarity are detected.
  • primers or probes can be used under conditions of high, moderate or even low stringency.
  • conditions of low or moderate stringency can be advantageous to detect homolog, ortholog and/or paralog sequences having lower degrees of sequence identity than would be identified under highly stringent conditions.
  • T m 81.5 ° C+16.6 (log M)+0.41 (% GC)-0.61 (% formamide)-500/L; where M is the molarity of monovalent cations, % GC is the percentage of guanosine and cytosine nucleotides in the DNA, % formamide is the percentage of formamide in the hybridization solution, and L is the length of the hybrid in base pairs.
  • the T m is the temperature (under defined ionic strength and pH) at which 50% of a complementary target sequence hybridizes to a perfectly matched probe.
  • T m is reduced by about 1 ° C for each 1% of mismatching; thus, T m , hybridization and/or wash conditions can be adjusted to hybridize to sequences of the desired degree of identity. For example, if sequences with >90% identity are sought, the T m can be decreased 10 ° C.
  • stringent conditions are selected to be about 5 ° C lower than the thermal melting point (T m ) for the specific sequence and its complement at a defined ionic strength and pH.
  • highly stringent conditions can utilize a hybridization and/or wash at the thermal melting point (T m ) or 1 , 2, 3 or 4 ° C lower than the thermal melting point (T m ); moderately stringent conditions can utilize a hybridization and/or wash at 6, 7, 8, 9 or 10 ° C lower than the thermal melting point (T m ); low stringency conditions can utilize a hybridization and/or wash at 1 1, 12. 13. 14, 15 or 20 ° C lower than the thermal melting point (T m ). If the desired degree of mismatching results in a T m of less than 45 ° C (aqueous solution) or 32"C (formamide solution), optionally the SSC concentration can be increased so that a higher temperature can be used.
  • stringent conditions are those in which the salt concentration is less than about 1.5 M Na ion, typically about 0.01 to 1.0 M Na ion concentration (or other salts) at about pH 7.0 to pH 8.3 and the temperature is at least about 30 ' C for short probes (e.g. , 10 to 50 nucleotides) and at least about 60 ° C for longer probes (e.g., greater than 50 nucleotides).
  • Stringent conditions may also be achieved with the addition of destabilizing agents such as formamide or Denhardt's (5 g Ficoll, 5 g polyvinylpyrrolidone, 5 g bovine serum albumin in 500 ml of water).
  • Exemplary moderate stringency conditions include hybridization in 40% to 45% formamide. 1 M NaCl, 1% SDS at 37 " C and a wash in 0.5X to I X SSC at 55 ° C to 60°C.
  • Exemplary high stringency conditions include hybridization in 50% formamide, 1 M NaCl, 1 % SDS at 37 " C and a wash in 0.1X SSC at 60 ° C to 65 ° C.
  • a further non-limiting example of high stringency conditions include hybridization in 4X SSC, 5X Denhardt's, 0.1 mg/ml boiled salmon sperm DNA, and 25 mM Na phosphate at 65"C and a wash in 0.1X SSC, 0.1% SDS at 65 ° C.
  • specificity is typically a function of post-hybridization washes, the relevant factors being the ionic strength and temperature of the final wash solution.
  • Nucleic acids that do not hybridize to each other under stringent conditions are still substantially identical if the proteins that they encode are substantially identical (e.g., due to the degeneracy of the genetic code).
  • polypeptides or "proteins” are amino acid residue sequences. Those sequences are written left to right in the direction from the amino to the carboxy terminus. In accordance with standard nomenclature, amino acid residue sequences are denominated by either a three letter or a single letter code as indicated as follows: Alanine (Ala, A), Arginine (Arg, R), Asparagine (Asn, N), Aspartic Acid (Asp, D), Cysteine (Cys, C), Glutamine (Gin, Q).
  • Glutamic Acid Glu, E
  • Glycine Gly, G
  • Histidine Histidine
  • Isoleiicine lie, I
  • Leucine Leu, L
  • Lysine Lysine
  • Methionine Met, M
  • Phenylalanine Phe, F
  • Proline Pro, P
  • Serine Serine
  • S Threonine
  • Thr Threonine
  • Trp Tryptophan
  • Tyrosine Tyr, Y
  • Valine Val, V
  • transformation refers to the introduction of an exogenous/heterologous nucleic acid (RNA and/or DNA) into a host cell.
  • a cell has been “transformed,” “transfected” or “transduced” with an exogenous/heterologous nucleic acid when such nucleic acid has been introduced or delivered into the cell.
  • transgenic refers to a plant, plant part or plant cell that comprises one or more exogenous nucleic acids.
  • the exogenous nucleic acid is stably integrated within the genome such that the polynucleotide is passed on to successive generations.
  • the exogenous nucleic acid may be integrated into the genome alone or as part of a recombinant expression cassette.
  • Transgenic may be used to designate any plant, plant part or plant cell the genotype of which has been altered by the presence of an exogenous nucleic acid, including those transgenics initially so altered and those created by sexual crosses or asexual propagation from the initial transgenic.
  • transgenic does not encompass the alteration of the genome (chromosomal or extra-chromosomal) by conventional plant breeding methods or by naturally occurring events such as random cross-fertilization, non- recombinant viral infection, non-recombinant bacterial transformation, non-recombinant transposition or spontaneous mutation. Additionally, the term “transgenic” does not encompass plants, plant cells, or plant tissues comprising only a TALEN induced mutation and not the TALEN nucleic acid construct (e.g., mutant plants, plant cells, or plant tissues that were transiently transformed or in which the TALEN nucleic acid is removed from the genome following mutagenesis).
  • "Introducing,” in the context o a polynucleotide means presenting the polynucleotide to the plant, plant part, and/or plant cell in such a manner that the polynucleotide gains access to the interior of a cell.
  • these polynucleotides can be assembled as part of a single polynucleotide or nucleic acid construct, or as separate polynucleotides or nucleic acid constructs, and can be located on the same or different transformation vectors. Accordingly, these polynucleotides can be introduced into plant cells in a single transformation event, in separate transformation events, or, e.g., as part of a breeding protocol.
  • "introducing" can encompass transformation of an ancestor plant with a nucleotide sequence of interest followed by conventional breeding process to produce progeny comprising said nucleotide sequence of interest.
  • Transformation of a cell may be stable or transient.
  • Transient transformation in the context of a polynucleotide means that a polynucleotide is introduced into the cell and does not integrate into the genome of the cell.
  • stable transformation or “stably transformed,” “stably introducing,” or “stably introduced” as used herein means that a polynucleotide is introduced into a cell and integrates into the genome of the cell.
  • the integrated polynucleotide is capable of being inherited by the progeny thereof, more particularly, by the progeny of multiple successive generations.
  • Gene as used herein also includes the nuclear and the plastid genome, and therefore includes integration of the polynucleotide into, for example, the chloroplast genome.
  • Stable transformation as used herein can also refer to a transgene that is maintained extrachromasomally, for example, as a minichromosome.
  • a nucleic construct for targeted mutagenesis may be transiently transformed introduced into a plant, plant cell or plant tissue.
  • a nucleic construct for targeted mutagenesis may be stably transformed/introduced into a plant, plant cell or plant tissue.
  • a nucleic construct for targeted mutagenesis may be stably transformed into a plant, plant cell or plant tissue and later deleted from the genome of the plant, plant cell or plant tissue.
  • Any method for removal of a integrated nucleic acid construct can be used to remove a TALEN nucleic acid construct (e.g. expression cassette or vector).
  • a stably integrated nucleic acid construct may be removed using the well known technique of site specific recombination system (e.g., cre/lox, flp/frt) (see, e.g., Example 8).
  • site specific recombination system e.g., cre/lox, flp/frt
  • segregation after sexual reproduction/crossing may be used.
  • Crossing to remove integrated TALEN nucleic acid constructs may be particularly useful when developing new germplasm and the mutations are to be introgressed into new germplasm.
  • Transient transformation may be detected by, for example, an enzyme-linked immunosorbent assay (ELISA) or Western blot, which can detect the presence of a peptide or polypeptide encoded by one or more transgene introduced into an organism.
  • Stable transformation of a cell can be detected by, for example, a Southern blot hybridization assay of genomic DNA of the cell with nucleic acid sequences which specifically hybridize with a nucleotide sequence of a transgene introduced into an organism (e.g., a plant).
  • Stable transformation of a cell can be detected by, for example, a Northern blot hybridization assay of RNA of the cell with nucleic acid sequences which specifically hybridize with a nucleotide sequence of a transgene introduced into a plant or other organism or by quantitative reverse transcription and polymerase chain reaction (qRT-PCR).
  • Stable transformation of a cell can also be detected by, e.g., a polymerase chain reaction (PCR) or other amplification reactions as are well known in the art, employing specific primer sequences that hybridize with target sequence(s) of a transgene, resulting in amplification of the transgene sequence, which can be detected according to standard methods Transformation can also be detected by direct sequencing and/or hybridization protocols well known in the art.
  • PCR polymerase chain reaction
  • a "transgene” refers to a nucleic acid which is used to transform a cell of an organism, such as a bacterium or a plant.
  • transgenic refers to a cell, tissue, or organism that contains a transgene.
  • heterologous refers to a nucleic acid or polypeptide that originates from a foreign species, or, if from the same species, is substantially modified from its native form in composition and/or genomic locus by deliberate human intervention.
  • the term "recombinant” generally refers to a non-natural ly occurring nucleic acid, nucleic acid construct, or polypeptide.
  • Such non-naturally occurring nucleic acids may include natural nucleic acids that have been modified, for example that have deletions, substitutions, inversions, insertions, etc., and/or combinations of nucleic acid sequences of different origin that are joined using molecular biology technologies (e.g., a nucleic acid sequences encoding a "fusion protein").
  • Recombinant also refers to the polypeptide encoded by the recombinant nucleic acid.
  • Non-naturally occurring nucleic acids or polypeptides include nucleic acids and polypeptides modified by man.
  • targeted mutagenesis refers to mutagenesis procedures that alter a specific or targeted gene in vivo and produce a change in the genetic structure directed at a specific site (e.g., a COMT gene) on the chromosome.
  • a specific site e.g., a COMT gene
  • targeted mutagenesis is distinguished from naturally occurring mutations and from radiation or chemical mutagenesis which are non-specific methods of generating mutations with the vast majority of events carrying random mutations in off-target sites.
  • Targeted mutagenesis is also distinguished from site-directed mutagenesis in which analogs of nucleotides and other chemicals are used to generate localized point mutations.
  • An example of targeted mutagenesis as used herein is the art-known technique using TALEN.
  • TALEN utilizes a chimeric nuclease comprising programmable, sequence-specific DNA-binding modules linked to a nonspecific DNA cleavage domain (See, e.g., Gaj et al., (2013) Trends Biotech. 31 :397-405).
  • Targeted knockout mutations can be similarly achieved with CRISPR/CAS, zinc finger nuclease, meganuclease technology or in vivo site specific mutagenesis using oligonucleotides.
  • random, chemical, radiation or natural mutational events would not be able to generate the COMT mutated plants, cells or tissues of this invention due to the high level of redundancy in the highly polyploidy sugarcane genome.
  • targeted mutagenesis comprising TALEN methods can be carried out using nucleic acid constructs introduced using any method for transforming or transfecting a plant, plant cell or plant tissue.
  • targeted mutagenesis comprising TALEN methods can be carried out using polypeptide sequences, which can be introduced into a plant, plant cell or plant tissue using, for example, biolistic gene transfer using any type of particle (for example, a nano particle that can deliver a functional protein).
  • two amino acid sequences may be co-delivered including two chimeric nuclease monomers which are composed of programmable, sequence-specific DNA-binding modules linked to a nonspecific DNA cleavage domain.
  • a TALEN target site can be selected in any exon (in some embodiments, the target site may be in the first exon) of the target gene (e.g., COMT GenBank accession no. AJ231 133) using software, for example, TALENTM Hit software (Cellectis plant sciences, New Brighton, MN). Design principles for TALEN are well known in the art (see, e.g., Cremak et al. Nucleic Acids Res. 2011; 39:e82. doi: 10.1093/nar/gkr218).
  • TALEN target sequence e.g., target polynucleotide
  • target polynucleotide e.g., target polynucleotide
  • Each TALEN target sequence includes appropriate binding sites for each of the two TALEN monomers and a spacer region between them.
  • the spacer length can be about 13 nucleotides to about 17 nucleotides (e.g., about 13, 14, 15, 16, 17 nucleotides, or any range or value therein).
  • the repeat regions define the TALEN binding site in each of the two TALEN monomers. Cleavage by the Fokl nuclease domains occurs in the 'spacer' sequence that lies between the two regions of the DNA bound by the two TALEN monomers (see, for example, Joung & Sander Nature Reviews Molecular Cell Biology 14, 49-55 (January 2013)).
  • the DNA binding domain of the TALEN contains a repeated highly conserved 33-34 amino acid sequence with the exception of the 12th and 13th amino acids. These two locations (12th, 13th residues) are highly variable (Repeat Variable Diresidue) and show a strong correlation with specific nucleotide recognition (See, Boch et al.
  • the TALEN architecture results in a minimum polynucleotide target of about 53 nucleotides. In some embodiments, alternative TALEN architectures utilize shorter or longer sequence targets.
  • a “deletion" mutation in a nucleic acid results in the loss of one or more nucleotides that are present in the corresponding wild type or non-mutated nucleic acid, whereas an "insertion” mutation involves an addition of one or more nucleotides as compared to the corresponding native or wild-type nucleic acid.
  • Such mutations can result in no polypeptide product or a polypeptide product having no or reduced activity relative to a non- mutated gene.
  • nucleic acid encoding COMT can be mutagenized to produce inactivating deletions or inactivating insertions that result in the lack of production of a COMT polypeptide or the production of a COMT polypeptide that has reduced or no activity relative to the wild-type or non-mutated COMT nucleic acid.
  • the invention provides "stable" inactivating deletions and/or insertions, which means that such deletions and insertions are maintained in the genome and are heritable from one generation to another.
  • uniform mutation refers to mutations that display consistency from tissue to tissue and in primary transgenic to vegetative progeny regarding the type of mutations that can be identified in a PGR amplicon from such tissues by, for example, capillary electrophoresis (e.g. peak pattern (length of insertion or deletion)) and/or sequence analysis (same sequence polymorphism across different tissues)
  • capillary electrophoresis e.g. peak pattern (length of insertion or deletion)
  • sequence analysis sequence polymorphism across different tissues
  • chimeric mutations show variation in peak pattern (capillary electrophoresis) and/or variation in sequence polymorphism.
  • fusion protein refers to a protein or polypeptide formed from the combination of two di ferent proteins or protein fragments.
  • gene refers to a hereditary unit corresponding to a sequence of DNA that occupies a specific location on a chromosome and that contains the genetic instruction for a characteristic(s) or trait(s) in an organism.
  • locus refers to the position that a given gene or portion thereof occupies on a chromosome of a given species.
  • allele(s) indicates any of one or more alternative forms of a gene, where the alleles relate to at least one trait or characteristic.
  • the two alleles of a given gene occupy corresponding loci on a pair of homologous chromosomes.
  • a polyploidy organism such as sugarcane
  • the resulting modern sugarcane cultivars have a genome size of around 10 Gb and typically have around 120 chromosomes, 70-80% of which are entirely derived from S. officinarum, 10-20% from S. spontaneum and a few from interspecific recombinations (D'Hont et al, Mol. Gen. Genet 250:405-413 ( 1996); Cuadrado et al. J. Exp. Bot. 55 (398): 847-854 (2004); D'Hont et al. Cytogenetic Genome Res. 109:27-33 (2005); Piperidis et al., Mol. Gen. Genet 284:65-73 (2010)).
  • an allele of COMT or copy of a COMT allele or multiple alleles f COMT or multiple copies of a COMT allele may be mutated using target mutagenesis.
  • homoeologous also spelled homeologous, is used to describe the relationship of similar chromosomes or parts of chromosomes brought together following inter-species hybridization and allopolyploidization, and whose relationship was completely homologous in an ancestral species.
  • the homologous chromosomes within each parental sub-genome should pair faithfully during meiosis, leading to disomic inheritance; however in some allopolyploids, the homoeologous chromosomes of the parental genomes may be nearly as similar to one another as the homologous chromosomes, leading to tetrasomic inheritance (four chromosomes pairing at meiosis), intergenomic recombination, and reduced fertility.
  • heterozygous refers to a genetic condition where the organism or cell has different alleles at corresponding loci on homologous chromosomes.
  • homozygous refers to a genetic condition where the organism or cell has identical alleles at corresponding loci on homologous chromosomes.
  • exogenous refers to a nucleic acid molecule that is not in the natural genetic background of the cell/organism in which it resides.
  • the exogenous nucleic acid molecule comprises one or more nucleotide sequences that are not found in the natural genetic background of the exogenous nucleic acid molecule
  • the exogenous nucleic acid molecule can comprise one or more additional copies of a nucleotide sequence that is/are endogenous to the
  • the introduced exogenous sequence is a recombinant sequence.
  • lignocellulosic biomass refers to plant biomass that is composed of carbohydrate polymers ⁇ e.g. cellulose and hemicellulose) and lignin.
  • biomass refers to biological material derived from living, or recently living organisms.
  • lignin refers to an aromatic polymer that is the result of oxidative combinatorial coupling of 4-hydroxyphenylpropanoids, which are deposited primarily in the secondary cell wall structure. Lignin is formed from the starting compounds of hydroxycinnamyl alcohols (monolignols), coniferyl alcohol, and/or sinapyl alcohol, and typically minor amounts of ; -coumaryl alcohol.
  • saccharum refers to any of the several species or varieties or hybrids of tall perennial true grasses of the genus Saccharum. This includes Saccharum officinarum, Saccharum sinense, Saccharum barberi, Saccharum robustum, and Saccharum spontaneum.
  • biomass refers to solid, liquid, or gas fuels made from biomass.
  • sustainable refers to the creation and maintenance of conditions under which humans and nature can exist in productive harmony, that permit fulfilling the social, economic, and other requirements of present and future generations.
  • cellulosic refers to containing, or derived from cellulose.
  • polyploidy or “polyploid” refers to an organism that contains one or more cells that have more than twice the haploid number of chromosomes.
  • siRNA precursor refers to a molecule capable of being acted upon by cell proteins to produce siRNA molecules within the cell. These include, but are not limited to, shRNA and microRNA.
  • the terms “specifically binds” or “specific binding” refer to binding that occurs between such paired species such as enzyme/substrate, receptor/agonist or antagonist, antibody/antigen, lectin/carbohydrate, oligo DNA primers/DNA.
  • enzyme or protein/DNA, RNA molecule to other nucleic acid (DNA or RNA) or amino acid which may be mediated by covalent or non-covalent interactions or a combination of covalent and non-covalent interactions.
  • the binding that occurs is typically electrostatic, hydrogen-bonding, or the result of lipophilic interactions.
  • specific binding occurs between a paired species where there is interaction between the two which produces a bound complex having the characteristics of, for example, an antibody/antigen, enzyme/substrate, DNA/DNA, DNA/RNA. DNA/protein, RNA/protein, RNA/amino acid interaction.
  • specific binding may be characterized, for example, by the binding of one member of a pair to a particular species and to no other species within the family of compounds to which the corresponding member of the binding member belongs.
  • a monoclonal antibody preferably binds to a single epitope and to no other epitope within the family of proteins.
  • the DNA binding domain of a TALEN nucleic acid specifically binds to at least a portion of nucleic acid encoding COMT.
  • hybridizing specifically to refers to the binding, duplexing, or hybridizing of a molecule to a particular nucleic acid target sequence under stringent conditions when that sequence is present in a complex mixture (e.g., total cellular DNA or RNA) to the substantial exclusion of non-target nucleic acids, or even with no detectable binding, duplexing or hybridizing to non-target sequences.
  • a complex mixture e.g., total cellular DNA or RNA
  • Specifically hybridizing sequences typically are at least about 40% complementary and are optionally substantially complementary or even completely complementary (i.e. , 100% identical) to a target nucleic acid sequence.
  • RNA silencing means that RNA transcribed from the gene and/or, in the case of a protein-encoding gene, protein translated from the transcribed mRNA.
  • the transcribed RNA may be non-coding or protein-encoding.
  • non- coding refers to polynucleotides that do not encode part or all of an expressed protein. Non- coding sequences include but are not limited to introns, enhancers, promoter regions, 3' untranslated regions, and 5' untranslated regions.
  • Measurement of transcribed RNA or translated protein can be done by using molecular techniques such as RNA solution hybridization, nuclease protection, Northern hybridization, reverse transcription, gene expression monitoring with a microarray, antibody binding, enzyme-linked immunosorbent assay (ELISA), Western blotting, radioimmunoassay (RIA), other immunoassays, or fluorescence-activated cell analysis (FACS).
  • Gene suppression can be the result of co- suppression, anti-sense suppression, transcriptional gene silencing, post-transcriptional gene silencing, or translational gene silencing.
  • a “silenced”, “knocked-down”, “reduced”, “inhibited”, 'down regulated”, or “suppressed” gene refers to a gene that is subject to silencing. "Target gene” is thus the gene which is to be silenced. Gene silencing is "specific” for a target gene when silencing of the target gene occurs without manifest effects on other genes.
  • Target gene refers to the entire target gene, including exons, introns and regulatory regions such as promoters, enhancers, and terminators, 5' and 3' untranslated regions, the primary transcript, and the mature mRNA.
  • a target gene may be a gene whose silencing has a high likelihood of resulting in a strong phenotype, preferably a knockout or null phenotype.
  • a “target polynucleotide” refers to any nucleic acid that is of interest as a target for modulation of expression.
  • “Target polynucleotide” thus, refers to the part of a target gene which is bound or hybridized by a transcription activator-like effectors engineered to recognize the target polynucleotide.
  • the target polynucleotide may correspond to the entire target gene or a fragment of the whole target gene.
  • different TAL effector/TALEN construction methods use different portions of the TAL effector protein flanking the repeat region.
  • the target polynucleotide may comprise at least about 15 contiguous nucleotides of the target gene. Accordingly, in some embodiments, the target polynucleotide may be about 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96,
  • the target polynucleotide may be a length in the range of about 15 to about 1000 nucleotides, about 15 to about 900 nucleotides, about 15 to about 800 nucleotides, about 15 to about 700 nucleotides, about 15 to about 600 nucleotides, about 15 to about 500 nucleotides, about 15 to about 400 nucleotides, about 15 to about 300 nucleotides, about 15 to about 250 nucleotides, about 15 to about 200 nucleotides, about 15 to about 150 nucleotides, about 15 to about 100 nucleotides, about 25 to about 1000 nucleotides, about 25 to about 900 nucleotides, about 25 to about 800 nucleotides, about 25 to about 700 nucleotides, about 25 to about 600 nucleotides, about 25 to about 500 nucleotides, about 25 to about 400 nucleotides, about 25 to about 300 nucleotides, about 25 to about 250 nucleotides, about 15 to
  • a target polynucleotide for a COMT allele may be a region within the allele that is highly conserved between different COMT alleles so that many COMT alleles (and/or copies of those alleles) that have different nucleotide sequences can be co-targeted with a single vector.
  • a target polynucleotide for a COMT allele may be a region within that gene that is highly specific for the particular COMT target allele so that only that specific COMT allele (and/or copies of that allele) is targeted.
  • multiple TALEN pairs, each one targeting different polynucleotides for a COMT gene encoding a region within each allele that is highly specific for the particular COMT target allele can be co-introduced so that multiple specific COMT alleles and their copies can be co-targeted.
  • the number of mutated COMT copies and alleles may vary from event to event after introduction of a TALEN to the sugarcane genome offering opportunities to select the events wherein the genetically modified sugarcane plant cell or plant tissue comprises agronomic performance that is substantially the same as that of the non-modified wild type sugarcane plant, cell and/or tissue, while the lignin content is reduced and the biofuel yield is increased.
  • direct embryogenesis refers to a method of delivering an expression cassette and/or vector to cells wherein the embryos formed on an explant directly regenerate shoots without producing a callus.
  • indirect embryogenesis refers to a method of delivering an expression cassette and/or vector to cells, wherein the embryos form after callus development on the surface of the callus.
  • expression cassette refers to the part of a DNA expression vector or plasmid that is capable of directing the cell to make RNA and protein.
  • An expression cassette contains one or more nucleotide sequences of interest that code a polypeptide or functional nucleic acid and other regulatory sequences controlling and/or influencing the expression of the polynucleotide sequence(s) of interest contained within the expression cassette.
  • An expression cassette may be chimeric, meaning that at least one of its components is heterologous with respect to at least one of its other components.
  • An expression cassette may also be one that is naturally occurring but has been obtained in a recombinant form useful for heterologous expression. Typically, however, the expression cassette is heterologous with respect to the host, i.e., the particular nucleic acid sequence of the expression cassette does not occur naturally in the host cell and must have been introduced into the host cell or an ancestor of the host cell by a transformation event.
  • an expression cassette of the invention can also include other regulatory sequences.
  • regulatory sequences means nucleotide sequences located upstream (5' non-coding sequences), within or downstream (3' non-coding sequences) of a coding sequence, and which influence the transcription, RNA processing or stability, or translation of the associated coding sequence. Regulatory sequences include, but are not limited to, promoters, enhancers, introns, translation leader sequences, termination signals, and polyadenylation signal sequences.
  • the regulatory sequences or regions can be native/analogous to the plant, plant part and/or plant cell and/or the regulatory sequences can be native/analogous to the other regulatory sequences.
  • the regulatory sequences may be heterologous to the plant (and/or plant part and/or plant cell) and/or to each other (i.e., the regulatory sequences).
  • a promoter can be heterologous when it is operatively linked to a polynucleotide from a species different from the species from which the polynucleotide was derived.
  • a promoter can also be heterologous to a selected nucleotide sequence if the promoter is from the same/analogous species from which the polynucleotide is derived, but one or both (i.e., promoter and/or polynucleotide) are substantially modified from their original form and/or genomic locus, and/or the promoter is not the native promoter for the operably linked polynucleotide.
  • leader sequences derived from viruses are known to enhance gene expression. Specifically, leader sequences from Tobacco Mosaic Virus (TMV, the " ⁇ -sequence"), Maize Chlorotic Mottle Virus (MCMV) and Alfalfa Mosaic Virus (AMV) have been shown to be effective in enhancing expression (Gallie et al. (1987) Nucleic Acids Res. 15:8693-871 1 ; and Skuzeski et al. (1990) Plant Mol. Biol. 15:65-79).
  • TMV Tobacco Mosaic Virus
  • MCMV Maize Chlorotic Mottle Virus
  • AMV Alfalfa Mosaic Virus
  • leader sequences known in the art include, but are not limited to, picornavirus leaders such as an encephalomyocarditis (EMCV) 5' noncoding region leader (Elroy-Stein et al. (1989) Proc. Natl. Acad. Set USA 86:6126-6130); potyvirus leaders such as a Tobacco Etch Virus (TEV) leader (Allison et al. ( 1986) Virology 1 54:9-20); Maize Dwarf Mosaic Virus (MDMV) leader (Allison et al.
  • picornavirus leaders such as an encephalomyocarditis (EMCV) 5' noncoding region leader (Elroy-Stein et al. (1989) Proc. Natl. Acad. Set USA 86:6126-6130); potyvirus leaders such as a Tobacco Etch Virus (TEV) leader (Allison et al. ( 1986) Virology 1 54:9-20); Maize Dwarf Mosaic
  • An expression cassette also can optionally include a transcriptional and/or translational termination region (i. e. , termination region) that is functional in plants.
  • a transcriptional and/or translational termination region i. e. , termination region
  • a variety of transcriptional terminators are available for use in expression cassettes and are responsible for the termination of transcription beyond the heterologous nucleotide sequence of interest and correct mRNA polyadenylation.
  • the termination region may be native to the transcriptional initiation region, may be native to the operably linked nucleotide sequence of interest, may be native to the plant host, or may be derived from another source (i.e. , foreign or heterologous to the promoter, the nucleotide sequence of interest, the plant host, or any combination thereof).
  • Appropriate transcriptional terminators include, but are not limited to, the CAMV 35S terminator, the tml terminator, the nopaline synthase terminator and/or the pea rbes E9 terminator. These can be used in both monocotyledons and dicotyledons. In addition, a coding sequence's native transcription terminator can be used.
  • An expression cassette of the invention also can include a nucleotide sequence for a selectable marker, which can be used to select a transformed plant, plant part and/or plant cell.
  • selectable marker means a nucleotide sequence that when expressed imparts a distinct phenotype to the plant, plant part and/or plant cell expressing the marker and thus allows such transformed plants, plant parts and/or plant cells to be distinguished from those that do not have the marker.
  • Such a nucleotide sequence may encode either a selectable or screenable marker, depending on whether the marker confers a trait that can be selected for by chemical means, such as by using a selective agent (e.g., an antibiotic, herbicide, or the like), or on whether the marker is simply a trait that one can identify through observation or testing, such as by screening (e.g., the R-locus trait).
  • a selective agent e.g., an antibiotic, herbicide, or the like
  • screening e.g., the R-locus trait
  • selectable markers include, but are not limited to, a nucleotide sequence encoding neo or nptll, which confers resistance to kanamycin, G418, and the like (Potrykus et al. (1985) Mol. Gen. Genet. 199:183-188); a nucleotide sequence encoding bar, which confers resistance to phosphinothricin; a nucleotide sequence encoding an altered 5- enolpyravylshikimate-3 -phosphate (EPSP) synthase, which confers resistance to glyphosate (Hinchee et al. (1988) Biotech.
  • a nucleotide sequence encoding neo or nptll which confers resistance to kanamycin, G418, and the like
  • a nucleotide sequence encoding bar which confers resistance to phosphinothricin
  • nucleotide sequence encoding a nitrilase such as bxn from Klebsiella ozaenae that confers resistance to bromoxynil (Stalker et al. (1988) Science 242:419-423); a nucleotide sequence encoding an altered acetolactate synthase (ALS) that confers resistance to imidazolinone, sulfonylurea or other ALS-inhibiting chemicals
  • ALS acetolactate synthase
  • EP Patent Application No. 154204 a nucleotide sequence encoding a methotrexate-resistant dihydrofolate reductase (DHFR) (Thillet et al. ( 1988) J Biol. Chem.
  • nucleotide sequence encoding a dalapon dehalogenase that confers resistance to dalapon a nucleotide sequence encoding a mannose-6-phosphate isomerase (also referred to as phosphomannose isomerase (PMI)) that confers an ability to metabolize mannose
  • PMI phosphomannose isomerase
  • a nucleotide sequence encoding an altered anthranilate synthase that confers resistance to 5 -methyl tryptophan and/or a nucleotide sequence encoding hph that confers resistance to hygromycin.
  • One of skill in the art is capable of choosing a suitable selectable marker for use in an expression cassette of the invention.
  • Additional selectable markers include, but are not limited to, a nucleotide sequence encoding ⁇ -glucuronidase or uidA (GUS) that encodes an enzyme for which various chromogenic substrates are known; an R-locus nucleotide sequence that encodes a product that regulates the production of anthocyanin pigments (red color) in plant tissues (Dellaporta et al, "Molecular cloning of the maize R-nj allele by transposon-tagging with Ac,” pp.
  • GUS uidA
  • tyrosinase an enzyme capable of oxidizing tyrosine to DOPA and dopaquinone, which in turn condenses to form melanin
  • ⁇ -galactosidase an enzyme for which there are chromogenic substrates
  • lux luciferase
  • vectors can be used in connection with vectors.
  • vehicle e.g., nucleic acid construct
  • a vector may include a DNA molecule, linear or circular (e.g. plasmids), which includes a segment encoding a polypeptide of interest or a functional polynucleotide (e.g., siRNA) operatively linked to additional segments that provide for its transcription and, where relevant, its translation, upon introduction into a host cell or host cell organelles.
  • Such additional segments may include promoter and terminator sequences, and may also include one or more origins of replication, one or more selectable markers, an enhancer, a polyadenylation signal, etc.
  • Vectors for use in transformation of plants and other organisms are well known in the art.
  • Non-limiting examples of general classes of vectors include a viral vector including but not limited to an adenovirus vector, a retroviral vector, an adeno-associated viral vector, a plasmid vector, a phage vector, a phagemid vector, a cosmid, a fosmid, a bacteriophage, or an artificial chromosome.
  • expression vectors are derived from yeast or bacterial genomic or plasmid DNA, or viral DNA, or may contain elements of both.
  • the size of a vector can vary considerably depending on whether the vector comprises one or multiple expression cassettes (e.g., for molecular stacking). Thus, a vector size can range from about 3 kb to about 30 kb.
  • a vector is about 3 kb, 4kb, 5 kb, 6 kb, 7 kb, 8 kb, 9 kb, 10 kb, 11 kb, 12 kb, 13 kb, 14kb, 15 kb, 16 kb, 17 kb, 18 kb, 19 kb, 20 kb, 21 kb, 22 kb, 23 kb, 24kb, 25 kb, 26 kb, 27 kb, 28 kb, 29 kb, 30 kb, or any range therein, in size.
  • a vector can be about 3 kb to about 15 kb in size.
  • Vectors may be engineered to contain sequences encoding selectable markers that provide for the selection of cells that contain the vector and/or have incorporated the nucleic acid of the vector into the cellular genome. Such markers allow identification and/or selection of host cells that incorporate and express the proteins encoded by the marker.
  • a "recombinant" vector refers to a viral or non-viral vector that comprises one or more heterologous nucleotide sequences (i.e., transgenes).
  • Vectors may be introduced into cells by any suitable method known in the art, including, but not limited to, transfection, electroporation, microinjection, transduction, cell fusion, DEAE dextran, calcium phosphate precipitation, lipofection (lysosome fusion), and use of a gene gun or nucleic acid vector transporter.
  • amplicon refers to the nucleic acid product of artificial amplification or replication events, such as the product formed from conducting PGR.
  • TALEN-induced modification refers to an alteration of genomic DNA
  • the TALEN-induced modification can be an insertion of additional nucleotides to the genomic DNA, wherein such an insertion is not present in a given wild-type genomic DNA sequence for a given gene.
  • the TALEN-induced modification can be a deletion of nucleotides that are normally present in a given wild-type genomic DNA sequence for a given gene.
  • the TALEN-induced modification can be a change in the nucleotide sequence of the genomic DNA of a given gene without an alteration in the total number of nucleotides present (e.g., a substitution of one or more nucleotides with different nucleotides).
  • the TALEN-induced modification can include any one of or any combination of the aforementioned modifications to result in an alteration of the genomic DNA for a given gene as compared to the wild-type.
  • Methods for using TALEN are well-known and fully described in the art (see, e.g., Mussolino et al. Curr Opin Biotechnol. 23:644-650 (2012)) and can be readily adapted for producing COMT mutations in plant genomes including the highly polyploidy sugarcane genome.
  • transcription activator-like effector nuclease (TALEN) protein refers to synthetic restriction enzymes generated by fusing a TAL effector DNA binding domain to a DNA cleavage domain as described in Gaj et al., (2013) Trends Biotech. 31 :397- 405, which is hereby incorporated by reference.
  • modified lignin profile refers to a change in the actual and relative amounts (relative to the other types of lignins present in the cell wall) and types of lignins present in the cell wall of a plant cell that has a TALEN-induced modification as compared to a wild-type lignin profile.
  • the genetically modified sugarcane plant, cell and/or tissue of the invention comprises, consists of, or consists essentially of a ratio of syringyl to guaiacyl in the lignin that is reduced by about 6% to about 70% (e.g., about 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, , 26, 27, 28, 29, 30, 31 , 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70%, or any range or value therein) as
  • the modified sugarcane plant, cell and/or tissue of the invention comprises, consists of, or consists essentially of a ratio of syringyl to guaiacyl in the lignin of about 1.4 to about 0.45 (e.g., about 1.4, 1.35, 1.3, 1.25, 1.2, 1.15, 1.1, 1.05, 1.0, 0.95, 0.9, 0.85, 0.8, 0.75, 0.7, 0.65, 0.6. 0.55, 0.5, 0.45, or any range or value therein).
  • the ratio of syringyl to guaiacyl in the lignin in wild type sugarcane lignin is about 1.5.
  • modified plants of the invention further comprise, consist essentially of or consist of a phenotype of a brown midrib and/or brown internode, which can be a useful, for example, as a screenable, visible marker facilitating the introgression of this trait into new germplasm.
  • breeding refers to plants consumed by animals, particularly by grazing animals.
  • Biofuels are such an alternative.
  • examples of biofuels include, but are not limited to bioethanol, biodiesel, bioethers, and syngas.
  • a source of biofuels can be the sugars derived from the grain of agricultural crops.
  • Sugar for biofuel production can also be found in fast growing plants such as poplar, eucalyptus, and various grass residues such as corn stover and sugarcane bagasse. Advantages of using these types of sources for biofuel are that they are sustainable and do not usually compete with the food and feed supply. While the use of these types of plants as a source for biofuels is promising, the actual benefits have yet to be fully realized primarily due to the presence of lignin in the cell wall(s) of these plants. The physical structure and strength of plants is due primarily to the presence of a cell wall in plant cells. The cell wall is made of lignin and sugar molecules, such as cellulose. Cellulose can be converted into glucose, which can then be used in a classical fermentation process to produce alcohol.
  • lignin One function of lignin is to embed the sugar molecules to give firmness to plants. As such, even tall plants can maintain their upright stature.
  • lignin Unfortunately, the very aspects of lignin that makes it important to the plant make the current techniques for producing biofuels from lignin containing plants and plant parts inefficient and, in most cases, not economical. Insofar as lignin embeds the sugar molecules needed to produce biofuels, lignin reduces the accessibility of the sugar molecules for biofuel production. Lignin is resistant to physical, chemical, and biological degradation. As such, current techniques for removal of lignin from biofuel feedstock are energy consuming and are environmentally unsound.
  • Plants with reduced lignin content, that contain a modified amount of lignin, or that contain a modified lignin profile that is easier to be broken down can be, at least in theory, used as a source for biofuels that require less energy required and reduce the negative impact on the environment as compared to current lignin-containing sources.
  • RNAi techniques and random mutagenesis have been used to modify the expression of proteins involved in lignin biosynthesis to produce plants with reduced lignin content or a modified lignin profile on a laboratory scale.
  • RNAi approaches to modifying lignin biosynthesis only allow for knockdown of multiple alleles, as opposed to allowing for knockout of specific or multiple alleles. Further, the knockdown effect induced by RNAi may be only transient as a siRNA precursor transgene may not be stably inherited by progeny. The instability of transgene transfer may because there is a need for continued expression of the RNAi precursor product. Low gene stability, and thus low phenotype predictability, can negatively impact the value of the plant as a commercial product.
  • compositions and methods are directed to modifying lignin biosynthesis of sugarcane and other commercially relevant plants by genome editing using synthesized transcription activator-like effector nuclease (TALEN) proteins.
  • TALEN transcription activator-like effector nuclease
  • the present compositions, systems, and methods do not rely on the addition/expression of transgenes to modify lignin biosynthesis. Instead, the compositions, methods, and systems disclosed herein use TALEN proteins to modify the sugarcane genetic content by selectively disrupting the coding sequence of genes on one or more alleles within the lignin biosynthetic pathway of sugarcane.
  • a sugarcane plant may contain cells having a TALEN -disrupted COMT gene on at least one allele or copy.
  • a sugarcane plant may comprise, consist essentially of or consist of cells having a TALEN- disrupted COMT gene in more than 50% of the alleles or copies encoding COMT.
  • a sugarcane plant may comprise, consist essentially of or consist of cells having a TALEN-disrupted COMT gene in less than 100% of the alleles or copies encoding COMT.
  • a sugarcane plant may comprise, consist essentially of or consist of cells having a TALEN-disrupted COMT gene in more than 50% and less than 100% of the alleles or copies encoding COMT.
  • a disrupted COM ⁇ gene within an allele or copy can result in a reduction of the amount of lignin in the plant or cell relative to wild-type, without being lethal.
  • compositions and methods described and claimed herein can be used to generate a lignocellulosic biomass feedstock for biofuels that allows a more efficient, economical, and environmentally friendly biofuel synthesis than is currently available.
  • the lignocellulosic biomass can also be used for production of products requiring a cellulosic feedstock, such as paper products.
  • the plants, plant cells and/or tissues of the invention produce a lignocellulosic biomass having increased yields of directly fermentable sugars as compared to lignocellulosic biomass produced from wild type plants, plant cells and/or tissues.
  • the yield increase can be from about 10% to about 40% (e.g., about 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40%, or any range or value therein) using lignocellulosic biomass from plants, plant cells and/or tissues of the invention as compared to lignocellulosic biomass from a corresponding wild type plant, plant cell and/or tissue.
  • 40% e.g., about 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40%, or any range or value therein
  • sugarcane plants, plant cells and/or tissues of the invention can produce a lignocellulosic biomass having increased yields of directly fermentable sugars (e.g., ethanol) of about 10% to about 40% as compared to the corresponding wild type sugarcane plant, plant cell and/or tissue.
  • directly fermentable sugars e.g., ethanol
  • a lignocellulosic biomass produced from the plants, plant cells and/or tissues of the invention requires a reduced amount of cell wall degrading enzymes or reduced chemical or physical pretreatment of the lignocellulosic biomass for conversion to biofuel and other bioproducts as compared to lignocellulosic biomass produced from wild type plants, plant cells and/or tissues.
  • the amount of cell wall degrading enzymes for conversion of a lignocellulosic biomass produced from the plants, plant cells and/or tissues of the invention can be reduced by about two to six fold as compared to lignocellulosic biomass produced from corresponding wild type plants, plant cells and/or tissues (e.g., about 2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5, 6 fold, or any range or value therein).
  • the amount of cell wall degrading enzymes for conversion of a lignocellulosic biomass produced from a sugarcane plant, plant cell and/or tissue of the invention can be reduced by about two to six fold as compared to the amount of cell wall degrading enzymes for conversion of a lignocellulosic biomass produced from corresponding wild type plants, plant cells and/or tissues.
  • biobased product is defined as commercial or industrial products that are composed in whole, or in significant part, of biological products or renewable domestic agricultural materials (see, www.usda.gov/biobased).
  • Biomass produced from the plants, cells and by the methods claimed and described herein may also have improved digestibility when used as forage, compared to forages incorporating wild-type biomass.
  • the present disclosure encompasses non-naturally occurring genetically modified sugarcane plants and cells, which contain a COMT gene on at least one allele or copy that has one or more nucleotides deleted from the wild type COMT sequence.
  • Fig. 1 shows the general lignin biosynthetic pathway.
  • the pathway involves the coordinated regulation of three biosythetic patways: the shikimate pathway, the general phenylpropanoid pathway and the lignin branch pathway.
  • the shikimate pathway is a primary metabolic pathway which leads to the biosynthesis of the amino acids phenylalanine, tyrosine, or tryptophan, which are subsequently incorporated into a variety of plant products.
  • the general phenylpropanoid pathway begins with the deamination of L- phenylalaninc to cinnamic acid, which is catalyzed by phenylalanine ammonia-lyase (PAL).
  • PAL phenylalanine ammonia-lyase
  • the next step in the phenylpropanoid pathway is the hydroxylation of -coumaric acid at the 3-carbon to form caffeic acid, which is catalyzed by -coumarate 3 '-hydroxylase (C3H).
  • Caffeic acid is methylated by caffcoyl-CoA O-methyltransferase (CCoAOMT) or CoA O- methyltransferase (COMT) to form ferulate.
  • Ferulate can be hydroxylated by 5- hydroxyferulate by the enzyme Ferulate 5 -hydroxylase (F5H).
  • Ferulate 5 -hydroxylase Ferulate 5 -hydroxylase
  • the next step in the phenylpropanoid pathway is the methylation of 5-hydroxyferulate to form sinapate.
  • coumarate, caffeate, ferulate, 5 -Hydroxy ferulate, and sinapate can be converted to their respective hydroxycinnamoyl CoA esters by 4- Coumarate:CoA ligase (4CL).
  • hydroxycinnamoyl CoA esters When hydroxycinnamoyl CoA esters are formed they are then reduced by enzymes of the lignin branch pathway. Hydroxycinnamoyl CoA esters are converted to their corresponding hydroxycinnamalydehydes by cinnamoyl-CoA reductase (CCR). Next, hydroxycinnamaldehydes are converted to their corresponding hydroxycinnamyl alcohols by the enzyme cinnamyl alcohol dehydrogenase (CAD). Finally, laccases and peroxidases convert the hydroxycinnamyl alcohols into the various types (Hydroxy-phenyl. Gualacyl, or Syringyl) of lignin.
  • CCR cinnamoyl-CoA reductase
  • CAD cinnamyl alcohol dehydrogenase
  • laccases and peroxidases convert the hydroxycinnamyl alcohols into the various types (H
  • the ratio of the type of lignin may vary and can thus form various lignin profiles. Different ratios of lignin types result in cell walls with different properties. Some lignin profiles are easier to degrade than others. As such, manipulation of the lignin profile through genome editing of the genes involved in the lignin biosynthetic pathway as disclosed herein can result in plants with a lignin composition that is easier to degrade.
  • COMT is a key enzyme in the lignin biosynthetic pathway. As such complete ablation of COMT expression is most likely lethal to the plant because lignin is necessary for plant structure. Therefore, it is desirable to produce a sugarcane plant that has reduced COMT expression.
  • a RNAi approach can be used to knockdown COM expression at the COMT RNA transcript level.
  • a RNAi approach has several disadvantages including the requirement of the use of a transgene to generate stable production of a siRNA precursor or impractical transient delivery of siRNA or siRNA precursor to the plant.
  • the invention disclosed herein produces plants, plant tissues and/or plant cells that contain a disrupted COMT gene disrupted at the genomic DNA level on at least one, but not all, alleles or copies.
  • Fig. 2 shows the general mechanism by which a reduction in COMT can occur using TALEN-mediated COMT gene disruption.
  • sugarcane is highly polyploidy and, in this example, has ten hom(oe)ologous copies of a chromosome (la-lj). Each chromosome carries one copy or allele of the COMT gene.
  • COMTb COMTc although some of the COMT, COMTb COMTc. COMTd, COMTe, COM I f.
  • COMTg, COMTh, COM f i located on homologous and homeologous chromosomes differ in their COMT sequence the TALEN in this embodiment was targeted to bind to a highly conserved sequence between all of these alleles and therefore co-mutates these alleles.
  • COMTi alleles contain TALEN binding sites corresponding to, for example, SEQ ID NOs:5 and 6.
  • COMTj allele does not contain TALEN binding sites corresponding to, for example, SEQ ID NOs:5 and 6 and is therefore not mutated with this specific TALEN.
  • a TALEN protein that specifically binds the COMT gene is delivered by an appropriate method and modifies the COMT genomic DNA of a sugarcane plant cell or population of sugarcane plant cells.
  • the restriction endonuclease associated with the TALEN protein acts at its target site located between the TALEN binding sites within the COMT genomic DNA, a double strand break is introduced into COMT genomic DNA.
  • the cell repairs the double stranded break using the cell DNA repair mechanism, which may result, for example, in a frame shift mutation within the coding region of the COMT gene.
  • the mutation within the COMT gene knocks out the COMT gene function by keeping the gene from producing a wild-type COMT RNA transcript that can be translated into a functional COMT protein.
  • the invention provides a sugarcane plant that contains modified COMT genomic DNA on an allele or copy, wherein the modification to the COMT genomic DNA is a stable, inactivating deletion of nucleotides that are present in a corresponding wild-type allele or copy in a sugarcane plant.
  • the COMT gene of the sugarcane plant contains modified COMT genomic DNA that comprises any one of the nucleotide sequences of SEQ ID NOs:7-13.
  • modified COMT genomic DNA of the present disclosure is not limited to the nucleotide sequences of SEQ ID NOs:7- 13.
  • the invention provides a sugarcane plant that contains modified COMT genomic DNA on an allele or copy, wherein the modification to the COMT genomic DNA is an insertion of nucleotides that are in addition to those present in a corresponding wild-type allele or copy in a sugarcane plant.
  • the insertion comprises the nucleotide sequence of
  • the sugarcane plant that contains the modified COMT genomic DNA on an allele or copy has reduced gene expression of wild- type COMT as compared to a wild-type sugarcane plant.
  • the expression of COMT in the plant comprising the modified COMT is reduced by more than 50% to less than about 100%.
  • the expression of COMT in the plant comprising the modified COM ⁇ is reduced by about 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% or any range or value therein.
  • the functional activity of COM ⁇ in the plant comprising the modified COMT is reduced by at least about 50% to about 98%, about 55% to about 95%, 60% to about 95%, about 65% to about 95%, about 70% to about 95%, about 75% to about 95%, about 60% to about 92%, about 65% to about 92%, about 70% to about 92%, about 75% to about 92%, and the like, and any range or value therein.
  • a plant, plant cell and/or plant tissue modified through targeted mutagenesis has a mutation frequency of more than 50% (i.e., more than 50% of the alleles or copies encoding COMT comprise, consist essentially of or consist of a stable, inactivating deletion or insertion mutation).
  • the mutation frequency using targeted mutagenesis may be less than 100% (i.e., less than 100% of the alleles or copies encoding COMT comprise, consist essentially of or consist of a stable, inactivating deletion or insertion mutation).
  • the mutation frequency may be more than 50% and less than 100% (e.g., 51%, 52%, 53%.
  • the mutation frequency in a plant, plant cell and/or plant tissue using targeted mutagenesis may be about 51% to about 99%, about 51% to about 98%, about 51% to about 95%, about 55% to about 95%, 60% to about 95%, about 65% to about 95%, about 70% to about 95%, about 75% to about 95%, about 60% to about 92%, about 65% to about 92%, about 70% to about 92%, about 75% to about 92%, and the like, and any range or value therein.
  • Sugarcane is a highly polyploidy plant species with the number of chromosomes varying from about 10 to about 12 depending on the particular sugarcane variety.
  • a mutation frequency of more than 50% and less than 100%
  • at least about 6 to about 11 copies e.g., about 6, 7, 8, 9, 10 copies, or in the case of a sugarcane variety having 12 chromosomes, 11 copies, and the like
  • the COMT gene may be knocked out (i.e., a stable, inactivating mutation).
  • the sugarcane plant that comprises a modified COMT genomic DNA on an allele or copy has a reduced amount of lignin as compared to a wild-type sugarcane plant.
  • the amount of lignin in the plant comprising a modified COMT genomic DNA on an allele or copy is reduced by about 3% to about 30% (e.g., about 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, and any range or value therein) as compared to a plant comprising the wild type or unmodified COMT.
  • the sugarcane plant that contains the modified COMT genomic DNA on an allele or copy comprises, consists essentially of or consists of a modified lignin profile (e.g., reduced ratio of syringyl to guaiacyl in the lignin (about 6% to about 70%> as compared to wild type) and/or comprises, consists essentially of or consists of a ratio of syringyl to guaiacyl in the lignin of 1.4 to about 0.45.
  • a modified lignin profile e.g., reduced ratio of syringyl to guaiacyl in the lignin (about 6% to about 70%> as compared to wild type) and/or comprises, consists essentially of or consists of a ratio of syringyl to guaiacyl in the lignin of 1.4 to about 0.45.
  • the present disclosure also encompasses a sugarcane cell or a population of sugarcane cells, wherein the modification to the COMT genomic DNA is a stable, inactivating deletion or insertion of nucleotides that are present in a corresponding wild-type allele or copy of a sugarcane cell or population of cells.
  • the COMT gene of the sugarcane cell or population of sugarcane cells contains modified COMT genomic DNA that comprises a nucleotide sequence of SEQ ID NOs:7-13 or SEQ ID NO:21, or any combination thereof. Insofar as the DNA repair process is not completely understood, one of ordinary skill in the art cannot predict or otherwise know what the resulting sequence of the modified COMT gene will be after TALEN binds to the genomic DNA and creates the double stranded break.
  • the sugarcane cell or population of sugarcane cells that contain modified COMT genomic DNA on an allele or copy has reduced gene expression of wild- type COMT as compared to wild-type sugarcane cells or a population of wild-type sugarcane cells.
  • the sugarcane cell or population of cells that contains the modified COMT genomic DNA on an allele or copy has a reduced amount of lignin as compared to a wild-type sugarcane cell or population of cells.
  • the sugarcane cell or population of sugarcane cells that contain the modified COMT genomic DNA on an allele or copy has a modified lignin profile as compared to wild-type sugarcane cell or population of sugarcane cells.
  • the methods of the present invention provide plants, cells, and/or tissues with significantly reduced lignin/altered lignin composition in combination with agronomic performance (See, e.g., stem diameter in Table 3) that is not significantly different from the non-modified wildtype plants, cells, and/or tissues.
  • agronomic performance See, e.g., stem diameter in Table 3
  • This range of mutation (less than 100% and more than 50% mutation) of the functional COMT copies can only be accomplished in and remain genetically stable in a polyploidy species, and more particularly, in a highly polyploidy species.
  • Non-limiting examples of methods for transformation of plants include transformation via bacterial-mediated nucleic acid delivery (e.g., via Agrobacterid), viral-mediated nucleic acid delivery, silicon carbide or nucleic acid whisker-mediated nucleic acid delivery, liposome mediated nucleic acid delivery, microinjection, microparticle bombardment, calcium-phosphate-mediated transformation, cyclodextrin-mediated transformation, electroporation, nanopart i c 1 e-med iated transformation,, sonication, infiltration, PEG-mediated nucleic acid uptake, as well as any other electrical, chemical, physical (mechanical) and/or biological mechanism that results in the introduction of nucleic acid into the plant cell, including any combination thereof.
  • the vector, expression cassette, or naked nucleic acid may be introduced directly into the genomic DNA of a plant cell using techniques such as, but not limited to, electroporation and microinjection of plant cell protoplasts, or the recombinant nucleic acid can be introduced directly to plant tissue using ballistic methods, such as DNA particle (biolistic) bombardment.
  • Microinjection techniques are known in the art and well described in the scientific and patent literature.
  • the introduction of a recombinant nucleic acid using polyethylene glycol precipitation is described in Paszkowski et al. EMBO J. 1984, 3:2717-2722.
  • Electroporation techniques are described in Fromm et al. Proc. Natl. Acad. Sci. USA. 1985, 82:5824.
  • Ballistic transformation techniques are described in Klein et al. Nature. 1987, 327:70-73.
  • Another method for transforming plants, plant parts and/or plant cells involves propelling inert or biologically active particles at plant tissues and cells (particle (biolistic) bombardment). See, e.g. , US Patent Nos. 4,945,050; 5,036,006 and 5,100,792. Generally, this method involves propelling inert or biologically active particles at the plant cells under conditions effective to penetrate the outer surface of the cell and afford incorporation within the interior thereof. When inert particles are utilized, the vector can be introduced into the cell by coating the particles with the vector containing the nucleic acid of interest.
  • a cell or cells can be surrounded by the vector so that the vector is carried into the cell by the wake of the particle.
  • Biologically active particles e.g., dried yeast cells, dried bacterium or a bacteriophage, each containing one or more nucleic acids sought to be introduced
  • biolistic techniques can be used to introduce one or more heterologous polypeptides into a cell.
  • the recombinant nucleic acid may also be combined with suitable T-DNA flanking regions and introduced into a conventional Agrobacterium tumefaciens host vector, or other suitable vector.
  • the virulence functions of the Agrobacterium tumefaciens host will direct the insertion of the recombinant nucleic acid including the exogenous nucleic acid and adjacent marker into the plant cell DNA when the cell is infected by the bacteria.
  • Agrobacterium tumefaciens-mediated transformation techniques including disarming and use of binary vectors, are known to those of skill in the art and are well described in the scientific literature. See, for example, Horsch et al. Science. 1984, 233:496-498; Fraley et al. Proc. Natl. Acad. Sci. USA. 1983, 80:4803; and Gene Transfer to Plants, Potrykus, ed., Springer- Verlag, Berlin, 1995.
  • a further method for introduction of the vector or recombinant nucleic acid into a plant cell is by transformation of plant cell protoplasts (stable or transient) Plant protoplasts are enclosed only by a plasma membrane and will therefore more readily take up macromolecules, for example, exogenous DNA. These engineered protoplasts can be capable of regenerating whole plants. Suitable methods for introducing exogenous DNA into plant cell protoplasts include electroporation and polyethylene glycol (PEG) transformation. Following electroporation, transformed cells are identified by growth on appropriate medium containing a selective agent.
  • PEG polyethylene glycol
  • TALEN-induced COMT gene disruption e.g., insertion, deletion
  • PGR analysis e.g., PGR analysis
  • amplicon sequencing e.g., amplicon sequencing
  • capillary electrophoresis e.g., PGR analysis
  • amplicon sequencing e.g., amplicon sequencing
  • capillary electrophoresis e.g., PGR analysis
  • amplicon sequencing e.g., amplicon sequencing
  • capillary electrophoresis e.g., capillary electrophoresis.
  • Expression of the wild-type COMT in a plant or cell may be confirmed by detecting an increase or decrease of wild-type COMT mRNA or polypeptide in the modified plant. Methods for detecting and quantifying mRNA or proteins are well known in the art (e.g.. COMT activity assay).
  • Transformed plant cells that are derived by any of the above transformation techniques, or other techniques now known or later developed, can be cultured to regenerate a whole plant.
  • regeneration techniques may rely on manipulation of certain phytohormones in a tissue culture growth medium, typically relying on a biocide or herbicide selectable marker that has been introduced together with the exogenous nucleic acid. Plant regeneration from cultured protoplasts is described in Evans et al. (1983)
  • Regeneration can also be obtained from plant callus, explants, organs, or parts thereof.
  • the genetic properties engineered into the transgenic seeds and plants, plant parts, and/or plant cells of the invention described above can be passed on by sexual reproduction or vegetative growth and therefore can be maintained and propagated in progeny plants.
  • maintenance and propagation make use of known agricultural methods developed to fit specific purposes such as harvesting, sowing or tilling.
  • a nucleotide sequence (or a polypeptide) can be introduced into the plant, plant part and/or plant cell in any number of ways that are well known in the art.
  • the methods of the invention do not depend on a particular method for introducing one or more nucleotide sequences into a plant, only that they gain access to the interior of at least one cell of the plant.
  • more than one nucleotide sequence is to be introduced, they can be assembled as part of a single nucleic acid construct, or as separate nucleic acid constructs, and can be located on the same or different nucleic acid constructs.
  • the nucleotide sequences can be introduced into the cell of interest in a single transformation event, in separate transformation events, or, for example, in plants, as part of a breeding protocol.
  • a plant may refer to any suitable plant, including, but not limited to, spermatophytes (e.g., angiosperms and gymnosperms) and embryophytes (e.g., bryophytes, ferns and fern allies).
  • spermatophytes e.g., angiosperms and gymnosperms
  • embryophytes e.g., bryophytes, ferns and fern allies.
  • a plant useful with this invention includes any monocot plant and/or any dicot plant.
  • Representative host plants include soybean (Glycine max), corn (Zea mays), canola (Brassica napus, Brassica rapa ssp.), alfalfa (Medicago sativa), rice (Oryza sativa), rye (Secale cereale), sorghum (Sorghum bicolor, Sorghum vulgare), sunflower (Helianthus annuus), wheat (Triticum aestivum), tobacco (Nicotiana tabacum), potato (Solarium tuberosum), peanuts (Arachis hypogaea), cotton (Gossypium hirsutum), sweet potato (Ipomoea batatas), cassava (Manihot esculenta), coffee (Cqffea ssp.), coconut (Cocos nucifera), pineapple (Ananas comosus), citrus trees (Citrus spp.), cocoa (Theobroma cacao), tea (Camellia sinensis), banana (Mus
  • Additional host plants of the invention are crop plants, for example, cereals and pulses, maize, wheat, potatoes, tapioca, rice, sorghum, millet, cassava, barley, pea, and other root, tuber, or seed crops or turf grasses.
  • Important seed crops for the invention are oil-seed rape, sugar beet, maize, sunflower, soybean, and sorghum.
  • Horticultural plants to which the invention may be applied may include lettuce, endive, and vegetable brassica including cabbage, broccoli, and cauliflower, and carnations, geraniums, petunias, and begonias.
  • plants of the invention may be applied to tobacco, cucurbits, carrot, strawberry, sunflower, tomato, pepper, chrysanthemum, poplar, eucalyptus, and pine.
  • plants of the invention include grain seeds, such as corn, wheat, barley, rice, sorghum, rye, etc.
  • plants of the invention include oil-seed plants.
  • Oil seed plants include canola, cotton, soybean, safflower, sunflower, brassica, maize, alfalfa, palm, coconut, etc.
  • plants of the invention include leguminous plants. Leguminous plants include beans and peas.
  • Beans include guar, locust bean, fenugreek, soybean, garden beans, cowpea, mung bean, lima bean, fava bean, lentils, chickpea, etc.
  • Host plants useful in the invention are row crops and broadcast crops.
  • Non-limiting examples of useful row crops are corn, soybeans, cotton, amaranth, vegetables, rice, sorghum, wheat, milo, barley, sunflower, durum, and oats.
  • Non-limiting examples of useful broadcast crops are sunflower, millet, rice, sorghum, wheat, milo, barley, durum, and oats.
  • Host plants useful in the invention are monocots and dicots.
  • Non-limiting examples of useful monocots are rice, corn, wheat, palm trees, turf grasses, barley, and oats.
  • Non-limiting examples of useful dicots are soybean, cotton, alfalfa, canola, flax, tomato, sugar beet, sunflower, potato, tobacco, corn, wheat, rice, lettuce, celery, cucumber, carrot, and cauliflower, grape, and turf grasses.
  • Host plants useful in the invention include plants cultivated for aesthetic or olfactory benefits. Non-limiting examples include flowering plants, trees, grasses, shade plants, and flowering and non-flowering ornamental plants.
  • Host plants useful in the invention include plants cultivated for nutritional value, fibers, wood, and industrial products.
  • a plant of the invention includes, but is not limited to, a soybean plant, a sugar beet plant, a corn plant, a cotton plant, a canola plant, a sugarcane plant, a wheat plant, a rice plant or a turf grass plant.
  • a plant of the invention can include but is not limited to a forage grass, a forage legume, and/or a fodder crop (e.g., ryegrass, bahiagrass, Bermuda grass, tall fescue, signal grass, gamma grass, alfalfa, clover, and the like).
  • the plant of the invention can include biomass crops (e.g., switchgrass, willow, arundo donax, elephantgrass, miscanthus.
  • biomass crops e.g., switchgrass, willow, arundo donax, elephantgrass, miscanthus.
  • the plant of the invention is sugarcane.
  • the plant of the invention can be bahiagrass.
  • plant part includes but is not limited to embryos, pollen, ovules, seeds, leaves, flowers, stems, branches, fruit, kernels, ears, cobs, husks, stalks, roots, root tips, anthers, plant cells including plant cells that are intact in plants and/or parts of plants, plant protoplasts, plant tissues, plant cell tissue cultures, plant calli, plant clumps, and the like.
  • plant cell refers to a structural and physiological unit of the plant, which comprises a cell wall and also may refer to a protoplast.
  • a plant cell of the invention can be in the form of an isolated single cell or can be a cultured cell or can be a part of a higher-organized unit such as, for example, a plant tissue or a plant organ.
  • a "protoplast" is an isolated plant cell without a cell wall or with only parts of the cell wall.
  • a transgenic cell comprising a nucleic acid molecule and/or nucleotide sequence of the invention is a cell of any plant or plant part including, but not limited to, a root cell, a leaf cell, a tissue culture cell, a seed cell, a flower cell, a fruit cell, a pollen cell, and the like.
  • the plant part can be a plant germplasm.
  • a plant cell can be non-propagating plant cell that does not regenerate into a plant.
  • Plant cell culture means cultures of plant units such as, for example, protoplasts, cell culture cells, cells in plant tissues, pollen, pollen tubes, ovules, embryo sacs, zygotes and embryos at various stages of development.
  • a transgenic tissue culture or transgenic plant cell culture is provided, wherein the transgenic tissue or cell culture comprises a nucleic acid molecule/nucleotide sequence of the invention.
  • a "plant organ” is a distinct and visibly structured and differentiated part of a plant such as a root, stem, leaf, flower bud, or embryo.
  • Plant tissue as used herein means a group of plant cells organized into a structural and functional unit. Any tissue of a plant in planta or in culture is included. This term includes, but is not limited to, whole plants, plant organs, plant seeds, tissue culture and any groups of plant cells organized into structural and/or functional units. The use of this term in conjunction with, or in the absence of, any specific type of plant tissue as listed above or otherwise embraced by this definition is not intended to be exclusive of any other type of plant tissue.
  • the invention provides plants, plant parts, and/or plant cells produced by the methods of the invention.
  • the invention further provides a plant crop comprising a plurality of transgenic plants of the invention planted together in, for example, an agricultural field, a golf course, a residential lawn, a road side, an athletic field, and/or a recreational field.
  • a sugarcane plant having modified COMT genomic DNA of at least one allele or copy is produced by a method of indirect embryogenesis using agrobacteria.
  • the method involves inoculating a callus induced from an immature sugarcane leaf whorl with agrobacteria containing a TALEN expression vector configured to produce a TALEN that binds sugarcane COMT and creates a double stranded break in the COMT DNA at a target site recognized by a restriction endonuclease of the TALEN.
  • the TALEN expression vector comprises, consists essentially of, or consists of the nucleotide sequence of SEQ ID NO: 14, SEQ ID NO: 15, and/or SEQ ID NO: 16.
  • a sugarcane plant having modified COMT genomic DNA of at least one allele or copy is produced by a method of direct embryogenesis using ballistic bombardment.
  • the method involves bombarding a pre-cultured immature sugarcane leaf whorl with a TALEN expression vector or a minimal TALEN expression cassette configured to produce a TALEN that binds sugarcane COMT and creates a double stranded break in the COMT DNA at a target site recognized by a restriction endonuclease of the TALEN.
  • the TALEN expression vector comprises, consists essentially of, or consists of the nucleotide sequence of SEQ ID NO: 14 or SEQ ID NO: 15 and/or any combination thereof.
  • the minimal TALEN expression cassette comprises, consists essentially of, or consists of the nucleotide sequence of SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 19, SEQ ID NO:20 and/or any combination thereof.
  • the prc- culture immature sugarcane leaf whorl is bombarded with expression vectors or minimal TALEN expression cassettes comprising, consisting essentially of, or consisting of the nucleotide sequence of SEQ ID NO: 14. SEQ I D NO: 1 5, SEQ ID NO: 1 7, SEQ I D NO: 1 8, SEQ ID NO: 19, SEQ ID NO:20, and/or any combination thereof.
  • the method further comprises selecting a leaf whorl that contains a minimal TALEN expression cassette and/or a TALEN expression vector and regenerating sugarcane roots from the selected leaf whorl without first producing a callus.
  • the TALEN-induced modified COMT gene can be introduced into other plants by, for example, sexual crossing. Any of a number of standard breeding techniques can be used, depending upon the species to be crossed.
  • sugarcane plants and cells containing TALEN-modified COMT The present disclosure also encompasses biomass and byproducts derived from the sugarcane plants and cells containing a modified COMT mediated by TALEN specific for COMT as described herein.
  • the biomass derived from sugarcane having TALEN-induced modified COMT genomic DNA may be used as a biofuel feedstock.
  • the biomass derived from sugarcane having TALEN-induced modified COMT genomic DNA has a reduced or modified amount of lignin relative to biomass derived from wild-type sugarcane.
  • the biomass derived from sugarcane having TALEN-induced modified COMT genomic DNA has a modified lignin profile relative to biomass derived from wild-type sugarcane.
  • the biomass derived from sugarcane having TALEN-induced modified COMT genomic DNA can also be used as a feedstock to produce cellulosic products, such as paper.
  • Production of cellulosic products from lignocellulosic biomass also needs the removal of lignin from the biomass feedstock.
  • production of cellulosic products from lignocellulosic biomass derived from wild-type sugarcane suffers the same shortcomings as removal of lignin from biofuel feedstock produced from wild-type sugarcane.
  • the lignocellulosic biomass derived from sugarcane having
  • TALEN-induced modified COMT genomic DNA has a reduced or modified amount of lignin relative to lignocellulosic biomass derived from wild-type sugarcane.
  • the lignocellulosic biomass derived from sugarcane having TALEN-induced modified COMT genomic DNA has a modified lignin profile relative to biomass derived from wild-type sugarcane. Therefore, the use of biomass having a reduced amount lignin and/or a modified lignin profile may remedy some of the shortcomings of current lignin removal techniques previously described.
  • the biomass derived from sugarcane having TALEN-induced modified COMT genomic DNA may be used as forage for animals.
  • Lignin may be difficult for many animals to digest as they may lack the proper or sufficient amount of enzymes to break it down into digestible components.
  • forages derived from plants, tissues and/or cells described herein may be more digestible by animals relative to wild type sugarcane. The increased digestibility may be due to a reduced amount of lignin or a modified lignin profile of the forage derived from the plants, tissues and/or cells described herein as compared to wild type.
  • the plant can be sugarcane.
  • the plant can be bahiagrass.
  • the present invention provides a novel approach to reducing lignin levels in a plant, plant cell or plant tissue, in particular, in a sugarcane plant, plant cell or plant tissue, as well as the plants, cells and tissues produced by the inventive method and products produced therefrom including methods of producing said products.
  • the present invention provides bahiagrass produced as described herein and having reduced lignin levels and modified lignin profiles
  • a method of reducing the lignin content and/or modifying the lignin profile of a plant, cell and/or tissue comprising, consisting essentially, or consisting of: mutagenizing nucleic acid in a plant, cell and/or tissue to produce a stable inactivating deletion or insertion in more than 50% of the alleles or copies of said plant, cell and/or tissue that encode CoA O-methyltransferase (COMT), thereby reducing the lignin content of the plant, cell and/or tissue and/or modifying the lignin profile of the plant, cell and/or tissue as compared to a wild type plant, cell and/or tissue.
  • COA O-methyltransferase COMP
  • a method of reducing the lignin content and/or modifying the lignin profile of a sugarcane plant, cell and/or tissue comprising, consisting essentially, or consisting of: mutagenizing nucleic acid in a sugarcane plant, cell and/or tissue to produce a stable inactivating deletion or insertion in more than 50% of the alleles or copies of said sugarcane plant, cell and/or tissue that encode CoA O-methyltransferase (COMT), thereby reducing the lignin content of the sugarcane plant, cell and/or tissue and/or modifying the lignin profile of the sugarcane plant, cell and/or tissue as compared to a wild type sugarcane plant, cell and/or tissue.
  • COMT CoA O-methyltransferase
  • the mutagenesis produces a stable inactivating deletion or insertion in less than 100% of the COMT alleles or copies. In some embodiments, the mutagenesis produces a stable inactivating deletion or insertion in more than 50% and less than 100% of the alleles or copies (e.g., 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71 , 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99% or any range or value therein).
  • the alleles or copies e.g., 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62,
  • about 7 to about 11 COMT alleles or copies of a sugarcane variety that comprises 12 COMT alleles or copies can be mutated to achieve a mutation frequency of more than 50% and less than 100%.
  • about 7 to about 10 COMT alleles or copies of a sugarcane variety that comprises 11 COMT alleles or copies can be mutated to achieve a mutation frequency of more than 50% and less than 100%.
  • the mutagenesis produces a stable inactivating deletion or insertion in about 51% to about 99%, about 51% to about 98%, about 51% to about 95%, about 55% to about 95%, about 58% to about 95%, about 58% to about 92%, 60% to about 95%, 60% to about 92%, about 65% to about 95%, about 70% to about 95%, about 75% to about 95%, about 60% to about 92%, about 65% to about 92%, about 70% to about 92%, about 75% to about 92%o, and the like, and any range or value therein.
  • the mutagenizing comprises targeted mutagenesis.
  • a sugarcane plant may be regenerated from the sugarcane plant cell or plant tissue.
  • a plant, plant cell and/or plant tissue modified through targeted mutagenesis comprises, consists essentially of or consists of a mutation frequency of more than 50%. In some embodiments, the mutation frequency using targeted mutagenesis may be less than 100%.
  • a sugarcane plant, plant cell and/or plant tissue modified through targeted mutagenesis comprises, consists essentially of or consists of a mutation frequency of more than 50%.
  • a sugarcane plant, plant cell and/or plant tissue modified through targeted mutagenesis comprises, consists essentially of or consists of a mutation frequency of more than 50% and less than 100%.
  • the mutation frequency may be more than 50% to less than 100% (e.g., 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% or any range or value therein).
  • the mutation frequency in a plant, plant cell and/or plant tissue using targeted mutagenesis may be about 51% to about 99%, about 51% to about 98%, about 51 % to about 95%, about 55% to about 95%, about 58% to about 95%, about 58% to about 92%, 60% to about 95%, 60% to about 92%, about 65% to about 95%, about 70% to about 95%, about 75% to about 95%>, about 60% to about 92%, about 65% to about 92%, about 70% to about 92%, about 75% to about 92%, and the like, and any range or value therein.
  • the COMT alleles or copies comprising said stable, inactivating deletion or insertion produce no COMT protein or produce COMT protein with reduced and/or no activity.
  • the reduction in COMT activity can be more than 50% to less than 100% of the activity of a control (e.g., the COMT activity in the corresponding wild type plant, plant cell or plant tissue).
  • the targeted mutagenesis comprises, consists essentially of, or consists of introducing into said sugarcane plant, cell and/or tissue a nucleic acid construct comprising a transcription activator-like effector nuclease (TALEN) and a DNA binding domain that binds to at least a portion of nucleic acid encoding COMT.
  • the nucleic acid construct is transiently transformed into the sugarcane plant, cell and/or tissue.
  • the nucleic acid construct is stably transformed into the genome of the sugarcane plant, cell and/or tissue.
  • the nucleic acid construct comprising a transcription activatorlike effector nuclease (TALEN) and a DNA binding domain that binds to at least a portion of nucleic acid encoding COMT comprises, consists essentially of. or consist of a TALEN expression vector encoded by the nucleotide sequence of SEQ ID NO: 14, SEQ ID NO: 15, and/or SEQ ID NO: 16, and/or a TALEN minimal expression cassette encoded by the nucleotide sequence of SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 19, and/or SEQ ID NO:20.
  • TALEN transcription activatorlike effector nuclease
  • the methods of the invention result in the amount of lignin in the mutagenized sugarcane plant, plant cell, or plant tissue being reduced by about 3% to about 30% (e.g., 3, 4, 5, 6, 7, 8, 9, 10, 1 1, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, , 26, 27, 28, 29, 30%, or any value or range therein) as compared to wild type.
  • the lignin in the mutagenized sugarcane plant, plant cell, or plant tissue is reduced by about 5% to about 30%, about 7% to about 30%, about 10% to about 30%, about 3% to about 25%, about 5% to about 25%, about 7% to about 25%, about 10% to about 25%, about 5% to about 20%, about 7% to about 20%, about 10% to about 20%>, and the like.
  • the ratio of syringyl to guaiacyl in the lignin of the mutagenized sugarcane plant, plant cell, or plant tissue is modified, wherein the ratio of syringyl to guaiacyl in the lignin is reduced by 6% to 70% (as compared to WT), thereby resulting in a modified lignin profile as compared to the lignin profile of a wild type sugarcane plant, plant cell, or plant tissue.
  • the ratio of syringyl to guaiacyl in the lignin of the mutagenized sugarcane plant, plant cell, or plant tissue is modified to be about 1.4 to about 0.45.
  • a genetically modified sugarcane plant, cell and/or tissue is produced by the methods of this invention.
  • the present invention provides a genetically modified plant, cell and/or tissue comprising, consisting essentially of, or consisting of a stable, inactivating deletion or insertion in more than 50% of the alleles or copies encoding CoA O- methy 1 trans ferase (COMT) in said plant, cell and/or tissue.
  • a genetically modified plant, cell and/or tissue is provided, comprising, consisting essentially of, or consisting of a stable, inactivating deletion or insertion in less than 100% of the COMT alleles or copies.
  • the present invention provides a genetically modified sugarcane plant, cell and/or tissue comprising, consisting essentially of, or consisting of a stable, inactivating deletion or insertion in more than 50% of the alleles or copies encoding CoA O-methyltransferase (COMT) in said sugarcane plant, cell and/or tissue.
  • a genetically modified sugarcane plant, cell and/or tissue is provided, comprising, consisting essentially of, or consisting of a stable, inactivating deletion or insertion in less than 100% of the COMT alleles or copies.
  • a genetically modified sugarcane plant, cell and/or tissue comprising, consisting essentially of, or consisting of a stable, inactivating deletion or insertion in more than 50% and less than 100% of the alleles or copies encoding CoA ( -methyltransferase (COMT) in said sugarcane plant, cell and/or tissue.
  • the COMT alleles or copies comprising said stable, inactivating deletion or insertion produce no COMT protein or produce COMT protein with reduced or no activity.
  • the genetically modified sugarcane plant, cell and/or tissue of the invention comprises, consists essentially of, or consists of a reduced amount of lignin, wherein the amount of lignin is reduced by about 3% to about 30% (as compared to WT).
  • the genetically modified sugarcane plant, cell and/or tissue of the invention comprises, consists essentially of, or consists of a modified lignin profile, wherein the lignin profile is modified such that the ratio of syringyl to guaiacyl in the lignin is reduced by 6% to 70% as compared to wild type.
  • the genetically modified sugarcane plant, cell and/or tissue of the invention comprises, consists essentially of, or consists of a modified lignin profile, wherein the ratio of syringyl to guaiacyl in the lignin is about 1.4 to about 0.45 (as compared to 1.5 in the wild type sugarcane).
  • the genetically modified sugarcane plant, cell and/or tissue of the invention further comprises an agronomic performance that is substantially the same as that of the non-modified wild type sugarcane plant, cell and/or tissue.
  • agronomic performance refers to biomass yield, resistance to lodging, resistance to disease and resistance to insects, and the like.
  • An agronomic performance that is the substantially the same as wild type can be, for example, about 80-100% of the biomass production of the wild type.
  • An indicator for biomass production in sugarcane is, for example, stem diameter (see, e.g., Table 3).
  • a genetically modified sugarcane plant cell or plant tissue of the invention may be regenerated into a genetically modified sugarcane plant.
  • a crop comprising a plurality of the genetically modified sugarcane plant of the present invention is planted together in an agricultural field.
  • a product is provided that is produced from a genetically modified sugarcane plant, cell and/or tissue of the invention or a sugarcane crop of the invention.
  • the product can be biomass, bagasse, biofuel, and/or a biobased product.
  • the present invention further provides a method of increasing the efficiency of conversion of lignocellulosic biomass into biofuel, comprising: providing a plant of the invention or a crop of the invention; and converting the lignocellulosic biomass from said plant and/or crop into biofuel, thereby increasing the efficiency of conversion of lignocellulosic biomass into biofuel as compared to a wild type plant or crop.
  • the present invention provides a method of increasing the efficiency of conversion of sugarcane lignocellulosic biomass into biofuel, comprising: providing a sugarcane plant of the invention or a crop of the invention; and converting the lignocellulosic biomass from said sugarcane plant and/or crop into biofuel, thereby increasing the efficiency of conversion of sugarcane lignocellulosic biomass into biofuel as compared to a wild type sugarcane plant or crop.
  • the efficiency can be measured as an increase in fermentable sugar or a percent increase of biofuel and the increase can be about 15% to about 45% (e.g., about 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45% or any value or range therein) as compared to conversion of wild type sugarcane lignocellulosic biomass into biofuel.
  • the increase in efficiency can be about 15% to about 40%, about 15% to about 35%, about 15% to about 30%, about 15% to about 25%, about 20% to about 40%, about 20% to about 35%, about 20% to about 30%, and the like.
  • the present invention further provides a method of providing an animal feed having increased digestibility, comprising: providing a plant having reduced lignin content generated as described herein; and converting the lignocellulosic biomass from said plant and/or crop into animal feed, thereby providing a more readily digestible animal feed.
  • the plant is sugarcane.
  • the plant is bahiagrass.
  • a genetically modified sugarcane plant comprising: modified COMT genomic DNA on an allele or copy, wherein the modification to the COMT genomic DNA comprises a stable, inactivating deletion or insertion (as compared to a wild- type allele or copy).
  • the modified COMT genomic DNA comprises, consists essentially of, or consists of the nucleotide sequence of SEQ ID NO: 7, SEQ ID NO:8, SEQ ID N():9. SEQ ID NO: 10, SEQ ID NO: 1 1 , SEQ ID NO: 12 and/or SEQ ID NO: 13.
  • the sugarcane plant of the invention has reduced gene expression of COMT as compared to a wild-type plant and/or a reduced amount of COMT protein expression as compared to a wild-type plant. In some embodiments, the sugarcane plant of the invention has a reduced amount of lignin or a modified lignin profile as compared to a wild-type sugarcane plant.
  • the invention provides a sugarcane plant cell comprising: modified COMT genomic DNA on an allele or copy, wherein the modification to the COMT genomic DNA comprises a stable, inactivating deletion or insertion (as compared to a wild- type allele or copy).
  • the modified COMT genomic DNA comprises, consists essentially of, or consists of the nucleotide sequence of SEQ ID NO: 7, SEQ ID NO:8, SEQ ID NO:9, SEQ ID NO: 10, SEQ ID NO:l l, SEQ ID NO: 12 and/or SEQ ID NO: 13.
  • the sugarcane plant cell of the invention has reduced gene expression of COMT as compared to a wild-type plant cell and/or a reduced amount of COMT protein expression as compared to a wild-type plant cell.
  • the sugarcane plant of the invention has a reduced amount of lignin or a modified lignin profile as compared to a wild-type sugarcane plant cell.
  • the present invention further provides a population of sugarcane plant cells comprising a sugarcane plant cell of this invention.
  • the invention provides a method of producing a genetically modified sugarcane plant having modified COMT genomic DNA of at least of one allele or copy produced by a method of indirect embryogenesis using agrobacteria, the method comprising: inoculating a callus induced from an immature leaf whorl of sugarcane with agrobacteria containing a TALEN expression vector, wherein the TALEN expression vector comprises, consists essentially of, or consists of the nucleotide sequence of SEQ ID NO: 14, SEQ ID NO: 15, SEQ ID: 16, or any combination thereof.
  • the invention provides a method of producing a genetically modified sugarcane plant having modified COMT genomic DNA of at least one allele or copy produced by a method of direct embryogenesis, the method comprising: bombarding a pre-cultured immature sugarcane leaf whorl with a TALEN expression vector comprises, consists essentially of, or consists of the nucleotide sequence of SEQ ID NO: 14, SEQ ID NO: 15 or any combination thereof, or a minimal TALEN expression cassette comprises, consists essentially of, or consists of the nucleotide sequence of SEQ ID NO: 17. SEQ ID NO: 18. SEQ ID NO: 19. SEQ ID NO:20, or any combination thereof.
  • the method further comprises, selecting leaf a whorl that contains a TALEN minimal expression cassette or TALEN expression vector; and regenerating sugarcane shoots from the selected leaf whorl without producing a callus.
  • the present invention provides a method for reducing the amount of lignin present in sugarcane relative to a wild-type sugarcane plant, the method comprising: modifying the COMT genomic DNA of at least one allele or copy in a sugarcane plant cell by inoculating a callus induced from immature leaf whorls of sugarcane with agrobacteria containing a TALEN expression cassette, wherein the TALEN expression cassette comprises, consists essentially of, or consists of the nucleotide sequence of SEQ ID NO: 14, SEQ ID NO: 15. SEQ ID 16, or any combination thereof; and growing a sugarcane plant from the inoculated callus, whereby the sugarcane plant has a reduced amount of lignin relative to a wild-type sugarcane plant.
  • the present invention provides method for reducing the amount of lignin present in a sugarcane relative to a wild-type sugarcane plant, the method comprising: bombarding a pre-cultured immature sugarcane leaf whorl with a TALEN expression vector comprises, consists essentially of, or consists of the nucleotide sequence of SEQ ID NO: 14, SEQ ID NO: 15 or any combination thereof, or a minimal TALEN expression cassette comprises, consists essentially of, or consists of the nucleotide sequence of SEQ ID NO: 17, SEQ ID NO: 1 8. SEQ ID NO: 19.
  • SEQ ID NO:20 or any combination thereof; selecting a leaf whorl that contains a TALEN minimal expression cassette or TALEN expression vector; regenerating sugarcane shoots from the selected leaf whorl without producing a callus; and growing a sugarcane plant from the regenerated sugarcane shoots, whereby the sugarcane plant has at least one cell with a reduced amount of lignin relative to a wild-type sugarcane plant cell.
  • the present invention provides a biofuel feedstock comprising biomass, a cellulosic material comprising cellulose each of which is derived from a sugarcane plant, cell or tissue of this invention.
  • the present invention provides a forage comprising biomass derived from a plant (e.g., sugarcane, bahiagrass, and the like) produced according to the methods of the invention.
  • a forage comprising biomass derived from a plant (e.g., sugarcane, bahiagrass, and the like) produced according to the methods of the invention.
  • Example 1 Selection of a TALEN binding sequence within sugarcane COMT gene
  • the first 200 bp region in the first exon of the sugarcane COMT gene was pre-selected to search potential TALEN binding sites.
  • TALENTM Hit software http://talen-hit.cellectis-bioresearch.com/) was used to identify candidate TALEN binding and target (spacer) sites within the pre-selected region. Binding DNA sequences for the left and right TALEN, which should be preceded by a 5 -T and the corresponding target sites (spacer) were selected between +49 and +101 in the COMT ORF (numbers counted from A in start codon sequence, ATG).
  • TALEN arms were custom synthesized from Cellectics Bioresearch (TALENTM Sure KO) and Life Technologies (GencArt® Precision TAL).
  • the backbone for the entry vector was custom-synthesized and constructed by subcloning and carried loxP sites for optional removal of the TALEN cassette from genomic DNA with ere recombinase, nptll selectable marker, and promoter / terminator pairs for the expression of TALEN arms.
  • Left and right TALEN arms were cloned into the entry vector under the control of YLCV promoter / NtHSP terminator and YLCV promoter / AtHSP terminator, respectively.
  • Cre expression cassette was introduced into TALEN constructs, resulting in pTALCOMT Cellectics Cre (pTALCreCell) (SEQ ID NO: 17) and pTALCONlT Ufe Cre (pTALCreLife) (SEQ ID NO: 18) as shown in Figs. 5 and 6.
  • the Cre expression cassette contained a heat inducible promoter (GmHSP) to allow for conditional activation of cre, NLS fused / codon optimized Cre gene, and Nos terminator.
  • GmHSP heat inducible promoter
  • TALEN expression cassette was introduced into sugarcane genome through indirect (IE) and direct embryogenesis (DE) using agrobacteria and biolistic mediated gene transfer.
  • IE indirect
  • DE direct embryogenesis
  • Agrobacteria mediated transformation via IE 8 week-old callus induced from immature leaf whorls of sugarcane (Saccharum spp. Hybrid) var. CP88-1762 was inoculated with Agrobacterium strain AG L I carrying pTALCell (SEQ ID NO: 14).
  • pTALLife SEQ ID NO: 15
  • nptll control SEQ ID NO: 16).
  • pTALCell SEQ ID NO: 14
  • pTALLife SEQ ID NO: 15
  • pTALCreCell SEQ ID NO: 17
  • pTALCreLiie SEQ ID NO: 18
  • Example 3 PCR of genomic DNA to screen for the presence of TALEN expression cassette.
  • Genomic DNA was extracted from leaves using DNeasy 96 Plant Kit (Qiagen), and 25 ng was used per reaction as a template for amplification.
  • the cassette specific forward (TALS F: 5 '- AAAGGCGTGTTTGATGTGAA-3 ') (SEQ ID NO:l) and reverse (TLAS R: 5 '-TCC AAGGAC AACTTTAGAAAG AAAA-3 ') (SEQ ID NO:2) primers were designed from NtHSP terminator region as shown in Fig. 7, with an expected amplicon size of 332 bp.
  • PCR was performed in the Mastercycler (Eppendorf) with Hot Start Taq DNA polymerase (New England Biolabs (NEB)) under the following conditions: 95°C for 30 s denaturation, 35 cycles at 95 °C for 30 s, 60°C for 30 s, 68°C for 1 min and final extension at 68°C for 5 min.
  • PCR products were separated by electrophoresis on 1.0% agarose gel and visualized after ethidium bromide staining as shown in Fig. 8.
  • the sugarcane lines generating 332 bp PCR product was considered as the TALEN integrated transgenic lines (Fig. 8). The number of regenerated transgenic lines and PCR screening results are summarized in Table 1.
  • nptll control Binary vector only harboring nptll expression cassette
  • TALEN induced targeted mutagenesis was screened by PCR followed by restriction enzyme digestion assay.
  • a 125 bp PCR fragment encompassing the TALEN binding and target sites was amplified using 4F (5 '-GGCTCG ACCGCCG AGGAC-3 ') (SEQ ID NO:3) and 128R (5'-TCCAGCAGGCCCAGCTCCAG-3')(SEQ ID NO:4) primer (Fig. 9).
  • PCR was performed in the Mastercycler (Eppendorf) with Phusion Hot Start II High-Fidelity DNA Polymerase (Thermo Scientific) under the following conditions: 98°C for 30 s denaturation, 33 cycles at 98 °C for 10 s, 68°C for 15 s, 72°C for 15 s and final extension at 72°C for 5 min.
  • Half of the PCR products (10 ul) was then treated with 5 units of BsaHI restriction enzyme (NEB) at 37 °C for 1 h. Restriction enzyme treated PCR products were separated by electrophoresis on 3.5% agarose gel and visualized after ethidium bromide staining as shown in Fig. 10.
  • Bsal 11 exists in the target site (spacer region) where TALEN induced double strand break and mutations are most likely to occur (Fig. 9).
  • TALEN induced double strand break and mutations are most likely to occur (Fig. 9).
  • alterations in Bsalll restriction site indicate mutation events.
  • PCR product remained undigested ( ⁇ 125 bp) in the transgenic sugarcane lines (boxes in Fig. 10) indicated the mutation events in the target region, whereas amplicons from the rest of the lines including wild type (WT) was digested by the enzyme and released 68 bp and 49 bp fragments (Fig. 10).
  • Example 5 Amplicon analysis using capillary electrophoresis to select the modified sugarcane lines.
  • Modified sugarcane lines were selected based on the size variations in the amplicon encompassing targeted mutation site. Genomic DNA was extracted from two different leaves or tillers in each of the TALEN integrated transgenic line using DNeasy 96 Plant Kit (Qiagen). PCR was carried out from each sample with 25 ng of genomic DNA and 4F and 6- FAM labelled 128R primer. PCR was performed under the same conditions mentioned above in restriction analysis and separated by Capillary electrophoresis. Capillary electrophoresis for the amplicon was performed by the service company, GENEWIZ. Size variations in the amplicon were analyzed using the Peak Scanner Software (Life Technology).
  • TALEN induced double strand break results in deletion and/or insertions (indels) at the target site in COMT gene.
  • Individual peaks generated from capillary electrophoresis correspond to DNA fragments with different sizes.
  • COMT amplicon from wild type (WT) displayed only one single peak at 125 bp (Fig. 11 A), which is an expected amplicon size, while mutant lines showed different peak patterns from WT (Fig. 11B). Lines with at least one different peak from WT were considered to be a mutant.
  • the number of mutant lines selected is shown in Table 2. Mutation rate in the TALEN integrated transgenic lines varied upon different TALEN constructs and transformation procedures, ranging from about 5% to about 77%.
  • Mutation frequency in amplicon 100 ⁇ (Total area of mutant peak / total area of mutant and wild type peak in fragment analysis). Values are medians among the lines in each column.
  • Mutation frequency in amplicon which indicates the proportion of mutants over wild type COMT alleles or copies in an individual plant, varied among individual lines.
  • the lines generated by agrobacteria transformation of pTALCELL via IE tended to have higher mutation frequency in amplicon (about 94% in median value), while those by ballistic bombardment of pTAL LIFE via DE showed lower mutation frequency (about 13% in median value) as shown in Table 2. These differences are likely a consequence of the different tissue culture procedures used with the two gene transfer methods. Direct embryogenesis was used in combination with biolistic gene transfer, while indirect embryogenesis was used in combination with agrobacterium mediated gene transfer.
  • BsaHI treated PGR products (as mentioned in restriction analysis) from mutant lines were cloned into pC l I-TOPO (Invitrogen) sequencing vector. A total of six clones were sequenced and analyzed. 5 bp to 29 bp deletions were detected in the analyzed mutant lines, confirming the TALEN induced mutagenesis in COMT gene (Fig. 14). Plants with mutated COMT genes are shown in Fig. 15A-B. Some are shown actively growing in soil and produce secondary tillers (Fig. I SA).
  • mutation frequency refers to % of de novo sequence modifications (deletions or insertions) within the 125bp PGR amplicon of the conserved sugarcane COMT region, which was targeted for cleavage by COMT-TALEN relative to WT.
  • the mutation frequency data are based on > 1 ,000 sequence reads per line (using 454 sequencing (e.g., high throughput sequencing).
  • WT original sugarcane CP 88-1762 without mutation.
  • Mutant sugarcane lines C6, C3, C7 with 75% to 92% mutation frequencies display a significant reduction of lignin (11 to 22%) but no significant reduction of stem diameter compared to WT.
  • C6 shows the best agronomic performance a measured by stem diameter (22mm) along with significant reduction in lignin (22% reduction) despite only a 75% mutation frequency. This suggests that both agronomic performance and lignin reduction are not only influenced by the frequency of the targeted mutations but also by which type of COMT copies are mutated or remain unmutated.
  • GCTGGAGCTGGGCCACGTCCATGAGGACCTTG (SEQ ID NO:21 ) was identified.
  • Both primary transgenic and vegetative progeny displayed the same type of mutations with 13 to 194 reads per mutation following 454 sequencing (i.e., high throughput screening).
  • Fig. 18 provides sequence confirmation of TALEN induced COMT mutation in line C6, which has significant reduction of lignin (22%) but no significant reduction of stem diameter compared to WT.
  • both primary transgenic and vegetative progeny displayed the same type of mutations with 3 to 97 reads per mutation following 454 sequencing.
  • Figs. 19 and 20 Identifying uniform mutation events.
  • mutation patterns were examined in the primary mutant line and its vegetative progeny using capillary electrophoresis (Figs. 19 and 20) in addition to amplicon sequencing (Figs. 17 and 18).
  • Fig. 19 provides the results for mutant lines CI 6, C6, and C14
  • Fig. 20 provides the results for mutant lines CI 7 and C7.
  • WT is the original sugarcane without mutation
  • PT, and VP indicate the primary mutant line and the vegetative progeny, respectively.
  • Identical mutations were confirmed by both sequencing and capillary electrophoresis, suggesting that the lines represent uniform mutation events.
  • Example 8 Cre/IoxP mediated site specific recombination for excision of TALEN expression cassette from the sugarcane genome.
  • a TALEN expression cassette is stably integrated into the sugarcane genome.
  • site specific recombination can be used as shown and described in Figs. 22A-22D.
  • heat inducible activation of the ere recombinase in vivo activates the excision of the entire nucleic acid sequences flanked by loxP sites from the sugarcane genome by site specific recombination.
  • Excised sequences include nptll selectable marker cassette, ere expression cassette, TALEN expression cassettes.
  • Table 4 shows the results of mutagenesis using a TALEN expression cassette and the removal of the cassette from the resultant regenerated mutant plants.

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Engineering & Computer Science (AREA)
  • Biotechnology (AREA)
  • Biomedical Technology (AREA)
  • Wood Science & Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Organic Chemistry (AREA)
  • Chemical & Material Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Zoology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Microbiology (AREA)
  • Plant Pathology (AREA)
  • Biophysics (AREA)
  • Biochemistry (AREA)
  • Cell Biology (AREA)
  • Physics & Mathematics (AREA)
  • Nutrition Science (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Breeding Of Plants And Reproduction By Means Of Culturing (AREA)

Abstract

The present invention relates to methods for reducing lignin content in and/or modifying the lignin profile of a plant, plant cell or plant tissue. The invention further provides plants, cells and/or tissues having reduced lignin content and/or modified lignin profiles as well as products produced from the plants, cells and/or tissues and uses thereof.

Description

TARGETED GENOME EDITING TO MODIFY
LIGNIN BIOSYNTHESIS AND CELL WALL COMPOSITION
STATEMENT OF PRIORITY
This application claims the benefit, under 35 U.S.C. § 1 19(e), of U.S. Provisional Application No. 61/985,070 filed April 28, 2014, the entire contents of which is incorporated by reference herein.
STATEMENT REGARDING ELECTRONIC FILING OF A SEQUENCE LISTING A Sequence Listing in ASCII text format, submitted under 37 C.F.R. § 1.821, entitled 9207-147WO_ST25.txt, 123.151 bytes in size, generated on April 27. 2015 and filed via EFS-Web, is provided in lieu of a paper copy. This Sequence Listing is hereby incorporated by reference into the specification for its disclosures
FIELD OF INVENTION
The present invention relates to methods for reducing lignin content in and/or modifying the lignin profile of a plant, plant cell or plant tissue. The invention further provides plants, cells and/or tissues having reduced lignin content and/or modified lignin profiles as well as products produced from the plants, cells and/or tissues and uses thereof.
BACKGROUND
Exploitation of non-renewable fossil fuels for energy has resulted in depletion of petroleum reserves, geopolitical tension, and climate change. As such, there is a critical need for alternative and sustainable sources of energy. A potential alternative to fossil fuels for an energy source is lignocellulosic biomass. The sugar fraction in the lignocellulosic biomass is primarily located in the secondary cell wall and can be used for the production of liquid biofuels, such as bioethanol. However, current techniques for converting lignocellulosic biomass into liquid biofuels are inefficient due in large part to the complexity of the cell wall structure and the presence of lignin therein. As such, there exists the need to develop techniques for improving the conversion of lignocellulosic biomass into liquid biofuels.
SUMMARY OF THE INVENTION
In a first aspect, a method of reducing the lignin content and/or modifying the lignin profile of a sugarcane plant, cell and/or tissue is provided, the method comprising: mutagenizing nucleic acid in a sugarcane plant, cell and/or tissue to produce a stable inactivating deletion or insertion in more than 50% of the alleles and/or copies of said sugarcane plant, cell and/or tissue that encode CoA O-methyltransferase (COMT), thereby reducing the lignin content of the sugarcane plant, cell and/or tissue and/or modifying the lignin profile of the sugarcane plant, cell and/or tissue as compared to a wild type sugarcane plant, cell and/or tissue.
In a second aspect, a genetically modified sugarcane plant, cell and/or tissue is provided, comprising: a stable, inactivating deletion or insertion in more than 50% of the alleles and/or copies of alleles encoding CoA O-methyltransferase (COMT) in said sugarcane plant, cell and/or tissue.
In a third aspect, a method of increasing the efficiency of conversion of sugarcane lignocellulosic biomass into biofuel is provided, the method comprising: providing a sugarcane or sugarcane crop of the invention; and converting the lignocellulosic biomass from said sugarcane plant and/or crop into biofuel, thereby increasing the efficiency of conversion of sugarcane lignocellulosic biomass into biofuel as compared to a wild type sugarcane plant or crop.
In a fourth aspect, a method of providing an animal feed having increased digestibility is provided, comprising: providing a sugarcane plant of the invention or crop of the invention, thereby providing a more readily digestible animal feed.
In a fifth aspect, a method of providing an animal feed having increased digestibility is provided, comprising: providing a sugarcane plant of the invention or crop of the invention; and converting the lignocellulosic biomass from said plant and/or crop into animal feed, thereby providing a more readily digestible animal feed. BRIEF DESCRIPTION OF THE DRAWINGS
Fig. 1 shows the lignin biosynthetic pathway in sugarcane.
Fig. 2 shows the mechanism behind modification of lignin biosynthesis using TALEN mediated genome editing to selectively disrupt the CoA O-methyltransferase (COMT) gene in sugarcane.
Fig. 3 shows one embodiment of a TALEN expression vector used to produce a
TALEN protein synthesized to recognize specific TALEN binding sites within the genomic sequence corresponding to the sugarcane COMT gene. The TALEN expression vector in Fig. 3 has a DNA sequence corresponding to SEQ ID NO: 14. Fig. 4 shows another embodiment of a TALEN expression vector used to produce a TALEN protein synthesized to recognize specific TALEN binding sites within the genomic sequence corresponding to the sugarcane COMT gene. The TALEN expression vector in Fig. 3 has a DNA sequence corresponding to SEQ ID NO: 15.
Fig. 5 shows one embodiment of a minimal T ALEN expression cassette with a heat inducible Cre-loxP mediated self-excision system. The minimal TALEN expression cassette of this figure has a DNA sequence corresponding to SEQ ID NO: 17.
Fig. 6 shows one embodiment of a minimal TALEN expression cassette with a heat inducible Cre-loxP mediated self-excision system. The minimal TALEN expression cassette of this figure has a DNA sequence corresponding to SEQ ID NO: 18.
Fig. 7 shows one embodiment of a section of the TALEN expression vector of Fig.3 or the minimal expression cassette of Fig. 5 and illustrates the region that a 332 base pair (bp) amplicon that is used for screening cells and plants. The figure shows that the amplified region is located in the NtHSP terminator region of the TALEN expression vector or cassette. This same 332 bp region is amplified to screen cells and plants produced using the TALEN expression vector or minimal TALEN cassette shown in Figs. 4 and 6, respectively. Primers used to amplify this region have sequences corresponding to SEQ ID NO:l and SEQ ID NO:2.
Fig. 8 shows the electrophoretic results from a PCR-based screening assay designed to amplify the 332 bp region of the TALEN expression vector or minimal TALEN cassette described and shown in Fig. 7. Presence of the 332 bp amplicon indicates presence of the
TALEN expression cassette or vector in the genomic DNA extract.
Fig. 9 shows a portion of the DNA sequence corresponding to sugarcane COMT.
Illustrated in Fig. 9 are the location of TALEN specific binding and target sites, primer locations for primers used in screening assays, and BsaHI restriction endonuclease sites within the wild-type sugarcane COMT gene sequence. +1 indicates the transcription start site of the sugarcane COMT gene.
Fig. 10 shows electrophoretic results of a PGR screening assay in which the PGR amplicon was digested with BSaHI restriction endonuclease prior to running out on an agarose gel. PGR amplicons from samples from plants with a mutated COMT gene were not cut by the BSaHI restriction endonuclease and produced an amplicon of approximately 125 bp. "WT" is used to designate lanes containing wild-type samples. "M" is used to designate the molecular weight marker. The boxes indicate amplicons that were uncut by the BSaHI restriction endonuclease. Fig. 11A-11B shows DNA fragment analysis of a sugarcane amplicon from a wild type sugarcane plant (Fig. 11 A) and a TALEN induced mutant sugarcane plant (Fig. 11B). The COMT amplicon from wild-type sugarcane displayed only one single peak at 125 bp. The COMT amplicon from TALEN induced mutated sugarcane showed different peak patterns relative to wild-type sugarcane, which suggests the presence of indels in the amplicion.
Fig. 12A-12I shows the DN A fragment analysis of wild type sugarcane plants (Figs. 12A-12C) and two tissues (Tl and T2) of TALEN induced mutant sugarcane plants
(Figs.l2D-12I). The fragment analysis in Fig. 12A-12I suggests uniform and stable TALEN induced mutation of sugarcane COMT genomic DNA.
Fig. 13A-13I shows the DNA fragment analysis of wild type sugarcane plants (Figs. 13A-13C) and two tissues (Tl and T2) of TALEN induced mutant sugarcane plants (Figs. 13D-13I). The variation between Tl and T2 in each mutant line suggests progressing mutagenesis or chimeric events.
Fig. 14 shows DNA sequence analysis of COMT of wild type (WT) and TALEN induced mutant (M) sugarcane plants.
Fig. 15A t l SC shows sugarcane plants with TALEN induced mutations. The sugarcane plant in Fig. 15A is shown actively growing in soil and producing secondary tillers.
Fig. 16A to 16C shows COMT mutants with lignin reduction of more than 20% displayed brown coloration in internodes and mid-rib (Fig. 16A and 16B). Fig. 16C shows that the growth performance of sugarcane mutant lines with up to 22% reduction in lignin did not differ from wild-type under greenhouse conditions.
Fig. 17 provides sequence confirmation of TALEN induced COMT mutation in line C I 6 with both significant reduction of lignin and stem diameter compared to WT. COMTa and COMTb are two confirmed COMT homo(eo)logs with a SNP indicated with a red arrow. TALEN binding site is indicted in the boxes. Read length and number of reads with the specific sequences are shown on the right.
Fig. 18 provides sequence confirmation of TALEN induced COM Γ mutation in line C6 with significant reduction of lignin (22%) but not a significant reduction of stem diameter when compared to WT. COMTa and COMTb are two confirmed COMT homo(eo)logs with a SNP indicated with a red arrow. TALEN binding site indicted with blue boxes. Read length is shown on the right. Fig. 19 shows mutant lines (CI 6, C6, CI 4) representing uniform TALEN induced mutation events. WT: original sugarcane without mutation, PT, VP indicate primary mutant line and vegetative progeny, respectively.
Fig. 20 shows mutant lines (CI 7, C7) representing uniform TALEN induced mutation events. WT: original sugarcane without mutation, PT, VP indicate primary mutant line and vegetative progeny, respectively.
Figs. 21 A-21 I) show Cre/loxP mediated site specific recombination for excision of TALEN expression cassette from the sugarcane genome. Fig. 21 A shows the TALEN Cre expression cassette ( 12983 bp). Fig. 21B outlines the method treatments. Fig. 21C shows the excision (474 bp) and Fig. 21D shows the confirmation of excision sites. LoxP: site for site specific recombination driven by the ere recombinase, 35S poly A: 3'UT , NPTII: selectable marker neomycin phosphotransferase, GmHSP promoter: Heat inducible promoter o the glycine max heat shock protein, Cre: ere recombinase for site specific recombination between the lox P sites, Nos: 3 ' UTR from nopaline synthase, AlHSP terminator: 3 ' UTR from Arabidopsis thaliana heat shock protein, Right TALEN: TALEN arm targeted to conserved region of the sugarcane COMT gene, YLCV promoter: Constitutive promoter of the yellow leaf curl virus, Left TALEN: TALEN arm targeted to conserved region of the sugarcane COMT gene, NtHSP terminator: 3 ' UTR of Nicotiana Tobacco, Lox P: site for site specific recombination driven by the cre recombinase.
DETAILED DESCRIPTION
Before the present disclosure is described in greater detail, it is to be understood that this disclosure is not limited to particular embodiments described, and as such may, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only, and is not intended to be limiting.
Unless the context indicates otherwise, it is specifically intended that the various features of the invention described herein can be used in any combination. For example, features described in relation to one embodiment may also be applicable to and combinable with other embodiments and aspects of the invention. Moreover, the present invention also contemplates that in some embodiments of the invention, any feature or combination of features set forth herein can be excluded or omitted. To illustrate, if the specification states that a complex comprises components A, B and C, it is specifically intended that any of A, B or C, or a combination thereof, can be omitted and disclaimed singularly or in any combination.
Where a range of values is provided, it is understood that each intervening value, to the tenth of the unit of the lower limit unless the context clearly dictates otherwise, between the upper and lower limit of that range and any other stated or intervening value in that stated range, is encompassed within the disclosure. The upper and lower limits of these smaller ranges may independently be included in the smaller ranges and are also encompassed within the disclosure, subject to any specifically excluded limit in the stated range. Where the stated range includes one or both of the limits, ranges excluding either or both of those included limits are also included in the disclosure.
It will be understood that, although the terms "first," "second," etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. Thus, a "first" element (e.g., a first promoter sequence) as described herein could also be termed a "second" element (e.g., a second promoter sequence) without departing from the teachings of the present invention.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. Although any methods and materials similar or equivalent to those described herein can also be used in the practice or testing of the present disclosure, the preferred methods and materials are now described. All patents, patent applications and publications referred to herein are incorporated by reference in their entirety. In case of a conflict in terminology, the present specification is controlling.
As used in the description of the embodiments of the invention and the appended claims, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. Also, as used herein, "and/or" refers to and encompasses any and all possible combinations of one or more of the associated listed items.
Furthermore, the term "about," as used herein when referring to a measurable value such as an amount of a compound, dose, time, temperature, and the like, is meant to encompass variations of 20%, 10%, 5%, 1%, 0.5%, or even 0.1% of the specified amount.
It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. The term "consists essentially of (and grammatical variants), as applied to the compositions of this invention, means the composition can contain additional components as long as the additional components do not materially alter the composition. Thus, the term "consisting essentially of when used in a claim of this invention is not intended to be interpreted to be equivalent to "comprising."
As used herein, phrases such as "between X and Y" and "between about X and Y" should be interpreted to include X and Y. As used herein, phrases such as "between about X and Y" mean "between about X and about Y" and phrases such as "from about X to Y" mean "from about X to about Y."
The term "decrease," "inhibit" or "reduce" or grammatical variations thereof as used herein refers to a decrease or diminishment in the specified level or activity of at least about 15%, 25%, 35%, 40%, 50%, 60%, 75%, 80%, 90%, 95% or more. In particular embodiments, the inhibition or reduction results in little or essentially no detectible activity (at most, an insignificant amount, e.g., less than about 10% or even 5%). Thus, for example, decreasing or reducing the amount of lignin in a plant, plant cell or plant tissue means decreasing the level of lignin by about 3% to about 30% as compared to the level of lignin in a control plant, plant cell or plant tissue. In some embodiments, decreasing or reducing the level of COMT activity in a plant, plant cell or plant tissue means decreasing the level of COMT activity by about 50% to about 100% (e.g., about 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100%, or any range or value therein) as compared to the level of COMT activity in a control plant, plant cell or plant tissue.
As used herein, the terms "increase," "increases," "increased," "increasing," and similar terms indicate an elevation of at least about 25%, 50%, 75%, 100%, 150%, 200%, 300%, 400%, 500% or more.
As used herein, the terms "modulating," "modulate," "modulates" or grammatical variations thereof, means an alteration in, for example, the lignin profie of a plant, plant cell or plant tissue as described herein. "Modulating," "modulate," "modulates" or grammatical variations thereof, can also refer to the expression of a target gene or target polynucleotide by increasing or reducing the expression of said target polynucleotide or target gene.
As used herein, "differentially expressed," refers to the differential production of RNA, including but not limited to mRNA, tRNA, miRNA, siR A, snRNA, and piRNA transcribed from a gene or regulatory region of a genome, or the protein product encoded by a gene as compared to the level of production of RNA or protein by the same gene or regulator region in a normal or a control cell. In another context, "differentially expressed," also refers to nucleotide sequences or proteins in a cell or tissue which have different temporal and/or spatial expression profiles as compared to a normal or control cell and/or tissue.
As used herein, "overexpressed" or "overexpression" refers to an increased expression level of an RNA or protein product encoded by a gene as compared to the level of expression of the RNA or protein product in a normal or control plant, plant cell and/or plant tissue.
As used herein, "underexpressed" or "underexpression" refers to decreased expression level of an RNA or protein product encoded by a gene as compared to the level of expression of the RNA or protein product in a normal or control plant, plant cell and/or plant tissue.
As used herein, "express," "expresses," "expressed" or "expression" refers to the process by which polynucleotides (e.g., RNA or DNA) are transcribed into RNA transcripts and, optionally, translated into peptides, polypeptides, or proteins. Thus, a nucleic acid molecule and/or a nucleotide sequence may express a polypeptide of interest or, for example, a functional untranslated RNA.
In some embodiments, the recombinant nucleic acid molecules, and/or nucleotide sequences of the invention are "isolated." As used herein, "isolated" means separated from constituents, cellular and otherwise, in which the polynucleotide, peptide, polypeptide, protein, antibody, or fragments thereof, are normally associated with in nature. Thus, an "isolated" nucleic acid molecule, an "isolated" nucleotide sequence or an "isolated" polypeptide is a nucleic acid molecule, nucleotide sequence or polypeptide that, by the hand of man, exists apart from its native environment and is therefore not a product of nature (i.e., non-naturally occurring). An isolated nucleic acid molecule, nucleotide sequence or polypeptide may exist in a purified form that is at least partially separated from at least some f the other components of the naturally occurring organism or virus, for example, the cell or viral structural components or other polypeptides or nucleic acids commonly found associated with the polynucleotide. In representative embodiments, the isolated nucleic acid molecule, the isolated nucleotide sequence and/or the isolated polypeptide is at least about 1%, 5%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 95%, or more pure. As used herein, "purified" is used in reference to a nucleic acid sequence, peptide, or polypeptide or other compound that has increased purity relative to the natural environment. A non-naturally occurring polynucleotide, peptide, polypeptide, protein, antibody, or fragments thereof, does not require "isolation" to distinguish it from its naturally occurring counterpart. In other embodiments, an isolated nucleic acid molecule, nucleotide sequence or polypeptide may exist in a non-native environment such as, for example, a recombinant host cell. Thus, for example, with respect to nucleotide sequences, the term "isolated" means that it is separated from the chromosome and/or cell in which it naturally occurs. A polynucleotide is also isolated if it is separated from the chromosome and/or cell in which it naturally occurs in and is then inserted into a genetic context, a chromosome and/or a cell in which it does not naturally occur (e.g., a different host cell, different regulatory sequences, and/or different position in the genome than as found in nature). Accordingly, the recombinant nucleic acid molecules, nucleotide sequences and their encoded functional nucleic acids or polypeptides are "isolated" in that, by the hand of man, they exist apart from their native environment and therefore are not products of nature, however, in some embodiments, they can be introduced into and exist in a recombinant host cell.
A "native" or "wild type" nucleic acid, nucleotide sequence, polypeptide or amino acid sequence refers to a naturally occurring or endogenous nucleic acid, nucleotide sequence, polypeptide or amino acid sequence. Thus, for example, a "wild type nucleic acid" or a "wild type protein" is a nucleic acid or protein that is naturally occurring in or endogenous to the organism. A "homologous" nucleic acid sequence is a nucleotide sequence naturally associated with a host cell into which it is introduced.
Also as used herein, the terms "nucleic acid," "nucleic acid molecule," "nucleotide sequence" and "polynucleotide" can be used interchangeably and encompass both RNA and DNA, including cDNA, genomic DNA, mRNA, synthetic (e.g., chemically synthesized) DNA or RNA and chimeras of RNA and DNA. The term polynucleotide, nucleotide sequence, or nucleic acid refers to a chain of nucleotides without regard to length of the chain. The nucleic acid can be double-stranded or single-stranded. Where single-stranded, the nucleic acid can be a sense strand or an antisense strand. The nucleic acid can be synthesized using oligonucleotide analogs or derivatives (e.g., inosine or phosphorothioate nucleotides). Such oligonucleotides can be used, for example, to prepare nucleic acids that have altered base-pairing abilities or increased resistance to nucleases. The present invention further provides a nucleic acid that is the complement (which can be either a full complement or a partial complement) of a nucleic acid, nucleotide sequence, or polynucleotide of this invention.
Nucleic acid molecules and/or nucleotide sequences provided herein are presented herein in the 5' to 3' direction, from left to right and are represented using the standard code for representing the nucleotide characters as set forth in the U.S. sequence rules, 37 CFR §§1.821 - 1.825 and the World Intellectual Property Organization (WIPO) Standard ST.25.
As used herein, "concentrated" refers to a molecule, including but not limited to a polynucleotide, peptide, polypeptide, protein, antibody, or fragments thereof, that is distinguishable from its naturally occurring counterpart in that the concentration or number of molecules per volume is greater than that of its naturally occurring counterpart.
As used herein, "diluted" refers to a molecule, including but not limited to a polynucleotide, peptide, polypeptide, protein, antibody, or fragments thereof, that is distinguishable from its naturally occurring counterpart in that the concentration or number o molecules per volume is less than that of its naturally occurring counterpart.
As used herein, "separated" refers to the state of being physically divided from the original source or population such that the separated compound, agent, particle, or molecule can no longer be considered part of the original source or population.
As used herein, "operative])' associated with." "operatively linked to." or "operably linked to," when referring to a first nucleic acid sequence that is operatively linked to a second nucleic acid sequence, means a situation when the first nucleic acid sequence is placed in a functional relationship with the second nucleic acid sequence. For instance, a promoter and/or other transcriptional control elements (e.g., enhancers, termination elements and the like) are operatively linked to a coding sequence if the promoter and/or other transcriptional control elements effect the transcription or expression of the coding sequence.
A DNA "promoter" is an untranslated DNA sequence upstream of a coding region that contains the binding site for RNA polymerase and initiates transcription of the DNA. Thus, as used herein, "promoter" includes any sequence capable of driving transcription of a coding sequence. In particular, the term "promoter" as used herein refers to a DNA sequence generally described as the 5' regulator region of a gene, located proximal to the start codon. The transcription of an adjacent coding sequence(s) is initiated at the promoter region. The term "promoter" also includes fragments of a promoter that arc functional in initiating/driving transcription of the gene. A "promoter region" can also include other elements that act as regulators of gene expression. Promoters can include, for example, constitutive, inducible, temporally regulated, developmentally regulated, chemically regulated, tissue-preferred and tissue-specific promoters for use in the preparation of recombinant nucleic acid molecules, i.e.. chimeric genes. In particular aspects, a "promoter" useful with the invention is a promoter capable of initiating transcription of a nucleotide sequence in a cell of a plant. As used herein, "plasmid" as used herein refers to a non-chromosomal double- stranded DNA sequence including an intact "replicon" such that the plasmid is replicated in a host cell.
As used herein "sequence identity" refers to the extent to which two optimally aligned polynucleotide or peptide sequences are invariant throughout a window of alignment of components, e.g. , nucleotides or amino acids. "Identity" can be readily calculated by known methods including, but not limited to, those described in: Computational Molecular Biology (Lesk, A. M., ed.) Oxford University Press, New York (1988); Biocomputing: Informatics and Genome Projects (Smith, D. W., ed.) Academic Press, New York (1993); Computer Analysis of Sequence Data, Part I (Griffin, A. M., and Griffin, H. G., eds.) Humana Press, New Jersey (1994); Sequence Analysis in Molecular Biology (von Heinje, G., ed.) Academic Press (1987); and Sequence Analysis Primer (Gribskov, M. and Devereux, J., eds.) Stockton Press, New York ( 1991 ).
"Identity" or "percent identity" refers to the degree of similarity between two nucleic acid or amino acid sequences. For sequence comparison, typically one sequence acts as a reference sequence to which test sequences are compared. When using a sequence comparison algorithm, test and reference sequences are input into a computer, subsequence coordinates are designated if necessary, and sequence algorithm program parameters are designated. The sequence comparison algorithm then calculates the percent sequence identity for the test sequence(s) relative to the reference sequence, based on the designated program parameters. The percent identity between two sequences can be determined by using analysis software (e.g., Sequence Analysis Software Package of the Genetics Computer Group, Madison Wis.) that incorporates the Needelman and Wunsch, (J. Mol. Biol., 1970, 48: 443- 453,) algorithm (e.g., NBLAST, and XBLAST). The default parameters are used to determine the identity for the polypeptides of the present disclosure.
The phrase "substantially identical," in the context of two nucleic acids or two amino acid sequences, refers to two or more sequences or subsequences that have at least about 50% nucleotide or amino acid residue identity when compared and aligned for maximum correspondence as measured using one of the following sequence comparison algorithms or by visual inspection. In certain embodiments, substantially identical sequences have at least about 60%, or at least about 70%, or at least about 80%, or even at least about 90% or 95% nucleotide or amino acid residue identity. In certain embodiments, substantial identity exists over a region of the sequences that is at least about 50 residues in length, or over a region of at least about 100 residues, or the sequences are substantially identical over at least about 150 residues. In further embodiments, the sequences are substantially identical when they are identical over the entire length of the coding regions.
The term "homology" in the context of the invention refers to the level of similarity between nucleic acid or amino acid sequences in terms of nucleotide or amino acid identity or similarity, respectively, i.e., sequence similarity or identity. Homology, homologue, and homologous also refers to the concept of similar functional properties among different nucleic acids or proteins. Homologues include genes that are orthologous and paralogous. Homologues can be determined by using the coding sequence for a gene, disclosed herein or found in appropriate database (such as that at NCBI or others) in one or more of the following ways. For an amino acid sequence, the sequences should be compared using algorithms (for instance see section on "identity" and "substantial identity"). For nucleotide sequences the sequence of one DNA molecule can be compared to the sequence of a known or putative homologue in much the same way. Homologues are at least 20% identical, or at least 30% identical, or at least 40% identical, or at least 50% identical, or at least 60% identical, or at least 70% identical, or at least 80% identical, or at least 88% identical, or at least 90% identical, or at least 92% identical, or at least 95% identical, across any substantial region of the molecule (DNA, RNA. or protein molecule).
Optimal alignment of sequences for comparison can be conducted, e.g., by the local homology algorithm of Smith & Waterman, Adv. Appl. Math. 2: 482 (1981), by the homology alignment algorithm of Needleman & Wunsch, J Mol. Biol. 48: 443 (1970), by the search for similarity method of Pearson & Lipman, Proc. Nat'l. Acad. Sci. USA 85: 2444 (1988), by computerized implementations of these algorithms (GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics Computer Group, 575 Science Dr., Madison, WI), or by visual inspection (see generally, Ausubel et al, infra).
One example of an algorithm that is suitable for determining percent sequence identity and sequence similarity is the BLAST algorithm, which is described in Altschul et al, J. Mol. Biol. 215: 403-410 (1990). Software for performing BLAST analyses is publicly available through the National Center for Biotechnology Information ( http://www.ncbi . n 1 m . n i h . ov/) . This algorithm involves first identifying high scoring sequence pairs (HSPs) by identifying short words of length W in the query sequence, which either match or satisfy some positive-valued threshold score T when aligned with a word of the same length in a database sequence. T is referred to as the neighborhood word score threshold (Altschul et al , 1990). These initial neighborhood word hits act as seeds for initiating searches to find longer HSPs containing them. The word hits are then extended in both directions along each sequence for as far as the cumulative alignment score can be increased. Cumulative scores are calculated using, for nucleotide sequences, the parameters M (reward score for a pair of matching residues; always > 0) and N (penalty score for mismatching residues; always < 0). For amino acid sequences, a scoring matrix is used to calculate the cumulative score. Extension of the word hits in each direction are halted when the cumulative alignment score falls off by the quantity X from its maximum achieved value, the cumulative score goes to zero or below due to the accumulation of one or more negative-scoring residue alignments, or the end of either sequence is reached. The BLAST algorithm parameters W, T, and X determine the sensitivity and speed of the alignment. The BLASTN program (for nucleotide sequences) uses as defaults a wordlength (W) of 11 , an expectation (E) of 10, a cutoff of 100, M=5, N=-4, and a comparison of both strands. For amino acid sequences, the BLASTP program uses as defaults a wordlength (W) of 3, an expectation (E) of 10, and the BLOSUM62 scoring matrix (see I lenikoff & HenikolT, Proc. Natl. Acad. Sci. USA 89: 10915 (1989)).
In addition to calculating percent sequence identity, the BLAST algorithm also performs a statistical analysis of the similarity between two sequences (see, e.g., Karlin & Altschul, Proc. Nat'l. Acad. Sci. USA 90: 5873-5787 (1993)). One measure of similarity provided by the BLAST algorithm is the smallest sum probability (P(N)), which provides an indication of the probability by which a match between two nucleotide or amino acid sequences would occur by chance. For example, a test nucleic acid sequence is considered similar to a reference sequence if the smallest sum probability in a comparison of the test nucleic acid sequence to the reference nucleic acid sequence is less than about 0.1, more preferably less than about 0.01, and most preferably less than about 0.001.
Another widely used and accepted computer program for performing sequence alignments is CLUSTALW vl .6 (Thompson, et al. Nuc. Acids Res., 22: 4673-4680, 1994). The number of matching bases or amino acids is divided by the total number of bases or amino acids, and multiplied by 100 to obtain a percent identity. For example, if two 580 base pair sequences had 145 matched bases, they would be 25 percent identical. If the two compared sequences are of different lengths, the number of matches is divided by the shorter of the two lengths. For example, if there were 100 matched amino acids between a 200 and a 400 amino acid protein, they are 50 percent identical with respect to the shorter sequence. If the shorter sequence is less than 150 bases or 50 amino acids in length, the number of matches are divided by 150 (for nucleic acid bases) or 50 (for amino acids), and multiplied by 100 to obtain a percent identity. In some embodiments, two nucleotide sequences can also be considered to be substantially complementary when the two sequences hybridize to each other under stringent conditions. In some representative embodiments, two nucleotide sequences considered to be substantially complementary hybridize to each other under highly stringent conditions.
As used herein, the term "substantially complementary" (and similar terms) means that two nucleic acid sequences are at least about 70%, 75%, 80%, 85%, 90%, 95%, 96%, 97%. 98%, 99% or more complementary. Alternatively, the term "substantially complementary" (and similar terms) can mean that two nucleic acid sequences can hybridize together under high stringency conditions (as described herein). Thus, in some embodiments, "substantially complementary" means about 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% complementary, or any value or range therein).
The terms "stringent conditions" or "stringent hybridization conditions" include reference to conditions under which a nucleic acid will selectively hybridize to a target sequence to a detectably greater degree than other sequences (e.g., at least 2-fold over a non- target sequence), and optionally may substantially exclude binding to non-target sequences. Stringent conditions are sequence-dependent and will vary under different circumstances. By controlling the stringency of the hybridization and/or washing conditions, target sequences can be identified that can be up to 100% complementary to the reference nucleotide sequence. Alternatively, conditions of moderate or even low stringency can be used to allow some mismatching in sequences so that lower degrees of sequence similarity are detected. For example, those skilled in the art will appreciate that to function as a primer or probe, a nucleic acid sequence only needs to be sufficiently complementary to the target sequence to substantially bind thereto so as to form a stable double-stranded structure under the conditions employed. Thus, primers or probes can be used under conditions of high, moderate or even low stringency. Likewise, conditions of low or moderate stringency can be advantageous to detect homolog, ortholog and/or paralog sequences having lower degrees of sequence identity than would be identified under highly stringent conditions.
For DNA-DNA hybrids, the Tm can be approximated from the equation of Meinkoth and Wahl, Anal. Biochem., 138:267-84 (1984): Tm = 81.5°C+16.6 (log M)+0.41 (% GC)-0.61 (% formamide)-500/L; where M is the molarity of monovalent cations, % GC is the percentage of guanosine and cytosine nucleotides in the DNA, % formamide is the percentage of formamide in the hybridization solution, and L is the length of the hybrid in base pairs. The Tm is the temperature (under defined ionic strength and pH) at which 50% of a complementary target sequence hybridizes to a perfectly matched probe. Tm is reduced by about 1 °C for each 1% of mismatching; thus, Tm, hybridization and/or wash conditions can be adjusted to hybridize to sequences of the desired degree of identity. For example, if sequences with >90% identity are sought, the Tm can be decreased 10°C. Generally, stringent conditions are selected to be about 5°C lower than the thermal melting point (Tm) for the specific sequence and its complement at a defined ionic strength and pH. However, highly stringent conditions can utilize a hybridization and/or wash at the thermal melting point (Tm) or 1 , 2, 3 or 4°C lower than the thermal melting point (Tm); moderately stringent conditions can utilize a hybridization and/or wash at 6, 7, 8, 9 or 10°C lower than the thermal melting point (Tm); low stringency conditions can utilize a hybridization and/or wash at 1 1, 12. 13. 14, 15 or 20°C lower than the thermal melting point (Tm). If the desired degree of mismatching results in a Tm of less than 45 °C (aqueous solution) or 32"C (formamide solution), optionally the SSC concentration can be increased so that a higher temperature can be used. An extensive guide to the hybridization of nucleic acids is found in Tijssen, Laboratory Techniques in Biochemistry and Molecular Biology-l lybridization with Nucleic Acid Probes, part I, chapter 2, "Overview of principles of hybridization and the strategy of nucleic acid probe assays," Elsevier, New York (1993); Current Protocols in Molecular Biology, chapter 2, Ausubel, et al., eds, Greene Publishing and Wiley-Interscience, New York (1995); and Green & Sambrook, In: Molecular Cloning, A Laboratory Manual, 4th Edition, Cold Spring Harbor Press. Cold Spring Harbor, N.Y. (2012).
Typically, stringent conditions are those in which the salt concentration is less than about 1.5 M Na ion, typically about 0.01 to 1.0 M Na ion concentration (or other salts) at about pH 7.0 to pH 8.3 and the temperature is at least about 30' C for short probes (e.g. , 10 to 50 nucleotides) and at least about 60°C for longer probes (e.g., greater than 50 nucleotides). Stringent conditions may also be achieved with the addition of destabilizing agents such as formamide or Denhardt's (5 g Ficoll, 5 g polyvinylpyrrolidone, 5 g bovine serum albumin in 500 ml of water). Exemplary low stringency conditions include hybridization with a buffer solution of 30% to 35% formamide. 1 M NaCl, 1% SDS (sodium dodecyl sulfate) at 37°C and a wash in IX to 2X SSC (20X SSC = 3.0 M NaCl/0.3 M trisodium citrate) at 50°C to 55°C. Exemplary moderate stringency conditions include hybridization in 40% to 45% formamide. 1 M NaCl, 1% SDS at 37" C and a wash in 0.5X to I X SSC at 55°C to 60°C. Exemplary high stringency conditions include hybridization in 50% formamide, 1 M NaCl, 1 % SDS at 37"C and a wash in 0.1X SSC at 60 °C to 65°C. A further non-limiting example of high stringency conditions include hybridization in 4X SSC, 5X Denhardt's, 0.1 mg/ml boiled salmon sperm DNA, and 25 mM Na phosphate at 65"C and a wash in 0.1X SSC, 0.1% SDS at 65 °C. Another illustration of high stringency hybridization conditions includes hybridization in 7% SDS, 0.5 M NaP04, 1 mM EDTA at 50°C with washing in 2X SSC, 0.1 % SDS at 50°C, alternatively with washing in IX SSC, 0.1% SDS at 50°C, alternatively with washing in 0.5X SSC, 0.1% SDS at 50°C, or alternatively with washing in 0.1 X SSC, 0.1% SDS at 50°C, or even with washing in 0.1X SSC, 0.1% SDS at 65°C. Those skilled in the art will appreciate that specificity is typically a function of post-hybridization washes, the relevant factors being the ionic strength and temperature of the final wash solution.
Nucleic acids that do not hybridize to each other under stringent conditions are still substantially identical if the proteins that they encode are substantially identical (e.g., due to the degeneracy of the genetic code).
As used herein, "polypeptides" or "proteins" are amino acid residue sequences. Those sequences are written left to right in the direction from the amino to the carboxy terminus. In accordance with standard nomenclature, amino acid residue sequences are denominated by either a three letter or a single letter code as indicated as follows: Alanine (Ala, A), Arginine (Arg, R), Asparagine (Asn, N), Aspartic Acid (Asp, D), Cysteine (Cys, C), Glutamine (Gin, Q). Glutamic Acid (Glu, E), Glycine (Gly, G), Histidine (His, H), Isoleiicine ( lie, I), Leucine (Leu, L), Lysine (Lys, K). Methionine (Met, M), Phenylalanine (Phe, F). Proline (Pro, P), Serine (Ser, S), Threonine (Thr, T), Tryptophan (Trp, W), Tyrosine (Tyr, Y), and Valine (Val, V).
As used herein, the terms "transformation," "transfection," and "transduction" refer to the introduction of an exogenous/heterologous nucleic acid (RNA and/or DNA) into a host cell. A cell has been "transformed," "transfected" or "transduced" with an exogenous/heterologous nucleic acid when such nucleic acid has been introduced or delivered into the cell.
As used herein with respect to plants and plant parts, the term "transgenic" refers to a plant, plant part or plant cell that comprises one or more exogenous nucleic acids. Generally, the exogenous nucleic acid is stably integrated within the genome such that the polynucleotide is passed on to successive generations. The exogenous nucleic acid may be integrated into the genome alone or as part of a recombinant expression cassette. "Transgenic" may be used to designate any plant, plant part or plant cell the genotype of which has been altered by the presence of an exogenous nucleic acid, including those transgenics initially so altered and those created by sexual crosses or asexual propagation from the initial transgenic. As used herein, the term "transgenic" does not encompass the alteration of the genome (chromosomal or extra-chromosomal) by conventional plant breeding methods or by naturally occurring events such as random cross-fertilization, non- recombinant viral infection, non-recombinant bacterial transformation, non-recombinant transposition or spontaneous mutation. Additionally, the term "transgenic" does not encompass plants, plant cells, or plant tissues comprising only a TALEN induced mutation and not the TALEN nucleic acid construct (e.g., mutant plants, plant cells, or plant tissues that were transiently transformed or in which the TALEN nucleic acid is removed from the genome following mutagenesis).
"Introducing," in the context o a polynucleotide, means presenting the polynucleotide to the plant, plant part, and/or plant cell in such a manner that the polynucleotide gains access to the interior of a cell. Where more than one polynucleotide is to be introduced these polynucleotides can be assembled as part of a single polynucleotide or nucleic acid construct, or as separate polynucleotides or nucleic acid constructs, and can be located on the same or different transformation vectors. Accordingly, these polynucleotides can be introduced into plant cells in a single transformation event, in separate transformation events, or, e.g., as part of a breeding protocol. Thus, for example, "introducing" can encompass transformation of an ancestor plant with a nucleotide sequence of interest followed by conventional breeding process to produce progeny comprising said nucleotide sequence of interest.
Transformation of a cell may be stable or transient. "Transient transformation" in the context of a polynucleotide means that a polynucleotide is introduced into the cell and does not integrate into the genome of the cell. "Stable transformation" or "stably transformed," "stably introducing," or "stably introduced" as used herein means that a polynucleotide is introduced into a cell and integrates into the genome of the cell. As such, the integrated polynucleotide is capable of being inherited by the progeny thereof, more particularly, by the progeny of multiple successive generations. "Genome" as used herein also includes the nuclear and the plastid genome, and therefore includes integration of the polynucleotide into, for example, the chloroplast genome. Stable transformation as used herein can also refer to a transgene that is maintained extrachromasomally, for example, as a minichromosome. In some embodiments, a nucleic construct for targeted mutagenesis may be transiently transformed introduced into a plant, plant cell or plant tissue. In other embodiments, a nucleic construct for targeted mutagenesis may be stably transformed/introduced into a plant, plant cell or plant tissue. In further embodiments, a nucleic construct for targeted mutagenesis may be stably transformed into a plant, plant cell or plant tissue and later deleted from the genome of the plant, plant cell or plant tissue. Any method for removal of a integrated nucleic acid construct can be used to remove a TALEN nucleic acid construct (e.g. expression cassette or vector). For example, a stably integrated nucleic acid construct may be removed using the well known technique of site specific recombination system (e.g., cre/lox, flp/frt) (see, e.g., Example 8). Alternatively, segregation after sexual reproduction/crossing may be used. Crossing to remove integrated TALEN nucleic acid constructs may be particularly useful when developing new germplasm and the mutations are to be introgressed into new germplasm.
Transient transformation may be detected by, for example, an enzyme-linked immunosorbent assay (ELISA) or Western blot, which can detect the presence of a peptide or polypeptide encoded by one or more transgene introduced into an organism. Stable transformation of a cell can be detected by, for example, a Southern blot hybridization assay of genomic DNA of the cell with nucleic acid sequences which specifically hybridize with a nucleotide sequence of a transgene introduced into an organism (e.g., a plant). Stable transformation of a cell can be detected by, for example, a Northern blot hybridization assay of RNA of the cell with nucleic acid sequences which specifically hybridize with a nucleotide sequence of a transgene introduced into a plant or other organism or by quantitative reverse transcription and polymerase chain reaction (qRT-PCR). Stable transformation of a cell can also be detected by, e.g., a polymerase chain reaction (PCR) or other amplification reactions as are well known in the art, employing specific primer sequences that hybridize with target sequence(s) of a transgene, resulting in amplification of the transgene sequence, which can be detected according to standard methods Transformation can also be detected by direct sequencing and/or hybridization protocols well known in the art.
As used herein, a "transgene" refers to a nucleic acid which is used to transform a cell of an organism, such as a bacterium or a plant.
As used herein, "transgenic" refers to a cell, tissue, or organism that contains a transgene.
As used herein, the term "heterologous" refers to a nucleic acid or polypeptide that originates from a foreign species, or, if from the same species, is substantially modified from its native form in composition and/or genomic locus by deliberate human intervention.
As used herein, the term "recombinant" generally refers to a non-natural ly occurring nucleic acid, nucleic acid construct, or polypeptide. Such non-naturally occurring nucleic acids may include natural nucleic acids that have been modified, for example that have deletions, substitutions, inversions, insertions, etc., and/or combinations of nucleic acid sequences of different origin that are joined using molecular biology technologies (e.g., a nucleic acid sequences encoding a "fusion protein"). Recombinant also refers to the polypeptide encoded by the recombinant nucleic acid. Non-naturally occurring nucleic acids or polypeptides include nucleic acids and polypeptides modified by man.
As used herein, the term "targeted mutagenesis" refers to mutagenesis procedures that alter a specific or targeted gene in vivo and produce a change in the genetic structure directed at a specific site (e.g., a COMT gene) on the chromosome. Thus, targeted mutagenesis is distinguished from naturally occurring mutations and from radiation or chemical mutagenesis which are non-specific methods of generating mutations with the vast majority of events carrying random mutations in off-target sites. Targeted mutagenesis is also distinguished from site-directed mutagenesis in which analogs of nucleotides and other chemicals are used to generate localized point mutations. An example of targeted mutagenesis as used herein is the art-known technique using TALEN. TALEN utilizes a chimeric nuclease comprising programmable, sequence-specific DNA-binding modules linked to a nonspecific DNA cleavage domain (See, e.g., Gaj et al., (2013) Trends Biotech. 31 :397-405). Targeted knockout mutations can be similarly achieved with CRISPR/CAS, zinc finger nuclease, meganuclease technology or in vivo site specific mutagenesis using oligonucleotides. Notably, random, chemical, radiation or natural mutational events would not be able to generate the COMT mutated plants, cells or tissues of this invention due to the high level of redundancy in the highly polyploidy sugarcane genome.
In some embodiments, targeted mutagenesis comprising TALEN methods can be carried out using nucleic acid constructs introduced using any method for transforming or transfecting a plant, plant cell or plant tissue. In other embodiments, targeted mutagenesis comprising TALEN methods can be carried out using polypeptide sequences, which can be introduced into a plant, plant cell or plant tissue using, for example, biolistic gene transfer using any type of particle (for example, a nano particle that can deliver a functional protein). Thus, for example, two amino acid sequences may be co-delivered including two chimeric nuclease monomers which are composed of programmable, sequence-specific DNA-binding modules linked to a nonspecific DNA cleavage domain.
As known in the art, a TALEN target site can be selected in any exon (in some embodiments, the target site may be in the first exon) of the target gene (e.g., COMT GenBank accession no. AJ231 133) using software, for example, TALEN™ Hit software (Cellectis plant sciences, New Brighton, MN). Design principles for TALEN are well known in the art (see, e.g., Cremak et al. Nucleic Acids Res. 2011; 39:e82. doi: 10.1093/nar/gkr218). For co-mutation of multiple COMT alleles and/or copies of COMT alleles a single TALEN pair is preferred, targeting an exon sequence (e.g., target polynucleotide) that is conserved between the targeted alleles and/or copies. For mutation of specific COMT alleles a single TALEN pair for each of the targeted alleles is preferred, targeting, for example, a non- conserved and allele specific exon sequence (e.g., target polynucleotide). Each TALEN target sequence (conserved or specific) includes appropriate binding sites for each of the two TALEN monomers and a spacer region between them. As known in the art, different TAL effector/TALEN construction methods use different portions of the TAL effector protein flanking the repeat region. These different "architectures" have different optimal ranges for spacer lengths and number of repeats (see, e.g., Schmid-Burgk et al. Nature Biotechnology 31 :76-81 (2013); Kim et al. Nature Reviews Genetics 15:321-334 (2014)). In some embodiments, the spacer length can be about 13 nucleotides to about 17 nucleotides (e.g., about 13, 14, 15, 16, 17 nucleotides, or any range or value therein).
The repeat regions define the TALEN binding site in each of the two TALEN monomers. Cleavage by the Fokl nuclease domains occurs in the 'spacer' sequence that lies between the two regions of the DNA bound by the two TALEN monomers (see, for example, Joung & Sander Nature Reviews Molecular Cell Biology 14, 49-55 (January 2013)). The DNA binding domain of the TALEN contains a repeated highly conserved 33-34 amino acid sequence with the exception of the 12th and 13th amino acids. These two locations (12th, 13th residues) are highly variable (Repeat Variable Diresidue) and show a strong correlation with specific nucleotide recognition (See, Boch et al. (December 2009) Science 326 (5959): 1509-12, doi: 10.1 126/science.117881 1 ; Moscou et al. (December 2009) Science 326 (5959): 1501. doi: 10.1126/science.1 178817). This simple relationship between amino acid sequence and DNA recognition has allowed for the engineering of specific DNA binding domains by selecting a combination of repeat segments containing the appropriate RVDs ( see, Boch et al. (February 2011) Nature Biotechnology 29 (2): 135-6. doi: 10.1038/nbt.1767). In some embodiments, the TALEN architecture results in a minimum polynucleotide target of about 53 nucleotides. In some embodiments, alternative TALEN architectures utilize shorter or longer sequence targets.
A "deletion" mutation in a nucleic acid (e.g., a gene) results in the loss of one or more nucleotides that are present in the corresponding wild type or non-mutated nucleic acid, whereas an "insertion" mutation involves an addition of one or more nucleotides as compared to the corresponding native or wild-type nucleic acid. Such mutations can result in no polypeptide product or a polypeptide product having no or reduced activity relative to a non- mutated gene. Deletions and insertions that result in no polypeptide product or a polypeptide product having no or reduced activity relative to a non-mutated gene are termed "inactivating deletions" or "inactivating insertions." Thus, for example, utilizing the methods of the invention, nucleic acid encoding COMT can be mutagenized to produce inactivating deletions or inactivating insertions that result in the lack of production of a COMT polypeptide or the production of a COMT polypeptide that has reduced or no activity relative to the wild-type or non-mutated COMT nucleic acid. Furthermore, in some embodiments, the invention provides "stable" inactivating deletions and/or insertions, which means that such deletions and insertions are maintained in the genome and are heritable from one generation to another.
As used herein, "uniform" mutation refers to mutations that display consistency from tissue to tissue and in primary transgenic to vegetative progeny regarding the type of mutations that can be identified in a PGR amplicon from such tissues by, for example, capillary electrophoresis (e.g. peak pattern (length of insertion or deletion)) and/or sequence analysis (same sequence polymorphism across different tissues) In contrast, chimeric mutations show variation in peak pattern (capillary electrophoresis) and/or variation in sequence polymorphism.
As used herein, the term "fusion protein" refers to a protein or polypeptide formed from the combination of two di ferent proteins or protein fragments.
As used herein, "gene" refers to a hereditary unit corresponding to a sequence of DNA that occupies a specific location on a chromosome and that contains the genetic instruction for a characteristic(s) or trait(s) in an organism.
As used herein, "locus" refers to the position that a given gene or portion thereof occupies on a chromosome of a given species.
As used herein, "allele(s)" indicates any of one or more alternative forms of a gene, where the alleles relate to at least one trait or characteristic. In a diploid cell or organism, the two alleles of a given gene occupy corresponding loci on a pair of homologous chromosomes. In a polyploidy organism, such as sugarcane, there are both alleles as well as identical copies (not variants) of the same gene occupying the corresponding locus. Specifically, sugarcane is an interspecific hybrid of autooctoploid Saccharum officinarum (2n = 8x = 80) and autoploid Saccharum spontaneum (2n = 5x = 40 to 16x = 128). The resulting modern sugarcane cultivars have a genome size of around 10 Gb and typically have around 120 chromosomes, 70-80% of which are entirely derived from S. officinarum, 10-20% from S. spontaneum and a few from interspecific recombinations (D'Hont et al, Mol. Gen. Genet 250:405-413 ( 1996); Cuadrado et al. J. Exp. Bot. 55 (398): 847-854 (2004); D'Hont et al. Cytogenetic Genome Res. 109:27-33 (2005); Piperidis et al., Mol. Gen. Genet 284:65-73 (2010)). Remarkably high general colinearity of genes between sugarcane haplotypes and high gene structure and sequence conservation of homologous and homoeologous alleles has been reported. Strikingly, all the hom(oe)ologous genes were predicted to be functional, based on their structure, causing an enormous redundancy (Grasmeur et al. New Phytologist 189: 629-642 (201 1)). Thus, in some embodiments, an allele of COMT or copy of a COMT allele or multiple alleles f COMT or multiple copies of a COMT allele may be mutated using target mutagenesis.
As used herein, the term "homoeologous," also spelled homeologous, is used to describe the relationship of similar chromosomes or parts of chromosomes brought together following inter-species hybridization and allopolyploidization, and whose relationship was completely homologous in an ancestral species. In allopolyploids, the homologous chromosomes within each parental sub-genome should pair faithfully during meiosis, leading to disomic inheritance; however in some allopolyploids, the homoeologous chromosomes of the parental genomes may be nearly as similar to one another as the homologous chromosomes, leading to tetrasomic inheritance (four chromosomes pairing at meiosis), intergenomic recombination, and reduced fertility.
The term "heterozygous" refers to a genetic condition where the organism or cell has different alleles at corresponding loci on homologous chromosomes.
As used herein, "homozygous" refers to a genetic condition where the organism or cell has identical alleles at corresponding loci on homologous chromosomes.
As used herein, with respect to nucleic acids, the term "exogenous" refers to a nucleic acid molecule that is not in the natural genetic background of the cell/organism in which it resides. In some embodiments, the exogenous nucleic acid molecule comprises one or more nucleotide sequences that are not found in the natural genetic background of the
cell/organism. In some embodiments, the exogenous nucleic acid molecule can comprise one or more additional copies of a nucleotide sequence that is/are endogenous to the
cell/organism. Typically, the introduced exogenous sequence is a recombinant sequence.
As used herein, the term "lignocellulosic biomass" refers to plant biomass that is composed of carbohydrate polymers {e.g. cellulose and hemicellulose) and lignin.
As used herein, the term biomass refers to biological material derived from living, or recently living organisms.
As used herein, the term "lignin" refers to an aromatic polymer that is the result of oxidative combinatorial coupling of 4-hydroxyphenylpropanoids, which are deposited primarily in the secondary cell wall structure. Lignin is formed from the starting compounds of hydroxycinnamyl alcohols (monolignols), coniferyl alcohol, and/or sinapyl alcohol, and typically minor amounts of ; -coumaryl alcohol.
As used herein, the term "sugarcane" refers to any of the several species or varieties or hybrids of tall perennial true grasses of the genus Saccharum. This includes Saccharum officinarum, Saccharum sinense, Saccharum barberi, Saccharum robustum, and Saccharum spontaneum.
As used herein, the term "biofuel" refers to solid, liquid, or gas fuels made from biomass.
As used herein, "sustainable" refers to the creation and maintenance of conditions under which humans and nature can exist in productive harmony, that permit fulfilling the social, economic, and other requirements of present and future generations.
As used herein, "cellulosic" refers to containing, or derived from cellulose.
As used herein, "polyploidy" or "polyploid" refers to an organism that contains one or more cells that have more than twice the haploid number of chromosomes.
As used herein, "siRNA precursor" refers to a molecule capable of being acted upon by cell proteins to produce siRNA molecules within the cell. These include, but are not limited to, shRNA and microRNA.
As used herein, the terms "specifically binds" or "specific binding" (and similar terms) refer to binding that occurs between such paired species such as enzyme/substrate, receptor/agonist or antagonist, antibody/antigen, lectin/carbohydrate, oligo DNA primers/DNA. enzyme or protein/DNA, RNA molecule to other nucleic acid (DNA or RNA) or amino acid, which may be mediated by covalent or non-covalent interactions or a combination of covalent and non-covalent interactions. When the interaction of the two species produces a non-covalently bound complex, the binding that occurs is typically electrostatic, hydrogen-bonding, or the result of lipophilic interactions. Accordingly, "specific binding" occurs between a paired species where there is interaction between the two which produces a bound complex having the characteristics of, for example, an antibody/antigen, enzyme/substrate, DNA/DNA, DNA/RNA. DNA/protein, RNA/protein, RNA/amino acid interaction. In particular, specific binding may be characterized, for example, by the binding of one member of a pair to a particular species and to no other species within the family of compounds to which the corresponding member of the binding member belongs. Thus, for example, a monoclonal antibody preferably binds to a single epitope and to no other epitope within the family of proteins. In some embodiments, the DNA binding domain of a TALEN nucleic acid specifically binds to at least a portion of nucleic acid encoding COMT.
In some embodiments, the phrase "hybridizing specifically to" refers to the binding, duplexing, or hybridizing of a molecule to a particular nucleic acid target sequence under stringent conditions when that sequence is present in a complex mixture (e.g., total cellular DNA or RNA) to the substantial exclusion of non-target nucleic acids, or even with no detectable binding, duplexing or hybridizing to non-target sequences. Specifically hybridizing sequences typically are at least about 40% complementary and are optionally substantially complementary or even completely complementary (i.e. , 100% identical) to a target nucleic acid sequence.
The terms "gene silencing", "gene knockdown", "reduction of gene expression", "inhibition of gene expression", "gene downregulation", and "gene suppression" are used interchangeably to generally describe reductions of the amount of RNA transcribed from the gene and/or, in the case of a protein-encoding gene, protein translated from the transcribed mRNA. The transcribed RNA may be non-coding or protein-encoding. The term "non- coding" refers to polynucleotides that do not encode part or all of an expressed protein. Non- coding sequences include but are not limited to introns, enhancers, promoter regions, 3' untranslated regions, and 5' untranslated regions. Measurement of transcribed RNA or translated protein can be done by using molecular techniques such as RNA solution hybridization, nuclease protection, Northern hybridization, reverse transcription, gene expression monitoring with a microarray, antibody binding, enzyme-linked immunosorbent assay (ELISA), Western blotting, radioimmunoassay (RIA), other immunoassays, or fluorescence-activated cell analysis (FACS). Gene suppression can be the result of co- suppression, anti-sense suppression, transcriptional gene silencing, post-transcriptional gene silencing, or translational gene silencing. A "silenced", "knocked-down", "reduced", "inhibited", 'down regulated", or "suppressed" gene refers to a gene that is subject to silencing. "Target gene" is thus the gene which is to be silenced. Gene silencing is "specific" for a target gene when silencing of the target gene occurs without manifest effects on other genes.
"Target gene" refers to the entire target gene, including exons, introns and regulatory regions such as promoters, enhancers, and terminators, 5' and 3' untranslated regions, the primary transcript, and the mature mRNA.
A target gene may be a gene whose silencing has a high likelihood of resulting in a strong phenotype, preferably a knockout or null phenotype. A "target polynucleotide" refers to any nucleic acid that is of interest as a target for modulation of expression. "Target polynucleotide" thus, refers to the part of a target gene which is bound or hybridized by a transcription activator-like effectors engineered to recognize the target polynucleotide. The target polynucleotide may correspond to the entire target gene or a fragment of the whole target gene. As known in the art, different TAL effector/TALEN construction methods use different portions of the TAL effector protein flanking the repeat region. These different "architectures" have different optimal ranges for spacer lengths and number of repeats. Thus, in some embodiments, the target polynucleotide may comprise at least about 15 contiguous nucleotides of the target gene. Accordingly, in some embodiments, the target polynucleotide may be about 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99. 100. 105, 110, 1 15. 120. 125, 130. 135, 140, 145, 150. 155, 160, 165, 170, 175, 180, 185, 190, 195, 200, 205, 210, 215, 220, 225, 230, 235, 240, 245, 250, 255, 260, 265, 270, 275, 280, 285, 290, 295, 300, 310, 320, 330, 340, 350, 360, 370, 380, 390, 400, 410, 420, 430, 440, 450, 460, 470, 480, 490, 500, 510, 520, 530, 540, 550, 560, 570, 580, 590, 600, 610, 620, 630, 640, 650, 660, 670, 680, 690, 700, 710, 720, 730, 740, 750, 760, 770, 780, 790, 800, 810, 820, 830, 840, 850, 860, 870, 880, 890, 900, 910, 920, 930, 940, 950, 960, 970, 980, 990, 1000 nucleotides or more in length up to the full length of the target gene, and any range or value therein. Thus, in some embodiments, the target polynucleotide may be a length in the range of about 15 to about 1000 nucleotides, about 15 to about 900 nucleotides, about 15 to about 800 nucleotides, about 15 to about 700 nucleotides, about 15 to about 600 nucleotides, about 15 to about 500 nucleotides, about 15 to about 400 nucleotides, about 15 to about 300 nucleotides, about 15 to about 250 nucleotides, about 15 to about 200 nucleotides, about 15 to about 150 nucleotides, about 15 to about 100 nucleotides, about 25 to about 1000 nucleotides, about 25 to about 900 nucleotides, about 25 to about 800 nucleotides, about 25 to about 700 nucleotides, about 25 to about 600 nucleotides, about 25 to about 500 nucleotides, about 25 to about 400 nucleotides, about 25 to about 300 nucleotides, about 25 to about 250 nucleotides, about 25 to about 200 nucleotides, about 25 to about 150 nucleotides, about 25 to about 100 nucleotides, about 35 to about 1000 nucleotides, about 35 to about 900 nucleotides, about 35 to about 800 nucleotides, about 35 to about 700 nucleotides, about 35 to about 600 nucleotides, about 35 to about 500 nucleotides, about 35 to about 400 nucleotides, about 35 to about 300 nucleotides, about 35 to about 250 nucleotides, about 35 to about 200 nucleotides, about 35 to about 150 nucleotides, about 35 to about 100 nucleotides, 50 to about 1000 nucleotides, about 50 to about 900 nucleotides, about 50 to about 800 nucleotides, about 50 to about 700 nucleotides, about 50 to about 600 nucleotides, about 50 to about 500 nucleotides, about 50 to about 400 nucleotides, about 50 to about 300 nucleotides, about 50 to about 250 nucleotides, about 50 to about 200 nucleotides, about 50 to about 150 nucleotides, about 50 to about 100 nucleotides, about 20 to about 225 nucleotides, about 25 to about 200 nucleotides, about 30 to about 175 nucleotides, about 40 to about 125 nucleotides, about 45 to about 100 nucleotides, about 55 to about 100 nucleotides, about 60 to about 100 nucleotides, or any range or value therein. In representative embodiments, the target polynucleotide may be about 53 nucleotides in length.
The skilled person is aware of methods for identifying the most suitable target polynucleotide within the context of the full-length target gene. For example, a target polynucleotide for a COMT allele (and/or copies of that allele) may be a region within the allele that is highly conserved between different COMT alleles so that many COMT alleles (and/or copies of those alleles) that have different nucleotide sequences can be co-targeted with a single vector. Alternatively, a target polynucleotide for a COMT allele (and/or copies of that allele) may be a region within that gene that is highly specific for the particular COMT target allele so that only that specific COMT allele (and/or copies of that allele) is targeted. Alternatively, multiple TALEN pairs, each one targeting different polynucleotides for a COMT gene encoding a region within each allele that is highly specific for the particular COMT target allele can be co-introduced so that multiple specific COMT alleles and their copies can be co-targeted. In cases where all COMT alleles (and/or copies of that allele) share a target polynucleotide within the allele that is conserved or identical between all COMT alleles, some, none or all of these conserved or identical target polynucleotides may be accessible for binding and cleavage by the same TALEN depending on the genomic context and the chromatin structure or methylation pattern at the specific allele or copy. Therefore, the number of mutated COMT copies and alleles may vary from event to event after introduction of a TALEN to the sugarcane genome offering opportunities to select the events wherein the genetically modified sugarcane plant cell or plant tissue comprises agronomic performance that is substantially the same as that of the non-modified wild type sugarcane plant, cell and/or tissue, while the lignin content is reduced and the biofuel yield is increased. As used herein, "direct embryogenesis" refers to a method of delivering an expression cassette and/or vector to cells wherein the embryos formed on an explant directly regenerate shoots without producing a callus.
As used herein, "indirect embryogenesis" refers to a method of delivering an expression cassette and/or vector to cells, wherein the embryos form after callus development on the surface of the callus.
As used herein, "expression cassette" refers to the part of a DNA expression vector or plasmid that is capable of directing the cell to make RNA and protein. An expression cassette contains one or more nucleotide sequences of interest that code a polypeptide or functional nucleic acid and other regulatory sequences controlling and/or influencing the expression of the polynucleotide sequence(s) of interest contained within the expression cassette.
An expression cassette may be chimeric, meaning that at least one of its components is heterologous with respect to at least one of its other components. An expression cassette may also be one that is naturally occurring but has been obtained in a recombinant form useful for heterologous expression. Typically, however, the expression cassette is heterologous with respect to the host, i.e., the particular nucleic acid sequence of the expression cassette does not occur naturally in the host cell and must have been introduced into the host cell or an ancestor of the host cell by a transformation event.
In addition to the promoters operatively linked to the nucleotide sequences of the invention, an expression cassette of the invention can also include other regulatory sequences. As used herein, "regulatory sequences" means nucleotide sequences located upstream (5' non-coding sequences), within or downstream (3' non-coding sequences) of a coding sequence, and which influence the transcription, RNA processing or stability, or translation of the associated coding sequence. Regulatory sequences include, but are not limited to, promoters, enhancers, introns, translation leader sequences, termination signals, and polyadenylation signal sequences.
For purposes of the invention, the regulatory sequences or regions can be native/analogous to the plant, plant part and/or plant cell and/or the regulatory sequences can be native/analogous to the other regulatory sequences. Alternatively, the regulatory sequences may be heterologous to the plant (and/or plant part and/or plant cell) and/or to each other (i.e., the regulatory sequences). Thus, for example, a promoter can be heterologous when it is operatively linked to a polynucleotide from a species different from the species from which the polynucleotide was derived. Alternatively, a promoter can also be heterologous to a selected nucleotide sequence if the promoter is from the same/analogous species from which the polynucleotide is derived, but one or both (i.e., promoter and/or polynucleotide) are substantially modified from their original form and/or genomic locus, and/or the promoter is not the native promoter for the operably linked polynucleotide.
A number of non-translated leader sequences derived from viruses are known to enhance gene expression. Specifically, leader sequences from Tobacco Mosaic Virus (TMV, the "ω-sequence"), Maize Chlorotic Mottle Virus (MCMV) and Alfalfa Mosaic Virus (AMV) have been shown to be effective in enhancing expression (Gallie et al. (1987) Nucleic Acids Res. 15:8693-871 1 ; and Skuzeski et al. (1990) Plant Mol. Biol. 15:65-79). Other leader sequences known in the art include, but are not limited to, picornavirus leaders such as an encephalomyocarditis (EMCV) 5' noncoding region leader (Elroy-Stein et al. (1989) Proc. Natl. Acad. Set USA 86:6126-6130); potyvirus leaders such as a Tobacco Etch Virus (TEV) leader (Allison et al. ( 1986) Virology 1 54:9-20); Maize Dwarf Mosaic Virus (MDMV) leader (Allison et al. (1986), supra); human immunoglobulin heavy-chain binding protein (BiP) leader (Macejak & Samow (1991) Nature 353:90-94); untranslated leader from the coat protein mRNA of AMV (AMV RNA 4; Jobling & Gehrke (1987) Nature 325:622-625); tobacco mosaic TMV leader (Gallie et al. (1989) Molecular Biology of RNA 237-256); and MCMV leader (Lommel et al. (1991) Virology 81 :382-385). See also, Della-Cioppa et al. (1987) Plant Physiol. 84:965-968.
An expression cassette also can optionally include a transcriptional and/or translational termination region (i. e. , termination region) that is functional in plants. A variety of transcriptional terminators are available for use in expression cassettes and are responsible for the termination of transcription beyond the heterologous nucleotide sequence of interest and correct mRNA polyadenylation. The termination region may be native to the transcriptional initiation region, may be native to the operably linked nucleotide sequence of interest, may be native to the plant host, or may be derived from another source (i.e. , foreign or heterologous to the promoter, the nucleotide sequence of interest, the plant host, or any combination thereof). Appropriate transcriptional terminators include, but are not limited to, the CAMV 35S terminator, the tml terminator, the nopaline synthase terminator and/or the pea rbes E9 terminator. These can be used in both monocotyledons and dicotyledons. In addition, a coding sequence's native transcription terminator can be used.
An expression cassette of the invention also can include a nucleotide sequence for a selectable marker, which can be used to select a transformed plant, plant part and/or plant cell. As used herein, "selectable marker" means a nucleotide sequence that when expressed imparts a distinct phenotype to the plant, plant part and/or plant cell expressing the marker and thus allows such transformed plants, plant parts and/or plant cells to be distinguished from those that do not have the marker. Such a nucleotide sequence may encode either a selectable or screenable marker, depending on whether the marker confers a trait that can be selected for by chemical means, such as by using a selective agent (e.g., an antibiotic, herbicide, or the like), or on whether the marker is simply a trait that one can identify through observation or testing, such as by screening (e.g., the R-locus trait). Of course, many examples of suitable selectable markers are known in the art and can be used in the expression cassettes described herein.
Examples of selectable markers include, but are not limited to, a nucleotide sequence encoding neo or nptll, which confers resistance to kanamycin, G418, and the like (Potrykus et al. (1985) Mol. Gen. Genet. 199:183-188); a nucleotide sequence encoding bar, which confers resistance to phosphinothricin; a nucleotide sequence encoding an altered 5- enolpyravylshikimate-3 -phosphate (EPSP) synthase, which confers resistance to glyphosate (Hinchee et al. (1988) Biotech. 6:915-922); a nucleotide sequence encoding a nitrilase such as bxn from Klebsiella ozaenae that confers resistance to bromoxynil (Stalker et al. (1988) Science 242:419-423); a nucleotide sequence encoding an altered acetolactate synthase (ALS) that confers resistance to imidazolinone, sulfonylurea or other ALS-inhibiting chemicals (EP Patent Application No. 154204); a nucleotide sequence encoding a methotrexate-resistant dihydrofolate reductase (DHFR) (Thillet et al. ( 1988) J Biol. Chem. 263:12500-12508); a nucleotide sequence encoding a dalapon dehalogenase that confers resistance to dalapon; a nucleotide sequence encoding a mannose-6-phosphate isomerase (also referred to as phosphomannose isomerase (PMI)) that confers an ability to metabolize mannose (US Patent Nos. 5,767,378 and 5,994,629); a nucleotide sequence encoding an altered anthranilate synthase that confers resistance to 5 -methyl tryptophan; and/or a nucleotide sequence encoding hph that confers resistance to hygromycin. One of skill in the art is capable of choosing a suitable selectable marker for use in an expression cassette of the invention.
Additional selectable markers include, but are not limited to, a nucleotide sequence encoding β-glucuronidase or uidA (GUS) that encodes an enzyme for which various chromogenic substrates are known; an R-locus nucleotide sequence that encodes a product that regulates the production of anthocyanin pigments (red color) in plant tissues (Dellaporta et al, "Molecular cloning of the maize R-nj allele by transposon-tagging with Ac," pp. 263- 282 In: Chromosome Structure and Function: Impact of New Concepts, 18th Stadler Genetics Symposium (Gustafson & Appels eds., Plenum Press 1988)); a nucleotide sequence encoding β-lactamase, an enzyme for which various chromogenic substrates are known (e.g. , PAD AC, a chromogenic cephalosporin) (Sutcliffe (1978) Proc. Natl. Acad. Sci. USA 75:3737-3741); a nucleotide sequence encoding xylE that encodes a catechol dioxygenase (Zukowsky et al. (1983) Proc. Natl. Acad. Sci. USA 80: 1101-1105); a nucleotide sequence encoding tyrosinase, an enzyme capable of oxidizing tyrosine to DOPA and dopaquinone, which in turn condenses to form melanin (Katz et al. (1983) J Gen. Microbiol. 129:2703- 2714); a nucleotide sequence encoding β-galactosidase, an enzyme for which there are chromogenic substrates; a nucleotide sequence encoding luciferase (lux) that allows for bioluminescence detection (Ow et al. (1986) Science 234:856-859); a nucleotide sequence encoding aequorin, which may be employed in calcium-sensitive bioluminescence detection (Prasher et al. (1985) Biochem. Biophys. Res. Comm. 126: 1259-1268); or a nucleotide sequence encoding green fluorescent protein (Niedz et al. (1995) Plant Cell Reports 14:403- 406). One of skill in the art is capable of choosing a suitable selectable marker for use in an expression cassette of the invention.
In addition to expression cassettes, the nucleic acid molecules and nucleotide sequences described herein can be used in connection with vectors. As used herein, the term "vector" or is used in reference to a vehicle (e.g., nucleic acid construct) used to introduce an exogenous nucleic acid sequence into a cell. A vector may include a DNA molecule, linear or circular (e.g. plasmids), which includes a segment encoding a polypeptide of interest or a functional polynucleotide (e.g., siRNA) operatively linked to additional segments that provide for its transcription and, where relevant, its translation, upon introduction into a host cell or host cell organelles. Such additional segments may include promoter and terminator sequences, and may also include one or more origins of replication, one or more selectable markers, an enhancer, a polyadenylation signal, etc. Vectors for use in transformation of plants and other organisms are well known in the art. Non-limiting examples of general classes of vectors include a viral vector including but not limited to an adenovirus vector, a retroviral vector, an adeno-associated viral vector, a plasmid vector, a phage vector, a phagemid vector, a cosmid, a fosmid, a bacteriophage, or an artificial chromosome. In some embodiments, expression vectors are derived from yeast or bacterial genomic or plasmid DNA, or viral DNA, or may contain elements of both. The size of a vector can vary considerably depending on whether the vector comprises one or multiple expression cassettes (e.g., for molecular stacking). Thus, a vector size can range from about 3 kb to about 30 kb. Thus, in some embodiments, a vector is about 3 kb, 4kb, 5 kb, 6 kb, 7 kb, 8 kb, 9 kb, 10 kb, 11 kb, 12 kb, 13 kb, 14kb, 15 kb, 16 kb, 17 kb, 18 kb, 19 kb, 20 kb, 21 kb, 22 kb, 23 kb, 24kb, 25 kb, 26 kb, 27 kb, 28 kb, 29 kb, 30 kb, or any range therein, in size. In some particular embodiments, a vector can be about 3 kb to about 15 kb in size.
Vectors may be engineered to contain sequences encoding selectable markers that provide for the selection of cells that contain the vector and/or have incorporated the nucleic acid of the vector into the cellular genome. Such markers allow identification and/or selection of host cells that incorporate and express the proteins encoded by the marker. A "recombinant" vector refers to a viral or non-viral vector that comprises one or more heterologous nucleotide sequences (i.e., transgenes). Vectors may be introduced into cells by any suitable method known in the art, including, but not limited to, transfection, electroporation, microinjection, transduction, cell fusion, DEAE dextran, calcium phosphate precipitation, lipofection (lysosome fusion), and use of a gene gun or nucleic acid vector transporter.
As used herein, "amplicon" refers to the nucleic acid product of artificial amplification or replication events, such as the product formed from conducting PGR.
As used herein, "TALEN-induced modification" refers to an alteration of genomic
DNA in a cell that is brought about by action of a synthetic TALEN protein on the genomic DNA followed by the DNA repair process of the cell. The TALEN-induced modification can be an insertion of additional nucleotides to the genomic DNA, wherein such an insertion is not present in a given wild-type genomic DNA sequence for a given gene. The TALEN- induced modification can be a deletion of nucleotides that are normally present in a given wild-type genomic DNA sequence for a given gene. The TALEN-induced modification can be a change in the nucleotide sequence of the genomic DNA of a given gene without an alteration in the total number of nucleotides present (e.g., a substitution of one or more nucleotides with different nucleotides). The TALEN-induced modification can include any one of or any combination of the aforementioned modifications to result in an alteration of the genomic DNA for a given gene as compared to the wild-type. Methods for using TALEN are well-known and fully described in the art (see, e.g., Mussolino et al. Curr Opin Biotechnol. 23:644-650 (2012)) and can be readily adapted for producing COMT mutations in plant genomes including the highly polyploidy sugarcane genome.
As used herein, "transcription activator-like effector nuclease (TALEN) protein" refers to synthetic restriction enzymes generated by fusing a TAL effector DNA binding domain to a DNA cleavage domain as described in Gaj et al., (2013) Trends Biotech. 31 :397- 405, which is hereby incorporated by reference. As used herein, "modified lignin profile" refers to a change in the actual and relative amounts (relative to the other types of lignins present in the cell wall) and types of lignins present in the cell wall of a plant cell that has a TALEN-induced modification as compared to a wild-type lignin profile. In some instances this may be expressed as a change in the ratio or percentages of the types of lignins present in the cell wall of plant cells having a TALEN- induced modification as compared to wild-type cell. Thus, in some embodiments, the genetically modified sugarcane plant, cell and/or tissue of the invention comprises, consists of, or consists essentially of a ratio of syringyl to guaiacyl in the lignin that is reduced by about 6% to about 70% (e.g., about 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, , 26, 27, 28, 29, 30, 31 , 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70%, or any range or value therein) as compared to wild type. In some embodiments, the modified sugarcane plant, cell and/or tissue of the invention comprises, consists of, or consists essentially of a ratio of syringyl to guaiacyl in the lignin of about 1.4 to about 0.45 (e.g., about 1.4, 1.35, 1.3, 1.25, 1.2, 1.15, 1.1, 1.05, 1.0, 0.95, 0.9, 0.85, 0.8, 0.75, 0.7, 0.65, 0.6. 0.55, 0.5, 0.45, or any range or value therein). The ratio of syringyl to guaiacyl in the lignin in wild type sugarcane lignin is about 1.5.
In some embodiments, modified plants of the invention further comprise, consist essentially of or consist of a phenotype of a brown midrib and/or brown internode, which can be a useful, for example, as a screenable, visible marker facilitating the introgression of this trait into new germplasm.
As used herein, "forage" refers to plants consumed by animals, particularly by grazing animals.
The increasingly limited availability of non-renewable energy sources, such as fossil fuels, and the negative impact of their derivation and use on the earth drives the search for alternative energy resources. Biofuels are such an alternative. Examples of biofuels include, but are not limited to bioethanol, biodiesel, bioethers, and syngas. A source of biofuels can be the sugars derived from the grain of agricultural crops.
Sugar for biofuel production can also be found in fast growing plants such as poplar, eucalyptus, and various grass residues such as corn stover and sugarcane bagasse. Advantages of using these types of sources for biofuel are that they are sustainable and do not usually compete with the food and feed supply. While the use of these types of plants as a source for biofuels is promising, the actual benefits have yet to be fully realized primarily due to the presence of lignin in the cell wall(s) of these plants. The physical structure and strength of plants is due primarily to the presence of a cell wall in plant cells. The cell wall is made of lignin and sugar molecules, such as cellulose. Cellulose can be converted into glucose, which can then be used in a classical fermentation process to produce alcohol. One function of lignin is to embed the sugar molecules to give firmness to plants. As such, even tall plants can maintain their upright stature. There are three main types of lignin within the cell wall and depends on the type and amount of the building blocks. Examples of types of lignin are shown in Fig. 1.
Unfortunately, the very aspects of lignin that makes it important to the plant make the current techniques for producing biofuels from lignin containing plants and plant parts inefficient and, in most cases, not economical. Insofar as lignin embeds the sugar molecules needed to produce biofuels, lignin reduces the accessibility of the sugar molecules for biofuel production. Lignin is resistant to physical, chemical, and biological degradation. As such, current techniques for removal of lignin from biofuel feedstock are energy consuming and are environmentally unsound.
Efforts have been made to reduce or modify the lignin content in the cell wall. Plants with reduced lignin content, that contain a modified amount of lignin, or that contain a modified lignin profile that is easier to be broken down can be, at least in theory, used as a source for biofuels that require less energy required and reduce the negative impact on the environment as compared to current lignin-containing sources. For example, RNAi techniques and random mutagenesis have been used to modify the expression of proteins involved in lignin biosynthesis to produce plants with reduced lignin content or a modified lignin profile on a laboratory scale. These approaches, while promising in theory, have disadvantages.
First, RNAi approaches to modifying lignin biosynthesis only allow for knockdown of multiple alleles, as opposed to allowing for knockout of specific or multiple alleles. Further, the knockdown effect induced by RNAi may be only transient as a siRNA precursor transgene may not be stably inherited by progeny. The instability of transgene transfer may because there is a need for continued expression of the RNAi precursor product. Low gene stability, and thus low phenotype predictability, can negatively impact the value of the plant as a commercial product.
With that said, the present compositions and methods are directed to modifying lignin biosynthesis of sugarcane and other commercially relevant plants by genome editing using synthesized transcription activator-like effector nuclease (TALEN) proteins. As such, the present compositions, systems, and methods do not rely on the addition/expression of transgenes to modify lignin biosynthesis. Instead, the compositions, methods, and systems disclosed herein use TALEN proteins to modify the sugarcane genetic content by selectively disrupting the coding sequence of genes on one or more alleles within the lignin biosynthetic pathway of sugarcane. For example, in one embodiment, a sugarcane plant may contain cells having a TALEN -disrupted COMT gene on at least one allele or copy. In some embodiment, a sugarcane plant may comprise, consist essentially of or consist of cells having a TALEN- disrupted COMT gene in more than 50% of the alleles or copies encoding COMT. In some embodiments, a sugarcane plant may comprise, consist essentially of or consist of cells having a TALEN-disrupted COMT gene in less than 100% of the alleles or copies encoding COMT. In some embodiment, a sugarcane plant may comprise, consist essentially of or consist of cells having a TALEN-disrupted COMT gene in more than 50% and less than 100% of the alleles or copies encoding COMT. In some embodiments, a disrupted COM Γ gene within an allele or copy can result in a reduction of the amount of lignin in the plant or cell relative to wild-type, without being lethal.
As such, the compositions and methods described and claimed herein can be used to generate a lignocellulosic biomass feedstock for biofuels that allows a more efficient, economical, and environmentally friendly biofuel synthesis than is currently available. The lignocellulosic biomass can also be used for production of products requiring a cellulosic feedstock, such as paper products. Thus, in some embodiments, the plants, plant cells and/or tissues of the invention produce a lignocellulosic biomass having increased yields of directly fermentable sugars as compared to lignocellulosic biomass produced from wild type plants, plant cells and/or tissues. In particular embodiments, the yield increase can be from about 10% to about 40% (e.g., about 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40%, or any range or value therein) using lignocellulosic biomass from plants, plant cells and/or tissues of the invention as compared to lignocellulosic biomass from a corresponding wild type plant, plant cell and/or tissue. In representative embodiments, sugarcane plants, plant cells and/or tissues of the invention can produce a lignocellulosic biomass having increased yields of directly fermentable sugars (e.g., ethanol) of about 10% to about 40% as compared to the corresponding wild type sugarcane plant, plant cell and/or tissue.
In some embodiments, a lignocellulosic biomass produced from the plants, plant cells and/or tissues of the invention requires a reduced amount of cell wall degrading enzymes or reduced chemical or physical pretreatment of the lignocellulosic biomass for conversion to biofuel and other bioproducts as compared to lignocellulosic biomass produced from wild type plants, plant cells and/or tissues. In particular embodiments, the amount of cell wall degrading enzymes for conversion of a lignocellulosic biomass produced from the plants, plant cells and/or tissues of the invention can be reduced by about two to six fold as compared to lignocellulosic biomass produced from corresponding wild type plants, plant cells and/or tissues (e.g., about 2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5, 6 fold, or any range or value therein). Accordingly, in representative embodiments, the amount of cell wall degrading enzymes for conversion of a lignocellulosic biomass produced from a sugarcane plant, plant cell and/or tissue of the invention can be reduced by about two to six fold as compared to the amount of cell wall degrading enzymes for conversion of a lignocellulosic biomass produced from corresponding wild type plants, plant cells and/or tissues.
As used herein, the term "biobased product" is defined as commercial or industrial products that are composed in whole, or in significant part, of biological products or renewable domestic agricultural materials (see, www.usda.gov/biobased).
Biomass produced from the plants, cells and by the methods claimed and described herein may also have improved digestibility when used as forage, compared to forages incorporating wild-type biomass.
I. Sugarcane plants and cells containing TALEN-modified COMT
The present disclosure encompasses non-naturally occurring genetically modified sugarcane plants and cells, which contain a COMT gene on at least one allele or copy that has one or more nucleotides deleted from the wild type COMT sequence. With this in mind, attention is directed to Fig. 1, which shows the general lignin biosynthetic pathway. The pathway involves the coordinated regulation of three biosythetic patways: the shikimate pathway, the general phenylpropanoid pathway and the lignin branch pathway. The shikimate pathway is a primary metabolic pathway which leads to the biosynthesis of the amino acids phenylalanine, tyrosine, or tryptophan, which are subsequently incorporated into a variety of plant products.
The general phenylpropanoid pathway begins with the deamination of L- phenylalaninc to cinnamic acid, which is catalyzed by phenylalanine ammonia-lyase (PAL). The next step in the phenylpropanoid pathway is the hydroxylation of -coumaric acid at the 3-carbon to form caffeic acid, which is catalyzed by -coumarate 3 '-hydroxylase (C3H). Caffeic acid is methylated by caffcoyl-CoA O-methyltransferase (CCoAOMT) or CoA O- methyltransferase (COMT) to form ferulate. Ferulate can be hydroxylated by 5- hydroxyferulate by the enzyme Ferulate 5 -hydroxylase (F5H). The next step in the phenylpropanoid pathway is the methylation of 5-hydroxyferulate to form sinapate. In the last step of the phenlypropanoid pathway, coumarate, caffeate, ferulate, 5 -Hydroxy ferulate, and sinapate can be converted to their respective hydroxycinnamoyl CoA esters by 4- Coumarate:CoA ligase (4CL).
When hydroxycinnamoyl CoA esters are formed they are then reduced by enzymes of the lignin branch pathway. Hydroxycinnamoyl CoA esters are converted to their corresponding hydroxycinnamalydehydes by cinnamoyl-CoA reductase (CCR). Next, hydroxycinnamaldehydes are converted to their corresponding hydroxycinnamyl alcohols by the enzyme cinnamyl alcohol dehydrogenase (CAD). Finally, laccases and peroxidases convert the hydroxycinnamyl alcohols into the various types (Hydroxy-phenyl. Gualacyl, or Syringyl) of lignin. Within any given cell or cell type, the ratio of the type of lignin may vary and can thus form various lignin profiles. Different ratios of lignin types result in cell walls with different properties. Some lignin profiles are easier to degrade than others. As such, manipulation of the lignin profile through genome editing of the genes involved in the lignin biosynthetic pathway as disclosed herein can result in plants with a lignin composition that is easier to degrade.
As shown in Fig. 1, COMT is a key enzyme in the lignin biosynthetic pathway. As such complete ablation of COMT expression is most likely lethal to the plant because lignin is necessary for plant structure. Therefore, it is desirable to produce a sugarcane plant that has reduced COMT expression. As previously discussed, a RNAi approach can be used to knockdown COM expression at the COMT RNA transcript level. However, a RNAi approach has several disadvantages including the requirement of the use of a transgene to generate stable production of a siRNA precursor or impractical transient delivery of siRNA or siRNA precursor to the plant. In some embodiments, the invention disclosed herein produces plants, plant tissues and/or plant cells that contain a disrupted COMT gene disrupted at the genomic DNA level on at least one, but not all, alleles or copies.
With this in mind, attention is directed to Fig. 2, which shows the general mechanism by which a reduction in COMT can occur using TALEN-mediated COMT gene disruption. As shown, sugarcane is highly polyploidy and, in this example, has ten hom(oe)ologous copies of a chromosome (la-lj). Each chromosome carries one copy or allele of the COMT gene. In Fig. 2, although some of the COMT, COMTb COMTc. COMTd, COMTe, COM I f. COMTg, COMTh, COM f i located on homologous and homeologous chromosomes differ in their COMT sequence the TALEN in this embodiment was targeted to bind to a highly conserved sequence between all of these alleles and therefore co-mutates these alleles. The COMTa , COMTb COMTc, COM fd. COMTe, COM IX COMTg, COMTh. COMTi alleles, contain TALEN binding sites corresponding to, for example, SEQ ID NOs:5 and 6. However, COMTj allele does not contain TALEN binding sites corresponding to, for example, SEQ ID NOs:5 and 6 and is therefore not mutated with this specific TALEN. However, a different TALEN could be designed to co-mutate this allele. However, residual COMT activity provided by COMTj that is not cleaved by TALEN supports agronomic performance that is not significantly different from the non-modified wild type plants. COMT copies or alleles that were cleaved by TALEN (COMTa-i) have disrupted COMT, resulting in reduced lignin content, altered lignin composition along with improved biofuel yields per land area of the crop. Fig. 2 also explains in the contrasting situation in diploid species with two homologous copies of each lignin biosynthetic gene only 3 outcomes are expected relative to knocking out a lignin biosynthetic gene. (A). Both COMT copies and/or alleles knocked out, drastically reduced lignin content (high saccharification and biofuel conversion efficiency) but agronomic plant performance and biomass yield severely compromised leading to reduced biofuel yield per land area of cultivated crop. (B). One COMT allele or copy knocked out leads to non-significant reduction in lignin (saccharification and biofuel conversion efficiency like wildtype) along with agronomic plant performance, biomass yield and biofuel yield like wildtype. (C). No COMT allele or copy knocked out leading to no change lignin content and agronomic plant performance, biomass yield and biofuel yield like wild type.
In an embodiment for producing founder plants, a TALEN protein that specifically binds the COMT gene is delivered by an appropriate method and modifies the COMT genomic DNA of a sugarcane plant cell or population of sugarcane plant cells. When the restriction endonuclease associated with the TALEN protein acts at its target site located between the TALEN binding sites within the COMT genomic DNA, a double strand break is introduced into COMT genomic DNA. The cell repairs the double stranded break using the cell DNA repair mechanism, which may result, for example, in a frame shift mutation within the coding region of the COMT gene. The mutation within the COMT gene knocks out the COMT gene function by keeping the gene from producing a wild-type COMT RNA transcript that can be translated into a functional COMT protein. Instead, the RNA transcripts produced from the modified COMT cannot be translated into functional COMT proteins. The founder modified plants can then be reproduced and produce offspring that 1) contain the COMT gene disruptions on one or more alleles or copies, 2) do not contain a TALEN protein, and 3) do not contain a foreign transgene. In some embodiments, the invention provides a sugarcane plant that contains modified COMT genomic DNA on an allele or copy, wherein the modification to the COMT genomic DNA is a stable, inactivating deletion of nucleotides that are present in a corresponding wild-type allele or copy in a sugarcane plant. In some embodiments, the COMT gene of the sugarcane plant contains modified COMT genomic DNA that comprises any one of the nucleotide sequences of SEQ ID NOs:7-13. Insofar as the DNA repair process is not completely understood, one of ordinary skill in the art cannot predict or otherwise know what the resulting sequence of a modified COMT gene will be after TALEN binds to the genomic DNA and creates the double stranded break, thus, the modified COMT genomic DNA of the present disclosure is not limited to the nucleotide sequences of SEQ ID NOs:7- 13.
In some embodiments, the invention provides a sugarcane plant that contains modified COMT genomic DNA on an allele or copy, wherein the modification to the COMT genomic DNA is an insertion of nucleotides that are in addition to those present in a corresponding wild-type allele or copy in a sugarcane plant. In some embodiments, the insertion comprises the nucleotide sequence of
GCTGGAGCTGGGCCACGTCCATGAGGACCTTG (SEQ ID NO:21).
In one embodiment, the sugarcane plant that contains the modified COMT genomic DNA on an allele or copy (e.g., mutated via TALEN) has reduced gene expression of wild- type COMT as compared to a wild-type sugarcane plant. In some embodiments, the expression of COMT in the plant comprising the modified COMT is reduced by more than 50% to less than about 100%. Accordingly, the expression of COMT in the plant comprising the modified COM Γ is reduced by about 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% or any range or value therein. In particular embodiments, the functional activity of COM Γ in the plant comprising the modified COMT is reduced by at least about 50% to about 98%, about 55% to about 95%, 60% to about 95%, about 65% to about 95%, about 70% to about 95%, about 75% to about 95%, about 60% to about 92%, about 65% to about 92%, about 70% to about 92%, about 75% to about 92%, and the like, and any range or value therein.
In some embodiments, a plant, plant cell and/or plant tissue modified through targeted mutagenesis (e.g., TALENS) has a mutation frequency of more than 50% (i.e., more than 50% of the alleles or copies encoding COMT comprise, consist essentially of or consist of a stable, inactivating deletion or insertion mutation). In some embodiments, the mutation frequency using targeted mutagenesis may be less than 100% (i.e., less than 100% of the alleles or copies encoding COMT comprise, consist essentially of or consist of a stable, inactivating deletion or insertion mutation). Accordingly, in some embodiments, the mutation frequency may be more than 50% and less than 100% (e.g., 51%, 52%, 53%. 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% or any range or value therein). In particular embodiments, the mutation frequency in a plant, plant cell and/or plant tissue using targeted mutagenesis may be about 51% to about 99%, about 51% to about 98%, about 51% to about 95%, about 55% to about 95%, 60% to about 95%, about 65% to about 95%, about 70% to about 95%, about 75% to about 95%, about 60% to about 92%, about 65% to about 92%, about 70% to about 92%, about 75% to about 92%, and the like, and any range or value therein.
Sugarcane is a highly polyploidy plant species with the number of chromosomes varying from about 10 to about 12 depending on the particular sugarcane variety. Thus, when referring to a mutation frequency of more than 50% and less than 100%, at least about 6 to about 11 copies (e.g., about 6, 7, 8, 9, 10 copies, or in the case of a sugarcane variety having 12 chromosomes, 11 copies, and the like) of the COMT gene may be knocked out (i.e., a stable, inactivating mutation).
In further embodiments, the sugarcane plant that comprises a modified COMT genomic DNA on an allele or copy (e.g., mutated via TALEN) has a reduced amount of lignin as compared to a wild-type sugarcane plant. In some embodiments, the amount of lignin in the plant comprising a modified COMT genomic DNA on an allele or copy is reduced by about 3% to about 30% (e.g., about 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10%, 11%, 12%, 13%, 14%, 15%, 16%, 17%, 18%, 19%, 20%, 21%, 22%, 23%, 24%, 25%, 26%, 27%, 28%, 29%, 30%, and any range or value therein) as compared to a plant comprising the wild type or unmodified COMT.
In other embodiments, the sugarcane plant that contains the modified COMT genomic DNA on an allele or copy (e.g., mutated via TALEN) comprises, consists essentially of or consists of a modified lignin profile (e.g., reduced ratio of syringyl to guaiacyl in the lignin (about 6% to about 70%> as compared to wild type) and/or comprises, consists essentially of or consists of a ratio of syringyl to guaiacyl in the lignin of 1.4 to about 0.45. The present disclosure also encompasses a sugarcane cell or a population of sugarcane cells, wherein the modification to the COMT genomic DNA is a stable, inactivating deletion or insertion of nucleotides that are present in a corresponding wild-type allele or copy of a sugarcane cell or population of cells. In some embodiments, the COMT gene of the sugarcane cell or population of sugarcane cells contains modified COMT genomic DNA that comprises a nucleotide sequence of SEQ ID NOs:7-13 or SEQ ID NO:21, or any combination thereof. Insofar as the DNA repair process is not completely understood, one of ordinary skill in the art cannot predict or otherwise know what the resulting sequence of the modified COMT gene will be after TALEN binds to the genomic DNA and creates the double stranded break.
In one embodiment, the sugarcane cell or population of sugarcane cells that contain modified COMT genomic DNA on an allele or copy has reduced gene expression of wild- type COMT as compared to wild-type sugarcane cells or a population of wild-type sugarcane cells. In further embodiments, the sugarcane cell or population of cells that contains the modified COMT genomic DNA on an allele or copy has a reduced amount of lignin as compared to a wild-type sugarcane cell or population of cells. In other embodiments, the sugarcane cell or population of sugarcane cells that contain the modified COMT genomic DNA on an allele or copy has a modified lignin profile as compared to wild-type sugarcane cell or population of sugarcane cells. Importantly, the methods of the present invention provide plants, cells, and/or tissues with significantly reduced lignin/altered lignin composition in combination with agronomic performance (See, e.g., stem diameter in Table 3) that is not significantly different from the non-modified wildtype plants, cells, and/or tissues. This is accomplished by targeting less than 100% and more than 50% mutation of the functional COMT copies. This range of mutation (less than 100% and more than 50% mutation) of the functional COMT copies can only be accomplished in and remain genetically stable in a polyploidy species, and more particularly, in a highly polyploidy species.
II. Methods of producing plants and cells containing TALEN-modified COMT.
Techniques for transforming a wide variety of plant cells with vectors or naked nucleic acids are well known in the art and described in the technical and scientific literature. See, for example, Weising et al. Ann. Rev. Genet. 1988, 22:421-477. Non-limiting examples of methods for transformation of plants include transformation via bacterial-mediated nucleic acid delivery (e.g., via Agrobacterid), viral-mediated nucleic acid delivery, silicon carbide or nucleic acid whisker-mediated nucleic acid delivery, liposome mediated nucleic acid delivery, microinjection, microparticle bombardment, calcium-phosphate-mediated transformation, cyclodextrin-mediated transformation, electroporation, nanopart i c 1 e-med iated transformation,, sonication, infiltration, PEG-mediated nucleic acid uptake, as well as any other electrical, chemical, physical (mechanical) and/or biological mechanism that results in the introduction of nucleic acid into the plant cell, including any combination thereof. General guides to various plant transformation methods known in the art include Miki et al. ("Procedures for Introducing Foreign DNA into Plants" in Methods in Plant Molecular Biology and Biotechnology, Glick, B. R. and Thompson, J. E., Eds. (CRC Press, Inc., Boca Raton, 1993), pages 67-88) and Rakowoczy- I rojanowska (Cell. Mol. Biol. Lett. 7:849-858 (2002)). Thus, in some embodiments, the vector, expression cassette, or naked nucleic acid may be introduced directly into the genomic DNA of a plant cell using techniques such as, but not limited to, electroporation and microinjection of plant cell protoplasts, or the recombinant nucleic acid can be introduced directly to plant tissue using ballistic methods, such as DNA particle (biolistic) bombardment.
Microinjection techniques are known in the art and well described in the scientific and patent literature. The introduction of a recombinant nucleic acid using polyethylene glycol precipitation is described in Paszkowski et al. EMBO J. 1984, 3:2717-2722. Electroporation techniques are described in Fromm et al. Proc. Natl. Acad. Sci. USA. 1985, 82:5824. Ballistic transformation techniques are described in Klein et al. Nature. 1987, 327:70-73.
Another method for transforming plants, plant parts and/or plant cells involves propelling inert or biologically active particles at plant tissues and cells (particle (biolistic) bombardment). See, e.g. , US Patent Nos. 4,945,050; 5,036,006 and 5,100,792. Generally, this method involves propelling inert or biologically active particles at the plant cells under conditions effective to penetrate the outer surface of the cell and afford incorporation within the interior thereof. When inert particles are utilized, the vector can be introduced into the cell by coating the particles with the vector containing the nucleic acid of interest.
Alternatively, a cell or cells can be surrounded by the vector so that the vector is carried into the cell by the wake of the particle. Biologically active particles (e.g., dried yeast cells, dried bacterium or a bacteriophage, each containing one or more nucleic acids sought to be introduced) also can be propelled into plant tissue. In some embodiments, biolistic techniques can be used to introduce one or more heterologous polypeptides into a cell.
The recombinant nucleic acid may also be combined with suitable T-DNA flanking regions and introduced into a conventional Agrobacterium tumefaciens host vector, or other suitable vector. The virulence functions of the Agrobacterium tumefaciens host will direct the insertion of the recombinant nucleic acid including the exogenous nucleic acid and adjacent marker into the plant cell DNA when the cell is infected by the bacteria. Agrobacterium tumefaciens-mediated transformation techniques, including disarming and use of binary vectors, are known to those of skill in the art and are well described in the scientific literature. See, for example, Horsch et al. Science. 1984, 233:496-498; Fraley et al. Proc. Natl. Acad. Sci. USA. 1983, 80:4803; and Gene Transfer to Plants, Potrykus, ed., Springer- Verlag, Berlin, 1995.
A further method for introduction of the vector or recombinant nucleic acid into a plant cell is by transformation of plant cell protoplasts (stable or transient) Plant protoplasts are enclosed only by a plasma membrane and will therefore more readily take up macromolecules, for example, exogenous DNA. These engineered protoplasts can be capable of regenerating whole plants. Suitable methods for introducing exogenous DNA into plant cell protoplasts include electroporation and polyethylene glycol (PEG) transformation. Following electroporation, transformed cells are identified by growth on appropriate medium containing a selective agent.
The presence of a TALEN-induced COMT gene disruption (e.g., insertion, deletion) can be determined using methods well known in the art, e.g., PGR analysis, amplicon sequencing, capillary electrophoresis. Expression of the wild-type COMT in a plant or cell may be confirmed by detecting an increase or decrease of wild-type COMT mRNA or polypeptide in the modified plant. Methods for detecting and quantifying mRNA or proteins are well known in the art (e.g.. COMT activity assay).
Transformed plant cells that are derived by any of the above transformation techniques, or other techniques now known or later developed, can be cultured to regenerate a whole plant. In embodiments, such regeneration techniques may rely on manipulation of certain phytohormones in a tissue culture growth medium, typically relying on a biocide or herbicide selectable marker that has been introduced together with the exogenous nucleic acid. Plant regeneration from cultured protoplasts is described in Evans et al. (1983)
Protoplasts Isolation and Culture, Handbook of Plant Cell Culture, pp. 124-176,
MacMillilan Publishing Company, New York; and Binding (1985) Regeneration of Plants, Plant Protoplasts, pp. 21-73, CRC Press, Boca Raton,. Regeneration can also be obtained from plant callus, explants, organs, or parts thereof.
The genetic properties engineered into the transgenic seeds and plants, plant parts, and/or plant cells of the invention described above can be passed on by sexual reproduction or vegetative growth and therefore can be maintained and propagated in progeny plants. Generally, maintenance and propagation make use of known agricultural methods developed to fit specific purposes such as harvesting, sowing or tilling.
Thus, a nucleotide sequence (or a polypeptide) can be introduced into the plant, plant part and/or plant cell in any number of ways that are well known in the art. The methods of the invention do not depend on a particular method for introducing one or more nucleotide sequences into a plant, only that they gain access to the interior of at least one cell of the plant. Where more than one nucleotide sequence is to be introduced, they can be assembled as part of a single nucleic acid construct, or as separate nucleic acid constructs, and can be located on the same or different nucleic acid constructs. Accordingly, the nucleotide sequences can be introduced into the cell of interest in a single transformation event, in separate transformation events, or, for example, in plants, as part of a breeding protocol.
As used herein, the term "plant" may refer to any suitable plant, including, but not limited to, spermatophytes (e.g., angiosperms and gymnosperms) and embryophytes (e.g., bryophytes, ferns and fern allies). In some embodiments, a plant useful with this invention includes any monocot plant and/or any dicot plant.
Representative host plants include soybean (Glycine max), corn (Zea mays), canola (Brassica napus, Brassica rapa ssp.), alfalfa (Medicago sativa), rice (Oryza sativa), rye (Secale cereale), sorghum (Sorghum bicolor, Sorghum vulgare), sunflower (Helianthus annuus), wheat (Triticum aestivum), tobacco (Nicotiana tabacum), potato (Solarium tuberosum), peanuts (Arachis hypogaea), cotton (Gossypium hirsutum), sweet potato (Ipomoea batatas), cassava (Manihot esculenta), coffee (Cqffea ssp.), coconut (Cocos nucifera), pineapple (Ananas comosus), citrus trees (Citrus spp.), cocoa (Theobroma cacao), tea (Camellia sinensis), banana (Musa spp.), avocado (Persea americana), fig (Ficus carica), guava (Psidium guajava), mango (Mangifera indica), olive (Olea europaea), papaya (Carica papaya), cashew (Anacardium occidental), macadamia (Macadamia integrifolia), almond (Prunus amygdalus), sugar beets (Beta vulgaris), oats, barley, vegetables, ornamentals, and conifers.
Additional host plants of the invention are crop plants, for example, cereals and pulses, maize, wheat, potatoes, tapioca, rice, sorghum, millet, cassava, barley, pea, and other root, tuber, or seed crops or turf grasses. Important seed crops for the invention are oil-seed rape, sugar beet, maize, sunflower, soybean, and sorghum. Horticultural plants to which the invention may be applied may include lettuce, endive, and vegetable brassica including cabbage, broccoli, and cauliflower, and carnations, geraniums, petunias, and begonias. The invention may be applied to tobacco, cucurbits, carrot, strawberry, sunflower, tomato, pepper, chrysanthemum, poplar, eucalyptus, and pine. Optionally, plants of the invention include grain seeds, such as corn, wheat, barley, rice, sorghum, rye, etc. Optionally, plants of the invention include oil-seed plants. Oil seed plants include canola, cotton, soybean, safflower, sunflower, brassica, maize, alfalfa, palm, coconut, etc. Optionally, plants of the invention include leguminous plants. Leguminous plants include beans and peas. Beans include guar, locust bean, fenugreek, soybean, garden beans, cowpea, mung bean, lima bean, fava bean, lentils, chickpea, etc. Host plants useful in the invention are row crops and broadcast crops. Non-limiting examples of useful row crops are corn, soybeans, cotton, amaranth, vegetables, rice, sorghum, wheat, milo, barley, sunflower, durum, and oats. Non-limiting examples of useful broadcast crops are sunflower, millet, rice, sorghum, wheat, milo, barley, durum, and oats. Host plants useful in the invention are monocots and dicots. Non-limiting examples of useful monocots are rice, corn, wheat, palm trees, turf grasses, barley, and oats. Non-limiting examples of useful dicots are soybean, cotton, alfalfa, canola, flax, tomato, sugar beet, sunflower, potato, tobacco, corn, wheat, rice, lettuce, celery, cucumber, carrot, and cauliflower, grape, and turf grasses. Host plants useful in the invention include plants cultivated for aesthetic or olfactory benefits. Non-limiting examples include flowering plants, trees, grasses, shade plants, and flowering and non-flowering ornamental plants. Host plants useful in the invention include plants cultivated for nutritional value, fibers, wood, and industrial products.
In some particular embodiments, a plant of the invention includes, but is not limited to, a soybean plant, a sugar beet plant, a corn plant, a cotton plant, a canola plant, a sugarcane plant, a wheat plant, a rice plant or a turf grass plant. Thus, in some embodiments, a plant of the invention can include but is not limited to a forage grass, a forage legume, and/or a fodder crop (e.g., ryegrass, bahiagrass, Bermuda grass, tall fescue, signal grass, gamma grass, alfalfa, clover, and the like). In some embodiments, the plant of the invention can include biomass crops (e.g., switchgrass, willow, arundo donax, elephantgrass, miscanthus. In representative embodiments, the plant of the invention is sugarcane. In some embodiments, the plant of the invention can be bahiagrass.
As used herein, the term "plant part" includes but is not limited to embryos, pollen, ovules, seeds, leaves, flowers, stems, branches, fruit, kernels, ears, cobs, husks, stalks, roots, root tips, anthers, plant cells including plant cells that are intact in plants and/or parts of plants, plant protoplasts, plant tissues, plant cell tissue cultures, plant calli, plant clumps, and the like. Further, as used herein, "plant cell" refers to a structural and physiological unit of the plant, which comprises a cell wall and also may refer to a protoplast. A plant cell of the invention can be in the form of an isolated single cell or can be a cultured cell or can be a part of a higher-organized unit such as, for example, a plant tissue or a plant organ. A "protoplast" is an isolated plant cell without a cell wall or with only parts of the cell wall. Thus, in some embodiments of the invention, a transgenic cell comprising a nucleic acid molecule and/or nucleotide sequence of the invention is a cell of any plant or plant part including, but not limited to, a root cell, a leaf cell, a tissue culture cell, a seed cell, a flower cell, a fruit cell, a pollen cell, and the like. In some aspects of the invention, the plant part can be a plant germplasm. In some aspects, a plant cell can be non-propagating plant cell that does not regenerate into a plant.
"Plant cell culture" means cultures of plant units such as, for example, protoplasts, cell culture cells, cells in plant tissues, pollen, pollen tubes, ovules, embryo sacs, zygotes and embryos at various stages of development. In some embodiments of the invention, a transgenic tissue culture or transgenic plant cell culture is provided, wherein the transgenic tissue or cell culture comprises a nucleic acid molecule/nucleotide sequence of the invention.
As used herein, a "plant organ" is a distinct and visibly structured and differentiated part of a plant such as a root, stem, leaf, flower bud, or embryo.
"Plant tissue" as used herein means a group of plant cells organized into a structural and functional unit. Any tissue of a plant in planta or in culture is included. This term includes, but is not limited to, whole plants, plant organs, plant seeds, tissue culture and any groups of plant cells organized into structural and/or functional units. The use of this term in conjunction with, or in the absence of, any specific type of plant tissue as listed above or otherwise embraced by this definition is not intended to be exclusive of any other type of plant tissue.
In some aspects, the invention provides plants, plant parts, and/or plant cells produced by the methods of the invention. In some embodiments, the invention further provides a plant crop comprising a plurality of transgenic plants of the invention planted together in, for example, an agricultural field, a golf course, a residential lawn, a road side, an athletic field, and/or a recreational field.
In one embodiment, a sugarcane plant having modified COMT genomic DNA of at least one allele or copy is produced by a method of indirect embryogenesis using agrobacteria. The method involves inoculating a callus induced from an immature sugarcane leaf whorl with agrobacteria containing a TALEN expression vector configured to produce a TALEN that binds sugarcane COMT and creates a double stranded break in the COMT DNA at a target site recognized by a restriction endonuclease of the TALEN. In some embodiments, the TALEN expression vector comprises, consists essentially of, or consists of the nucleotide sequence of SEQ ID NO: 14, SEQ ID NO: 15, and/or SEQ ID NO: 16.
In another embodiment, a sugarcane plant having modified COMT genomic DNA of at least one allele or copy is produced by a method of direct embryogenesis using ballistic bombardment. The method involves bombarding a pre-cultured immature sugarcane leaf whorl with a TALEN expression vector or a minimal TALEN expression cassette configured to produce a TALEN that binds sugarcane COMT and creates a double stranded break in the COMT DNA at a target site recognized by a restriction endonuclease of the TALEN. In some embodiments, the TALEN expression vector comprises, consists essentially of, or consists of the nucleotide sequence of SEQ ID NO: 14 or SEQ ID NO: 15 and/or any combination thereof. In some embodiments, the minimal TALEN expression cassette comprises, consists essentially of, or consists of the nucleotide sequence of SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 19, SEQ ID NO:20 and/or any combination thereof. In further embodiments, the prc- culture immature sugarcane leaf whorl is bombarded with expression vectors or minimal TALEN expression cassettes comprising, consisting essentially of, or consisting of the nucleotide sequence of SEQ ID NO: 14. SEQ I D NO: 1 5, SEQ ID NO: 1 7, SEQ I D NO: 1 8, SEQ ID NO: 19, SEQ ID NO:20, and/or any combination thereof. In some embodiments, the method further comprises selecting a leaf whorl that contains a minimal TALEN expression cassette and/or a TALEN expression vector and regenerating sugarcane roots from the selected leaf whorl without first producing a callus.
Once the plants containing the TALEN-induced COMT gene disruption have been identified by a suitable screening method (e.g. PGR, Southern blotting), the TALEN-induced modified COMT gene can be introduced into other plants by, for example, sexual crossing. Any of a number of standard breeding techniques can be used, depending upon the species to be crossed.
III. Uses of sugarcane plants and cells containing TALEN-modified COMT The present disclosure also encompasses biomass and byproducts derived from the sugarcane plants and cells containing a modified COMT mediated by TALEN specific for COMT as described herein. In one embodiment, the biomass derived from sugarcane having TALEN-induced modified COMT genomic DNA may be used as a biofuel feedstock. In some embodiments, the biomass derived from sugarcane having TALEN-induced modified COMT genomic DNA has a reduced or modified amount of lignin relative to biomass derived from wild-type sugarcane. In further embodiments, the biomass derived from sugarcane having TALEN-induced modified COMT genomic DNA has a modified lignin profile relative to biomass derived from wild-type sugarcane. There are several advantages to using a biofuel feedstock with a reduced or modified amount of lignin or a modified lignin profile relative to feedstock derived from a wild-type sugarcane. Reducing or otherwise modifying the lignin content in a biofuel feedstock may increase the efficiency of, reduce the cost of, and/or reduce the environmental impact of lignin removal as compared to current techniques for lignin removal from biofuel feedstock, which as discussed above are inefficient, uneconomical, and environmentally unsound,
The biomass derived from sugarcane having TALEN-induced modified COMT genomic DNA can also be used as a feedstock to produce cellulosic products, such as paper. Production of cellulosic products from lignocellulosic biomass also needs the removal of lignin from the biomass feedstock. Thus, production of cellulosic products from lignocellulosic biomass derived from wild-type sugarcane suffers the same shortcomings as removal of lignin from biofuel feedstock produced from wild-type sugarcane.
In some embodiments, the lignocellulosic biomass derived from sugarcane having
TALEN-induced modified COMT genomic DNA has a reduced or modified amount of lignin relative to lignocellulosic biomass derived from wild-type sugarcane. In further embodiments, the lignocellulosic biomass derived from sugarcane having TALEN-induced modified COMT genomic DNA has a modified lignin profile relative to biomass derived from wild-type sugarcane. Therefore, the use of biomass having a reduced amount lignin and/or a modified lignin profile may remedy some of the shortcomings of current lignin removal techniques previously described.
Moreover, the biomass derived from sugarcane having TALEN-induced modified COMT genomic DNA may be used as forage for animals. Lignin may be difficult for many animals to digest as they may lack the proper or sufficient amount of enzymes to break it down into digestible components. As such, in some embodiments, forages derived from plants, tissues and/or cells described herein may be more digestible by animals relative to wild type sugarcane. The increased digestibility may be due to a reduced amount of lignin or a modified lignin profile of the forage derived from the plants, tissues and/or cells described herein as compared to wild type. In some embodiments, the plant can be sugarcane. In other embodiments, the plant can be bahiagrass.
Accordingly, the present invention provides a novel approach to reducing lignin levels in a plant, plant cell or plant tissue, in particular, in a sugarcane plant, plant cell or plant tissue, as well as the plants, cells and tissues produced by the inventive method and products produced therefrom including methods of producing said products. In some embodiments, the present invention provides bahiagrass produced as described herein and having reduced lignin levels and modified lignin profiles
Thus, in some embodiments, a method of reducing the lignin content and/or modifying the lignin profile of a plant, cell and/or tissue is provided, comprising, consisting essentially, or consisting of: mutagenizing nucleic acid in a plant, cell and/or tissue to produce a stable inactivating deletion or insertion in more than 50% of the alleles or copies of said plant, cell and/or tissue that encode CoA O-methyltransferase (COMT), thereby reducing the lignin content of the plant, cell and/or tissue and/or modifying the lignin profile of the plant, cell and/or tissue as compared to a wild type plant, cell and/or tissue. In representative embodiments, a method of reducing the lignin content and/or modifying the lignin profile of a sugarcane plant, cell and/or tissue is provided, comprising, consisting essentially, or consisting of: mutagenizing nucleic acid in a sugarcane plant, cell and/or tissue to produce a stable inactivating deletion or insertion in more than 50% of the alleles or copies of said sugarcane plant, cell and/or tissue that encode CoA O-methyltransferase (COMT), thereby reducing the lignin content of the sugarcane plant, cell and/or tissue and/or modifying the lignin profile of the sugarcane plant, cell and/or tissue as compared to a wild type sugarcane plant, cell and/or tissue. In some embodiments, the mutagenesis produces a stable inactivating deletion or insertion in less than 100% of the COMT alleles or copies. In some embodiments, the mutagenesis produces a stable inactivating deletion or insertion in more than 50% and less than 100% of the alleles or copies (e.g., 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71 , 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99% or any range or value therein). As an example, about 7 to about 11 COMT alleles or copies of a sugarcane variety that comprises 12 COMT alleles or copies can be mutated to achieve a mutation frequency of more than 50% and less than 100%. In a further example, about 7 to about 10 COMT alleles or copies of a sugarcane variety that comprises 11 COMT alleles or copies can be mutated to achieve a mutation frequency of more than 50% and less than 100%. Thus, in some embodiments, the mutagenesis produces a stable inactivating deletion or insertion in about 51% to about 99%, about 51% to about 98%, about 51% to about 95%, about 55% to about 95%, about 58% to about 95%, about 58% to about 92%, 60% to about 95%, 60% to about 92%, about 65% to about 95%, about 70% to about 95%, about 75% to about 95%, about 60% to about 92%, about 65% to about 92%, about 70% to about 92%, about 75% to about 92%o, and the like, and any range or value therein. In some embodiments, the mutagenizing comprises targeted mutagenesis. In some embodiments, a sugarcane plant may be regenerated from the sugarcane plant cell or plant tissue.
In some embodiments, a plant, plant cell and/or plant tissue modified through targeted mutagenesis (e.g., TALENS) comprises, consists essentially of or consists of a mutation frequency of more than 50%. In some embodiments, the mutation frequency using targeted mutagenesis may be less than 100%. In representative embodiments, a sugarcane plant, plant cell and/or plant tissue modified through targeted mutagenesis (e.g., TALENS) comprises, consists essentially of or consists of a mutation frequency of more than 50%. In some embodiments, a sugarcane plant, plant cell and/or plant tissue modified through targeted mutagenesis (e.g., TALENS) comprises, consists essentially of or consists of a mutation frequency of more than 50% and less than 100%. Accordingly, in some embodiments, the mutation frequency may be more than 50% to less than 100% (e.g., 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, or 99% or any range or value therein). In particular embodiments, the mutation frequency in a plant, plant cell and/or plant tissue using targeted mutagenesis may be about 51% to about 99%, about 51% to about 98%, about 51 % to about 95%, about 55% to about 95%, about 58% to about 95%, about 58% to about 92%, 60% to about 95%, 60% to about 92%, about 65% to about 95%, about 70% to about 95%, about 75% to about 95%>, about 60% to about 92%, about 65% to about 92%, about 70% to about 92%, about 75% to about 92%, and the like, and any range or value therein.
In some embodiments, the COMT alleles or copies comprising said stable, inactivating deletion or insertion produce no COMT protein or produce COMT protein with reduced and/or no activity. In some embodiments, the reduction in COMT activity can be more than 50% to less than 100% of the activity of a control (e.g., the COMT activity in the corresponding wild type plant, plant cell or plant tissue).
In some embodiments, the targeted mutagenesis comprises, consists essentially of, or consists of introducing into said sugarcane plant, cell and/or tissue a nucleic acid construct comprising a transcription activator-like effector nuclease (TALEN) and a DNA binding domain that binds to at least a portion of nucleic acid encoding COMT. In some embodiments, the nucleic acid construct is transiently transformed into the sugarcane plant, cell and/or tissue. In some embodiments, the nucleic acid construct is stably transformed into the genome of the sugarcane plant, cell and/or tissue. In some embodiments, the nucleic acid construct comprising a transcription activatorlike effector nuclease (TALEN) and a DNA binding domain that binds to at least a portion of nucleic acid encoding COMT comprises, consists essentially of. or consist of a TALEN expression vector encoded by the nucleotide sequence of SEQ ID NO: 14, SEQ ID NO: 15, and/or SEQ ID NO: 16, and/or a TALEN minimal expression cassette encoded by the nucleotide sequence of SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 19, and/or SEQ ID NO:20.
In some embodiments, the methods of the invention result in the amount of lignin in the mutagenized sugarcane plant, plant cell, or plant tissue being reduced by about 3% to about 30% (e.g., 3, 4, 5, 6, 7, 8, 9, 10, 1 1, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, , 26, 27, 28, 29, 30%, or any value or range therein) as compared to wild type. Thus, in some embodiments, the lignin in the mutagenized sugarcane plant, plant cell, or plant tissue is reduced by about 5% to about 30%, about 7% to about 30%, about 10% to about 30%, about 3% to about 25%, about 5% to about 25%, about 7% to about 25%, about 10% to about 25%, about 5% to about 20%, about 7% to about 20%, about 10% to about 20%>, and the like.
In some embodiments, the ratio of syringyl to guaiacyl in the lignin of the mutagenized sugarcane plant, plant cell, or plant tissue is modified, wherein the ratio of syringyl to guaiacyl in the lignin is reduced by 6% to 70% (as compared to WT), thereby resulting in a modified lignin profile as compared to the lignin profile of a wild type sugarcane plant, plant cell, or plant tissue. In some embodiments, the ratio of syringyl to guaiacyl in the lignin of the mutagenized sugarcane plant, plant cell, or plant tissue is modified to be about 1.4 to about 0.45.
In some embodiments, a genetically modified sugarcane plant, cell and/or tissue is produced by the methods of this invention.
In some embodiments, the present invention provides a genetically modified plant, cell and/or tissue comprising, consisting essentially of, or consisting of a stable, inactivating deletion or insertion in more than 50% of the alleles or copies encoding CoA O- methy 1 trans ferase (COMT) in said plant, cell and/or tissue. In some embodiments, a genetically modified plant, cell and/or tissue is provided, comprising, consisting essentially of, or consisting of a stable, inactivating deletion or insertion in less than 100% of the COMT alleles or copies. In representative embodiments, the present invention provides a genetically modified sugarcane plant, cell and/or tissue comprising, consisting essentially of, or consisting of a stable, inactivating deletion or insertion in more than 50% of the alleles or copies encoding CoA O-methyltransferase (COMT) in said sugarcane plant, cell and/or tissue. In some embodiments, a genetically modified sugarcane plant, cell and/or tissue is provided, comprising, consisting essentially of, or consisting of a stable, inactivating deletion or insertion in less than 100% of the COMT alleles or copies. In still further embodiments, a genetically modified sugarcane plant, cell and/or tissue is provided, comprising, consisting essentially of, or consisting of a stable, inactivating deletion or insertion in more than 50% and less than 100% of the alleles or copies encoding CoA ( -methyltransferase (COMT) in said sugarcane plant, cell and/or tissue. In some embodiments, the COMT alleles or copies comprising said stable, inactivating deletion or insertion produce no COMT protein or produce COMT protein with reduced or no activity. In some embodiments, the genetically modified sugarcane plant, cell and/or tissue of the invention comprises, consists essentially of, or consists of a reduced amount of lignin, wherein the amount of lignin is reduced by about 3% to about 30% (as compared to WT). In some embodiments, the genetically modified sugarcane plant, cell and/or tissue of the invention comprises, consists essentially of, or consists of a modified lignin profile, wherein the lignin profile is modified such that the ratio of syringyl to guaiacyl in the lignin is reduced by 6% to 70% as compared to wild type. In some embodiments, the genetically modified sugarcane plant, cell and/or tissue of the invention comprises, consists essentially of, or consists of a modified lignin profile, wherein the ratio of syringyl to guaiacyl in the lignin is about 1.4 to about 0.45 (as compared to 1.5 in the wild type sugarcane).
In some embodiments, the genetically modified sugarcane plant, cell and/or tissue of the invention further comprises an agronomic performance that is substantially the same as that of the non-modified wild type sugarcane plant, cell and/or tissue. As used herein, "agronomic performance" refers to biomass yield, resistance to lodging, resistance to disease and resistance to insects, and the like. An agronomic performance that is the substantially the same as wild type can be, for example, about 80-100% of the biomass production of the wild type. An indicator for biomass production in sugarcane is, for example, stem diameter (see, e.g., Table 3).
In some embodiments, a genetically modified sugarcane plant cell or plant tissue of the invention may be regenerated into a genetically modified sugarcane plant.
In some embodiments, a crop comprising a plurality of the genetically modified sugarcane plant of the present invention is planted together in an agricultural field.
In some embodiments, a product is provided that is produced from a genetically modified sugarcane plant, cell and/or tissue of the invention or a sugarcane crop of the invention. In some embodiments, the product can be biomass, bagasse, biofuel, and/or a biobased product.
In some embodiments, the present invention further provides a method of increasing the efficiency of conversion of lignocellulosic biomass into biofuel, comprising: providing a plant of the invention or a crop of the invention; and converting the lignocellulosic biomass from said plant and/or crop into biofuel, thereby increasing the efficiency of conversion of lignocellulosic biomass into biofuel as compared to a wild type plant or crop. Thus, in representative embodiments, the present invention provides a method of increasing the efficiency of conversion of sugarcane lignocellulosic biomass into biofuel, comprising: providing a sugarcane plant of the invention or a crop of the invention; and converting the lignocellulosic biomass from said sugarcane plant and/or crop into biofuel, thereby increasing the efficiency of conversion of sugarcane lignocellulosic biomass into biofuel as compared to a wild type sugarcane plant or crop. In some embodiments, the efficiency can be measured as an increase in fermentable sugar or a percent increase of biofuel and the increase can be about 15% to about 45% (e.g., about 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45% or any value or range therein) as compared to conversion of wild type sugarcane lignocellulosic biomass into biofuel. Thus, in some embodiments, the increase in efficiency can be about 15% to about 40%, about 15% to about 35%, about 15% to about 30%, about 15% to about 25%, about 20% to about 40%, about 20% to about 35%, about 20% to about 30%, and the like.
The present invention further provides a method of providing an animal feed having increased digestibility, comprising: providing a plant having reduced lignin content generated as described herein; and converting the lignocellulosic biomass from said plant and/or crop into animal feed, thereby providing a more readily digestible animal feed. In some embodiments, the plant is sugarcane. In some embodiments, the plant is bahiagrass.
In some embodiments, a genetically modified sugarcane plant is provided comprising: modified COMT genomic DNA on an allele or copy, wherein the modification to the COMT genomic DNA comprises a stable, inactivating deletion or insertion (as compared to a wild- type allele or copy). In some embodiments, the modified COMT genomic DNA comprises, consists essentially of, or consists of the nucleotide sequence of SEQ ID NO: 7, SEQ ID NO:8, SEQ ID N():9. SEQ ID NO: 10, SEQ ID NO: 1 1 , SEQ ID NO: 12 and/or SEQ ID NO: 13. In some embodiments, the sugarcane plant of the invention has reduced gene expression of COMT as compared to a wild-type plant and/or a reduced amount of COMT protein expression as compared to a wild-type plant. In some embodiments, the sugarcane plant of the invention has a reduced amount of lignin or a modified lignin profile as compared to a wild-type sugarcane plant.
In some embodiments, the invention provides a sugarcane plant cell comprising: modified COMT genomic DNA on an allele or copy, wherein the modification to the COMT genomic DNA comprises a stable, inactivating deletion or insertion (as compared to a wild- type allele or copy). In some embodiments, the modified COMT genomic DNA comprises, consists essentially of, or consists of the nucleotide sequence of SEQ ID NO: 7, SEQ ID NO:8, SEQ ID NO:9, SEQ ID NO: 10, SEQ ID NO:l l, SEQ ID NO: 12 and/or SEQ ID NO: 13. In some embodiments, the sugarcane plant cell of the invention has reduced gene expression of COMT as compared to a wild-type plant cell and/or a reduced amount of COMT protein expression as compared to a wild-type plant cell. In some embodiments, the sugarcane plant of the invention has a reduced amount of lignin or a modified lignin profile as compared to a wild-type sugarcane plant cell. In some embodiments, the present invention further provides a population of sugarcane plant cells comprising a sugarcane plant cell of this invention.
In some embodiments, the invention provides a method of producing a genetically modified sugarcane plant having modified COMT genomic DNA of at least of one allele or copy produced by a method of indirect embryogenesis using agrobacteria, the method comprising: inoculating a callus induced from an immature leaf whorl of sugarcane with agrobacteria containing a TALEN expression vector, wherein the TALEN expression vector comprises, consists essentially of, or consists of the nucleotide sequence of SEQ ID NO: 14, SEQ ID NO: 15, SEQ ID: 16, or any combination thereof. In some embodiments, the invention provides a method of producing a genetically modified sugarcane plant having modified COMT genomic DNA of at least one allele or copy produced by a method of direct embryogenesis, the method comprising: bombarding a pre-cultured immature sugarcane leaf whorl with a TALEN expression vector comprises, consists essentially of, or consists of the nucleotide sequence of SEQ ID NO: 14, SEQ ID NO: 15 or any combination thereof, or a minimal TALEN expression cassette comprises, consists essentially of, or consists of the nucleotide sequence of SEQ ID NO: 17. SEQ ID NO: 18. SEQ ID NO: 19. SEQ ID NO:20, or any combination thereof. In some embodiments, the method further comprises, selecting leaf a whorl that contains a TALEN minimal expression cassette or TALEN expression vector; and regenerating sugarcane shoots from the selected leaf whorl without producing a callus.
In some embodiments, the present invention provides a method for reducing the amount of lignin present in sugarcane relative to a wild-type sugarcane plant, the method comprising: modifying the COMT genomic DNA of at least one allele or copy in a sugarcane plant cell by inoculating a callus induced from immature leaf whorls of sugarcane with agrobacteria containing a TALEN expression cassette, wherein the TALEN expression cassette comprises, consists essentially of, or consists of the nucleotide sequence of SEQ ID NO: 14, SEQ ID NO: 15. SEQ ID 16, or any combination thereof; and growing a sugarcane plant from the inoculated callus, whereby the sugarcane plant has a reduced amount of lignin relative to a wild-type sugarcane plant.
In some embodiments, the present invention provides method for reducing the amount of lignin present in a sugarcane relative to a wild-type sugarcane plant, the method comprising: bombarding a pre-cultured immature sugarcane leaf whorl with a TALEN expression vector comprises, consists essentially of, or consists of the nucleotide sequence of SEQ ID NO: 14, SEQ ID NO: 15 or any combination thereof, or a minimal TALEN expression cassette comprises, consists essentially of, or consists of the nucleotide sequence of SEQ ID NO: 17, SEQ ID NO: 1 8. SEQ ID NO: 19. SEQ ID NO:20, or any combination thereof; selecting a leaf whorl that contains a TALEN minimal expression cassette or TALEN expression vector; regenerating sugarcane shoots from the selected leaf whorl without producing a callus; and growing a sugarcane plant from the regenerated sugarcane shoots, whereby the sugarcane plant has at least one cell with a reduced amount of lignin relative to a wild-type sugarcane plant cell.
In some embodiments, the present invention provides a biofuel feedstock comprising biomass, a cellulosic material comprising cellulose each of which is derived from a sugarcane plant, cell or tissue of this invention.
In some embodiments, the present invention provides a forage comprising biomass derived from a plant (e.g., sugarcane, bahiagrass, and the like) produced according to the methods of the invention.
The invention will now be described with reference to the following examples. It should be appreciated that these examples are not intended to limit the scope of the claims to the invention, but are rather intended to be exemplary of certain embodiments. Any variations in the exemplified methods that occur to the skilled artisan are intended to fall within the scope of the invention. EXAMPLES
Example 1: Selection of a TALEN binding sequence within sugarcane COMT gene
The first 200 bp region in the first exon of the sugarcane COMT gene (GenBank accession no. AJ231133) was pre-selected to search potential TALEN binding sites. TALEN™ Hit software (http://talen-hit.cellectis-bioresearch.com/) was used to identify candidate TALEN binding and target (spacer) sites within the pre-selected region. Binding DNA sequences for the left and right TALEN, which should be preceded by a 5 -T and the corresponding target sites (spacer) were selected between +49 and +101 in the COMT ORF (numbers counted from A in start codon sequence, ATG).
Example 2: TALEN Expression Vector Construction
TALEN arms were custom synthesized from Cellectics Bioresearch (TALEN™ Sure KO) and Life Technologies (GencArt® Precision TAL). The backbone for the entry vector was custom-synthesized and constructed by subcloning and carried loxP sites for optional removal of the TALEN cassette from genomic DNA with ere recombinase, nptll selectable marker, and promoter / terminator pairs for the expression of TALEN arms. Left and right TALEN arms were cloned into the entry vector under the control of YLCV promoter / NtHSP terminator and YLCV promoter / AtHSP terminator, respectively. The backbone of the entry vectors was then replaced by the backbone of the agrobacteria transformation vector, resulting in pTALCOMT Cellectics (pTALCell) (SEQ ID NO: 14) and pTALCOMT Life (pTALLife) (SEQ ID NO: 15) as shown in Figs. 3 and 4.
In an attempt to induce Cre-loxP mediated self-excision of the expression cassette, Cre expression cassette was introduced into TALEN constructs, resulting in pTALCOMT Cellectics Cre (pTALCreCell) (SEQ ID NO: 17) and pTALCONlT Ufe Cre (pTALCreLife) (SEQ ID NO: 18) as shown in Figs. 5 and 6. The Cre expression cassette contained a heat inducible promoter (GmHSP) to allow for conditional activation of cre, NLS fused / codon optimized Cre gene, and Nos terminator. Example 2: Generation and TALEN integrated transgenic sugarcane plants
TALEN expression cassette was introduced into sugarcane genome through indirect (IE) and direct embryogenesis (DE) using agrobacteria and biolistic mediated gene transfer. For the Agrobacteria mediated transformation via IE, 8 week-old callus induced from immature leaf whorls of sugarcane (Saccharum spp. Hybrid) var. CP88-1762 was inoculated with Agrobacterium strain AG L I carrying pTALCell (SEQ ID NO: 14). pTALLife (SEQ ID NO: 15), or nptll control (SEQ ID NO: 16). For the biolistic transfer, pre-cultured immature leaf whorls of CP88-1762 were bombarded with a minimal expression cassette (without vector backbone) from each construct. pTALCell (SEQ ID NO: 14). pTALLife (SEQ ID NO: 15), pTALCreCell (SEQ ID NO: 17), and pTALCreLiie (SEQ ID NO: 18) followed by selection and regeneration via direct embryogenesis.
Example 3: PCR of genomic DNA to screen for the presence of TALEN expression cassette.
The integration of TALEN cassette in sugarcane genome was investigated by PCR.
Genomic DNA was extracted from leaves using DNeasy 96 Plant Kit (Qiagen), and 25 ng was used per reaction as a template for amplification. The cassette specific forward (TALS F: 5 '- AAAGGCGTGTTTGATGTGAA-3 ') (SEQ ID NO:l) and reverse (TLAS R: 5 '-TCC AAGGAC AACTTTAGAAAG AAAA-3 ') (SEQ ID NO:2) primers were designed from NtHSP terminator region as shown in Fig. 7, with an expected amplicon size of 332 bp. PCR was performed in the Mastercycler (Eppendorf) with Hot Start Taq DNA polymerase (New England Biolabs (NEB)) under the following conditions: 95°C for 30 s denaturation, 35 cycles at 95 °C for 30 s, 60°C for 30 s, 68°C for 1 min and final extension at 68°C for 5 min. PCR products were separated by electrophoresis on 1.0% agarose gel and visualized after ethidium bromide staining as shown in Fig. 8. The sugarcane lines generating 332 bp PCR product was considered as the TALEN integrated transgenic lines (Fig. 8). The number of regenerated transgenic lines and PCR screening results are summarized in Table 1.
Table 1. Results of the TALEN mutagenesis.
Figure imgf000058_0001
nptll control: Binary vector only harboring nptll expression cassette
One gram of calli is equivalent to one shot of bombardment. Example 4: Restriction analysis to screen for mutations in the TALEN targeting site
TALEN induced targeted mutagenesis was screened by PCR followed by restriction enzyme digestion assay. A 125 bp PCR fragment encompassing the TALEN binding and target sites was amplified using 4F (5 '-GGCTCG ACCGCCG AGGAC-3 ') (SEQ ID NO:3) and 128R (5'-TCCAGCAGGCCCAGCTCCAG-3')(SEQ ID NO:4) primer (Fig. 9). PCR was performed in the Mastercycler (Eppendorf) with Phusion Hot Start II High-Fidelity DNA Polymerase (Thermo Scientific) under the following conditions: 98°C for 30 s denaturation, 33 cycles at 98 °C for 10 s, 68°C for 15 s, 72°C for 15 s and final extension at 72°C for 5 min. Half of the PCR products (10 ul) was then treated with 5 units of BsaHI restriction enzyme (NEB) at 37 °C for 1 h. Restriction enzyme treated PCR products were separated by electrophoresis on 3.5% agarose gel and visualized after ethidium bromide staining as shown in Fig. 10.
The recognition site of Bsal 11 exists in the target site (spacer region) where TALEN induced double strand break and mutations are most likely to occur (Fig. 9). Thus, it could be anticipated that alterations in Bsalll restriction site indicate mutation events. PCR product remained undigested (~ 125 bp) in the transgenic sugarcane lines (boxes in Fig. 10) indicated the mutation events in the target region, whereas amplicons from the rest of the lines including wild type (WT) was digested by the enzyme and released 68 bp and 49 bp fragments (Fig. 10).
Example 5: Amplicon analysis using capillary electrophoresis to select the modified sugarcane lines.
Modified sugarcane lines were selected based on the size variations in the amplicon encompassing targeted mutation site. Genomic DNA was extracted from two different leaves or tillers in each of the TALEN integrated transgenic line using DNeasy 96 Plant Kit (Qiagen). PCR was carried out from each sample with 25 ng of genomic DNA and 4F and 6- FAM labelled 128R primer. PCR was performed under the same conditions mentioned above in restriction analysis and separated by Capillary electrophoresis. Capillary electrophoresis for the amplicon was performed by the service company, GENEWIZ. Size variations in the amplicon were analyzed using the Peak Scanner Software (Life Technology).
TALEN induced double strand break results in deletion and/or insertions (indels) at the target site in COMT gene. Individual peaks generated from capillary electrophoresis correspond to DNA fragments with different sizes. COMT amplicon from wild type (WT) displayed only one single peak at 125 bp (Fig. 11 A), which is an expected amplicon size, while mutant lines showed different peak patterns from WT (Fig. 11B). Lines with at least one different peak from WT were considered to be a mutant. The number of mutant lines selected is shown in Table 2. Mutation rate in the TALEN integrated transgenic lines varied upon different TALEN constructs and transformation procedures, ranging from about 5% to about 77%. The lines generated by agrobacteria transformation of pTALCELL via IE showed the highest mutation rate (about 77%, Table 2). The COMT amplicon from TALEN induced mutated sugarcane display an almost complete conversion of the 125bp peak found in wild- type to mutant peaks, suggesting an almost complete conversion of the wild-type COMT to mutant COMT. Table 2
Figure imgf000060_0001
' Mutation rate in transgenic sugarcane lines: 100 χ (No. of mutant lines / No. of analyzed lines)
2) Mutation frequency in amplicon: 100 χ (Total area of mutant peak / total area of mutant and wild type peak in fragment analysis). Values are medians among the lines in each column.
Amplicon analysis was performed in two different tissues to investigate mutation patterns. The majority of mutation lines showed no variations in peak pattern between different tissues, representing a uniform and stable TALEN induced mutation event (Table 2 and Fig. 12). Some of the lines displayed variations, indicating progressing mutation or chimeric events (Table 2 and Fig. 13).
Mutation frequency in amplicon, which indicates the proportion of mutants over wild type COMT alleles or copies in an individual plant, varied among individual lines. The lines generated by agrobacteria transformation of pTALCELL via IE tended to have higher mutation frequency in amplicon (about 94% in median value), while those by ballistic bombardment of pTAL LIFE via DE showed lower mutation frequency (about 13% in median value) as shown in Table 2. These differences are likely a consequence of the different tissue culture procedures used with the two gene transfer methods. Direct embryogenesis was used in combination with biolistic gene transfer, while indirect embryogenesis was used in combination with agrobacterium mediated gene transfer.
Example 6: Sequence Confirmation and plant development
BsaHI treated PGR products (as mentioned in restriction analysis) from mutant lines were cloned into pC l I-TOPO (Invitrogen) sequencing vector. A total of six clones were sequenced and analyzed. 5 bp to 29 bp deletions were detected in the analyzed mutant lines, confirming the TALEN induced mutagenesis in COMT gene (Fig. 14). Plants with mutated COMT genes are shown in Fig. 15A-B. Some are shown actively growing in soil and produce secondary tillers (Fig. I SA).
Example 7. Analysis of mutant sugarcane plants.
Six mutants were analyzed for lignin content, stem diameter, browning of the midrib and/or internode and percent reduction in lignin as compared to the wildtype. The results are provided in Table 3, below. In Table 3, mutation frequency refers to % of de novo sequence modifications (deletions or insertions) within the 125bp PGR amplicon of the conserved sugarcane COMT region, which was targeted for cleavage by COMT-TALEN relative to WT. The mutation frequency data are based on > 1 ,000 sequence reads per line (using 454 sequencing (e.g., high throughput sequencing).
Table 3. Lignin content, plant performance (stem diameter) in TALEN COMT-mutant lines with different mutation frequencies
Figure imgf000062_0001
WT = original sugarcane CP 88-1762 without mutation. C7-C16 TALEN lines with COMT mutations.
* indicates significant difference p<0.05 from WT value in the same column.
Mutant sugarcane lines C6, C3, C7 with 75% to 92% mutation frequencies display a significant reduction of lignin (11 to 22%) but no significant reduction of stem diameter compared to WT. C6 shows the best agronomic performance a measured by stem diameter (22mm) along with significant reduction in lignin (22% reduction) despite only a 75% mutation frequency. This suggests that both agronomic performance and lignin reduction are not only influenced by the frequency of the targeted mutations but also by which type of COMT copies are mutated or remain unmutated. Identification of these potentially superior COMT targets for mutation may then be used to design specific TALENS for those targets in this and/or other sugarcane cultivars targeting the non-conserved portion of the COMT for cleavage. Low mutation frequencies (e.g., Line CI 4) result in no significant difference in both stem diameter and lignin content as compared to WT. Mutation frequencies of 98% or higher (CI 7 and CI 6) result in both significant reduction of lignin and in stem diameter compared to WT. Brown stem internode and midrib are reliable indicators for significant lignin reduction.
Phenotype and plant growth. COMT mutants with lignin reduction of more than 20% displayed brown coloration in internodes and mid-rib (see. Table 3 and Fig. 16A-16C). This phenotype has earlier been described in COMT mutants of Sorghum and Maize with reduced lignin. The growth performance (e.g. stem diameter) of sugarcane mutant lines with up to 22% reduction in lignin did not differ from wild-type under greenhouse conditions (Table 3, Fig. 16A-16C).
Sequence confirmation by amplicon sequencing. Mutant lines were confirmed by sequencing of the COMT amplicon which revealed the presence of deletions or insertions at the target site. Amplicon sequencing also revealed simultaneous mutation of different COMT homo(eo)logs (e.g., COM l a and COMTb), wheih are two confirmed COMT homo(eo)logs with SNP indicated with red arrow. Fig. 17 shows sequence confirmation of TALEN induced COMT mutation in line CI 6 with both significant reduction of lignin and stem diameter compared to WT. In Fig. 17 the SNP site is indicated with red arrow and the TALEN binding site indicted by the boxes. An insertion at Ml a of
GCTGGAGCTGGGCCACGTCCATGAGGACCTTG (SEQ ID NO:21 ) was identified. Both primary transgenic and vegetative progeny displayed the same type of mutations with 13 to 194 reads per mutation following 454 sequencing (i.e., high throughput screening). Fig. 18 provides sequence confirmation of TALEN induced COMT mutation in line C6, which has significant reduction of lignin (22%) but no significant reduction of stem diameter compared to WT. Here, both primary transgenic and vegetative progeny displayed the same type of mutations with 3 to 97 reads per mutation following 454 sequencing.
Identifying uniform mutation events. In addition, mutation patterns were examined in the primary mutant line and its vegetative progeny using capillary electrophoresis (Figs. 19 and 20) in addition to amplicon sequencing (Figs. 17 and 18). Fig. 19 provides the results for mutant lines CI 6, C6, and C14 and Fig. 20 provides the results for mutant lines CI 7 and C7. In both Figs. 19 and 20, WT is the original sugarcane without mutation, while PT, and VP indicate the primary mutant line and the vegetative progeny, respectively. Identical mutations were confirmed by both sequencing and capillary electrophoresis, suggesting that the lines represent uniform mutation events.
Example 8. Cre/IoxP mediated site specific recombination for excision of TALEN expression cassette from the sugarcane genome.
In some cases, a TALEN expression cassette is stably integrated into the sugarcane genome. To reduce the possibility of the creation of additional mutations, it may be desirable to remove the TALEN expression cassette. For this purpose site specific recombination can be used as shown and described in Figs. 22A-22D. Here, heat inducible activation of the ere recombinase in vivo activates the excision of the entire nucleic acid sequences flanked by loxP sites from the sugarcane genome by site specific recombination. Excised sequences include nptll selectable marker cassette, ere expression cassette, TALEN expression cassettes.
Table 4 shows the results of mutagenesis using a TALEN expression cassette and the removal of the cassette from the resultant regenerated mutant plants.
Table 4. Results of site specific recombination to remove an integrated TALEN expression cassette.
Figure imgf000064_0001
Removal of the constitutively expressed TALEN from the COM Γ mutants by site specific recombination or sexual reproduction (causing segregation of TALEN and mutations) will remove the potential of transgene derived instability of the created events.

Claims

That which is claimed is:
1. A method of reducing the lignin content and/or modifying the lignin profile of a sugarcane plant, cell and/or tissue, comprising:
mutagenizing nucleic acid in a sugarcane plant, cell and/or tissue to produce a stable, inactivating deletion or insertion in more than 50% of the alleles or copies of said sugarcane plant, cell and/or tissue that encode CoA 0-methyltransferase (COMT), thereby reducing the lignin content of the sugarcane plant, cell and/or tissue and/or modifying the lignin profile of the sugarcane plant, cell and/or tissue as compared to a wild type sugarcane plant, cell and/or tissue.
2. The method of claim 1, wherein said COMT alleles or copies comprising said stable, inactivating deletion or insertion produce no COMT protein or produce COMT protein with reduced and/or no activity.
3. The method of Claim 1 or Claim 2, wherein the mutagenizing comprises targeted mutagenesis.
4. The method of Claim 3, wherein the targeted mutagenesis comprises introducing into said sugarcane plant, cell and/or tissue a nucleic acid construct comprising a transcription activator-like effector nuclease (TALEN) and a DNA binding domain that binds to at least a portion of nucleic acid encoding COMT.
5. The method of any one of Claims 1 to 4, wherein the mutagenesis produces a stable, inactivating deletion or an insertion in less than 100% of the COMT alleles or copies.
6. The method of any one of Claims 1 to 5, wherein the amount of lignin is reduced by about 3% to about 30%.
7. The method of any one of Claims 1 to 6, wherein the lignin profile is modified such that the ratio of syringyl to guaiacyl in the lignin is reduced by 6% to 70%.
8. The method of Claim 1 to 7, wherein the ratio of syringyl to guaiacyl in the lignin is about 1.4 to about 0.45.
9. The method of any one of Claims 1 to 8, further comprising regenerating a sugarcane plant from the sugarcane plant cell or plant tissue.
10. A genetically modified sugarcane plant, cell and/or tissue produced by the method of any one of Claims 1 to 9.
11. A genetically modified sugarcane plant, cell and/or tissue comprising a stable, inactivating deletion or insertion in more than 50% of the alleles or copies encoding Co A O- methyltransferase (COMT) in said sugarcane plant, cell and/or tissue.
12. The genetically modified sugarcane plant, cell and/or tissue of Claim 11, comprising a stable, inactivating deletion or insertion in less than 100% of the COMT alleles or copies.
13. The genetically modified sugarcane plant, cell and/or tissue of Claim 1 1 or Claim 12, wherein said COMT alleles or copies comprising said stable, inactivating deletion or insertion produce no COMT protein or produce COMT protein with reduced or no activity.
14. The genetically modified sugarcane plant, cell and/or tissue of any one of Claims 11 to
13, wherein the amount of lignin is reduced by about 3% to about 30%.
15. The genetically modified sugarcane plant, cell and/or tissue of any one of Claims 11 to
14, wherein the lignin profile is modified such that the ratio of syringyl to guaiacyl in the lignin is reduced by 6% to 70%.
16. The genetically modified sugarcane plant, cell and/or tissue of Claim 15, wherein the ratio of syringyl to guaiacyl in the lignin is about 1.4 to about 0.45.
17. The genetically modified sugarcane plant, cell and/or tissue of any one of Claims 10 to
16, wherein the genetically modified sugarcane plant cell or plant tissue comprises agronomic performance that is substantially the same as that of the non-modified wild type sugarcane plant, cell and/or tissue.
18. The genetically modified sugarcane plant cell or plant tissue of any one of Claims 11 to
17, wherein the genetically modified sugarcane plant cell or plant tissue is regenerated into a genetically modified sugarcane plant.
19. A crop comprising a plurality of the genetically modified sugarcane plant of any one of Claims 10 to 18 planted together in an agricultural field.
20. A product produced from the genetically modified sugarcane plant, cell and/or tissue of any one of Claims 10-18 or the crop of Claim 19.
21. The product of claim 20, wherein the product is biomass, bagasse, biofuel, and/or a biobased product.
22. A method of increasing the efficiency of conversion of sugarcane lignocellulosic biomass into biofuel, comprising:
providing a sugarcane plant of any one of Claims 10-18 or the crop of Claim 19; and converting the lignocellulosic biomass from said sugarcane plant and/or crop into biofuel, thereby increasing the efficiency of conversion of sugarcane lignocellulosic biomass into biofuel as compared to a wild type sugarcane plant or crop.
23. The method of Claim 22, wherein the efficiency is measured as an increase in fermentable sugar or a % increase of biofuel and the increase is about 15% to 45% as compared to conversion of wild type sugarcane lignocellulosic biomass into biofuel.
24. A method of providing an animal feed having increased digestibility, comprising:
providing a plant of any one of Claims 10-18 or crop of Claim 19; and converting the lignocellulosic biomass from said plant and/or crop into animal feed, thereby providing a more readily digestible animal feed.
PCT/US2015/028057 2014-04-28 2015-04-28 Targeted genome editing to modify lignin biosynthesis and cell wall composition WO2015168158A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201461985070P 2014-04-28 2014-04-28
US61/985,070 2014-04-28

Publications (1)

Publication Number Publication Date
WO2015168158A1 true WO2015168158A1 (en) 2015-11-05

Family

ID=54359252

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2015/028057 WO2015168158A1 (en) 2014-04-28 2015-04-28 Targeted genome editing to modify lignin biosynthesis and cell wall composition

Country Status (1)

Country Link
WO (1) WO2015168158A1 (en)

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9340799B2 (en) 2013-09-06 2016-05-17 President And Fellows Of Harvard College MRNA-sensing switchable gRNAs
US9388430B2 (en) 2013-09-06 2016-07-12 President And Fellows Of Harvard College Cas9-recombinase fusion proteins and uses thereof
US9526784B2 (en) 2013-09-06 2016-12-27 President And Fellows Of Harvard College Delivery system for functional nucleases
US9840699B2 (en) 2013-12-12 2017-12-12 President And Fellows Of Harvard College Methods for nucleic acid editing
US10077453B2 (en) 2014-07-30 2018-09-18 President And Fellows Of Harvard College CAS9 proteins including ligand-dependent inteins
US10113163B2 (en) 2016-08-03 2018-10-30 President And Fellows Of Harvard College Adenosine nucleobase editors and uses thereof
WO2018198049A1 (en) * 2017-04-25 2018-11-01 Cellectis Alfalfa with reduced lignin composition
US10167457B2 (en) 2015-10-23 2019-01-01 President And Fellows Of Harvard College Nucleobase editors and uses thereof
US10227581B2 (en) 2013-08-22 2019-03-12 President And Fellows Of Harvard College Engineered transcription activator-like effector (TALE) domains and uses thereof
US10323236B2 (en) 2011-07-22 2019-06-18 President And Fellows Of Harvard College Evaluation and improvement of nuclease cleavage specificity
US10508298B2 (en) 2013-08-09 2019-12-17 President And Fellows Of Harvard College Methods for identifying a target site of a CAS9 nuclease
WO2020117837A1 (en) * 2018-12-04 2020-06-11 Monsanto Technology Llc Methods and compositions for improving silage
US10745677B2 (en) 2016-12-23 2020-08-18 President And Fellows Of Harvard College Editing of CCR5 receptor gene to protect against HIV infection
US11268082B2 (en) 2017-03-23 2022-03-08 President And Fellows Of Harvard College Nucleobase editors comprising nucleic acid programmable DNA binding proteins
US11306324B2 (en) 2016-10-14 2022-04-19 President And Fellows Of Harvard College AAV delivery of nucleobase editors
US11319532B2 (en) 2017-08-30 2022-05-03 President And Fellows Of Harvard College High efficiency base editors comprising Gam
US11447770B1 (en) 2019-03-19 2022-09-20 The Broad Institute, Inc. Methods and compositions for prime editing nucleotide sequences
US11542509B2 (en) 2016-08-24 2023-01-03 President And Fellows Of Harvard College Incorporation of unnatural amino acids into proteins using base editing
US11542496B2 (en) 2017-03-10 2023-01-03 President And Fellows Of Harvard College Cytosine to guanine base editor
US11560566B2 (en) 2017-05-12 2023-01-24 President And Fellows Of Harvard College Aptazyme-embedded guide RNAs for use with CRISPR-Cas9 in genome editing and transcriptional activation
US11661590B2 (en) 2016-08-09 2023-05-30 President And Fellows Of Harvard College Programmable CAS9-recombinase fusion proteins and uses thereof
US11732274B2 (en) 2017-07-28 2023-08-22 President And Fellows Of Harvard College Methods and compositions for evolving base editors using phage-assisted continuous evolution (PACE)
US11795443B2 (en) 2017-10-16 2023-10-24 The Broad Institute, Inc. Uses of adenosine base editors
US11898179B2 (en) 2017-03-09 2024-02-13 President And Fellows Of Harvard College Suppression of pain by gene editing
US11912985B2 (en) 2020-05-08 2024-02-27 The Broad Institute, Inc. Methods and compositions for simultaneous editing of both strands of a target double-stranded nucleotide sequence

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100058496A1 (en) * 2007-02-21 2010-03-04 Nagarjuna Energy Private Limited Transgenic sweet sorghum with altered lignin composition and process of preparation thereof
US20120272406A1 (en) * 2011-04-21 2012-10-25 Basf Plant Science Company Gmbh Methods of modifying lignin biosynthesis and improving digestibility
WO2013074999A1 (en) * 2011-11-16 2013-05-23 Sangamo Biosciences, Inc. Modified dna-binding proteins and uses thereof
US20140090099A1 (en) * 2009-10-06 2014-03-27 The Regents Of The University Of California Generation of haploid plants and improved plant breeding

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100058496A1 (en) * 2007-02-21 2010-03-04 Nagarjuna Energy Private Limited Transgenic sweet sorghum with altered lignin composition and process of preparation thereof
US20140090099A1 (en) * 2009-10-06 2014-03-27 The Regents Of The University Of California Generation of haploid plants and improved plant breeding
US20120272406A1 (en) * 2011-04-21 2012-10-25 Basf Plant Science Company Gmbh Methods of modifying lignin biosynthesis and improving digestibility
WO2013074999A1 (en) * 2011-11-16 2013-05-23 Sangamo Biosciences, Inc. Modified dna-binding proteins and uses thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JUNG ET AL.: "RNA interference suppression of lignin biosynthesis increases fermentable sugar yields for biofuel production from field-grown sugarcane", PLANT BIOTECHNOLOGY JOURNAL, vol. 11, no. Iss. 6, 2 April 2013 (2013-04-02), pages 709 - 716, XP055233998, ISSN: 1467-7644 *

Cited By (49)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10323236B2 (en) 2011-07-22 2019-06-18 President And Fellows Of Harvard College Evaluation and improvement of nuclease cleavage specificity
US11920181B2 (en) 2013-08-09 2024-03-05 President And Fellows Of Harvard College Nuclease profiling system
US10954548B2 (en) 2013-08-09 2021-03-23 President And Fellows Of Harvard College Nuclease profiling system
US10508298B2 (en) 2013-08-09 2019-12-17 President And Fellows Of Harvard College Methods for identifying a target site of a CAS9 nuclease
US11046948B2 (en) 2013-08-22 2021-06-29 President And Fellows Of Harvard College Engineered transcription activator-like effector (TALE) domains and uses thereof
US10227581B2 (en) 2013-08-22 2019-03-12 President And Fellows Of Harvard College Engineered transcription activator-like effector (TALE) domains and uses thereof
US9388430B2 (en) 2013-09-06 2016-07-12 President And Fellows Of Harvard College Cas9-recombinase fusion proteins and uses thereof
US10682410B2 (en) 2013-09-06 2020-06-16 President And Fellows Of Harvard College Delivery system for functional nucleases
US11299755B2 (en) 2013-09-06 2022-04-12 President And Fellows Of Harvard College Switchable CAS9 nucleases and uses thereof
US10912833B2 (en) 2013-09-06 2021-02-09 President And Fellows Of Harvard College Delivery of negatively charged proteins using cationic lipids
US9340800B2 (en) 2013-09-06 2016-05-17 President And Fellows Of Harvard College Extended DNA-sensing GRNAS
US9999671B2 (en) 2013-09-06 2018-06-19 President And Fellows Of Harvard College Delivery of negatively charged proteins using cationic lipids
US9737604B2 (en) 2013-09-06 2017-08-22 President And Fellows Of Harvard College Use of cationic lipids to deliver CAS9
US9526784B2 (en) 2013-09-06 2016-12-27 President And Fellows Of Harvard College Delivery system for functional nucleases
US9340799B2 (en) 2013-09-06 2016-05-17 President And Fellows Of Harvard College MRNA-sensing switchable gRNAs
US10597679B2 (en) 2013-09-06 2020-03-24 President And Fellows Of Harvard College Switchable Cas9 nucleases and uses thereof
US10858639B2 (en) 2013-09-06 2020-12-08 President And Fellows Of Harvard College CAS9 variants and uses thereof
US11124782B2 (en) 2013-12-12 2021-09-21 President And Fellows Of Harvard College Cas variants for gene editing
US9840699B2 (en) 2013-12-12 2017-12-12 President And Fellows Of Harvard College Methods for nucleic acid editing
US10465176B2 (en) 2013-12-12 2019-11-05 President And Fellows Of Harvard College Cas variants for gene editing
US11053481B2 (en) 2013-12-12 2021-07-06 President And Fellows Of Harvard College Fusions of Cas9 domains and nucleic acid-editing domains
US10704062B2 (en) 2014-07-30 2020-07-07 President And Fellows Of Harvard College CAS9 proteins including ligand-dependent inteins
US11578343B2 (en) 2014-07-30 2023-02-14 President And Fellows Of Harvard College CAS9 proteins including ligand-dependent inteins
US10077453B2 (en) 2014-07-30 2018-09-18 President And Fellows Of Harvard College CAS9 proteins including ligand-dependent inteins
US11214780B2 (en) 2015-10-23 2022-01-04 President And Fellows Of Harvard College Nucleobase editors and uses thereof
US10167457B2 (en) 2015-10-23 2019-01-01 President And Fellows Of Harvard College Nucleobase editors and uses thereof
US11702651B2 (en) 2016-08-03 2023-07-18 President And Fellows Of Harvard College Adenosine nucleobase editors and uses thereof
US10947530B2 (en) 2016-08-03 2021-03-16 President And Fellows Of Harvard College Adenosine nucleobase editors and uses thereof
US10113163B2 (en) 2016-08-03 2018-10-30 President And Fellows Of Harvard College Adenosine nucleobase editors and uses thereof
US11661590B2 (en) 2016-08-09 2023-05-30 President And Fellows Of Harvard College Programmable CAS9-recombinase fusion proteins and uses thereof
US11542509B2 (en) 2016-08-24 2023-01-03 President And Fellows Of Harvard College Incorporation of unnatural amino acids into proteins using base editing
US11306324B2 (en) 2016-10-14 2022-04-19 President And Fellows Of Harvard College AAV delivery of nucleobase editors
US10745677B2 (en) 2016-12-23 2020-08-18 President And Fellows Of Harvard College Editing of CCR5 receptor gene to protect against HIV infection
US11820969B2 (en) 2016-12-23 2023-11-21 President And Fellows Of Harvard College Editing of CCR2 receptor gene to protect against HIV infection
US11898179B2 (en) 2017-03-09 2024-02-13 President And Fellows Of Harvard College Suppression of pain by gene editing
US11542496B2 (en) 2017-03-10 2023-01-03 President And Fellows Of Harvard College Cytosine to guanine base editor
US11268082B2 (en) 2017-03-23 2022-03-08 President And Fellows Of Harvard College Nucleobase editors comprising nucleic acid programmable DNA binding proteins
US11479782B2 (en) 2017-04-25 2022-10-25 Cellectis Alfalfa with reduced lignin composition
WO2018198049A1 (en) * 2017-04-25 2018-11-01 Cellectis Alfalfa with reduced lignin composition
US11560566B2 (en) 2017-05-12 2023-01-24 President And Fellows Of Harvard College Aptazyme-embedded guide RNAs for use with CRISPR-Cas9 in genome editing and transcriptional activation
US11732274B2 (en) 2017-07-28 2023-08-22 President And Fellows Of Harvard College Methods and compositions for evolving base editors using phage-assisted continuous evolution (PACE)
US11319532B2 (en) 2017-08-30 2022-05-03 President And Fellows Of Harvard College High efficiency base editors comprising Gam
US11932884B2 (en) 2017-08-30 2024-03-19 President And Fellows Of Harvard College High efficiency base editors comprising Gam
US11795443B2 (en) 2017-10-16 2023-10-24 The Broad Institute, Inc. Uses of adenosine base editors
WO2020117837A1 (en) * 2018-12-04 2020-06-11 Monsanto Technology Llc Methods and compositions for improving silage
US11795452B2 (en) 2019-03-19 2023-10-24 The Broad Institute, Inc. Methods and compositions for prime editing nucleotide sequences
US11643652B2 (en) 2019-03-19 2023-05-09 The Broad Institute, Inc. Methods and compositions for prime editing nucleotide sequences
US11447770B1 (en) 2019-03-19 2022-09-20 The Broad Institute, Inc. Methods and compositions for prime editing nucleotide sequences
US11912985B2 (en) 2020-05-08 2024-02-27 The Broad Institute, Inc. Methods and compositions for simultaneous editing of both strands of a target double-stranded nucleotide sequence

Similar Documents

Publication Publication Date Title
WO2015168158A1 (en) Targeted genome editing to modify lignin biosynthesis and cell wall composition
US20220380790A1 (en) Spatially modified gene expression in plants
Jung et al. RNAi suppression of lignin biosynthesis in sugarcane reduces recalcitrance for biofuel production from lignocellulosic biomass
US8173866B1 (en) Modulation of plant xylan synthases
WO1999010498A2 (en) Genes encoding enzymes for lignin biosynthesis and uses thereof
WO2019038417A1 (en) Methods for increasing grain yield
US11473086B2 (en) Loss of function alleles of PtEPSP-TF and its regulatory targets in rice
US20200255846A1 (en) Methods for increasing grain yield
US9683241B2 (en) Polynucleotides encoding enzymes from the jute lignin biosynthetic pathway
Liu et al. Bn. YCO affects chloroplast development in Brassica napus L.
Xie et al. Overexpression of ARAhPR10, a member of the PR10 family, decreases levels of Aspergillus flavus infection in peanut seeds
US20200283786A1 (en) Lodging resistance in plants
EP3709792A1 (en) Plant promoter for transgene expression
US11629356B2 (en) Regulating lignin biosynthesis and sugar release in plants
US9932601B2 (en) Inhibition of Snl6 expression for biofuel production
CN115244178A (en) Cis-acting regulatory elements
CN108473997B (en) Plant promoters for transgene expression
WO2019099191A1 (en) Plant promoter for transgene expression
EP3703488A1 (en) Plant promoter for transgene expression
WO2019060145A1 (en) Use of a maize untranslated region for transgene expression in plants
US11959089B2 (en) Recombinant LAC polynucleotides and uses thereof to increase production of C-lignin in plants
WO2009104181A1 (en) Plants having genetically modified lignin content and methods of producing same

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15786265

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15786265

Country of ref document: EP

Kind code of ref document: A1