WO2023141464A1 - Method for designing synthetic nucleotide sequences - Google Patents

Method for designing synthetic nucleotide sequences Download PDF

Info

Publication number
WO2023141464A1
WO2023141464A1 PCT/US2023/060837 US2023060837W WO2023141464A1 WO 2023141464 A1 WO2023141464 A1 WO 2023141464A1 US 2023060837 W US2023060837 W US 2023060837W WO 2023141464 A1 WO2023141464 A1 WO 2023141464A1
Authority
WO
WIPO (PCT)
Prior art keywords
polynucleotide sequence
motifs
motif
counted
target
Prior art date
Application number
PCT/US2023/060837
Other languages
French (fr)
Inventor
Matthew Bryon BIGGS
Original Assignee
AgBiome, Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by AgBiome, Inc. filed Critical AgBiome, Inc.
Publication of WO2023141464A1 publication Critical patent/WO2023141464A1/en

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/67General methods for enhancing the expression
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/63Introduction of foreign genetic material using vectors; Vectors; Use of hosts therefor; Regulation of expression
    • C12N15/79Vectors or expression systems specially adapted for eukaryotic hosts
    • C12N15/82Vectors or expression systems specially adapted for eukaryotic hosts for plant cells, e.g. plant artificial chromosomes (PACs)
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids

Definitions

  • optimization of the nucleic acid sequence of a gene is desirable in a number of circumstances. Genes that are cloned from one organism and then expressed in another of a different type (i.e., transgenes) often fail to express, or express poorly, in the target organism. Sequence optimization can correct this poor expression. In other cases, it may be desirable to alter or tune the expression of a specific gene in its native or recombinant context. Optimization may also be desirable not due to the target organism or system, but due to the technical manipulations used during the manipulation of a gene. One example is the removal of restriction endonuclease sites in order to simplify the construction of a vector for generation of transgenic organisms.
  • optimization removes known sequences that have a negative impact on gene expression in the target organism or expression system. For example, some sequences can cause message termination in transcribed RNA, cause folding of the nascent RNA that hinders translation, or can target transcribed RNA for degradation. Removing these sequences can improve expression by stabilizing the RNA message and allowing more abundant translation into protein. Other sequences may impact messages through other mechanisms, for example, by directing splicing or trafficking of the RNA or by changing the rates of protein transcription.
  • sequences and sequence patterns that govern protein production from a gene have been discovered, and sequences which were previously believed to govern protein production do not always do so in every circumstance. For example, some sequences and sequence patterns behave differently in different organisms or even in closely related organisms, or behave in other complex ways based on context or other poorly recognized variables. Therefore, it would be advantageous to develop further methods for gene optimization that account for potential regulatory effects of sequences and sequence patterns found in genes and in systems in which genes are heterologously expressed.
  • the said method comprises steps including identifying and preserving certain motifs, identifying and removing certain other motifs, optimizing GC content and removing undesired polynucleotide repeats.
  • Figure 1 A and Figure IB show a visualization of an optimization solution.
  • the x-axis displays the amino acids corresponding to an AgBiome protein; the y-axis displays the codon selection rank of the new host; different amino acids can be coded for by as few as one or as many as 6 codons.
  • Fig. 1 A shows the initial conditions;
  • Fig. IB shows the optimization solution.
  • Figure 2 shows expression of the optimized form of Gene A in TO plants.
  • Figure 3 shows the Western blot reflecting the expression of APG06396 (not optimized) vs APG06396.5 (optimized) in transformed TV. benthamiana.
  • the constructs in bold are the optimized version of APG06396 (APG06396.5), while the constructs not in bold are un-optimized version or native version (932 and APG06396).
  • a target expression system can include, for example, a target organism or an in vitro expression system.
  • Additional sequence modifications are known to enhance gene expression in a cellular host. These include elimination of sequences encoding spurious polyadenylation signals, RNA destabilizing sites, termination signals, exon-intron splice site signals, transposon-like repeats, and other such well-characterized sequences that may be deleterious to gene expression.
  • the G-C content of the sequence may be adjusted to levels average for a given cellular host, as calculated by reference to known genes expressed in the host cell. When possible, the sequence can also be modified to avoid predicted hairpin secondary mRNA structures.
  • the present method allows for optimization of a native polynucleotide sequence for use in expression of the corresponding polypeptide encoded by the native polynucleotide sequence.
  • the method can be applied to the optimization of any DNA or RNA sequence.
  • a method for making a synthetic gene for expression in a target expression system comprises identifying and counting one or more counted motifs in a native polynucleotide sequence. Such counted motifs are those sequences that should be preserved in the synthetic construct (i.e., the optimized version of the gene).
  • the method further comprises identifying and removing “avoided motifs”, optimizing GC contents, and identifying and removing excessive polynucleotide repeats greater than a set number, as appropriate for the given gene.
  • the method comprises:
  • step (c) selecting a target expression system, and creating a preliminary codon optimized polynucleotide sequence by selecting said target system’s optimal codons for each amino acid in said amino acid sequence in step (b); (d) generating an optimized polynucleotide sequence for use in the expression of the amino acid sequence in the said target system by repeating the process of modifying at least one codon for at least one amino acid position, such that the criteria set forth in (i)-(v) are satisfied in order:
  • the optimized polynucleotide sequence comprises the optimal sets of added motifs
  • optimization refers to modifications of the nucleic acid sequence of a gene, or other nucleic acid sequence encoding a separate nucleic acid or protein, that modulate the expression of the gene when expressed in a selected target expression system.
  • optimization of the gene results in enhanced expression of the synthetic gene when expressed by the target expression system.
  • Expression of the optimized gene can be enhanced by at least 5%, 10%, 15%, 20%, 25%, 50%, 75%, 100%, 200%, 500%, 1000%, or more, when compared to the expression of the native gene in the target expression system under similar conditions.
  • a gene can be optimized to achieve a desired level of expression, higher or lower than the expression of the native gene in the target expression system under similar conditions.
  • modifying refers to a change in the candidate gene nucleic acid sequence that can include, for example, substitutions, deletions, truncations, and/or insertions.
  • modifications to the nucleic acid sequence such as substitutions of codons or sequence patterns, or the removal of avoided motifs, do not alter the encoded amino acid sequence.
  • a modification is an essential modification.
  • An essential modification refers to a modification that cannot be altered by subsequent modifications such that the algorithm avoids infinite loops.
  • an essential modification comprises adding optimal sets of added motifs as set forth in (i).
  • an essential modification comprises optimizing GC content as set forth in (ii).
  • an essential modification comprises removing avoided motif as set forth in (iii).
  • an essential modification comprises removing avoided motif removing polynucleotide repeats greater than a set number as set forth in (iv).
  • polynucleotide As used herein, the use of the term “DNA”, “nucleic acid”, or “polynucleotide” is not intended to limit the present invention to polynucleotides comprising DNA.
  • polynucleotides can comprise ribonucleotides and combinations of ribonucleotides and deoxyribonucleotides. Such deoxyribonucleotides and ribonucleotides include both naturally occurring molecules and synthetic analogues.
  • the polynucleotides of the invention also encompass all forms of sequences including, but not limited to, single-stranded forms, doublestranded forms, hairpins, stem-and-loop structures, and the like.
  • a “native polynucleotide sequence” or “native sequence” optimized by the method can be any gene of interest and can encode any polypeptide of interest. Exemplary genes of interest that can be optimized by the method are further described elsewhere herein.
  • the native polynucleotide sequence can be a modified sequence that does not exist in the native, or natural, form. In particular embodiments, the native polynucleotide sequence is the polynucleotide sequence prior to optimization.
  • the optimized polynucleotide sequence is generated in the methods disclosed herein by applying the algorithm to the native polynucleotide sequence. Modifications to the native polynucleotide sequence can be made in silico within sequence windows. Different parts of the statistical model can be applied concurrently or sequentially.
  • polynucleotide repeat refers to a polynucleotide having tandemly linked repetitive nucleotide.
  • the repetitive nucleotide consists of a single base pair.
  • the repetitive nucleotide consists of 2 base pairs.
  • the repetitive nucleotide consists of 3 base pairs, such as CAG, CGG, CTG, GAA, GCC and GCG.
  • the repetitive nucleotide consists of 4 base pairs, such as CCTG.
  • the repetitive nucleotide consists of 5 base pairs, such as TGGAA.
  • the repetitive nucleotide consists of 6 base pairs, such as GGCCTG and GGGGCC. In other embodiments, the repetitive nucleotide consists of 7 base pairs. In other embodiments, the repetitive nucleotide consists of 8 base pairs. In other embodiments, the repetitive nucleotide consists of 9 base pairs. In other embodiments, the repetitive nucleotide consists of 10 base pairs. In other embodiments, the repetitive nucleotide consists of 11 base pairs. In other embodiments, the repetitive nucleotide consists of 12 base pairs, such as CCCCGCCCCGCG (Handb Clin Neurol. 2018; 147: 105-123).
  • the polynucleotide repeat comprises at least 3 repetitive nucleotides. In some embodiments, the polynucleotide repeat comprises at least 4 repetitive nucleotides. In some embodiments, the polynucleotide repeat comprises at least 5 repetitive nucleotides. In some embodiments, the polynucleotide repeat comprises at least 6 repetitive nucleotides. In some embodiments, the polynucleotide repeat comprises at least 7 repetitive nucleotides. In some embodiments, the polynucleotide repeat comprises at least 8 repetitive nucleotides. In some embodiments, the polynucleotide repeat comprises at least 9 repetitive nucleotides. In some embodiments, the polynucleotide repeat comprises at least 10 repetitive nucleotides.
  • rare codon refers to a codon which is not often used or never used in a target organism or system.
  • rare codon is one or more of: TTA, CTA, TCG, CCG, ACG, GCG, CGA, and CGG.
  • rare codon is one or more of: TTA, CTA, GT A, CGT, AGT, and CGA.
  • a “target expression system” refers to, without limitation, any in vivo or in vitro expression system that facilitates expression of the synthetic gene.
  • a target expression system can be a target organism, or cell or part thereof. Exemplary target organisms that can be employed by the method are further described elsewhere herein.
  • a target expression system can be any in vitro expression system known in the art that allows for cell-free recombinant expression of the synthetic gene.
  • In vitro expression systems can support protein synthesis from DNA templates (transcription and translation) or from mRNA templates (translation only), and can be designed to accomplish transcription and translation steps as two separate sequential reactions or concurrently as one reaction.
  • the target expression system is a bacterial cell for in vitro expression and/or a plant cell for in vivo expression.
  • the method identifies and counts one or more counted motifs or added motifs in the native polynucleotide sequence.
  • counted motif refer to specific sequences, codons, or sequence patterns in the native polynucleotide sequence that are identified and counted, and subsequently not deleted in the primary strand of the optimized polynucleotide.
  • an optimized polynucleotide sequence must comprise at least one more counted motif than that contained in the native polynucleotide sequence (i.e., a greater number). In some embodiments, an optimized polynucleotide sequence must comprise at least 4 counted motifs.
  • the number of counted motifs in the optimized sequence is not the same as the number of counted motifs in the native sequence (i.e., the former number is greater than the latter number).
  • a counted motif could be moved to form the optimized sequence such that the particular counted motif exists in the same number, but different location.
  • the location of any counted motif in the optimized polynucleotide sequence is different from the location of the counted motif in the native polynucleotide sequence.
  • the location of all counted motifs present in the optimized polynucleotide sequence is the same as the location of the counted motif in the native polynucleotide sequence.
  • Counted motifs need not be present in the optimized sequence. Accordingly, in some embodiments, the optimized polynucleotide sequence does not comprise a counted motif.
  • the counted motifs are selected from the group consisting of: AAGCAT, AATAAT, ATTAAT, AACCAA, ATACAT, ATATAA, AAAATA, ATTAAA, ATACTA, AATTAA, ATAAAA, AATACA, ATGAAA, CATAAA.
  • the number of counted or added motifs is selected from the group consisting of AATAAT, AACCAA, ATATAA, ATACTA, ATAAAA, ATGAAA, AAGCAT, ATTAAT, ATACAT, AAAATA, ATTAAA, AATTAA, AATACA, and CATAAA.
  • counted motifs can be scored by the method. Exemplary counted motifs are described in the Examples provided herein.
  • added motif refers to a subset of the counted motifs which are prioritized for addition, that is, when adding motifs to the optimized sequence, only the sequences set forth by added motifs are considered.
  • added motifs can be scored by the method.
  • the optimal set of added motifs is selected based on the score by the method.
  • the optimal set of added motifs can be at least 4 counted motifs.
  • the number of counted motifs is added such that the number of counted motifs in the optimized sequence is the same or greater than the number of counted motifs in the native polynucleotide sequence.
  • the optimal set of added motifs could refer to at least one required motif or a number of required motifs to result in an equal or greater number of required motifs in the optimized polynucleotide sequence when compared to the native polynucleotide sequence. Exemplary added motifs are described in the Examples provided herein.
  • the method further identifies any avoided sequences that may be present in the native polynucleotide sequence. As used herein, “avoided sequence” refer to specific sequences or sequence patterns that are actively excluded, as much as possible, from both the primary strand and the reverse strand of the optimized sequence. In one embodiment, an avoided motif may be an ATTTA motif. In another embodiment, an avoided motif can be a restriction endonuclease site.
  • Modification or removal of restriction endonuclease sites can modulate gene expression in the target expression system and/or promote the incorporation and function of the synthetic gene in an expression cassette or recombinant vector where particular restriction endonuclease sites may be problematic.
  • an avoided motif can comprise polyadenylation sites, termination sites, RNA destabilizing sites, ATTTA motifs, exon-intron splice site signals, transposon-like repeats, other such well-characterized sequences that may be deleterious to gene expression, restriction enzyme recognition sites, exon-intron splice site signals, known transposonlike repeats, or known patterns with strong RNA secondary structures.
  • avoid motifs must be included in the optimized polynucleotide sequence for proper expression of the gene or other reasons specific to the sequence being optimized. However, it is understood that the number of avoided sequences should be limited or excluded from the optimized sequence. Exemplary avoided motifs are described in the Examples provided herein.
  • a required motif refers to specific sequences or sequence patterns of which at least one copy should be found in the optimized polynucleotide sequence.
  • a required motif is an ATATAT motif. Exemplary required motifs are described in the Examples provided herein.
  • the required motifs are selected from the group consisting of ATATAT, TTGTTT, TTTTGT, TGTTTT, TAT AT A, TATTTT, TTTTTT, ATTTTT, TTATTT, TTTATT, TAATAA, ATTTAT, TATATT, TTTTAT, ATATTT, TATTAT, TGTTTG, TTATAT, TGTAAT, AAATAA, AATTTT, TTTTTA, TAATTT, TTAATT, AAATTT, TTTGTT, ATTATT, ATTTTA, TTTAAT, and TTTTAA.
  • polynucleotide repeat refers to any stretch of the same repeated nucleotide, which includes adenine (A), guanine (G), cytosine (C), and thymine (T).
  • a polynucleotide repeat comprises at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 40 repeated nucleotides.
  • an optimized polynucleotide sequence does not comprise any polynucleotide repeat with at least 6 repeated nucleotides. In other embodiments, an optimized polynucleotide sequence does not comprise any polynucleotide repeat with at least 5 repeated nucleotides.
  • an optimized polynucleotide sequence does not comprise any polynucleotide repeat with at least 4 repeated nucleotides. In other embodiments, an optimized polynucleotide sequence does not comprise any polynucleotide repeat with at least 3 repeated nucleotides.
  • spurious ORF refers to any unintended open reading frame except the first open reading frame.
  • a spurious ORF with a length of at least 70 nucleotides receives bad scores by the method.
  • Spurious ORF is also described in the Examples provided herein,
  • GC content refers to the is the percentage of nitrogenous bases in a DNA or RNA molecule that are either guanine (G) or cytosine (C). This measure indicates the proportion of G and C bases out of an implied four total bases, also including adenine and thymine in DNA and adenine and uracil in RNA.
  • DNA with low GC-content is less stable than DNA with high GC- content; however, the hydrogen bonds themselves do not have a particularly significant impact on molecular stability, which is instead caused mainly by molecular interactions of base stacking.
  • GC content is found to be variable with different organisms, the process of which is envisaged to be contributed to by variation in selection, mutational bias, and biased recombination- associated DNA repair.
  • the average GC-content in human genomes ranges from 35% to 60% across 100-Kb fragments, with a mean of 41%.
  • the GC-content of Yeast (Saccharomyces cerevisiae) is 38%, and that of another common model organism, arabidopsis (Arabidopsis thaliana) is 36%.
  • optimal codon of a host refers to the most frequently used codon for each amino acid in the host.
  • the optimal codon can be defined by one of multiple methods, including codon frequency tables, or hidden Markov models. Methods such as the frequency of optimal codons (Fop) (Ikemura T, 1981, J. Mol. Biol. 151 (3): 389-409), the relative codon adaptation (RCA) (Fox et al., 2010, DNA Res. 17 (3): 185-96) or the codon adaptation index (CAI)(Sharp et al., 1987, Nucleic Acids Research.
  • optimal sequence or “optimized polynucleotide sequence” or “optimized nucleic acid sequence” refers to a polynucleotide sequence that encodes the same polypeptide sequence as the native polynucleotide sequence, and which has desirable properties such as improved GC content, no restriction enzyme sites, and/or expresses with higher efficiency in a target expression system.
  • the optimized sequence can be referred to as a synthetic sequence.
  • the algorithm of the method is developed in silico using, for example, a program code.
  • the algorithm can be based on the whole genome, a partial genome, or transcriptome sequences of the selected target expression system.
  • the algorithm is applied to the native polynucleotide sequence to determine codons, sequence patterns, and/or avoided motifs that can be modified to optimize the gene for expression in the target expression system.
  • the algorithm follows the following steps:
  • a new host is identified; the polypeptide sequence is converted into a preliminary DNA sequence using the new host’s optimal codons for each amino acid (See fig. 1A for the initial conditions.)
  • the optimal set of added motifs (i.e. to achieve the goal of at least one more counted motif than the native sequence and a minimum of four) is selected based on the score from step 4.
  • the GC content of the overall sequence is iteratively altered until it falls within the desired range of the new host. Extreme GC regions are identified using a sliding window approach (where the window is 4 amino acids wide in specific embodiments). Within the extreme (high or low) regions of the sequence, all possible alternative codon combinations are tested in the order of most- to least-preferred codons. Alterations are selected that will move the overall GC content towards the new host range. This process is repeated until the GC content is at or near the center of the new host range. 7. Avoided motifs are removed by sampling codons most- to least-preferred for the corresponding amino acids until all avoided motifs have been removed while preserving native amino acid sequence.
  • Polynucleotide repeats greater than length N are removed by sampling codons most- to least-preferred for the corresponding amino acids until all polynucleotide repeats greater than length N have been removed.
  • the algorithm repeats any necessary steps. For example, by removing a polynucleotide repeat, the algorithm may have added back an avoided motif. To avoid infinite loops (cycling back and forth between competing edits), the method includes a codon “lock” feature, so that essential changes can be made unalterable by subsequent steps.
  • the synthetic gene can be incorporated into an expression cassette designed for expression of the gene in the target expression system. Such expression cassettes may be further incorporated into appropriate recombinant DNA vectors.
  • the method further comprises introduction of the expression cassette into a host cell. Upon introduction of the expression cassette, the host cell expresses the synthetic gene. Exemplary expression cassettes and host cells are described in further detail elsewhere herein.
  • the method is described in a particular order, it should not be construed that the method must be performed in the order set forth herein.
  • the steps of the method may be performed in any order based on the desired outcome of the method, the native polynucleotide sequence, and/or the selected target expression system.
  • the native polynucleotide sequence that is modified by the method can be any gene of interest, and can be derived from any prokaryotic or eukaryotic organism.
  • the gene of interest can be desirable for heterologous expression in a plant.
  • the gene may be plant-derived or may be derived from another organism.
  • Such genes of interest are reflective of the commercial markets and interests of those involved in the development of the crop. Crops and markets of interest change, and as developing nations open up world markets, new crops and technologies will emerge also. In addition, as our understanding of agronomic traits and characteristics such as yield and heterosis increase, the choice of genes for transformation will change accordingly.
  • General categories of genes of interest include, for example, those genes involved in information, such as zinc fingers, those involved in communication, such as kinases, and those involved in housekeeping, such as heat shock proteins.
  • transgenes include genes encoding important traits for agronomics, insect resistance, disease resistance, herbicide resistance, sterility, grain characteristics, and commercial products. Genes of interest include, generally, those involved in oil, starch, carbohydrate, or nutrient metabolism, as well as those affecting kernel size, sucrose loading, and the like.
  • Agronomically important traits such as oil, starch, and protein content can be genetically altered in addition to using traditional breeding methods. Modifications include increasing content of oleic acid, saturated and unsaturated oils, increasing levels of lysine and sulfur, providing essential amino acids, and also modification of starch. Hordothionin protein modifications are described in U.S. Patent Nos. 5,703,049, 5,885,801, 5,885,802, and 5,990,389, herein incorporated by reference. Another example is lysine and/or sulfur rich seed protein encoded by the soybean 2S albumin described in U.S. Patent No. 5,850,016, and the chymotrypsin inhibitor from barley, described in Williamson et al. (1987) Eur. J. Biochem. 165:99-106, the disclosures of which are herein incorporated by reference.
  • Derivatives of the coding sequences can be made by site-directed mutagenesis to increase the level of preselected amino acids in the encoded polypeptide.
  • the gene encoding the barley high lysine polypeptide (BHL) is derived from barley chymotrypsin inhibitor, U.S. Application Serial No. 08/740,682, filed November 1, 1996, and WO 98/20133, the disclosures of which are herein incorporated by reference.
  • Other proteins include methionine-rich plant proteins such as from sunflower seed (Lilley et al. (1989) Proceedings of the World Congress on Vegetable Protein Utilization in Human Foods and Animal Feedstuffs, ed.
  • Applewhite American Oil Chemists Society, Champaign, Illinois), pp. 497-502; herein incorporated by reference
  • corn Pedersen et al. (1986) J. Biol. Chem. 261 :6279; Kirihara et al. (1988) Gene 71 :359; both of which are herein incorporated by reference
  • rice agronomically important genes encode latex, Floury 2, growth factors, seed storage factors, and transcription factors.
  • Insect resistance genes may encode resistance to pests that have great yield drag such as rootworm, cutworm, European Com Borer, and the like.
  • Such genes include, for example, Bacillus thuringiensis toxin protein genes (U.S. Patent Nos. 5,366,892; 5,747,450; 5,736,514; 5,723,756; 5,593,881; and Geiser et al. (1986) Gene 48: 109); and the like.
  • Genes encoding disease resistance traits include detoxification genes, such as against fumonosin (U.S. Patent No. 5,792,931); avirulence (avr) and disease resistance (R) genes (Jones et al. (1994) Science 266:789; Martin et al. (1993) Science 262: 1432; and Mindrinos et al. (1994) Cell 78: 1089); and the like.
  • Herbicide resistance traits may include genes coding for resistance to herbicides that act to inhibit the action of acetolactate synthase (ALS), in particular the sulfonylurea-type herbicides (e.g., the acetolactate synthase (ALS) genecontaining mutations leading to such resistance, in particular the S4 and/or Hra20 mutations), genes coding for resistance to herbicides that act to inhibit action of glutamine synthase, such as phosphinothricin or basta (e.g., the bar gene); glyphosate (e.g., the EPSPS gene and the GAT gene; see, for example, U.S. Publication No.
  • ALS acetolactate synthase
  • ALS sulfonylurea-type herbicides
  • glutamine synthase such as phosphinothricin or basta
  • glyphosate e.g., the EPSPS gene and the GAT gene; see
  • the bar gene encodes resistance to the herbicide basta
  • the nptll gene encodes resistance to the antibiotics kanamycin and geneticin
  • the ALS-gene mutants encode resistance to the herbicide chlorsulfuron.
  • Sterility genes can also be encoded in an expression cassette and provide an alternative to physical detasseling. Examples of genes used in such ways include male tissue-preferred genes and genes with male sterility phenotypes such as QM, described in U.S. Patent No. 5,583,210. Other genes include kinases and those encoding compounds toxic to either male or female gametophytic development.
  • Exogenous products include plant enzymes and products as well as those from other sources including prokaryotes and other eukaryotes. Such products include enzymes, cofactors, hormones, and the like.
  • the level of proteins, particularly modified proteins having improved amino acid distribution to improve the nutrient value of the plant, can be increased. This is achieved by the expression of such proteins having enhanced amino acid content.
  • the synthetic gene made by the method can be incorporated into an expression cassette for expression in a host, or host cell or part thereof.
  • the expression cassette can be further incorporated into an appropriate recombinant vector.
  • the expression cassette may include 5’ and/or 3’ regulatory sequences operably linked to a polynucleotide. “Operably linked” is intended to mean a functional linkage between two or more elements. For example, an operable linkage between a polynucleotide of interest and a regulatory sequence (i.e., a promoter) is a functional link that allows for expression of the polynucleotide of interest. Operably linked elements may be contiguous or non-contiguous.
  • the cassette may additionally contain at least one additional gene to be co-transformed into the organism.
  • the additional gene(s) can be provided on multiple expression cassettes.
  • Such an expression cassette is provided with a plurality of restriction sites and/or recombination sites for insertion of the polynucleotides to be under the transcriptional regulation of the regulatory regions.
  • the expression cassette may additionally contain selectable marker genes.
  • the expression cassette may include in the 5 ’-3’ direction of transcription, a transcriptional and translational initiation region (i.e., a promoter), a polynucleotide (i.e., the synthetic gene) encoding a polypeptide of interest (and optionally coding sequences for one or more linker peptides), and a transcriptional and translational termination region (i.e., termination region) functional in the host organism.
  • the regulatory regions i.e., promoters, transcriptional regulatory regions, and translational termination regions
  • the coding sequence for the polypeptide of interest may be native/analogous to the host cell or to each other.
  • the regulatory regions and/or the coding sequence for the polypeptide of interest may be heterologous to the host cell or to each other.
  • heterologous is a sequence that originates from a foreign species, or, if from the same species, is substantially modified from its native form in composition and/or genomic locus by deliberate human intervention.
  • a promoter operably linked to a heterologous polynucleotide is from a species different from the species from which the polynucleotide was derived, or, if from the same/analogous species, one or both are substantially modified from their original form and/or genomic locus, or the promoter is not the native promoter for the operably linked polynucleotide.
  • a heterologous promoter or the native promoter sequence for the polypeptide of interest, may be used. Such constructs can change the levels of polypeptide expression in the host, or cell or part thereof. Thus, the phenotype of the host, or cell or part thereof, can be altered.
  • the termination region may be native with the transcriptional initiation region, may be native with the operably linked coding sequence for the polypeptide of interest, may be native with the host, or may be derived from another source (i.e., foreign or heterologous) to the promoter, the coding sequence for the polypeptide of interest, the host, or any combination thereof. Selection of suitable termination regions is within the means of one of ordinary skill in the art. For plant hosts, convenient termination regions may include, but are not limited to, those available from the Ti- plasmid of A. turn efaci ens, such as the octopine synthase and nopaline 30 synthase termination regions. See also Guerineau et al. (1991) Mol. Gen. Genet.
  • the expression cassettes may additionally contain 5’ leader sequences.
  • leader sequences can act to enhance translation.
  • Translation leaders are known in the art and include: picornavirus leaders, for example, EMCV leader (Encephalomyocarditis 5’ noncoding region) (Elroy-Stein et al. (1989) Proc. Natl. Acad. Sci. USA 86:6126-6130); potyvirus leaders, for example, TEV leader (Tobacco Etch Virus) (Gallie et al. (1995) Gene 165:233-238), MDMV leader (Maize Dwarf Mosaic Virus) (Kong et al.
  • the various DNA fragments may be manipulated, so as to provide for the DNA sequences in the proper orientation and, as appropriate, in the proper reading frame.
  • adapters or linkers may be employed to join the DNA fragments or other manipulations may be involved to provide for convenient restriction sites, removal of superfluous DNA, removal of restriction sites, or the like.
  • in vitro mutagenesis, primer repair, restriction, annealing, resubstitutions, e.g., transitions and transversions may be involved.
  • a number of promoters can be used in the practice of the invention. The promoters can be selected based on the desired outcome.
  • the synthetic gene can be combined with constitutive, tissue-preferred, inducible, or other promoters for expression in the host organism.
  • suitable constitutive promoters for use in a plant host cell include, without limitation, the core promoter of the Rsyn7 promoter and other constitutive promoters disclosed in WO 99/43838 and U.S. Patent No. 6,072,050; the core CaMV 35S promoter (Odell et al. (1985) Nature 313: 810-812); rice actin (McElroy et al. (1990) Plant Cell 2: 163-171); ubiquitin (Christensen et al. (1989) Plant Mot. Biol. 12: 619-632 and Christensen et al.
  • Wound-inducible promoters may respond to damage caused by insect feeding, and include potato proteinase inhibitor (pin II) gene (Ryan (1990) Ann. Rev. Phytopath. 28: 425-449; Duan et al. (1996) Nature Biotechnology 14: 494- 498); wunl and wun2, US Patent No. 5,428,148; winl and win2 (Stanford et al. (1989) Mol. Gen. Genet. 215:200-208); systemin (McGurl et al.
  • pin II potato proteinase inhibitor
  • pathogen-inducible promoters may be employed in the methods and nucleotide constructs of the present invention.
  • pathogen-inducible promoters include those from pathogenesis-related proteins (PR proteins), which are induced following infection by a pathogen; e.g., PR proteins, SAR proteins, beta-1, 3-glucanase, chitinase, etc. See, for example, Redolfi et al. (1983) Neth. I Plant Pathol. 89: 245-254; Uknes et al. (1992) Plant Cell 4: 645-656; and Van Loon (1985) Plant Mol. Virol. 4: 111-116. See also WO 99/43819, herein incorporated by reference.
  • promoters that are expressed locally at or near the site of pathogen infection. See, for example, Marineau et al. (1987) Plant Mol. Biol. 9:335-342; Matton et al. (1989) Molecular Plant-Microbe Interactions 2:325-331; Somsisch et al. (1986) Proc. Natl. Acad. Sci. USA 83:2427-2430; Somsisch et al. (1988) Mol. Gen. Genet. 2:93-98; and Yang (1996) Proc. Natl. Acad. Sci. USA 93: 14972-14977. See also, Chen et al. (1996) Plant 1 10:955-966; Zhang et al. (1994) Proc.
  • Tissue-preferred promoters can be utilized to target enhanced pesticidal protein expression within a particular plant tissue.
  • Tissue-preferred promoters include those discussed in Yamamoto et al. (1997) Plant J. 12(2)255-265; Kawamata et al. (1997) Plant Cell Physiol. 38(7):792-803; Hansen et al. (1997) Mol. Gen Genet. 254(3):337-343; Russell et al. (1997) Transgenic Res. 6(2): 157-168; Rinehart et al. (1996) Plant Physiol. 112(3): 1331-1341; Van Camp et al. (1996) Plant Physiol. 112(2):525-535; Canevascini et al. (1996) Plant Physiol.
  • Root-preferred or root-specific promoters are known and can be selected from the many available from the literature or isolated de novo from various compatible species. See, for example, Hire et al. (1992) Plant IVIol. Biol. 20(2):207-218 (soybean root-specific glutamine synthetase gene); Keller and Baumgartner (1991) Plant Cell 3(10): 1051-1061 (root-specific control element in the GRP 1.8 gene of French bean); Sanger et al. (1990) Plant Mol. Biol. 14(3):433-443 (rootspecific promoter of the mannopine synthase (MAS) gene of Agrobacterium Uimefaciens): and Miao et al.
  • MAS mannopine synthase
  • the expression cassette will comprise a selectable marker gene for the selection of transformed cells.
  • Selectable marker genes are utilized for the selection of transformed cells or tissues.
  • Marker genes include genes encoding antibiotic resistance, such as those encoding neomycin phosphotransferase II (NEO) and hygromycin phosphotransferase (HPT), as well as genes conferring resistance to herbicidal compounds, such as glufosinate ammonium, bromoxynil, imidazolinones, and 2,4-dichlorophenoxyacetate (2,4-D).
  • Additional examples of suitable selectable marker genes include, but are not limited to, genes encoding resistance to chloramphenicol (Herrera Estrella et al.
  • selectable marker genes are not meant to be limiting. Any selectable marker gene can be used in the present invention.
  • Expression cassettes comprising a synthetic gene can be introduced into a host cell for expression in a host, or cell or part thereof.
  • a “host, or cell or part thereof’ refers to any organism, or cell, or part of that organism, that can be used as a suitable host for expressing the synthetic gene. It is understood that such a phrase refers not only to the particular host, or cell or part thereof, but also to the progeny or potential progeny thereof. Because certain modifications may occur in succeeding generations due to either mutation or environmental influences, such progeny may not, in fact, be identical to the parent, but are still included within the scope of the phrase as used herein.
  • introducing is intended to mean presenting to the host cell the synthetic gene, or the expression cassette comprising the synthetic gene, in such a manner that the synthetic gene gains access to the interior of a cell.
  • the methods of the invention do not depend on a particular method for introducing a synthetic gene or expression cassette into a host cell, only that the synthetic gene or expression cassette gains access to the interior of at least one cell of the host.
  • Methods for introducing a synthetic gene or an expression cassette into plants are known in the art including, but not limited to, stable transformation methods, transient transformation methods, and virus-mediated methods.
  • the synthetic gene is expressed in a prokaryotic host, or cell or part thereof, or a eukaryotic host, or cell or part thereof.
  • the host is an invertebrate host, or cell or part thereof, or a vertebrate host, or cell or part thereof.
  • the host, or cell or part thereof may be, but is not limited to, a bacterium, a fungus, yeast, a nematode, an insect, a fish, a plant, an avian, an animal, or a mammal.
  • Mammalian hosts, or cells or parts thereof, that are suitable for expression of the synthetic gene are known to those of ordinary skill in the art, and may include, but are not limited to, hamsters, mice, rats, rabbits, cats, dogs, bovine, goats, cows, pigs, horses, sheep, monkeys, or chimpanzees. Mammalian cells or mammalian parts may also be derived from humans, and the selection of such cells or parts would be known to those of ordinary skill in the art. The selection of suitable bacterial hosts for expression of a synthetic gene is known to those of ordinary skill in the art. In selecting bacterial hosts for expression, suitable hosts may include those shown to have, inter alia, good inclusion body formation capacity, low proteolytic activity, and overall robustness.
  • Bacterial hosts are generally available from a variety of sources including, but not limited to, the Bacterial Genetic Stock Center, Department of Biophysics and Medical Physics, University of California (Berkeley, Calif.); and the American Type Culture Collection (“ATCC”) (Manassas, Va.).
  • ATCC American Type Culture Collection
  • suitable yeast hosts for expression of a synthetic gene is known to those of ordinary skill in the art, and may include, but is not limited to, ascosporogenous yeasts (Endomycetales), basidiosporogenous yeasts and yeast belonging to Fungi Imperfecti (Blastomycetes).
  • suitable hosts may include those shown to have, inter alia, good secretion capacity, low proteolytic activity, and overall vigor.
  • Yeast and other microorganisms are generally available from a variety of sources, including the Yeast Genetic Stock Center, Department of Biophysics and Medical Physics, University of California, Berkeley, California; and the American Type Culture Collection, Rockville, Maryland. Since the classification of yeast may change in the future, yeast shall be defined as described in Skinner et al., eds. 1980) Biology and Activities of Yeast (Soc. App. Bacteriol. Symp. Series No. 9).
  • suitable insect hosts for expression of a synthetic gene is known to those of ordinary skill in the art, and may include, but is not limited to, Aedes aegypti, Bombyx mori. Drosophila melanogaster , Spodoptera fnigiperda. and Trichoplusia ni.
  • Insect cells suitable for the expression of a synthetic gene include, but are not limited to, SF9 cells, and others also well known to those of ordinary skill in the art.
  • suitable hosts may include those shown to have, inter alia, good secretion capacity, low proteolytic activity, and overall robustness.
  • Insect hosts are generally available from a variety of sources including, but not limited to, the Insect Genetic Stock Center, Department of Biophysics and Medical Physics, University of California (Berkeley, Calif.); and the American Type Culture Collection (“ATCC”) (Manassas, Va.)
  • ATCC American Type Culture Collection
  • plant also includes plant cells, plant protoplasts, plant cell tissue cultures from which plants can be regenerated, plant calli, plant clumps, and plant cells that are intact in plants or parts of plants such as embryos, pollen, ovules, seeds, leaves, flowers, branches, fruit, kernels, ears, cobs, husks, stalks, roots, root tips, anthers, and the like. Progeny, variants, and mutants of the regenerated plants are also included, provided that these parts comprise the introduced polynucleotides.
  • any plant species may be utilized as a host, including, but not limited to, monocots and dicots.
  • plant species of interest include, but are not limited to, corn (Zea mays), Brassica sp. (e.g., B. napus, B. rapa, B.
  • juncea alfalfa (Medicago sativa), rice (Oryza sativa), rye (Secale cereale), sorghum (Sorghum bicolor, Sorghum vulgare), millet (e.g., pearl millet (Pennisetum glaucum), proso millet (Panicum miliaceum), foxtail millet (Setaria italica), finger millet (Eleusine coracana)), sunflower (Helianthus annuus), safflower (Carthamus tinctorius), wheat (Triticum aestivum), soybean (Glycine max), tobacco (Nicotiana tabacum), potato (Solanum tuberosum), peanuts (Arachis hypogaea), cotton (Gossypium barbadense, Gossypium hirsutum), sweet potato (Ipomoea batatus), cassava (Manihot esculenta), coffee (Coffea spp.),
  • Vegetables of interest include tomatoes (Lycopersicon esculentum), lettuce (e.g., Lactuca sativa), green beans (Phaseolus vulgaris), lima beans (Phaseolus limensis), peas (Lathyrus spp.), and members of the genus Cucumis such as cucumber (C. sativus), cantaloupe (C. cantalupensis), and musk melon (C. melo).
  • tomatoes Locopersicon esculentum
  • lettuce e.g., Lactuca sativa
  • green beans Phaseolus vulgaris
  • lima beans Phaseolus limensis
  • peas Lathyrus spp.
  • members of the genus Cucumis such as cucumber (C. sativus), cantaloupe (C. cantalupensis), and musk melon (C. melo).
  • Ornamentals include azalea (Rhododendron spp.), hydrangea (Macrophylla hydrangea), hibiscus (Hibiscus rosa-sinensis), roses (Rosa spp.), tulips (Tulipa spp.), daffodils (Narcissus spp.), petunias (Petunia hybrida), carnation (Dianthus caryophyllus), poinsettia (Euphorbia pulcherrima), and chrysanthemum.
  • Conifers of interest include, for example, pines such as loblolly pine (Pinus taeda), slash pine (Pinus elliottii), ponderosa pine (Pinus ponderosa), lodgepole pine (Pinus contorta), and Monterey pine (Pinus radiata); Douglas-fir (Pseudotsuga menziesii),' Western hemlock (Tsuga canadensis),' Sitka spruce (Picea glauca),' redwood (Sequoia sempervirens),' true firs such as silver fir (Abies amabilis) and balsam fir (Abies balsamea),' and cedars such as Western red cedar (Thuja plicata) and Alaska yellow-cedar (Chamaecyparis nootkatensis).
  • pines such as loblolly pine (Pinus taeda), slash pine (Pinus elli
  • Hardwood trees can also be employed including ash, aspen, beech, basswood, birch, black cherry, black walnut, buckeye, American chestnut, cottonwood, dogwood, elm, hackbeny, hickory, holly, locust, magnolia, maple, oak, poplar, red alder, redbud, royal paulownia, sassafras, sweetgum, sycamore, tupelo, willow, yellow-poplar.
  • the plants or cells, or parts thereof are crop plants (for example, corn, alfalfa, sunflower, Brassica, soybean, cotton, safflower, peanut, sorghum, wheat, millet, tobacco, sugarcane etc.).
  • crop plants for example, corn, alfalfa, sunflower, Brassica, soybean, cotton, safflower, peanut, sorghum, wheat, millet, tobacco, sugarcane etc.
  • turfgrasses such as, for example, annual bluegrass (Poa annua annual ryegrass (Lolium multiflorum),' Canada bluegrass (Poa compressa),' Chewings fescue (Festuca rubra),' colonial bentgrass (Agrostis tenuis),' creeping bentgrass (Agrostis palustris),' crested wheatgrass (Agropyron desertorum),' fairway wheatgrass (Agropyron cristatum),' hard fescue (Festuca trachyphylla),' Kentucky bluegrass Poa pratensis),' orchardgrass (Dactylis glomerata),' perennial ryegrass Lolium perenne),' red fescue (Festuca rubra ,' redtop (Agrostis alba),' rough bluegrass Poa trivialis),' sheep fescue (Festuca ovina),' smooth bromegrass Bromus inermis),' tall fescue (Festuca
  • Augustine grass (Stenotaphrum secundatum),' zoysia grass Zoysia spp.),' Bahia grass (Paspalum notatum),' carpet grass (Axonopus affmis),' centipede grass (Eremochloa ophiuroides),' kikuyu grass (Pennisetum clandesinum),' seashore paspalum (Paspalum vaginatum),' blue gramma (Bouteloua gracilis),' buffalo grass (Ruchloe dactyloids),' sideoats gramma (Bouteloua curtipendula).
  • Plants of interest further include grain plants that provide seeds of interest, oil-seed plants, and leguminous plants.
  • Seeds of interest include grain seeds, such as com, wheat, barley, rice, sorghum, rye, millet, etc.
  • Oil-seed plants include cotton, soybean, safflower, sunflower, Brassica, maize, alfalfa, palm, coconut, flax, castor, olive etc.
  • Leguminous plants include beans and peas. Beans include guar, locust bean, fenugreek, soybean, garden beans, cowpea, mungbean, lima bean, fava bean, lentils, chickpea, etc.
  • reagents useful in transfecting such hosts for example calcium phosphate and DEAE-dextran or liposome formulations, are available from Stratagene Cloning Systems, or Life Technologies Inc., Gaithersburg, Md. 20877, USA. Electroporation is also useful for transforming and/or transfecting cells and is well known in the art for transforming yeast, bacteria, insect cells and vertebrate cells.
  • a successfully transformed host, or cell or part thereof, i.e., one that contains a synthetic gene, and which is expressing the encoded polypeptide can be identified using well-known techniques. For example, cells resulting from the introduction of an expression cassette can be grown to produce the polypeptide encoded by the synthetic gene. Cells can be harvested and lysed, and their DNA content examined for the presence of the synthetic gene using a method such as that described by Southern (1975) J. Mol. Biol. 98:503; or Berent et al. (1985) Biotech. 3:208. Alternatively, the presence of the encoded polypeptide in the supernatant can be detected using antibodies and methods known to those of ordinary skill in the art.
  • successful transformation can be confirmed by well-known immunological methods when the recombinant DNA is capable of directing the expression of the encoded polypeptide.
  • cells successfully transformed with an expression vector produce polypeptides displaying appropriate antigenicity.
  • Samples of cells suspected of being transformed may be harvested and assayed for the encoded polypeptide using suitable antibodies.
  • suitable antibodies For stable transfection of a host, or cell or part thereof, it is known that, depending upon the expression vector and transfection technique used, only a small fraction of cells may integrate the foreign DNA into their genome.
  • a gene that encodes a selectable marker is generally introduced into the host, or cell or part thereof, along with the gene of interest.
  • selectable markers may include those which confer resistance to drugs, such as G418, hygromycin, and methotrexate.
  • a nucleic acid encoding a selectable marker can be introduced into a host, or cell or part thereof, on the same vector as that comprising the synthetic gene, or alternatively introduced on a separate vector.
  • a host, or cell, or part thereof, that is stably transfected with the introduced nucleic acid can be identified by drug selection.
  • a method for making a synthetic gene comprising:
  • step (c) selecting a target expression system, and creating a preliminary codon optimized polynucleotide sequence by selecting said target system’s optimal codons for each amino acid in said amino acid sequence in step (b);
  • the optimized polynucleotide sequence comprises the optimal sets of added motifs
  • one or more of the counted motif is selected from a group consisting of AATAAT, AACCAA, ATATAA, ATACTA, ATAAAA, ATGAAA, AAGCAT, ATTAAT, ATACAT, AAAATA, ATT AAA, AATTAA, AATACA, CATAAA, AATAAA, and AATCAA.
  • counted motifs are selected from one or more of: AAGCAT, AATAAT, ATTAAT, AACCAA, ATACAT, ATATAA, AAAATA, ATT AAA, ATACTA, AATTAA, ATAAAA, AATACA, ATGAAA, CATAAA.
  • TATTAT TGTTTG, TTATAT, TGTAAT, AAATAA, AATTTT, TTTTTA, TAATTT, TTAATT, AAATTT, TTTGTT, ATTATT, ATTTTA, TTTAAT, and TTTTAA.
  • optimized polynucleotide sequence comprises at least one more counted motif than the native sequence, wherein the and optimized polynucleotide sequence comprises a minimum of four counted motifs.
  • TTATTT TTTATT, TAATAA, ATTTAT, TATATT, TTTTAT, ATATTT, TATTAT, TGTTTG,
  • TTATAT TTATAT
  • TGTAAT AAATAA
  • AATTTT TTTTTA
  • TAATTT TTAATT
  • AAATTT TTTGTT
  • ATATAT ATATAT, TAATTT, TTAATT, AAATTT, AAATAA, ATATTT, TTTGTT, TTGTTT, ATTATT,
  • ATTTTA, TTTAAT, and TTTTAA ATTTTA, TTTAAT, and TTTTAA; or b) ATATAT, TTGTTT, TTTTGT, TGTTTT, TATATA, TATTTT, TTTTTT, ATTTTT, TTATTT, TTTATT, TAATAA, ATTTAT, TATATT, TTTTAT, ATATTT, TATTAT, TGTTTG,
  • TTATAT TTATAT
  • TGTAAT AAATAA
  • AATTTT TTTTTA
  • TAATTT TTAATT
  • AAATTT TTTGTT
  • ATTATT ATTTTA
  • ATTTTA TTTAAT
  • TTTAAA TTATAAA
  • the new host is identified, amino acid sequence is back translated into a preliminary DNA sequence using the new host’s optimal codon choice for each amino acid (which can be defined by one of multiple methods, including codon frequency tables or hidden Markov models). See figure 1 A for the initial conditions.
  • the optimal set of added motifs (i.e. to achieve the goal of at least one more counted motif than the native sequence and a minimum of four) is selected based on the score from step 4.
  • the GC content of the overall sequence is iteratively altered until it falls within the desired range of the new host. Extreme GC regions are identified using a sliding window approach (where the window is 4 amino acids wide). Within the extreme (high or low) regions of the sequence, all possible alternative codon combinations are tested in the order of most- to least-preferred codons. Alterations are selected that will move the GC towards the new host range. This process is repeated until the GC is at the center of the new host range.
  • Polynucleotide repeats greater than length N are removed by sampling codons most- to least-preferred for the corresponding amino acids until all polynucleotide repeats greater than length N have been removed.
  • the algorithm repeats any necessary steps. For example, by removing a polynucleotide re-peat, the algorithm may have added back an avoided motif. To avoid infinite loops (cycling back and forth between competing edits), a codon “lock” feature is included, so that essential changes can be made unalterable by subsequence steps.
  • the x-axis displays the amino acids corresponding to an AgBiome protein; the y-axis displays the codon selection rank of the new host (the bottom row consists of the most preferred codon for the new host, and higher rows display less-preferred codons in rank order).
  • Different amino acids can be coded for by as few as one or as many as 6 codons.
  • the horizontal line with multiple peaks indicates the codon selections.
  • the algorithm initializes with a flat horizontal line across the bottom (Fig. 1 A) and proceeds to select less preferred codons to improve GC %, etc (Fig. IB).
  • the method can optimize the Gene A from the native sequence set forth in SEQ ID NO: 1 to the optimized version set forth in SEQ ID NO: 2.
  • Figure 2 shows expression of the optimized form of Gene A. TO plants were analyzed by Western Blot with protein-specific antibodies. The positive control is the purified protein and negative control is untransformed Bl 04 tissue.
  • This experiment was aimed to compare expression of the maize optimized APG06396.5 (new algorithm optimization and passes QC) with and without a chloroplast target signal sequence compared to expression of the native and non-optimized APG06396 in the N. benthamiana transient system.
  • APG06396 and APG06396.5 are as follows: >APG06396 (SEQ ID NO: 1, see above) ATGCATTCTGAAGATATTAAAGAAAAAACACTTACCTGGTTTAACTACATTACCAGTCCGGTAAATAATGAA GATGTATTTATGCGAAGCTCACAGGATATACTTGTTATGAATCCTGCGATAGCAGCTGCAACGCAAGAGTAT ATCGATGGAAATACTCACGATAGTCAGCTATTCAACACACCATCATCAGCCTCAAACGATGTTTGATGGC CTGCAAACCATTGTAAACCTTTGCCGTGTGTGCAATCAGGTTATAATGCACTTGATCCTAATGGAACCGGAAGT AAGGCGTATTTTACGAAATTTACTCAGAACATAGCAAATGTTCCGTGCCTGACGTTGTTGAGTGCGGAAACA AAAAATATTAAACAACAAAGCCATAATGCAGATGCTCATCAACTCATTTGTCGATGCTTTTGATGGGCTT ACACAAAGCGACCAGTCTAAGCATCCGTCATCCGTCCATAAT
  • the Agrobacterium strain EHA101 harboring pCAMBIA2301 containing a gene of interest was cultured in terrific broth (TB) medium containing kanamycin (50 pg/mL) and spectinomycin (100 pg/mL) with agitation at 225 rpm at 26-28 °C.
  • the re-suspended Agrobacterium cells were allowed to sit at room temperature for 1-3 hours before use.
  • the Agrobacterium mixture was infiltrated into the abaxial side of 3.5 -week-old N benthamiana leaves using a needless syringe. The infiltrated regions were outlined with a color paint pen for downstream evaluation.
  • the infiltrated N. benthamiana was grown at 23 °C and analyzed for expression by Western blot after day post-infiltration.

Abstract

Provided herein is a method for generating codon-optimized polynucleotide sequence for expressing a protein of interest in a target expression system. The said method comprises steps including identifying and preserving certain motifs, identifying and removing certain other motifs, optimizing GC content and removing undesired polynucleotide repeats.

Description

METHOD FOR DESIGNING SYNTHETIC NUCLEOTIDE SEQUENCES
CROSS-REFERENCE TO RELATED APPLICATIONS
This application claims priority to U.S. Provisional Application No. 63/300,447, filed January 18, 2022, which is incorporated by reference herein in its entirety.
STATEMENT REGARDING THE SEQUENCE LISTING
The Sequence Listing associated with this application is provided in XML format in lieu of a paper copy, and is hereby incorporated by reference into the specification. The XML copy named A101100_1710WO_0618_0_Seq_List is 6,025 bytes in size, was created on January 18, 2023, and is being submitted electronically via USPTO Patent Center.
BACKGROUND
Optimization of the nucleic acid sequence of a gene is desirable in a number of circumstances. Genes that are cloned from one organism and then expressed in another of a different type (i.e., transgenes) often fail to express, or express poorly, in the target organism. Sequence optimization can correct this poor expression. In other cases, it may be desirable to alter or tune the expression of a specific gene in its native or recombinant context. Optimization may also be desirable not due to the target organism or system, but due to the technical manipulations used during the manipulation of a gene. One example is the removal of restriction endonuclease sites in order to simplify the construction of a vector for generation of transgenic organisms. In many cases, optimization removes known sequences that have a negative impact on gene expression in the target organism or expression system. For example, some sequences can cause message termination in transcribed RNA, cause folding of the nascent RNA that hinders translation, or can target transcribed RNA for degradation. Removing these sequences can improve expression by stabilizing the RNA message and allowing more abundant translation into protein. Other sequences may impact messages through other mechanisms, for example, by directing splicing or trafficking of the RNA or by changing the rates of protein transcription.
However, not all sequences and sequence patterns that govern protein production from a gene have been discovered, and sequences which were previously believed to govern protein production do not always do so in every circumstance. For example, some sequences and sequence patterns behave differently in different organisms or even in closely related organisms, or behave in other complex ways based on context or other poorly recognized variables. Therefore, it would be advantageous to develop further methods for gene optimization that account for potential regulatory effects of sequences and sequence patterns found in genes and in systems in which genes are heterologously expressed.
SUMMARY OF THE INVENTION
Provided herein is a method for generating codon-optimized polynucleotide sequence for expressing a protein of interest in a target expression system. The said method comprises steps including identifying and preserving certain motifs, identifying and removing certain other motifs, optimizing GC content and removing undesired polynucleotide repeats.
BRIEF DESCRIPTION OF THE DRAWINGS
Figure 1 A and Figure IB show a visualization of an optimization solution. The x-axis displays the amino acids corresponding to an AgBiome protein; the y-axis displays the codon selection rank of the new host; different amino acids can be coded for by as few as one or as many as 6 codons. Fig. 1 A shows the initial conditions; Fig. IB shows the optimization solution.
Figure 2 shows expression of the optimized form of Gene A in TO plants.
Figure 3 shows the Western blot reflecting the expression of APG06396 (not optimized) vs APG06396.5 (optimized) in transformed TV. benthamiana. The constructs in bold are the optimized version of APG06396 (APG06396.5), while the constructs not in bold are un-optimized version or native version (932 and APG06396).
DETAILED DESCRIPTION
Methods for making a synthetic gene are provided. The methods find use in optimizing a candidate nucleic acid sequence of a gene for expression in a target expression system. A target expression system can include, for example, a target organism or an in vitro expression system.
The utility of changing codon usage while optimizing genes for use in heterologous expression or transgene systems is well known in the art (Novoa et al., Trends in Genetics (2012) Vol. 28(11): 574-581 ; Plotkin et al. (Nature Reviews. Genetics (2011) Vol. 12(l):32-42). There are many patterns observed in codon bias and other compositional patterns in expressed genes that may have effects on translation, message longevity, and transcription (U.S. Patent Nos. 5,380,831 and 5,436,391; Murray et al., Nucleic Acids Res. (1989) Vol. 17:477-498; Ji et al., BMC Bioinformatics (2007) Vol. 8(1):43; Graber et al. PNAS (1999) Vol. 96(24): 14055-14060).
Additional sequence modifications are known to enhance gene expression in a cellular host. These include elimination of sequences encoding spurious polyadenylation signals, RNA destabilizing sites, termination signals, exon-intron splice site signals, transposon-like repeats, and other such well-characterized sequences that may be deleterious to gene expression. The G-C content of the sequence may be adjusted to levels average for a given cellular host, as calculated by reference to known genes expressed in the host cell. When possible, the sequence can also be modified to avoid predicted hairpin secondary mRNA structures.
Furthermore, others have reported that removing termination and polyadenylation signals (deleterious sites) can be critical (Proudfoot Genes and Development (2011) Vol. 25(17): 1770- 1782); Shen Nucleic Acids Research (2008) Vol. 36(9): 3150-3161). The AATAAA motif is the canonical terminator for eukaryotes, but there are others that are thought to be signals of varying strength in different groups of eukaryotic organisms (Graber et al. PNAS (1999) Vol. 96(24): 14055-14060). Some methods include the identification of specific polyadenylation and RNA destabilizing sequences that potentially act as termination sites in a gene sequence and their removal for optimization of the gene sequence. For example, Fischhoff et al. (U.S. Patent No. 7,741,118) describes a method of gene optimization that can include that the number of select polyadenylation sequences is reduced in the optimized gene sequence.
The present method allows for optimization of a native polynucleotide sequence for use in expression of the corresponding polypeptide encoded by the native polynucleotide sequence. The method can be applied to the optimization of any DNA or RNA sequence.
Definitions
A method for making a synthetic gene for expression in a target expression system is provided. The method comprises identifying and counting one or more counted motifs in a native polynucleotide sequence. Such counted motifs are those sequences that should be preserved in the synthetic construct (i.e., the optimized version of the gene). The method further comprises identifying and removing “avoided motifs”, optimizing GC contents, and identifying and removing excessive polynucleotide repeats greater than a set number, as appropriate for the given gene.
In some embodiments, the method comprises:
(a) identifying and counting at least one counted motif in a native polynucleotide sequence;
(b) converting the native polynucleotide sequence to the corresponding amino acid sequence;
(c) selecting a target expression system, and creating a preliminary codon optimized polynucleotide sequence by selecting said target system’s optimal codons for each amino acid in said amino acid sequence in step (b); (d) generating an optimized polynucleotide sequence for use in the expression of the amino acid sequence in the said target system by repeating the process of modifying at least one codon for at least one amino acid position, such that the criteria set forth in (i)-(v) are satisfied in order:
(i) the optimized polynucleotide sequence comprises the optimal sets of added motifs;
(ii) the extreme GC regions are identified and altered such that the GC content of the optimized polynucleotide sequence is within desired range of the said target system;
(iii) no avoided motif is present in the optimized polynucleotide sequence;
(iv) no polynucleotide repeats greater than a set number is present in the optimized polynucleotide sequence;
(v) repeat the earliest step of steps (i)-(i v), wherein the criterion of the said earliest step is made unmet by any step after it;
(e) making a synthetic gene comprising the selected optimized polynucleotide sequence.
As used herein, “optimization” refers to modifications of the nucleic acid sequence of a gene, or other nucleic acid sequence encoding a separate nucleic acid or protein, that modulate the expression of the gene when expressed in a selected target expression system. In some embodiments, optimization of the gene results in enhanced expression of the synthetic gene when expressed by the target expression system. Expression of the optimized gene can be enhanced by at least 5%, 10%, 15%, 20%, 25%, 50%, 75%, 100%, 200%, 500%, 1000%, or more, when compared to the expression of the native gene in the target expression system under similar conditions. In some embodiments, a gene can be optimized to achieve a desired level of expression, higher or lower than the expression of the native gene in the target expression system under similar conditions.
As used herein, “modifying” or “modification” refers to a change in the candidate gene nucleic acid sequence that can include, for example, substitutions, deletions, truncations, and/or insertions. In specific embodiments, modifications to the nucleic acid sequence, such as substitutions of codons or sequence patterns, or the removal of avoided motifs, do not alter the encoded amino acid sequence. In some embodiments, a modification is an essential modification. An essential modification refers to a modification that cannot be altered by subsequent modifications such that the algorithm avoids infinite loops. In a specific embodiment, an essential modification comprises adding optimal sets of added motifs as set forth in (i). In another specific embodiment, an essential modification comprises optimizing GC content as set forth in (ii). In another specific embodiment, an essential modification comprises removing avoided motif as set forth in (iii). In another specific embodiment, an essential modification comprises removing avoided motif removing polynucleotide repeats greater than a set number as set forth in (iv).
As used herein, all polynucleotide sequences written using the nucleic acid standard notation of the International Union of Pure and Applied Chemistry (IUPAC, Biochemistry (1970) Vol. 9:4022-4027); adenine (A), thymine (T), guanine (G), and cytosine (C) are equivalent to the corresponding RNA polynucleotide sequences. Therefore, “T” (Thymine) in all sequences is equivalent to “U” (uracil). For example, the sequence AATAAA in a DNA coding strand would also indicate the corresponding mRNA sequence AAUAAA.
As used herein, the use of the term “DNA”, “nucleic acid”, or “polynucleotide” is not intended to limit the present invention to polynucleotides comprising DNA. Those of ordinary skill in the art will recognize that polynucleotides can comprise ribonucleotides and combinations of ribonucleotides and deoxyribonucleotides. Such deoxyribonucleotides and ribonucleotides include both naturally occurring molecules and synthetic analogues. The polynucleotides of the invention also encompass all forms of sequences including, but not limited to, single-stranded forms, doublestranded forms, hairpins, stem-and-loop structures, and the like.
As used herein, a “native polynucleotide sequence” or “native sequence” optimized by the method can be any gene of interest and can encode any polypeptide of interest. Exemplary genes of interest that can be optimized by the method are further described elsewhere herein. The native polynucleotide sequence can be a modified sequence that does not exist in the native, or natural, form. In particular embodiments, the native polynucleotide sequence is the polynucleotide sequence prior to optimization.
The optimized polynucleotide sequence is generated in the methods disclosed herein by applying the algorithm to the native polynucleotide sequence. Modifications to the native polynucleotide sequence can be made in silico within sequence windows. Different parts of the statistical model can be applied concurrently or sequentially.
As used herein, “polynucleotide repeat” refers to a polynucleotide having tandemly linked repetitive nucleotide. In some embodiments, the repetitive nucleotide consists of a single base pair. In other embodiments, the repetitive nucleotide consists of 2 base pairs. In other embodiments, the repetitive nucleotide consists of 3 base pairs, such as CAG, CGG, CTG, GAA, GCC and GCG. In other embodiments, the repetitive nucleotide consists of 4 base pairs, such as CCTG. In other embodiments, the repetitive nucleotide consists of 5 base pairs, such as TGGAA. In other embodiments, the repetitive nucleotide consists of 6 base pairs, such as GGCCTG and GGGGCC. In other embodiments, the repetitive nucleotide consists of 7 base pairs. In other embodiments, the repetitive nucleotide consists of 8 base pairs. In other embodiments, the repetitive nucleotide consists of 9 base pairs. In other embodiments, the repetitive nucleotide consists of 10 base pairs. In other embodiments, the repetitive nucleotide consists of 11 base pairs. In other embodiments, the repetitive nucleotide consists of 12 base pairs, such as CCCCGCCCCGCG (Handb Clin Neurol. 2018; 147: 105-123).
In some embodiments, the polynucleotide repeat comprises at least 3 repetitive nucleotides. In some embodiments, the polynucleotide repeat comprises at least 4 repetitive nucleotides. In some embodiments, the polynucleotide repeat comprises at least 5 repetitive nucleotides. In some embodiments, the polynucleotide repeat comprises at least 6 repetitive nucleotides. In some embodiments, the polynucleotide repeat comprises at least 7 repetitive nucleotides. In some embodiments, the polynucleotide repeat comprises at least 8 repetitive nucleotides. In some embodiments, the polynucleotide repeat comprises at least 9 repetitive nucleotides. In some embodiments, the polynucleotide repeat comprises at least 10 repetitive nucleotides.
As used herein, the term “rare codon” refers to a codon which is not often used or never used in a target organism or system. For example, when the target expression system is a soy plant or soy plant cell, rare codon is one or more of: TTA, CTA, TCG, CCG, ACG, GCG, CGA, and CGG. In other embodiments, when the target expression system is a corn plant or corn plant cell, rare codon is one or more of: TTA, CTA, GT A, CGT, AGT, and CGA.
A “target expression system” refers to, without limitation, any in vivo or in vitro expression system that facilitates expression of the synthetic gene. In one embodiment, a target expression system can be a target organism, or cell or part thereof. Exemplary target organisms that can be employed by the method are further described elsewhere herein. In another embodiment, a target expression system can be any in vitro expression system known in the art that allows for cell-free recombinant expression of the synthetic gene. In vitro expression systems can support protein synthesis from DNA templates (transcription and translation) or from mRNA templates (translation only), and can be designed to accomplish transcription and translation steps as two separate sequential reactions or concurrently as one reaction. In particular embodiments, the target expression system is a bacterial cell for in vitro expression and/or a plant cell for in vivo expression.
The method identifies and counts one or more counted motifs or added motifs in the native polynucleotide sequence. As used herein, “counted motif’ refer to specific sequences, codons, or sequence patterns in the native polynucleotide sequence that are identified and counted, and subsequently not deleted in the primary strand of the optimized polynucleotide. In some embodiments, an optimized polynucleotide sequence must comprise at least one more counted motif than that contained in the native polynucleotide sequence (i.e., a greater number). In some embodiments, an optimized polynucleotide sequence must comprise at least 4 counted motifs. In particular embodiments, the number of counted motifs in the optimized sequence is not the same as the number of counted motifs in the native sequence (i.e., the former number is greater than the latter number). Moreover, a counted motif could be moved to form the optimized sequence such that the particular counted motif exists in the same number, but different location. In particular embodiments, the location of any counted motif in the optimized polynucleotide sequence is different from the location of the counted motif in the native polynucleotide sequence. In some embodiments, the location of all counted motifs present in the optimized polynucleotide sequence is the same as the location of the counted motif in the native polynucleotide sequence. Counted motifs need not be present in the optimized sequence. Accordingly, in some embodiments, the optimized polynucleotide sequence does not comprise a counted motif.
In some embodiments, the counted motifs are selected from the group consisting of: AAGCAT, AATAAT, ATTAAT, AACCAA, ATACAT, ATATAA, AAAATA, ATTAAA, ATACTA, AATTAA, ATAAAA, AATACA, ATGAAA, CATAAA. In particular embodiments, the number of counted or added motifs is selected from the group consisting of AATAAT, AACCAA, ATATAA, ATACTA, ATAAAA, ATGAAA, AAGCAT, ATTAAT, ATACAT, AAAATA, ATTAAA, AATTAA, AATACA, and CATAAA.
In some embodiments, counted motifs can be scored by the method. Exemplary counted motifs are described in the Examples provided herein.
As used herein, “added motif’ refers to a subset of the counted motifs which are prioritized for addition, that is, when adding motifs to the optimized sequence, only the sequences set forth by added motifs are considered. In some embodiments, added motifs can be scored by the method. In some embodiments, the optimal set of added motifs is selected based on the score by the method. In specific embodiments, the optimal set of added motifs can be at least 4 counted motifs. In some embodiments, the number of counted motifs is added such that the number of counted motifs in the optimized sequence is the same or greater than the number of counted motifs in the native polynucleotide sequence. The optimal set of added motifs could refer to at least one required motif or a number of required motifs to result in an equal or greater number of required motifs in the optimized polynucleotide sequence when compared to the native polynucleotide sequence. Exemplary added motifs are described in the Examples provided herein. The method further identifies any avoided sequences that may be present in the native polynucleotide sequence. As used herein, “avoided sequence” refer to specific sequences or sequence patterns that are actively excluded, as much as possible, from both the primary strand and the reverse strand of the optimized sequence. In one embodiment, an avoided motif may be an ATTTA motif. In another embodiment, an avoided motif can be a restriction endonuclease site. Modification or removal of restriction endonuclease sites can modulate gene expression in the target expression system and/or promote the incorporation and function of the synthetic gene in an expression cassette or recombinant vector where particular restriction endonuclease sites may be problematic. In further embodiments, an avoided motif can comprise polyadenylation sites, termination sites, RNA destabilizing sites, ATTTA motifs, exon-intron splice site signals, transposon-like repeats, other such well-characterized sequences that may be deleterious to gene expression, restriction enzyme recognition sites, exon-intron splice site signals, known transposonlike repeats, or known patterns with strong RNA secondary structures. In specific embodiments, avoid motifs must be included in the optimized polynucleotide sequence for proper expression of the gene or other reasons specific to the sequence being optimized. However, it is understood that the number of avoided sequences should be limited or excluded from the optimized sequence. Exemplary avoided motifs are described in the Examples provided herein.
As used herein, “required motif’ refers to specific sequences or sequence patterns of which at least one copy should be found in the optimized polynucleotide sequence. In some embodiments, a required motif is an ATATAT motif. Exemplary required motifs are described in the Examples provided herein. In some embodiments, the required motifs are selected from the group consisting of ATATAT, TTGTTT, TTTTGT, TGTTTT, TAT AT A, TATTTT, TTTTTT, ATTTTT, TTATTT, TTTATT, TAATAA, ATTTAT, TATATT, TTTTAT, ATATTT, TATTAT, TGTTTG, TTATAT, TGTAAT, AAATAA, AATTTT, TTTTTA, TAATTT, TTAATT, AAATTT, TTTGTT, ATTATT, ATTTTA, TTTAAT, and TTTTAA.
As used herein, “polynucleotide repeat” refers to any stretch of the same repeated nucleotide, which includes adenine (A), guanine (G), cytosine (C), and thymine (T). In some embodiments, a polynucleotide repeat comprises at least 2, at least 3, at least 4, at least 5, at least 10, at least 15, at least 20, at least 40 repeated nucleotides. In some embodiments, an optimized polynucleotide sequence does not comprise any polynucleotide repeat with at least 6 repeated nucleotides. In other embodiments, an optimized polynucleotide sequence does not comprise any polynucleotide repeat with at least 5 repeated nucleotides. In some embodiments, an optimized polynucleotide sequence does not comprise any polynucleotide repeat with at least 4 repeated nucleotides. In other embodiments, an optimized polynucleotide sequence does not comprise any polynucleotide repeat with at least 3 repeated nucleotides.
As used herein, “spurious ORF” refers to any unintended open reading frame except the first open reading frame. In some embodiments, a spurious ORF with a length of at least 70 nucleotides receives bad scores by the method. Spurious ORF is also described in the Examples provided herein,
As used herein, “GC content” refers to the is the percentage of nitrogenous bases in a DNA or RNA molecule that are either guanine (G) or cytosine (C). This measure indicates the proportion of G and C bases out of an implied four total bases, also including adenine and thymine in DNA and adenine and uracil in RNA. DNA with low GC-content is less stable than DNA with high GC- content; however, the hydrogen bonds themselves do not have a particularly significant impact on molecular stability, which is instead caused mainly by molecular interactions of base stacking.
GC content is found to be variable with different organisms, the process of which is envisaged to be contributed to by variation in selection, mutational bias, and biased recombination- associated DNA repair. The average GC-content in human genomes ranges from 35% to 60% across 100-Kb fragments, with a mean of 41%. The GC-content of Yeast (Saccharomyces cerevisiae) is 38%, and that of another common model organism, arabidopsis (Arabidopsis thaliana) is 36%.
As used herein, “optimal codon” of a host refers to the most frequently used codon for each amino acid in the host. The optimal codon can be defined by one of multiple methods, including codon frequency tables, or hidden Markov models. Methods such as the frequency of optimal codons (Fop) (Ikemura T, 1981, J. Mol. Biol. 151 (3): 389-409), the relative codon adaptation (RCA) (Fox et al., 2010, DNA Res. 17 (3): 185-96) or the codon adaptation index (CAI)(Sharp et al., 1987, Nucleic Acids Research. 15 (3): 1281-1295) are used to predict gene expression levels, while methods such as the effective number of codons (Nc) and Shannon entropy from information theory are used to measure codon usage evenness (Peden, 2005, Correspondence Analysis of Codon Usage). Multivariate statistical methods, such as correspondence analysis and principal component analysis, are widely used to analyze variations in codon usage among genes (Suzuki et al., 2008, DNA Res. 15 (6): 357-65). There are many computer programs to implement the statistical analyses enumerated above, including CodonW, GCUA, INCA, etc.
As used herein, “optimized sequence” or “optimized polynucleotide sequence” or “optimized nucleic acid sequence” refers to a polynucleotide sequence that encodes the same polypeptide sequence as the native polynucleotide sequence, and which has desirable properties such as improved GC content, no restriction enzyme sites, and/or expresses with higher efficiency in a target expression system. As used herein the optimized sequence can be referred to as a synthetic sequence.
The algorithm of the method is developed in silico using, for example, a program code. The algorithm can be based on the whole genome, a partial genome, or transcriptome sequences of the selected target expression system. The algorithm is applied to the native polynucleotide sequence to determine codons, sequence patterns, and/or avoided motifs that can be modified to optimize the gene for expression in the target expression system.
In a particular embodiment, the algorithm follows the following steps:
1. Start with a native polynucleotide sequence, counted motifs are counted within the sequence;
2. The native DNA sequence is translated to the corresponding polypeptide sequence;
3. A new host is identified; the polypeptide sequence is converted into a preliminary DNA sequence using the new host’s optimal codons for each amino acid (See fig. 1A for the initial conditions.)
4. Taking into account all codons for each amino acid position, every potential location of counted motifs, added motifs, avoided motifs, required motifs and out-of-frame stop codons is identified and scored. The score prioritizes potential motif locations based on the impact on final percentage GC content, host codon preference, and the likelihood of contributing undesirable motifs, that is, discouraging overlap with avoided motifs.
5. The optimal set of added motifs (i.e. to achieve the goal of at least one more counted motif than the native sequence and a minimum of four) is selected based on the score from step 4.
6. The GC content of the overall sequence is iteratively altered until it falls within the desired range of the new host. Extreme GC regions are identified using a sliding window approach (where the window is 4 amino acids wide in specific embodiments). Within the extreme (high or low) regions of the sequence, all possible alternative codon combinations are tested in the order of most- to least-preferred codons. Alterations are selected that will move the overall GC content towards the new host range. This process is repeated until the GC content is at or near the center of the new host range. 7. Avoided motifs are removed by sampling codons most- to least-preferred for the corresponding amino acids until all avoided motifs have been removed while preserving native amino acid sequence.
8. Polynucleotide repeats greater than length N (upper limit set by user, usually 4 or 5 polynucleotides) are removed by sampling codons most- to least-preferred for the corresponding amino acids until all polynucleotide repeats greater than length N have been removed.
9. The algorithm repeats any necessary steps. For example, by removing a polynucleotide repeat, the algorithm may have added back an avoided motif. To avoid infinite loops (cycling back and forth between competing edits), the method includes a codon “lock” feature, so that essential changes can be made unalterable by subsequent steps.
The synthetic gene can be incorporated into an expression cassette designed for expression of the gene in the target expression system. Such expression cassettes may be further incorporated into appropriate recombinant DNA vectors. The method further comprises introduction of the expression cassette into a host cell. Upon introduction of the expression cassette, the host cell expresses the synthetic gene. Exemplary expression cassettes and host cells are described in further detail elsewhere herein.
Although the method is described in a particular order, it should not be construed that the method must be performed in the order set forth herein. The steps of the method may be performed in any order based on the desired outcome of the method, the native polynucleotide sequence, and/or the selected target expression system.
Target Genes of Interest
The native polynucleotide sequence that is modified by the method can be any gene of interest, and can be derived from any prokaryotic or eukaryotic organism.
In some embodiments, the gene of interest can be desirable for heterologous expression in a plant. The gene may be plant-derived or may be derived from another organism. Such genes of interest are reflective of the commercial markets and interests of those involved in the development of the crop. Crops and markets of interest change, and as developing nations open up world markets, new crops and technologies will emerge also. In addition, as our understanding of agronomic traits and characteristics such as yield and heterosis increase, the choice of genes for transformation will change accordingly. General categories of genes of interest include, for example, those genes involved in information, such as zinc fingers, those involved in communication, such as kinases, and those involved in housekeeping, such as heat shock proteins. More specific categories of transgenes, for example, include genes encoding important traits for agronomics, insect resistance, disease resistance, herbicide resistance, sterility, grain characteristics, and commercial products. Genes of interest include, generally, those involved in oil, starch, carbohydrate, or nutrient metabolism, as well as those affecting kernel size, sucrose loading, and the like.
Agronomically important traits such as oil, starch, and protein content can be genetically altered in addition to using traditional breeding methods. Modifications include increasing content of oleic acid, saturated and unsaturated oils, increasing levels of lysine and sulfur, providing essential amino acids, and also modification of starch. Hordothionin protein modifications are described in U.S. Patent Nos. 5,703,049, 5,885,801, 5,885,802, and 5,990,389, herein incorporated by reference. Another example is lysine and/or sulfur rich seed protein encoded by the soybean 2S albumin described in U.S. Patent No. 5,850,016, and the chymotrypsin inhibitor from barley, described in Williamson et al. (1987) Eur. J. Biochem. 165:99-106, the disclosures of which are herein incorporated by reference.
Derivatives of the coding sequences can be made by site-directed mutagenesis to increase the level of preselected amino acids in the encoded polypeptide. For example, the gene encoding the barley high lysine polypeptide (BHL) is derived from barley chymotrypsin inhibitor, U.S. Application Serial No. 08/740,682, filed November 1, 1996, and WO 98/20133, the disclosures of which are herein incorporated by reference. Other proteins include methionine-rich plant proteins such as from sunflower seed (Lilley et al. (1989) Proceedings of the World Congress on Vegetable Protein Utilization in Human Foods and Animal Feedstuffs, ed. Applewhite (American Oil Chemists Society, Champaign, Illinois), pp. 497-502; herein incorporated by reference); corn (Pedersen et al. (1986) J. Biol. Chem. 261 :6279; Kirihara et al. (1988) Gene 71 :359; both of which are herein incorporated by reference); and rice (Musumura et al. (1989) P lant Mol. Biol. 12: 123, herein 5 incorporated by reference). Other agronomically important genes encode latex, Floury 2, growth factors, seed storage factors, and transcription factors.
Insect resistance genes may encode resistance to pests that have great yield drag such as rootworm, cutworm, European Com Borer, and the like. Such genes include, for example, Bacillus thuringiensis toxin protein genes (U.S. Patent Nos. 5,366,892; 5,747,450; 5,736,514; 5,723,756; 5,593,881; and Geiser et al. (1986) Gene 48: 109); and the like. Genes encoding disease resistance traits include detoxification genes, such as against fumonosin (U.S. Patent No. 5,792,931); avirulence (avr) and disease resistance (R) genes (Jones et al. (1994) Science 266:789; Martin et al. (1993) Science 262: 1432; and Mindrinos et al. (1994) Cell 78: 1089); and the like.
Herbicide resistance traits may include genes coding for resistance to herbicides that act to inhibit the action of acetolactate synthase (ALS), in particular the sulfonylurea-type herbicides (e.g., the acetolactate synthase (ALS) genecontaining mutations leading to such resistance, in particular the S4 and/or Hra20 mutations), genes coding for resistance to herbicides that act to inhibit action of glutamine synthase, such as phosphinothricin or basta (e.g., the bar gene); glyphosate (e.g., the EPSPS gene and the GAT gene; see, for example, U.S. Publication No. 20040082770 and WO 03/092360); or other such genes known in the art. The bar gene encodes resistance to the herbicide basta, the nptll gene encodes resistance to the antibiotics kanamycin and geneticin, and the ALS-gene mutants encode resistance to the herbicide chlorsulfuron.
Sterility genes can also be encoded in an expression cassette and provide an alternative to physical detasseling. Examples of genes used in such ways include male tissue-preferred genes and genes with male sterility phenotypes such as QM, described in U.S. Patent No. 5,583,210. Other genes include kinases and those encoding compounds toxic to either male or female gametophytic development.
The quality of grain is reflected in traits such as levels and types of oils, saturated and unsaturated, quality and quantity of essential amino acids, and levels of cellulose. In corn, modified hordothionin proteins are described in U.S. Patent Nos. 5,703,049, 5,885,801, 5,885,802, and 5,990,389.
Commercial traits can also be encoded on a gene or genes that could increase for example, starch for ethanol production, or provide expression of proteins. Another important commercial use of transformed plants is the production of polymers and bioplastics such as described in U.S. Patent No. 5,602,321. Genes such as 13 -Ketothiolase, PHBase (polyhydroxyburyrate synthase), and acetoacetyl-CoA reductase (see Schubert et al. (1988) 1 Bacteriol. 170:5837-5847) facilitate expression of polyhyroxyalkanoates (PHAs).
Exogenous products include plant enzymes and products as well as those from other sources including prokaryotes and other eukaryotes. Such products include enzymes, cofactors, hormones, and the like. The level of proteins, particularly modified proteins having improved amino acid distribution to improve the nutrient value of the plant, can be increased. This is achieved by the expression of such proteins having enhanced amino acid content. Expression Cassettes
The synthetic gene made by the method can be incorporated into an expression cassette for expression in a host, or host cell or part thereof. The expression cassette can be further incorporated into an appropriate recombinant vector. The expression cassette may include 5’ and/or 3’ regulatory sequences operably linked to a polynucleotide. “Operably linked” is intended to mean a functional linkage between two or more elements. For example, an operable linkage between a polynucleotide of interest and a regulatory sequence (i.e., a promoter) is a functional link that allows for expression of the polynucleotide of interest. Operably linked elements may be contiguous or non-contiguous. When used to refer to the joining of two polypeptide coding regions, by operably linked is intended that the coding regions are in the same reading frame. The cassette may additionally contain at least one additional gene to be co-transformed into the organism. Alternatively, the additional gene(s) can be provided on multiple expression cassettes. Such an expression cassette is provided with a plurality of restriction sites and/or recombination sites for insertion of the polynucleotides to be under the transcriptional regulation of the regulatory regions. The expression cassette may additionally contain selectable marker genes.
For example, the expression cassette may include in the 5 ’-3’ direction of transcription, a transcriptional and translational initiation region (i.e., a promoter), a polynucleotide (i.e., the synthetic gene) encoding a polypeptide of interest (and optionally coding sequences for one or more linker peptides), and a transcriptional and translational termination region (i.e., termination region) functional in the host organism. The regulatory regions (i.e., promoters, transcriptional regulatory regions, and translational termination regions) and/or the coding sequence for the polypeptide of interest may be native/analogous to the host cell or to each other. Alternatively, the regulatory regions and/or the coding sequence for the polypeptide of interest may be heterologous to the host cell or to each other. As used herein, “heterologous” is a sequence that originates from a foreign species, or, if from the same species, is substantially modified from its native form in composition and/or genomic locus by deliberate human intervention. For example, a promoter operably linked to a heterologous polynucleotide is from a species different from the species from which the polynucleotide was derived, or, if from the same/analogous species, one or both are substantially modified from their original form and/or genomic locus, or the promoter is not the native promoter for the operably linked polynucleotide. A heterologous promoter, or the native promoter sequence for the polypeptide of interest, may be used. Such constructs can change the levels of polypeptide expression in the host, or cell or part thereof. Thus, the phenotype of the host, or cell or part thereof, can be altered.
The termination region may be native with the transcriptional initiation region, may be native with the operably linked coding sequence for the polypeptide of interest, may be native with the host, or may be derived from another source (i.e., foreign or heterologous) to the promoter, the coding sequence for the polypeptide of interest, the host, or any combination thereof. Selection of suitable termination regions is within the means of one of ordinary skill in the art. For plant hosts, convenient termination regions may include, but are not limited to, those available from the Ti- plasmid of A. turn efaci ens, such as the octopine synthase and nopaline 30 synthase termination regions. See also Guerineau et al. (1991) Mol. Gen. Genet. 262: 141-144; Proudfoot (1991) Cell 64:671-674; Sanfacon et al. (1991) Genes Dev. 5: 141-149; Mogen et al. (1990) Plant Cell 2:1261- 1272; Munroe et al. (1990) Gene 91 : 151-158; Ballas et al. (1989) Nucleic Acids Res. 17:7891- 7903; and Joshi et al. (1987) Nucleic Acids Res. 15:9627-9639.
The expression cassettes may additionally contain 5’ leader sequences. Such leader sequences can act to enhance translation. Translation leaders are known in the art and include: picornavirus leaders, for example, EMCV leader (Encephalomyocarditis 5’ noncoding region) (Elroy-Stein et al. (1989) Proc. Natl. Acad. Sci. USA 86:6126-6130); potyvirus leaders, for example, TEV leader (Tobacco Etch Virus) (Gallie et al. (1995) Gene 165:233-238), MDMV leader (Maize Dwarf Mosaic Virus) (Kong et al. (1988) Arch Virol 143: 1791-1799), and human immunoglobulin heavy-chain binding polypeptide (BiP) (Macejak et al. (1991) Nature 353:90-94); untranslated leader from the coat polypeptide mRNA of alfalfa mosaic virus (AMV RNA 4) (Jobling et al. (1987) Nature 325:622-625); tobacco mosaic virus leader (TMV) (Gallie et al. (1989) in Molecular Biology of RNA, ed. Cech (Liss, New York), pp. 237-256); and maize chlorotic mottle virus leader (MCMV) (Lommel et al. (1991) Virology 81 :382-385). See also, Della-Cioppa et al. (1987) Plant Physiol. 84:965-968.
In preparing the expression cassette, the various DNA fragments may be manipulated, so as to provide for the DNA sequences in the proper orientation and, as appropriate, in the proper reading frame. Toward this end, adapters or linkers may be employed to join the DNA fragments or other manipulations may be involved to provide for convenient restriction sites, removal of superfluous DNA, removal of restriction sites, or the like. For this purpose, in vitro mutagenesis, primer repair, restriction, annealing, resubstitutions, e.g., transitions and transversions, may be involved. A number of promoters can be used in the practice of the invention. The promoters can be selected based on the desired outcome. The synthetic gene can be combined with constitutive, tissue-preferred, inducible, or other promoters for expression in the host organism. For example, suitable constitutive promoters for use in a plant host cell include, without limitation, the core promoter of the Rsyn7 promoter and other constitutive promoters disclosed in WO 99/43838 and U.S. Patent No. 6,072,050; the core CaMV 35S promoter (Odell et al. (1985) Nature 313: 810-812); rice actin (McElroy et al. (1990) Plant Cell 2: 163-171); ubiquitin (Christensen et al. (1989) Plant Mot. Biol. 12: 619-632 and Christensen et al. (1992) Plant Mot. Biol. 18: 675-689); pEMU (Last et al. (1991) Theor. Appl. Genet. 81 : 581-588); MAS (Velten et al. (1984) EIVIBO 1 3:2723-2730); ALS promoter (U.S. Patent No. 5,659,026), and the like. Other constitutive promoters include, for example, those discussed in U.S. Patent Nos. 5,608,149; 5,608,144; 5,604,121; 5,569,597; 5,466,785; 5,399,680; 5,268,463; 5,608,142; and 6,177,611.
Depending on the desired outcome, it may be beneficial to express the gene from an inducible promoter, for example, a wound-inducible promoter. Wound-inducible promoters may respond to damage caused by insect feeding, and include potato proteinase inhibitor (pin II) gene (Ryan (1990) Ann. Rev. Phytopath. 28: 425-449; Duan et al. (1996) Nature Biotechnology 14: 494- 498); wunl and wun2, US Patent No. 5,428,148; winl and win2 (Stanford et al. (1989) Mol. Gen. Genet. 215:200-208); systemin (McGurl et al. (1992) Science 225: 1570-1573); WIP1 (Rohmeier et al. (1993) Plant Mol. Biol. 22: 783-792; Eckelkamp et al. (1993) FEBS Letters 323: 73-76); MPI gene (Corderok et al. (1994) Plant J 6(2): 141-150); and the like, herein incorporated by reference.
Additionally, pathogen-inducible promoters may be employed in the methods and nucleotide constructs of the present invention. Such pathogen-inducible promoters include those from pathogenesis-related proteins (PR proteins), which are induced following infection by a pathogen; e.g., PR proteins, SAR proteins, beta-1, 3-glucanase, chitinase, etc. See, for example, Redolfi et al. (1983) Neth. I Plant Pathol. 89: 245-254; Uknes et al. (1992) Plant Cell 4: 645-656; and Van Loon (1985) Plant Mol. Virol. 4: 111-116. See also WO 99/43819, herein incorporated by reference.
Of interest are promoters that are expressed locally at or near the site of pathogen infection. See, for example, Marineau et al. (1987) Plant Mol. Biol. 9:335-342; Matton et al. (1989) Molecular Plant-Microbe Interactions 2:325-331; Somsisch et al. (1986) Proc. Natl. Acad. Sci. USA 83:2427-2430; Somsisch et al. (1988) Mol. Gen. Genet. 2:93-98; and Yang (1996) Proc. Natl. Acad. Sci. USA 93: 14972-14977. See also, Chen et al. (1996) Plant 1 10:955-966; Zhang et al. (1994) Proc. Natl. Acad. Sci. USA 91 :2507-2511; Warner et al. (1993) Plant 1 3:191-201; Siebertz et al. (1989) Plant Cell 1 :961-968; U.S. Patent No. 5,750,386 (nematode-inducible); and the references cited therein. Of particular interest is the inducible promoter for the maize PRms gene, whose expression is induced by the pathogen Fusarium moniliforme (see, for example, Cordero et al. (1992) Physiol. Mol. Plant Path. 41 : 189-200).
Tissue-preferred promoters can be utilized to target enhanced pesticidal protein expression within a particular plant tissue. Tissue-preferred promoters include those discussed in Yamamoto et al. (1997) Plant J. 12(2)255-265; Kawamata et al. (1997) Plant Cell Physiol. 38(7):792-803; Hansen et al. (1997) Mol. Gen Genet. 254(3):337-343; Russell et al. (1997) Transgenic Res. 6(2): 157-168; Rinehart et al. (1996) Plant Physiol. 112(3): 1331-1341; Van Camp et al. (1996) Plant Physiol. 112(2):525-535; Canevascini et al. (1996) Plant Physiol. 112(2):513-524; Yamamoto et al. (1994) Plant Cell Physiol. 35(5):773-778; Lam (1994) Results Probl. Cell Differ. 20: 181-196; Orozco et al. (1993) Plant Mol Biol. 23(6): 1129-1138; Matsuoka et al. (1993) Proc Natl. Acad. Sci. USA 90(20):9586-9590; and Guevara-Garcia et al. (1993) Plant 4(3):495-505. Such promoters can be modified, if necessary, for weak expression.
Root-preferred or root-specific promoters are known and can be selected from the many available from the literature or isolated de novo from various compatible species. See, for example, Hire et al. (1992) Plant IVIol. Biol. 20(2):207-218 (soybean root-specific glutamine synthetase gene); Keller and Baumgartner (1991) Plant Cell 3(10): 1051-1061 (root-specific control element in the GRP 1.8 gene of French bean); Sanger et al. (1990) Plant Mol. Biol. 14(3):433-443 (rootspecific promoter of the mannopine synthase (MAS) gene of Agrobacterium Uimefaciens): and Miao et al. (1991) Plant Cell 3(1): 11-22 (full-length cDNA clone encoding cytosolic glutamine synthetase (GS), which is expressed in roots and root nodules of soybean). See also Bogusz et al. (1990) Plant Cell 2(7):633-641, where two root-specific promoters isolated from hemoglobin genes from the nitrogen-fixing nonlegume Parasponia andersonii and the related non-nitrogen-fixing nonlegume Trema tomentosa are described. The promoters of these genes were linked to a P- glucuronidase reporter gene and introduced into both the nonlegume Nicotiana tabacurn and the legume Lotus corniculatus, and in both instances root-specific promoter activity was preserved. Leach and Aoyagi (1991) describe their analysis of the promoters of the highly expressed rolC and rolD root-inducing genes of Agrobacterium rhizogenes (see Plant Science (Limerick) 79(l):69-76). They concluded that enhancer and tissue-preferred DNA determinants are dissociated in those promoters. Teeri et al. (1989) used gene fusion to lacZ to show that the Agrobacterium T-DNA gene encoding octopine synthase is especially active in the epidermis of the root tip and that the TR2’ gene is root specific in the intact plant and stimulated by wounding in leaf tissue, an especially desirable combination of characteristics for use with an insecticidal or larvicidal gene (see EMBO I 8(2):343-350). The TRI' gene fused to nptll (neomycin phosphotransferase II) showed similar characteristics. Additional root-preferred promoters include the VfEN0D-GRP3 gene promoter (Kuster et al. (1995) Plant Mol. Biol. 29(4):759-772); and rolB promoter (Capana et al. (1994) Plant Mol. Biol. 25(4):681-691. See also U.S. Patent Nos. 5,837,876; 5,750,386; 5,633,363; 5,459,252; 5,401,836; 5,110,732; and 5,023,179.
Generally, the expression cassette will comprise a selectable marker gene for the selection of transformed cells. Selectable marker genes are utilized for the selection of transformed cells or tissues. Marker genes include genes encoding antibiotic resistance, such as those encoding neomycin phosphotransferase II (NEO) and hygromycin phosphotransferase (HPT), as well as genes conferring resistance to herbicidal compounds, such as glufosinate ammonium, bromoxynil, imidazolinones, and 2,4-dichlorophenoxyacetate (2,4-D). Additional examples of suitable selectable marker genes include, but are not limited to, genes encoding resistance to chloramphenicol (Herrera Estrella et al. (1983) EMBO 1 2:987-992); methotrexate (Herrera Estrella et al. (1983) Nature 303:209-213; and Meijer et al. (1991) Plant Mot Biol. 16:807-820); streptomycin (Jones et al. (1987) Mol. Gen. Genet. 210:86-91); spectinomycin (Bretagne-Sagnard et al. (1996) Transgenic Res. 5: 131-137); bleomycin (Hille et al. (1990) Plant Mot Biol. 7: 171- 176); sulfonamide (Guerineau et al. (1990) Plant Mol. Biol. 15: 127-136); bromoxynil (Stalker et al. (1988) Science 242:419-423); glyphosate (Shaw et al. (1986) Science 233:478-481; and U.S. Application Serial Nos. 10/004,357; and 10/427,692); phosphinothricin (DeBlock et al. (1987) EMBO I 6:2513-2518). See generally, Yarranton (1992) Curr. Opin. Biotech. 3: 506-511; Christopherson et al. (1992) Proc. Natl. Acad. Sci. USA 89: 6314-6318; Yao et al. (1992) Cell 71 : 63-72; Reznikoff (1992) Mot Microbiol. 6: 2419-2422; Barkley et al. (1980) in The Operon, pp. 177-220; Hu et al. (1987) Cell 48: 555-566; Brown et al. (1987) Cell 49: 603-612; Figge et al. (1988) Cell 52: 713-722; Deuschle et al. (1989) Proc. Natl. Acad. Sci. USA 86: 5400-5404; Fuerst et al. (1989) Proc. Natl. Acad. Sci. USA 86: 2549-2553; Deuschle et al. (1990) Science 248: 480- 483; Gossen (1993) Ph.D. Thesis, University of Heidelberg; Reines et al. (1993) Proc. Natl. Acad. Sci. USA 90: 1917-1921; Labow et al. (1990) Mol. Cell. Biol. 10: 3343-3356; Zambretti et al. (1992) Proc. Natl. Acad. Sci. USA 89: 3952-3956; Bairn et al. (1991) Proc. Natl. Acad. Sci. USA 88: 5072-5076; Wyborski et al. (1991) Nucleic Acids Res. 19: 4647-4653; Hillenand-Wissman (1989) Topics Mot Siruc. Biol. 10: 143-162; Degenkolb et al. (1991) Antimicrob. Agents Chemother. 35: 1591-1595; Kleinschnidt et al. (1988) Biochemistry 27: 1094-1104; Bonin (1993) Ph.D. Thesis, University of Heidelberg; Gossen et al. (1992) Proc. Natl. Acad Sci. USA 89: 5547- 5551; Oliva et al (1992) Antimicrob. Agents Chemother. 36: 913-919; Hlavka et al. (1985) Handbook of Experimental Pharmacology, Vol. 78 (Springer-Verlag, Berlin); and Gill et al. (1988) Nature 334: 721-724. Such disclosures are herein incorporated by reference.
The above list of selectable marker genes is not meant to be limiting. Any selectable marker gene can be used in the present invention.
Host Cells and Methods of Producing Host Cells
Expression cassettes comprising a synthetic gene can be introduced into a host cell for expression in a host, or cell or part thereof. As used herein, a “host, or cell or part thereof’ refers to any organism, or cell, or part of that organism, that can be used as a suitable host for expressing the synthetic gene. It is understood that such a phrase refers not only to the particular host, or cell or part thereof, but also to the progeny or potential progeny thereof. Because certain modifications may occur in succeeding generations due to either mutation or environmental influences, such progeny may not, in fact, be identical to the parent, but are still included within the scope of the phrase as used herein.
As used herein, “introducing” is intended to mean presenting to the host cell the synthetic gene, or the expression cassette comprising the synthetic gene, in such a manner that the synthetic gene gains access to the interior of a cell. The methods of the invention do not depend on a particular method for introducing a synthetic gene or expression cassette into a host cell, only that the synthetic gene or expression cassette gains access to the interior of at least one cell of the host. Methods for introducing a synthetic gene or an expression cassette into plants are known in the art including, but not limited to, stable transformation methods, transient transformation methods, and virus-mediated methods.
In one example, the synthetic gene is expressed in a prokaryotic host, or cell or part thereof, or a eukaryotic host, or cell or part thereof. In another example, the host is an invertebrate host, or cell or part thereof, or a vertebrate host, or cell or part thereof. In another example, the host, or cell or part thereof, may be, but is not limited to, a bacterium, a fungus, yeast, a nematode, an insect, a fish, a plant, an avian, an animal, or a mammal.
Mammalian hosts, or cells or parts thereof, that are suitable for expression of the synthetic gene are known to those of ordinary skill in the art, and may include, but are not limited to, hamsters, mice, rats, rabbits, cats, dogs, bovine, goats, cows, pigs, horses, sheep, monkeys, or chimpanzees. Mammalian cells or mammalian parts may also be derived from humans, and the selection of such cells or parts would be known to those of ordinary skill in the art. The selection of suitable bacterial hosts for expression of a synthetic gene is known to those of ordinary skill in the art. In selecting bacterial hosts for expression, suitable hosts may include those shown to have, inter alia, good inclusion body formation capacity, low proteolytic activity, and overall robustness. Bacterial hosts are generally available from a variety of sources including, but not limited to, the Bacterial Genetic Stock Center, Department of Biophysics and Medical Physics, University of California (Berkeley, Calif.); and the American Type Culture Collection (“ATCC”) (Manassas, Va.).
The selection of suitable yeast hosts for expression of a synthetic gene is known to those of ordinary skill in the art, and may include, but is not limited to, ascosporogenous yeasts (Endomycetales), basidiosporogenous yeasts and yeast belonging to Fungi Imperfecti (Blastomycetes). When selecting yeast hosts for expression, suitable hosts may include those shown to have, inter alia, good secretion capacity, low proteolytic activity, and overall vigor. Yeast and other microorganisms are generally available from a variety of sources, including the Yeast Genetic Stock Center, Department of Biophysics and Medical Physics, University of California, Berkeley, California; and the American Type Culture Collection, Rockville, Maryland. Since the classification of yeast may change in the future, yeast shall be defined as described in Skinner et al., eds. 1980) Biology and Activities of Yeast (Soc. App. Bacteriol. Symp. Series No. 9).
The selection of suitable insect hosts for expression of a synthetic gene is known to those of ordinary skill in the art, and may include, but is not limited to, Aedes aegypti, Bombyx mori. Drosophila melanogaster , Spodoptera fnigiperda. and Trichoplusia ni. Insect cells suitable for the expression of a synthetic gene include, but are not limited to, SF9 cells, and others also well known to those of ordinary skill in the art. In selecting insect hosts for expression, suitable hosts may include those shown to have, inter alia, good secretion capacity, low proteolytic activity, and overall robustness. Insect hosts are generally available from a variety of sources including, but not limited to, the Insect Genetic Stock Center, Department of Biophysics and Medical Physics, University of California (Berkeley, Calif.); and the American Type Culture Collection (“ATCC”) (Manassas, Va.)
The selection of suitable plant hosts for expression of a synthetic gene is known to those of ordinary skill in the art. As used herein, the term plant also includes plant cells, plant protoplasts, plant cell tissue cultures from which plants can be regenerated, plant calli, plant clumps, and plant cells that are intact in plants or parts of plants such as embryos, pollen, ovules, seeds, leaves, flowers, branches, fruit, kernels, ears, cobs, husks, stalks, roots, root tips, anthers, and the like. Progeny, variants, and mutants of the regenerated plants are also included, provided that these parts comprise the introduced polynucleotides.
In one example, any plant species may be utilized as a host, including, but not limited to, monocots and dicots. Examples of plant species of interest include, but are not limited to, corn (Zea mays), Brassica sp. (e.g., B. napus, B. rapa, B. juncea), alfalfa (Medicago sativa), rice (Oryza sativa), rye (Secale cereale), sorghum (Sorghum bicolor, Sorghum vulgare), millet (e.g., pearl millet (Pennisetum glaucum), proso millet (Panicum miliaceum), foxtail millet (Setaria italica), finger millet (Eleusine coracana)), sunflower (Helianthus annuus), safflower (Carthamus tinctorius), wheat (Triticum aestivum), soybean (Glycine max), tobacco (Nicotiana tabacum), potato (Solanum tuberosum), peanuts (Arachis hypogaea), cotton (Gossypium barbadense, Gossypium hirsutum), sweet potato (Ipomoea batatus), cassava (Manihot esculenta), coffee (Coffea spp.), coconut (Cocos nucifera), pineapple (Ananas comosus), citrus trees (Citrus spp.), cocoa (Theobroma cacao), tea (Camellia sinensis), banana (Musa spp.), avocado (Per sea americana), fig (Ficus casica), guava (Psidium guajava), mango (Mangifera indica), olive (Olea europaea), papaya (Carica papaya), cashew (Anacardium occidentals), macadamia (Macadamia integrifolia), almond (Prunus amygdalus), sugar beets (Beta vulgaris), sugarcane (Saccharum spp.), oats, barley, vegetables, ornamentals, and conifers.
Vegetables of interest include tomatoes (Lycopersicon esculentum), lettuce (e.g., Lactuca sativa), green beans (Phaseolus vulgaris), lima beans (Phaseolus limensis), peas (Lathyrus spp.), and members of the genus Cucumis such as cucumber (C. sativus), cantaloupe (C. cantalupensis), and musk melon (C. melo). Ornamentals include azalea (Rhododendron spp.), hydrangea (Macrophylla hydrangea), hibiscus (Hibiscus rosa-sinensis), roses (Rosa spp.), tulips (Tulipa spp.), daffodils (Narcissus spp.), petunias (Petunia hybrida), carnation (Dianthus caryophyllus), poinsettia (Euphorbia pulcherrima), and chrysanthemum.
Conifers of interest include, for example, pines such as loblolly pine (Pinus taeda), slash pine (Pinus elliottii), ponderosa pine (Pinus ponderosa), lodgepole pine (Pinus contorta), and Monterey pine (Pinus radiata); Douglas-fir (Pseudotsuga menziesii),' Western hemlock (Tsuga canadensis),' Sitka spruce (Picea glauca),' redwood (Sequoia sempervirens),' true firs such as silver fir (Abies amabilis) and balsam fir (Abies balsamea),' and cedars such as Western red cedar (Thuja plicata) and Alaska yellow-cedar (Chamaecyparis nootkatensis). Hardwood trees can also be employed including ash, aspen, beech, basswood, birch, black cherry, black walnut, buckeye, American chestnut, cottonwood, dogwood, elm, hackbeny, hickory, holly, locust, magnolia, maple, oak, poplar, red alder, redbud, royal paulownia, sassafras, sweetgum, sycamore, tupelo, willow, yellow-poplar.
In specific examples, the plants or cells, or parts thereof are crop plants (for example, corn, alfalfa, sunflower, Brassica, soybean, cotton, safflower, peanut, sorghum, wheat, millet, tobacco, sugarcane etc.).
Other plants of interest include turfgrasses such as, for example, annual bluegrass (Poa annua annual ryegrass (Lolium multiflorum),' Canada bluegrass (Poa compressa),' Chewings fescue (Festuca rubra),' colonial bentgrass (Agrostis tenuis),' creeping bentgrass (Agrostis palustris),' crested wheatgrass (Agropyron desertorum),' fairway wheatgrass (Agropyron cristatum),' hard fescue (Festuca trachyphylla),' Kentucky bluegrass Poa pratensis),' orchardgrass (Dactylis glomerata),' perennial ryegrass Lolium perenne),' red fescue (Festuca rubra ,' redtop (Agrostis alba),' rough bluegrass Poa trivialis),' sheep fescue (Festuca ovina),' smooth bromegrass Bromus inermis),' tall fescue (Festuca arundinacea),' timothy (Phleum pratense),' velvet bentgrass (Agrostis canina),' weeping alkaligrass (Puccinellia distans),' western wheatgrass (Agropyron smithii),' Bermuda grass (Cynodon spp.); St. Augustine grass (Stenotaphrum secundatum),' zoysia grass Zoysia spp.),' Bahia grass (Paspalum notatum),' carpet grass (Axonopus affmis),' centipede grass (Eremochloa ophiuroides),' kikuyu grass (Pennisetum clandesinum),' seashore paspalum (Paspalum vaginatum),' blue gramma (Bouteloua gracilis),' buffalo grass (Ruchloe dactyloids),' sideoats gramma (Bouteloua curtipendula).
Plants of interest further include grain plants that provide seeds of interest, oil-seed plants, and leguminous plants. Seeds of interest include grain seeds, such as com, wheat, barley, rice, sorghum, rye, millet, etc. Oil-seed plants include cotton, soybean, safflower, sunflower, Brassica, maize, alfalfa, palm, coconut, flax, castor, olive etc. Leguminous plants include beans and peas. Beans include guar, locust bean, fenugreek, soybean, garden beans, cowpea, mungbean, lima bean, fava bean, lentils, chickpea, etc.
Methods for expressing a synthetic gene in a host, or cell or part thereof, are well known to those of ordinary skill in the art. Transformation of appropriate hosts with an expression cassette is accomplished by well-known methods. With regard to transformation of prokaryotic hosts, see, for example, Cohen et al. (1972) Proc. Natl. Acad. Sci. USA 69:2110 and Sambrook et al. (1989) Molecular Cloning, A Laboratory Manual, Cold Spring Harbor Laboratory, Cold Spring Harbor, N.Y. Transformation of yeast is described in Sherman et al. (1986) Methods In Yeast Genetics, A Laboratory Manual, Cold Spring Harbor, NY. The method of Beggs (1978) Nature 275:104-109 is also useful. With regard to vertebrates, reagents useful in transfecting such hosts, for example calcium phosphate and DEAE-dextran or liposome formulations, are available from Stratagene Cloning Systems, or Life Technologies Inc., Gaithersburg, Md. 20877, USA. Electroporation is also useful for transforming and/or transfecting cells and is well known in the art for transforming yeast, bacteria, insect cells and vertebrate cells.
A successfully transformed host, or cell or part thereof, i.e., one that contains a synthetic gene, and which is expressing the encoded polypeptide, can be identified using well-known techniques. For example, cells resulting from the introduction of an expression cassette can be grown to produce the polypeptide encoded by the synthetic gene. Cells can be harvested and lysed, and their DNA content examined for the presence of the synthetic gene using a method such as that described by Southern (1975) J. Mol. Biol. 98:503; or Berent et al. (1985) Biotech. 3:208. Alternatively, the presence of the encoded polypeptide in the supernatant can be detected using antibodies and methods known to those of ordinary skill in the art.
In addition to directly assaying for the presence of recombinant DNA, successful transformation can be confirmed by well-known immunological methods when the recombinant DNA is capable of directing the expression of the encoded polypeptide. For example, cells successfully transformed with an expression vector produce polypeptides displaying appropriate antigenicity. Samples of cells suspected of being transformed may be harvested and assayed for the encoded polypeptide using suitable antibodies. For stable transfection of a host, or cell or part thereof, it is known that, depending upon the expression vector and transfection technique used, only a small fraction of cells may integrate the foreign DNA into their genome. In order to identify and select these integrants, a gene that encodes a selectable marker (e.g., for resistance to antibiotics) is generally introduced into the host, or cell or part thereof, along with the gene of interest. For example, selectable markers may include those which confer resistance to drugs, such as G418, hygromycin, and methotrexate.
A nucleic acid encoding a selectable marker can be introduced into a host, or cell or part thereof, on the same vector as that comprising the synthetic gene, or alternatively introduced on a separate vector. A host, or cell, or part thereof, that is stably transfected with the introduced nucleic acid can be identified by drug selection.
Many modifications and other embodiments of the inventions set forth herein will come to mind to one skilled in the art to which these inventions pertain having the benefit of the teachings presented in the foregoing descriptions and the associated drawings. Therefore, it is to be understood that the inventions are not to be limited to the specific embodiments disclosed and that modifications and other embodiments are intended to be included within the scope of the appended claims. Although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.
Embodiments
1. A method for making a synthetic gene, said method comprising:
(a) identifying and counting at least one counted motif in a native polynucleotide sequence;
(b) converting the native polynucleotide sequence to the corresponding amino acid sequence;
(c) selecting a target expression system, and creating a preliminary codon optimized polynucleotide sequence by selecting said target system’s optimal codons for each amino acid in said amino acid sequence in step (b);
(d) generating an optimized polynucleotide sequence for use in the expression of the amino acid sequence in the said target system by repeating the process of modifying at least one codon for at least one amino acid position, such that the criteria set forth in (i)-(v) are met in order:
(i) the optimized polynucleotide sequence comprises the optimal sets of added motifs;
(ii) the extreme GC regions are identified and altered such that the GC content of the optimized polynucleotide sequence is within desired range of the said target system;
(iii) no avoided motif is present in the optimized polynucleotide sequence;
(iv) no polynucleotide repeats greater than a set number is present in the optimized polynucleotide sequence;
(v) repeat the earliest step of steps (i)-(i v), wherein the criterion of the said earliest step is made unmet by any step after it;
(e) making a synthetic gene comprising the selected optimized polynucleotide sequence.
2. The method of embodiment 1, wherein an essential modification is not subject to subsequent modification in any of steps (i)-(v).
3. The method of embodiment 2, wherein the essential modification comprises adding optimal sets of added motifs as set forth in step (i), optimizing GC content as set forth in step (ii), removing avoided motif as set forth in step (iii), or removing polynucleotide repeats greater than a set number as set forth in step (iv).
4. The method of any one of embodiments 1-3, wherein the synthetic gene is incorporated into an expression cassette.
5. The method of embodiment 4, wherein the expression cassette is introduced into the target expression system, and wherein the target expression system expresses the synthetic gene.
6. The method of embodiments 1-5, wherein the target expression system is a target cell or a target organism.
7. The method of embodiment 6, wherein the target cell or target organism is a plant or plant cell.
8. The method of any one of embodiments 1-7, wherein the target expression system is a soybean plant, soybean plant cell, com plant, or corn plant cell.
9. The method according to any of the preceding embodiments, wherein one or more of the counted motif is selected from a group consisting of AATAAT, AACCAA, ATATAA, ATACTA, ATAAAA, ATGAAA, AAGCAT, ATTAAT, ATACAT, AAAATA, ATT AAA, AATTAA, AATACA, CATAAA, AATAAA, and AATCAA.
10. The method according to any one of embodiments 1-9, wherein the counted motifs are selected from one or more of: AAGCAT, AATAAT, ATTAAT, AACCAA, ATACAT, ATATAA, AAAATA, ATT AAA, ATACTA, AATTAA, ATAAAA, AATACA, ATGAAA, CATAAA.
11. The method according to any one of embodiments 1-8, wherein the at least one of the added motif is selected from a group consisting of AATAAT, AACCAA, ATATAA, ATACTA, ATAAAA, ATGAAA, AAGCAT, ATTAAT, ATACAT, AAAATA, ATT AAA, AATTAA, AATACA, and CATAAA.
12. The method according to any one of embodiments 1-11, wherein at least one of the avoided motif is selected from a group consisting of ATTTA, TAAAT, GTGCAG, CTGCAC, GGCGCGCC, AAGCTT, CTCGAG, GAGCTC, GGATCC, GGTACC, GTCGAC, MAGGTRAG, and YYYNCAGG.
13. The method according to any one of embodiments 1-12, wherein the one or more added motifs are scored using an algorithm.
14. The method according to embodiment 13, wherein the scoring further involves one or more required motifs.
15. The method according to embodiment 14, wherein at least one of the required motif is selected from a list consisting of AT AT AT, TTGTTT, TTTTGT, TGTTTT, TAT AT A, TATTTT, TTTTTT, ATTTTT TTATTT, TTTATT, TAATAA, ATTTAT, TATATT, TTTTAT, ATATTT,
TATTAT, TGTTTG, TTATAT, TGTAAT, AAATAA, AATTTT, TTTTTA, TAATTT, TTAATT, AAATTT, TTTGTT, ATTATT, ATTTTA, TTTAAT, and TTTTAA.
16. The method according to any one of embodiments 1-15, wherein extreme GC regions are identified using a sliding window having a width of 4 amino acids; wherein codon modification proceeds in the order of most- to least-preferred codons of the said target expression system; wherein the said modification process repeats until the GC content is at the center of the said target expression system’s range.
17. The method according to any one of embodiments 1-16, wherein optimized polynucleotide sequence comprises at least one more counted motif than the native sequence, wherein the and optimized polynucleotide sequence comprises a minimum of four counted motifs.
18. The method according to any one of embodiments 1-17, wherein codon modification proceeds in the order of most- to least-preferred codons of the said target expression system.
19. The method according to any one of embodiments 1-18, wherein the optimal codons for the said target expression system is determined using at least one of methods comprising codon frequency tables and hidden Markov models.
20. The method according to any one of embodiments 1-19, wherein the number of counted motifs in the optimized polynucleotide sequence is the same or greater than the number of counted motifs in the native polynucleotide sequence.
21. The method according to any one of embodiments 1-19, wherein the number of counted motifs in the optimized polynucleotide sequence is not the same as the number of counted motifs in the native polynucleotide sequence.
22. The method according to any one of embodiments 1-20, wherein the optimized polynucleotide sequence comprises at least one required motif.
23. The method according to any one of embodiments 1-19, wherein the target system is a corn plant or plant cell and wherein the number of required motifs in the optimized polynucleotide sequence is the same or greater than the number of required motifs in the native polynucleotide sequence.
24. The method according to embodiment 22 or 23, wherein the required motifs are selected from: a) AT AT AT, TTGTTT, TTTTGT, TGTTTT, TATATA, TATTTT, TTTTTT, ATTTTT, TTATTT, TTTATT, TAATAA, ATTTAT, TATATT, TTTTAT, ATATTT, TATTAT, TGTTTG, TTATAT, TGTAAT, and AAATAA; or b) ATATAT, TTGTTT, TTTTGT, TGTTTT, TATATA, TATTTT, TTTTTT, ATTTTT,
TTATTT, TTTATT, TAATAA, ATTTAT, TATATT, TTTTAT, ATATTT, TATTAT, TGTTTG,
TTATAT, TGTAAT, AAATAA, AATTTT, TTTTTA, TAATTT, TTAATT, AAATTT, TTTGTT,
ATTATT, ATTTTA, TTTAAT, and TTTTAA.
25. The method according to any one of embodiments 1-22, wherein the target system is a soybean plant or plant cell and wherein the number of required motifs in the optimized polynucleotide sequence is the same or greater than the number of required motifs in the native polynucleotide sequence.
26. The method according to embodiment 22 or 25, wherein the required motifs are a) ATTTTT, TATTTT, TTATTT, TTTATT, TTTTTT, TTTTAT, AATTTT, TTTTTA,
ATATAT, TAATTT, TTAATT, AAATTT, AAATAA, ATATTT, TTTGTT, TTGTTT, ATTATT,
ATTTTA, TTTAAT, and TTTTAA; or b) ATATAT, TTGTTT, TTTTGT, TGTTTT, TATATA, TATTTT, TTTTTT, ATTTTT, TTATTT, TTTATT, TAATAA, ATTTAT, TATATT, TTTTAT, ATATTT, TATTAT, TGTTTG,
TTATAT, TGTAAT, AAATAA, AATTTT, TTTTTA, TAATTT, TTAATT, AAATTT, TTTGTT, ATTATT, ATTTTA, TTTAAT, and TTTTAA.
27. The method of any one of embodiments 1-21, wherein the optimized polynucleotide sequence does not comprise a required motif.
28. The method of any one of embodiments 1-27, wherein the location of any counted motif in the optimized polynucleotide sequence is different from the location of the counted motif in the native polynucleotide sequence.
29. The method of any one of embodiments 1-27, wherein said optimized polynucleotide sequence does not comprise a counted motif.
30. The method of any one of embodiments 1-29, wherein said optimized polynucleotide sequence does not comprise a rare codon.
31. The method of embodiment 30, wherein said target expression system is a corn plant or corn plant cell and said rare codon is one or more of: TTA, CTA, GTA, CGT, AGT, and CGA; or wherein said target expression system is a soy plant or soy plant cell and rare codon is selected from the group consisting of TTA, CTA, TCG, CCG, ACG, GCG, CGA, and CGG. EXAMPLES
Example 1
The methods disclosed herein were implemented in a GeneOpt3 algorithm created for designing synthetic nucleotide sequences for expression of proteins in new hosts. The algorithm was implemented in Python. A list of terms and corresponding definitions used herein are described in table 1.
Table 1 : definition of key terms
Figure imgf000030_0001
Figure imgf000031_0001
A sample algorithm follows the following steps:
1. Start with a native nucleotide sequence, “counted motifs” (see Table 1) are counted within the sequence. 2. The native DNA sequence is translated to the corresponding amino acid sequence.
3. The new host is identified, amino acid sequence is back translated into a preliminary DNA sequence using the new host’s optimal codon choice for each amino acid (which can be defined by one of multiple methods, including codon frequency tables or hidden Markov models). See figure 1 A for the initial conditions.
4. Taking into account all codons for each amino acid position, every potential location of counted motifs, added motifs, avoided motifs, required motifs and out-of-frame stop codons is identified and scored. The score prioritizes potential motif locations based on the impact on final GC%, host codon preference, and the likelihood of contributing undesirable motifs (i.e. discourages overlap with avoided motifs).
5. The optimal set of added motifs (i.e. to achieve the goal of at least one more counted motif than the native sequence and a minimum of four) is selected based on the score from step 4.
6. The GC content of the overall sequence is iteratively altered until it falls within the desired range of the new host. Extreme GC regions are identified using a sliding window approach (where the window is 4 amino acids wide). Within the extreme (high or low) regions of the sequence, all possible alternative codon combinations are tested in the order of most- to least-preferred codons. Alterations are selected that will move the GC towards the new host range. This process is repeated until the GC is at the center of the new host range.
7. Avoided motifs are removed by sampling codons most- to least-preferred for the corresponding amino acids until all avoided motifs have been removed while preserving native amino acid sequence.
8. Polynucleotide repeats greater than length N (upper limit set by user, usually 4 or 5) are removed by sampling codons most- to least-preferred for the corresponding amino acids until all polynucleotide repeats greater than length N have been removed.
9. The algorithm repeats any necessary steps. For example, by removing a polynucleotide re-peat, the algorithm may have added back an avoided motif. To avoid infinite loops (cycling back and forth between competing edits), a codon “lock” feature is included, so that essential changes can be made unalterable by subsequence steps.
In an example optimization solution, such as that presented in Figure IB, the x-axis displays the amino acids corresponding to an AgBiome protein; the y-axis displays the codon selection rank of the new host (the bottom row consists of the most preferred codon for the new host, and higher rows display less-preferred codons in rank order). Different amino acids can be coded for by as few as one or as many as 6 codons. The horizontal line with multiple peaks indicates the codon selections. The algorithm initializes with a flat horizontal line across the bottom (Fig. 1 A) and proceeds to select less preferred codons to improve GC %, etc (Fig. IB).
As an example, the method can optimize the Gene A from the native sequence set forth in SEQ ID NO: 1 to the optimized version set forth in SEQ ID NO: 2. Figure 2 shows expression of the optimized form of Gene A. TO plants were analyzed by Western Blot with protein-specific antibodies. The positive control is the purified protein and negative control is untransformed Bl 04 tissue.
SEQ ID NO: 1 - native nucleotide sequence ATGCATTCTGAAGATATTAAAGAAAAAACACTTACCTGGTTTAACTACATTACCAGTC CGGTAAATAATGAAGATGTATTTATGCGAAGCTCACAGGATATACTTGTTATGAATCC TGCGATAGCAGCTGCAACGCAAGAGTATATCGATGGAAATACTCACGATAGTCAGCTA TTCAACACACCATCATCAGCGCCTCAAACGATGTTTGATGGCCTGCAAACCATTGTAA ACCTTTGCCGTGTGCAATCAGGTTATAATGCACTTGATCCTAATGGAACCGGAAGTAA GGCGTATTTTACGAAATTTACTCAGAACATAGCAAATGTTCCGTGCCTGACGTTGTTGA GTGCGGAAACAAAAAATATTAAACAACAAAGCCATAATGCAGATGAGCTCATCAACT CATTTGTCGATGCTTTTGATGGGCTTACACAAAGCGACCAGTCTAAGATTAAGTCATCC GTAACCTCGTTAGTAAAAGCAGCCCTGAGCTATGCAAATGAAGATCAGAAATCATCAA ATTTCACGCAAAATATTTTGCAAACCGGTGATGATCAGGTGATATTCACGTTGTATGCC AGTACATTTGAAATTTCTTCGACCAAAAGCAAAGGTGTTATATCTTTTAAATCCGAATA CTCATTGCAGCAGGCGTTATATAGCCTTTCCCGCGCAAGCTGGGAACGGGTGAAAGAT CTGTTTGCTGAACAAGAAAAAACTACTATGGATCAGTGGTTAAATGATATGAAAACGC CACAGAAAAGTGGCAGTACAGTAAAAGCATTATGTTTAGAA
SEQ ID NO: 2 - optimized by GeneOpt3 algorithm
ATGCACTCTGAGGATATCAAGGAGAAGACACTGACATGGTTCAATTATATAAC ATCTCCAGTGAATAATGAGGATGTGTTCATGAGGTCATCTCAGGATATCCTGGTGATG AATCCAGCCATCGCCGCCGCCACACAGGAGTACATCGATGGCAATACACACGATTCTC AGCTGTTCAATACACCATCTTCTGCCCCACAGACAATGTTCGATGGCCTGCAGACAATC GTGAATCTGTGCCGCGTGCAATCTGGCTACAATGCCCTAGATCCTAATGGCACAGGCT CTAAGGCCTACTTCACAAAGTTCACACAAAATATCGCCAATGTGCCATGCCTGACACT GCTGTCTGCCGAGACAAAGAATATCAAGCAGCAGTCTCACAATGCCGATGAGCTGATT AATTCTTTCGTGGATGCCTTCGATGGCCTGACACAGTCTGATCAGTCAAAAATCAAGTC TTCTGTGACATCTCTGGTGAAGGCCGCCCTGTCTTACGCCAATGAGGATCAGAAGTCTT CTAATTTCACACAGAATATCCTGCAGACAGGCGATGATCAGGTGATCTTCACACTGTA CGCCTCTACATTCGAGATCTCTTCTACAAAGTCTAAGGGCGTGATCTCATTCAAGTCTG AGTACTCTCTGCAGCAGGCCCTGTACTCTCTGTCTAGGGCCTCTTGGGAGAGGGTGAA GGATCTGTTCGCCGAGCAGGAGAAGACAACAATGGATCAGTGGCTGAATGATATGAA AACACCACAGAAGTCTGGCTCTACAGTGAAGGCCCTGTGCCTGGAG
Example 2
This experiment was aimed to compare expression of the maize optimized APG06396.5 (new algorithm optimization and passes QC) with and without a chloroplast target signal sequence compared to expression of the native and non-optimized APG06396 in the N. benthamiana transient system. The sequences of APG06396 and APG06396.5 are as follows: >APG06396 (SEQ ID NO: 1, see above) ATGCATTCTGAAGATATTAAAGAAAAAACACTTACCTGGTTTAACTACATTACCAGTCCGGTAAATAATGAA GATGTATTTATGCGAAGCTCACAGGATATACTTGTTATGAATCCTGCGATAGCAGCTGCAACGCAAGAGTAT ATCGATGGAAATACTCACGATAGTCAGCTATTCAACACACCATCATCAGCGCCTCAAACGATGTTTGATGGC CTGCAAACCATTGTAAACCTTTGCCGTGTGCAATCAGGTTATAATGCACTTGATCCTAATGGAACCGGAAGT AAGGCGTATTTTACGAAATTTACTCAGAACATAGCAAATGTTCCGTGCCTGACGTTGTTGAGTGCGGAAACA AAAAATATTAAACAACAAAGCCATAATGCAGATGAGCTCATCAACTCATTTGTCGATGCTTTTGATGGGCTT ACACAAAGCGACCAGTCTAAGATTAAGTCATCCGTAACCTCGTTAGTAAAAGCAGCCCTGAGCTATGCAAAT GAAGATCAGAAATCATCAAATTTCACGCAAAATATTTTGCAAACCGGTGATGATCAGGTGATATTCACGTTG TATGCCAGTACATTTGAAATTTCTTCGACCAAAAGCAAAGGTGTTATATCTTTTAAATCCGAATACTCATTG CAGCAGGCGTTATATAGCCTTTCCCGCGCAAGCTGGGAACGGGTGAAAGATCTGTTTGCTGAACAAGAAAAA ACTACTATGGATCAGTGGTTAAATGATATGAAAACGCCACAGAAAAGTGGCAGTACAGTAAAAGCATTATGT TTAGAA
>APG06396.5 (SEQ ID NO: 3):
AT GC AC T C T G AGGAT AT C AAGG AG AAGAC ACT GAG AT GGT T C AAT T AC AT C AC AT C T C C AGT GAAT AAT GAG GATGTGTTCATGCGATCATCTCAAGATATCCTGGTGATGAATCCAGCCATCGCCGCCGCCACACAGGAGTAC ATCGATGGCAATACACATGATTCTCAGCTGTTCAATACACCATCTTCTGCGCCACAGACAATGTTCGATGGG CTACAGACAATCGTGAACCTATGTCGCGTTCAATCTGGCTACAATGCCCTAGACCCAAATGGCACAGGCTCT AAGGCCTACTTCACAAAGTTCACACAGAATATCGCCAATGTGCCATGCCTGACACTGCTGTCAGCCGAGACC AAGAATATCAAGCAGCAGTCTCACAATGCCGATGAACTGATTAACTCTTTCGTGGATGCCTTCGATGGCCTG ACACAGTCTGATCAGTCTAAGATCAAGAGTAGCGTGACGTCTCTGGTGAAGGCCGCCTTGTCTTACGCCAAT GAGGATCAGAAGTCTAGCAATTTCACACAGAATATACTACAGACAGGCGATGATCAGGTGATCTTCACACTG TACGCCTCTACATTCGAGATATCTTCTACAAAGAGTAAGGGCGTGATCTCTTTCAAGAGCGAGTACTCTCTG CAGCAGGCCCTGTACTCTCTGTCACGCGCCTCTTGGGAGAGGGTGAAGGACCTGTTCGCCGAGCAGGAGAAG ACAACAATGGATCAGTGGCTGAACGATATGAAGACACCACAGAAGTCTGGCTCTACAGTGAAGGCCCTGTGC CTGGAA
The Agrobacterium strain EHA101 harboring pCAMBIA2301 containing a gene of interest was cultured in terrific broth (TB) medium containing kanamycin (50 pg/mL) and spectinomycin (100 pg/mL) with agitation at 225 rpm at 26-28 °C. After the culture reached an optical density at 600 nm (OD600) of 0.8-1.0, the Agrobacterium cells were centrifuged at about 3200 g for collection, re-suspended in infiltration buffer (10 mM MgCh, 10 mM MES, 200 pM acetosyringone, pH = 5.6), adjusted to an OD600 of 0.8-1.0 and used for syringe agroinfiltration. The re-suspended Agrobacterium cells were allowed to sit at room temperature for 1-3 hours before use. The Agrobacterium mixture was infiltrated into the abaxial side of 3.5 -week-old N benthamiana leaves using a needless syringe. The infiltrated regions were outlined with a color paint pen for downstream evaluation. The infiltrated N. benthamiana was grown at 23 °C and analyzed for expression by Western blot after day post-infiltration.
As shown in Table 2 and Fig. 3, the constructs in bold were the optimized version of APG06396 (APG06396.5). It was the same optimization and the only difference was the signal sequence. The un-optimized version or native version (932 and APG06396 not in bold) and there was no corresponding expression detected at the correct size. The band that was visible right below the correct size was a background band that was detected in wild-type untransformed N benthamiana. As shown in Fig. 3, constructs harboring optimized sequences (bold) had higher expression than those harboring un-optimized sequences (not in bold).
Table 2.
Figure imgf000035_0001

Claims

That which is claimed is:
1. A method for making a synthetic gene, said method comprising:
(a) identifying and counting at least one counted motif in a native polynucleotide sequence;
(b) converting the native polynucleotide sequence to the corresponding amino acid sequence;
(c) selecting a target expression system, and creating a preliminary codon optimized polynucleotide sequence by selecting said target system’s optimal codons for each amino acid in said amino acid sequence in step (b);
(d) generating an optimized polynucleotide sequence for use in the expression of the amino acid sequence in the said target system by repeating the process of modifying at least one codon for at least one amino acid position, such that the criteria set forth in (i)-(v) are met in order:
(i) the optimized polynucleotide sequence comprises the optimal sets of added motifs;
(ii) the extreme GC regions are identified and altered such that the GC content of the optimized polynucleotide sequence is within desired range of the said target system;
(iii) no avoided motif is present in the optimized polynucleotide sequence;
(iv) no polynucleotide repeats greater than a set number is present in the optimized polynucleotide sequence;
(v) repeat the earliest step of steps (i)-(i v), wherein the criterion of the said earliest step is made unmet by any step after it;
(e) making a synthetic gene comprising the selected optimized polynucleotide sequence.
2. The method of claim 1, wherein an essential modification is not subject to subsequent modification in any of steps (i)-(v).
3. The method of claim 2, wherein the essential modification comprises adding optimal sets of added motifs as set forth in step (i), optimizing GC content as set forth in step (ii), removing avoided motif as set forth in step (iii), or removing polynucleotide repeats greater than a set number as set forth in step (iv).
9. The method of any one of claims 1-3, wherein the synthetic gene is incorporated into an expression cassette.
10. The method of claim 4, wherein the expression cassette is introduced into the target expression system, and wherein the target expression system expresses the synthetic gene.
11. The method of claims 1-5, wherein the target expression system is a target cell or a target organism.
12. The method of claim 6, wherein the target cell or target organism is a plant or plant cell.
13. The method of any one of claims 1-7, wherein the target expression system is a soybean plant, soybean plant cell, corn plant, or corn plant cell.
9. The method according to any of the preceding claims, wherein one or more of the counted motif is selected from a group consisting of AATAAT, AACCAA, ATATAA, ATACTA, ATAAAA, ATGAAA, AAGCAT, ATTAAT, ATACAT, AAAATA, ATT AAA, AATTAA, AATACA, CATAAA, AATAAA, and AATCAA.
10. The method according to any one of claims 1-9, wherein the counted motifs are selected from one or more of: AAGCAT, AATAAT, ATTAAT, AACCAA, ATACAT, ATATAA, AAAATA, ATT AAA, ATACTA, AATTAA, ATAAAA, AATACA, ATGAAA, CATAAA.
11. The method according to any one of claims 1 -8, wherein the at least one of the added motif is selected from a group consisting of AATAAT, AACCAA, ATATAA, ATACTA, ATAAAA, ATGAAA, AAGCAT, ATTAAT, ATACAT, AAAATA, ATTAAA, AATTAA, AATACA, and CATAAA.
12. The method according to any one of claims 1-11, wherein at least one of the avoided motif is selected from a group consisting of ATTTA, TAAAT, GTGCAG, CTGCAC, GGCGCGCC, AAGCTT, CTCGAG, GAGCTC, GGATCC, GGTACC, GTCGAC, MAGGTRAG, and YYYNCAGG.
13. The method according to any one of claims 1-12, wherein the one or more added motifs are scored using an algorithm.
14. The method according to claim 13, wherein the scoring further involves one or more required motifs.
15. The method according to claim 14, wherein at least one of the required motif is selected from a list consisting of AT AT AT, TTGTTT, TTTTGT, TGTTTT, TAT AT A, TATTTT, TTTTTT,
ATTTTT TTATTT TTTATT, TAATAA, ATTTAT, TATATT, TTTTAT, ATATTT, TATTAT,
TGTTTG, TTATAT, TGTAAT, AAATAA, AATTTT, TTTTTA, TAATTT, TTAATT, AAATTT, TTTGTT, ATTATT, ATTTTA, TTTAAT, and TTTTAA.
16. The method according to any one of claims 1-15, wherein extreme GC regions are identified using a sliding window having a width of 4 amino acids; wherein codon modification proceeds in the order of most- to least-preferred codons of the said target expression system; wherein the said modification process repeats until the GC content is at the center of the said target expression system’s range.
17. The method according to any one of claims 1-16, wherein optimized polynucleotide sequence comprises at least one more counted motif than the native sequence, and wherein the optimized polynucleotide sequence comprises a minimum of four counted motifs.
18. The method according to any one of claims 1-17, wherein codon modification proceeds in the order of most- to least-preferred codons of the said target expression system.
19. The method according to any one of claims 1-18, wherein the optimal codons for the said target expression system is determined using at least one of methods comprising codon frequency tables and hidden Markov models.
20. The method according to any one of claims 1-19, wherein the number of counted motifs in the optimized polynucleotide sequence is the same or greater than the number of counted motifs in the native polynucleotide sequence.
21. The method according to any one of claims 1-19, wherein the number of counted motifs in the optimized polynucleotide sequence is not the same as the number of counted motifs in the native polynucleotide sequence.
22. The method according to any one of claims 1-20, wherein the optimized polynucleotide sequence comprises at least one required motif.
23. The method according to any one of claims 1-19, wherein the target system is a corn plant or plant cell and wherein the number of required motifs in the optimized polynucleotide sequence is the same or greater than the number of required motifs in the native polynucleotide sequence.
24. The method according to claim 22 or 23, wherein the required motifs are selected from a) ATATAT, TTGTTT, TTTTGT, TGTTTT, TATATA, TATTTT, TTTTTT, ATTTTT, TTATTT, TTTATT, TAATAA, ATTTAT, TATATT, TTTTAT, ATATTT, TATTAT, TGTTTG, TTATAT, TGTAAT, and AAATAA; or b) ATATAT, TTGTTT, TTTTGT, TGTTTT, TATATA, TATTTT, TTTTTT, ATTTTT, TTATTT, TTTATT, TAATAA, ATTTAT, TATATT, TTTTAT, ATATTT, TATTAT, TGTTTG, TTATAT, TGTAAT, AAATAA, AATTTT, TTTTTA, TAATTT, TTAATT, AAATTT, TTTGTT, ATTATT, ATTTTA, TTTAAT, and TTTTAA.
25. The method according to any one of claims 1-22, wherein the target system is a soybean plant or plant cell and wherein the number of required motifs in the optimized polynucleotide sequence is the same or greater than the number of required motifs in the native polynucleotide sequence.
26. The method according to claim 22 or 25, wherein the required motifs are a) ATTTTT, TATTTT, TTATTT, TTTATT, TTTTTT, TTTTAT, AATTTT, TTTTTA,
ATATAT, TAATTT, TTAATT, AAATTT, AAATAA, ATATTT, TTTGTT, TTGTTT, ATTATT,
ATTTTA, TTTAAT, and TTTTAA; or b) ATATAT, TTGTTT, TTTTGT, TGTTTT, TATATA, TATTTT, TTTTTT, ATTTTT, TTATTT, TTTATT, TAATAA, ATTTAT, TATATT, TTTTAT, ATATTT, TATTAT, TGTTTG,
TTATAT, TGTAAT, AAATAA, AATTTT, TTTTTA, TAATTT, TTAATT, AAATTT, TTTGTT, ATTATT, ATTTTA, TTTAAT, and TTTTAA.
27. The method of any one of claims 1-21, wherein the optimized polynucleotide sequence does not comprise a required motif.
28. The method of any one of claims 1-27, wherein the location of any counted motif in the optimized polynucleotide sequence is different from the location of the counted motif in the native polynucleotide sequence.
29. The method of any one of claims 1-27, wherein said optimized polynucleotide sequence does not comprise a counted motif.
30. The method of any one of claims 1-29, wherein said optimized polynucleotide sequence does not comprise a rare codon.
31. The method of claim 30, wherein said target expression system is a corn plant or corn plant cell and said rare codon is one or more of: TTA, CTA, GTA, CGT, AGT, and CGA; or wherein said target expression system is a soy plant or soy plant cell and rare codon is selected from the group consisting of TTA, CTA, TCG, CCG, ACG, GCG, CGA, and CGG.
PCT/US2023/060837 2022-01-18 2023-01-18 Method for designing synthetic nucleotide sequences WO2023141464A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202263300447P 2022-01-18 2022-01-18
US63/300,447 2022-01-18

Publications (1)

Publication Number Publication Date
WO2023141464A1 true WO2023141464A1 (en) 2023-07-27

Family

ID=85384669

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2023/060837 WO2023141464A1 (en) 2022-01-18 2023-01-18 Method for designing synthetic nucleotide sequences

Country Status (1)

Country Link
WO (1) WO2023141464A1 (en)

Citations (46)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5023179A (en) 1988-11-14 1991-06-11 Eric Lam Promoter enhancer element for gene expression in plant roots
US5110732A (en) 1989-03-14 1992-05-05 The Rockefeller University Selective gene expression in plants
US5268463A (en) 1986-11-11 1993-12-07 Jefferson Richard A Plant promoter α-glucuronidase gene construct
US5366892A (en) 1991-01-16 1994-11-22 Mycogen Corporation Gene encoding a coleopteran-active toxin
US5380831A (en) 1986-04-04 1995-01-10 Mycogen Plant Science, Inc. Synthetic insecticidal crystal protein gene
US5399680A (en) 1991-05-22 1995-03-21 The Salk Institute For Biological Studies Rice chitinase promoter
US5401836A (en) 1992-07-16 1995-03-28 Pioneer Hi-Bre International, Inc. Brassica regulatory sequence for root-specific or root-abundant gene expression
US5428148A (en) 1992-04-24 1995-06-27 Beckman Instruments, Inc. N4 - acylated cytidinyl compounds useful in oligonucleotide synthesis
US5436391A (en) 1991-11-29 1995-07-25 Mitsubishi Corporation Synthetic insecticidal gene, plants of the genus oryza transformed with the gene, and production thereof
US5459252A (en) 1991-01-31 1995-10-17 North Carolina State University Root specific gene promoter
US5466785A (en) 1990-04-12 1995-11-14 Ciba-Geigy Corporation Tissue-preferential promoters
US5569597A (en) 1985-05-13 1996-10-29 Ciba Geigy Corp. Methods of inserting viral DNA into plant material
US5583210A (en) 1993-03-18 1996-12-10 Pioneer Hi-Bred International, Inc. Methods and compositions for controlling plant development
US5593881A (en) 1994-05-06 1997-01-14 Mycogen Corporation Bacillus thuringiensis delta-endotoxin
US5602321A (en) 1992-11-20 1997-02-11 Monsanto Company Transgenic cotton plants producing heterologous polyhydroxy(e) butyrate bioplastic
US5604121A (en) 1991-08-27 1997-02-18 Agricultural Genetics Company Limited Proteins with insecticidal properties against homopteran insects and their use in plant protection
US5608149A (en) 1990-06-18 1997-03-04 Monsanto Company Enhanced starch biosynthesis in tomatoes
US5608144A (en) 1994-08-12 1997-03-04 Dna Plant Technology Corp. Plant group 2 promoters and uses thereof
US5608142A (en) 1986-12-03 1997-03-04 Agracetus, Inc. Insecticidal cotton plants
US5633363A (en) 1994-06-03 1997-05-27 Iowa State University, Research Foundation In Root preferential promoter
US5659026A (en) 1995-03-24 1997-08-19 Pioneer Hi-Bred International ALS3 promoter
US5703049A (en) 1996-02-29 1997-12-30 Pioneer Hi-Bred Int'l, Inc. High methionine derivatives of α-hordothionin for pathogen-control
US5723756A (en) 1990-04-26 1998-03-03 Plant Genetic Systems, N.V. Bacillus thuringiensis strains and their genes encoding insecticidal toxins
US5736514A (en) 1994-10-14 1998-04-07 Nissan Chemical Industries, Ltd. Bacillus strain and harmful organism controlling agents
US5747450A (en) 1991-08-02 1998-05-05 Kubota Corporation Microorganism and insecticide
US5750386A (en) 1991-10-04 1998-05-12 North Carolina State University Pathogen-resistant transgenic plants
WO1998020133A2 (en) 1996-11-01 1998-05-14 Pioneer Hi-Bred International, Inc. Proteins with enhanced levels of essential amino acids
US5792931A (en) 1994-08-12 1998-08-11 Pioneer Hi-Bred International, Inc. Fumonisin detoxification compositions and methods
US5837876A (en) 1995-07-28 1998-11-17 North Carolina State University Root cortex specific gene promoter
US5850016A (en) 1996-03-20 1998-12-15 Pioneer Hi-Bred International, Inc. Alteration of amino acid compositions in seeds
US5885802A (en) 1995-06-02 1999-03-23 Pioneer Hi-Bred International, Inc. High methionine derivatives of α-hordothionin
US5885801A (en) 1995-06-02 1999-03-23 Pioneer Hi-Bred International, Inc. High threonine derivatives of α-hordothionin
WO1999043819A1 (en) 1998-02-26 1999-09-02 Pioneer Hi-Bred International, Inc. Family of maize pr-1 genes and promoters
WO1999043838A1 (en) 1998-02-24 1999-09-02 Pioneer Hi-Bred International, Inc. Synthetic promoters
US5990389A (en) 1993-01-13 1999-11-23 Pioneer Hi-Bred International, Inc. High lysine derivatives of α-hordothionin
US6177611B1 (en) 1998-02-26 2001-01-23 Pioneer Hi-Bred International, Inc. Maize promoters
WO2003092360A2 (en) 2002-04-30 2003-11-13 Verdia, Inc. Novel glyphosate-n-acetyltransferase (gat) genes
US20040082770A1 (en) 2000-10-30 2004-04-29 Verdia, Inc. Novel glyphosate N-acetyltransferase (GAT) genes
US7741118B1 (en) 1989-02-24 2010-06-22 Monsanto Technology Llc Synthetic plant genes and method for preparation
US8740682B2 (en) 2009-04-20 2014-06-03 Capcom Co., Ltd. Game machine, program for realizing game machine, and method of displaying objects in game
US20170362627A1 (en) * 2014-11-10 2017-12-21 Modernatx, Inc. Multiparametric nucleic acid optimization
WO2017218727A1 (en) * 2016-06-15 2017-12-21 President And Fellows Of Harvard College Methods for rule-based genome design
US20170369890A1 (en) * 2014-12-22 2017-12-28 AgBiome, Inc. Methods for making a synthetic gene
EP3406708A1 (en) * 2011-04-15 2018-11-28 Dow AgroSciences LLC Synthetic genes
US20200407739A1 (en) * 2014-01-17 2020-12-31 Dow Agrosciences Llc Increased protein expression in plants
EP3320097B1 (en) * 2015-07-06 2021-01-06 Genective Method for gene optimization

Patent Citations (47)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5569597A (en) 1985-05-13 1996-10-29 Ciba Geigy Corp. Methods of inserting viral DNA into plant material
US5380831A (en) 1986-04-04 1995-01-10 Mycogen Plant Science, Inc. Synthetic insecticidal crystal protein gene
US5268463A (en) 1986-11-11 1993-12-07 Jefferson Richard A Plant promoter α-glucuronidase gene construct
US5608142A (en) 1986-12-03 1997-03-04 Agracetus, Inc. Insecticidal cotton plants
US5023179A (en) 1988-11-14 1991-06-11 Eric Lam Promoter enhancer element for gene expression in plant roots
US7741118B1 (en) 1989-02-24 2010-06-22 Monsanto Technology Llc Synthetic plant genes and method for preparation
US5110732A (en) 1989-03-14 1992-05-05 The Rockefeller University Selective gene expression in plants
US5466785A (en) 1990-04-12 1995-11-14 Ciba-Geigy Corporation Tissue-preferential promoters
US5723756A (en) 1990-04-26 1998-03-03 Plant Genetic Systems, N.V. Bacillus thuringiensis strains and their genes encoding insecticidal toxins
US5608149A (en) 1990-06-18 1997-03-04 Monsanto Company Enhanced starch biosynthesis in tomatoes
US5366892A (en) 1991-01-16 1994-11-22 Mycogen Corporation Gene encoding a coleopteran-active toxin
US5459252A (en) 1991-01-31 1995-10-17 North Carolina State University Root specific gene promoter
US5399680A (en) 1991-05-22 1995-03-21 The Salk Institute For Biological Studies Rice chitinase promoter
US5747450A (en) 1991-08-02 1998-05-05 Kubota Corporation Microorganism and insecticide
US5604121A (en) 1991-08-27 1997-02-18 Agricultural Genetics Company Limited Proteins with insecticidal properties against homopteran insects and their use in plant protection
US5750386A (en) 1991-10-04 1998-05-12 North Carolina State University Pathogen-resistant transgenic plants
US5436391A (en) 1991-11-29 1995-07-25 Mitsubishi Corporation Synthetic insecticidal gene, plants of the genus oryza transformed with the gene, and production thereof
US5428148A (en) 1992-04-24 1995-06-27 Beckman Instruments, Inc. N4 - acylated cytidinyl compounds useful in oligonucleotide synthesis
US5401836A (en) 1992-07-16 1995-03-28 Pioneer Hi-Bre International, Inc. Brassica regulatory sequence for root-specific or root-abundant gene expression
US5602321A (en) 1992-11-20 1997-02-11 Monsanto Company Transgenic cotton plants producing heterologous polyhydroxy(e) butyrate bioplastic
US5990389A (en) 1993-01-13 1999-11-23 Pioneer Hi-Bred International, Inc. High lysine derivatives of α-hordothionin
US5583210A (en) 1993-03-18 1996-12-10 Pioneer Hi-Bred International, Inc. Methods and compositions for controlling plant development
US5593881A (en) 1994-05-06 1997-01-14 Mycogen Corporation Bacillus thuringiensis delta-endotoxin
US5633363A (en) 1994-06-03 1997-05-27 Iowa State University, Research Foundation In Root preferential promoter
US5792931A (en) 1994-08-12 1998-08-11 Pioneer Hi-Bred International, Inc. Fumonisin detoxification compositions and methods
US5608144A (en) 1994-08-12 1997-03-04 Dna Plant Technology Corp. Plant group 2 promoters and uses thereof
US5736514A (en) 1994-10-14 1998-04-07 Nissan Chemical Industries, Ltd. Bacillus strain and harmful organism controlling agents
US5659026A (en) 1995-03-24 1997-08-19 Pioneer Hi-Bred International ALS3 promoter
US5885802A (en) 1995-06-02 1999-03-23 Pioneer Hi-Bred International, Inc. High methionine derivatives of α-hordothionin
US5885801A (en) 1995-06-02 1999-03-23 Pioneer Hi-Bred International, Inc. High threonine derivatives of α-hordothionin
US5837876A (en) 1995-07-28 1998-11-17 North Carolina State University Root cortex specific gene promoter
US5703049A (en) 1996-02-29 1997-12-30 Pioneer Hi-Bred Int'l, Inc. High methionine derivatives of α-hordothionin for pathogen-control
US5850016A (en) 1996-03-20 1998-12-15 Pioneer Hi-Bred International, Inc. Alteration of amino acid compositions in seeds
US6072050A (en) 1996-06-11 2000-06-06 Pioneer Hi-Bred International, Inc. Synthetic promoters
WO1998020133A2 (en) 1996-11-01 1998-05-14 Pioneer Hi-Bred International, Inc. Proteins with enhanced levels of essential amino acids
WO1999043838A1 (en) 1998-02-24 1999-09-02 Pioneer Hi-Bred International, Inc. Synthetic promoters
WO1999043819A1 (en) 1998-02-26 1999-09-02 Pioneer Hi-Bred International, Inc. Family of maize pr-1 genes and promoters
US6177611B1 (en) 1998-02-26 2001-01-23 Pioneer Hi-Bred International, Inc. Maize promoters
US20040082770A1 (en) 2000-10-30 2004-04-29 Verdia, Inc. Novel glyphosate N-acetyltransferase (GAT) genes
WO2003092360A2 (en) 2002-04-30 2003-11-13 Verdia, Inc. Novel glyphosate-n-acetyltransferase (gat) genes
US8740682B2 (en) 2009-04-20 2014-06-03 Capcom Co., Ltd. Game machine, program for realizing game machine, and method of displaying objects in game
EP3406708A1 (en) * 2011-04-15 2018-11-28 Dow AgroSciences LLC Synthetic genes
US20200407739A1 (en) * 2014-01-17 2020-12-31 Dow Agrosciences Llc Increased protein expression in plants
US20170362627A1 (en) * 2014-11-10 2017-12-21 Modernatx, Inc. Multiparametric nucleic acid optimization
US20170369890A1 (en) * 2014-12-22 2017-12-28 AgBiome, Inc. Methods for making a synthetic gene
EP3320097B1 (en) * 2015-07-06 2021-01-06 Genective Method for gene optimization
WO2017218727A1 (en) * 2016-06-15 2017-12-21 President And Fellows Of Harvard College Methods for rule-based genome design

Non-Patent Citations (113)

* Cited by examiner, † Cited by third party
Title
"Biochemistry", 1970, IUPAC, article "International Union of Pure and Applied Chemistry", pages: 4022 - 4027
BAIRN ET AL., PROC. NATL. ACAD. SCI. USA, vol. 88, 1991, pages 5072 - 5076
BALLAS ET AL., NUCLEIC ACIDS RES., vol. 17, 1989, pages 7891 - 7903
BARKLEY ET AL., THE OPERON, 1980, pages 177 - 220
BEGGS, NATURE, vol. 275, 1978, pages 104 - 109
BERENT ET AL., BIOTECH, vol. 78, 1985, pages 208
BONIN: "Ph.D. Thesis", 1993, UNIVERSITY OF HEIDELBERG
BRETAGNE-SAGNARD ET AL., TRANSGENIC RES., vol. 5, 1996, pages 131 - 137
BROWN ET AL., CELL, vol. 49, 1987, pages 603 - 612
CANEVASCINI ET AL., PLANT PHYSIOL., vol. 112, no. 2, 1996, pages 1331 - 1341
CAPANA ET AL., PLANT MOL. BIOL., vol. 25, no. 4, 1994, pages 681 - 691
CELL, vol. 52, 1988, pages 713 - 722
CHEN ET AL., PLANT, vol. 1, no. 10, 1996, pages 955 - 966
CHRISTENSEN ET AL., PLANT MOT. BIOL., vol. 12, 1989, pages 619 - 632
CHRISTENSEN ET AL., PLANT MOT. BIOL., vol. 18, 1992, pages 675 - 689
CHRISTOPHERSON ET AL., PROC. NATL. ACAD. SCI. USA, vol. 89, 1992, pages 3952 - 3956
COHEN ET AL., PROC. NATL. ACAD. SCI. USA, vol. 69, 1972, pages 2110
CORDERO ET AL., PHYSIOL. MOL. PLANT PATH., vol. 41, 1992, pages 189 - 200
CORDEROK ET AL., PLANT J, vol. 6, no. 2, 1994, pages 141 - 150
DEBLOCK ET AL., EMBO I, vol. 6, no. 2, 1987, pages 2513 - 2518
DEGENKOLB ET AL., ANTIMICROB. AGENTS CHEMOTHER., vol. 35, 1991, pages 1591 - 1595
DELLA-CIOPPA ET AL., PLANT PHYSIOL., vol. 84, 1987, pages 965 - 968
DEUSCHLE ET AL., SCIENCE, vol. 248, 1990, pages 480 - 483
DUAN ET AL., NATURE BIOTECHNOLOGY, vol. 14, 1996, pages 494 - 498
ECKELKAMP ET AL., FEBS LETTERS, vol. 323, 1993, pages 73 - 76
ELROY-STEIN ET AL., PROC. NATL. ACAD. SCI. USA, vol. 86, 1989, pages 2549 - 2553
ESTRELLA ET AL., NATURE, vol. 303, 1983, pages 209 - 213
FOX ET AL., DNA RES., vol. 17, no. 3, 2010, pages 185 - 96
GALLIE ET AL., GENE, vol. 165, 1995, pages 233 - 238
GALLIE ET AL., MOLECULAR BIOLOGY OF RNA, 1989, pages 237 - 256
GEISER ET AL., GENE, vol. 48, 1986, pages 109
GILL ET AL., NATURE, vol. 334, 1988, pages 721 - 724
GOSSEN ET AL., PROC. NATL. ACAD SCI. USA, vol. 89, 1992, pages 5547 - 5551
GOSSEN, PH.D. THESIS, UNIVERSITY OF HEIDELBERG, 1993
GOULD N ET AL: "Computational tools and algorithms for designing customized synthetic genes", FRONTIERS IN BIOENGINEERING AND BIOTECHNOLOGY, FRONTIERS RESEARCH FOUNDATION, CH, vol. 2, 6 October 2014 (2014-10-06), pages 41 - 1, XP002755946, ISSN: 2296-4185, [retrieved on 20140101], DOI: 10.3389/FBIOE.2014.00041 *
GRABER ET AL., PNAS, vol. 96, no. 24, 1999, pages 14055 - 14060
GUERINEAU ET AL., MOL. GEN. GENET., vol. 262, 1991, pages 141 - 144
GUERINEAU ET AL., PLANT MOL. BIOL., vol. 15, no. 3, 1990, pages 127 - 136
GUEVARA-GARCIA ET AL., PLANT, vol. 4, no. 3, 1993, pages 495 - 505
HANDB CLIN NEUROL., vol. 147, 2018, pages 105 - 123
HANSEN ET AL., MOL. GEN GENET., vol. 254, no. 3, 1997, pages 337 - 343
HERRERA ESTRELLA ET AL., EMBO, vol. 1, no. 2, 1983, pages 987 - 992
HILLE ET AL., PLANT MOT BIOL., vol. 7, 1990, pages 171 - 176
HILLENAND-WISSMAN, TOPICS MOT SIRUC. BIOL., vol. 10, 1989, pages 143 - 162
HIRE ET AL., PLANT IVIOL. BIOL., vol. 20, no. 2, 1992, pages 207 - 218
IKEMURA T, J. MOL. BIOL., no. 3, 1981, pages 389 - 409
JI ET AL., BMC BIOINFORMATICS, vol. 8, no. 1, 2007, pages 43
JOBLING ET AL., NATURE, vol. 325, 1987, pages 622 - 625
JONES ET AL., MOL. GEN. GENET., vol. 210, 1987, pages 86 - 91
JONES ET AL., SCIENCE, vol. 266, 1994, pages 789
JOSHI ET AL., NUCLEIC ACIDS RES., vol. 15, 1987, pages 9627 - 9639
KAWAMATA ET AL., PLANT CELL PHYSIOL., vol. 38, no. 7, 1997, pages 792 - 803
KELLERBAUMGARTNER, PLANT CELL, vol. 3, no. 10, 1991, pages 1051 - 1061
KIRIHARA ET AL., GENE, vol. 71, 1988, pages 359
KLEINSCHNIDT ET AL., BIOCHEMISTRY, vol. 27, 1988, pages 1094 - 1104
KONG ET AL., ARCH VIROL, vol. 143, 1988, pages 1791 - 1799
KUSTER ET AL., PLANT MOL. BIOL., vol. 29, no. 4, 1995, pages 759 - 772
LABOW ET AL., MOL. CELL. BIOL., vol. 10, 1990, pages 3343 - 3356
LAM, RESULTS PROBL. CELL DIFFER., vol. 20, 1994, pages 181 - 196
LAST ET AL., THEOR. APPL. GENET., vol. 81, 1991, pages 581 - 588
LOMMEL ET AL., VIROLOGY, vol. 81, 1991, pages 382 - 385
MACEJAK ET AL., NATURE, vol. 353, 1991, pages 90 - 94
MARINEAU ET AL., PLANT MOL. BIOL., vol. 9, 1987, pages 335 - 342
MARTIN ET AL., SCIENCE, vol. 262, 1993, pages 1432
MATSUOKA ET AL., PROC NATL. ACAD. SCI. USA, vol. 90, no. 20, 1993, pages 9586 - 9590
MATTON ET AL., MOLECULAR PLANT-MICROBE INTERACTIONS, vol. 2, 1989, pages 325 - 331
MCELROY ET AL., PLANT CELL, vol. 2, no. 7, 1990, pages 1261 - 1272
MCGURL ET AL., SCIENCE, vol. 225, 1992, pages 1570 - 1573
MEIJER ET AL., PLANT MOT BIOL., vol. 16, 1991, pages 807 - 820
MINDRINOS ET AL., CELL, vol. 78, 1994, pages 1089
MUNROE ET AL., GENE, vol. 91, 1990, pages 151 - 158
MUSUMURA ET AL., PLANT MOL. BIOL., vol. 12, 1989, pages 123 - 502
NOVOA ET AL., TRENDS IN GENETICS, vol. 28, no. 11, 2012, pages 574 - 581
ODELL ET AL., NATURE, vol. 313, 1985, pages 810 - 812
OLIVA ET AL., ANTIMICROB. AGENTS CHEMOTHER., vol. 36, 1992, pages 913 - 919
OROZCO ET AL., PLANT MOL BIOL., vol. 23, no. 6, 1993, pages 1129 - 1138
PEDERSEN ET AL., J. BIOL. CHEM., vol. 261, 1986, pages 6279
PLANT SCIENCE (LIMERICK, vol. 79, no. 1, pages 69 - 76
PLOTKIN ET AL., NATURE REVIEWS. GENETICS, vol. 12, no. 1, 2011, pages 32 - 42
PROUDFOOT GENES AND DEVELOPMENT, vol. 25, no. 17, 2011, pages 1770 - 1782
PROUDFOOT, CELL, vol. 64, 1991, pages 671 - 674
REDOLFI ET AL., NETH. I PLANT PATHOL., vol. 89, 1983, pages 245 - 254
REINES ET AL., PROC. NATL. ACAD. SCI. USA, vol. 90, 1993, pages 1917 - 1921
REZNIKOFF, MOT MICROBIOL., vol. 6, 1992, pages 2419 - 2422
ROHMEIER ET AL., PLANT MOL. BIOL., vol. 22, 1993, pages 783 - 792
RUSSELL ET AL., TRANSGENIC RES., vol. 6, no. 2, 1997, pages 157 - 168
RYAN, ANN. REV. PHYTOPATH., vol. 28, 1990, pages 425 - 449
SAMBROOK ET AL.: "Molecular Cloning, A Laboratory Manual", 1989, COLD SPRING HARBOR LABORATORY
SANFACON ET AL., GENES DEV., vol. 5, 1991, pages 141 - 149
SCHUBERT ET AL., 1 BACTERIOL, vol. 170, 1988, pages 5837 - 5847
SHARP ET AL., NUCLEIC ACIDS RESEARCH, vol. 15, no. 3, 1987, pages 1281 - 1295
SHAW ET AL., SCIENCE, vol. 233, 1986, pages 478 - 481
SHEN, NUCLEIC ACIDS RESEARCH, vol. 36, no. 9, 2008, pages 3150 - 3161
SHERMAN ET AL.: "Methods In Yeast Genetics, A Laboratory Manual", 1986, COLD SPRING HARBOR
SIEBERTZ ET AL., PLANT CELL, vol. 1, 1989, pages 961 - 968
SKINNER ET AL.: "Biology and Activities of Yeast", SOC. APP. BACTERIOL. SYMP, no. 9, 1980
SOMSISCH ET AL., MOL. GEN. GENET., vol. 2, 1988, pages 93 - 98
SOMSISCH ET AL., PROC. NATL. ACAD. SCI. USA, vol. 83, 1986, pages 2427 - 2430
SOUTHERN, J. MOL. BIOL., vol. 98, 1975, pages 503
STALKER ET AL., SCIENCE, vol. 242, 1988, pages 419 - 423
STANFORD ET AL., MOL. GEN. GENET., vol. 215, 1989, pages 200 - 208
SUZUKI ET AL., DNA RES., vol. 15, no. 6, 2008, pages 357 - 65
UKNES ET AL., PLANT CELL, vol. 4, 1992, pages 645 - 656
VAN LOON, PLANT MOL. VIROL., vol. 4, 1985, pages 111 - 116
VELTEN ET AL., EIVIBO, vol. 1, no. 3, 1984, pages 2723 - 2730
WILLIAMSON ET AL., EUR. J. BIOCHEM., vol. 165, 1987, pages 99 - 106
WYBORSKI ET AL., NUCLEIC ACIDS RES., vol. 19, 1991, pages 4647 - 4653
YAMAMOTO ET AL., PLANT J., vol. 12, no. 2, 1997, pages 255 - 265
YAMAMOTO, PLANT CELL PHYSIOL., vol. 35, no. 5, 1994, pages 773 - 778
YANG, PROC. NATL. ACAD. SCI. USA, vol. 93, 1996, pages 14972 - 14977
YAO ET AL., CELL, vol. 71, 1992, pages 63 - 72
YARRANTON, CURR. OPIN. BIOTECH., vol. 3, 1992, pages 506 - 511
ZHANG, PROC. NATL. ACAD. SCI. USA, vol. 91, 1994, pages 2507 - 2511

Similar Documents

Publication Publication Date Title
US9139842B2 (en) Methods and compositions for targeting sequences of interest to the chloroplast
US9150625B2 (en) Chloroplast transit peptides and methods of their use
CA2825951C (en) Pesticidal nucleic acids and proteins and uses thereof
US20210277404A1 (en) Methods for making a synthetic gene
CN105602952B (en) A kind of fertile gene and its application
US20080050824A1 (en) Polynucleotide Encoding a Maize Herbicide Resistance Gene and Methods for Use
CN105209623B (en) Manipulation of dominant male sterility
AU2012214420A1 (en) Pesticidal nucleic acids and proteins and uses thereof
JPS6094041A (en) Insect resistant plant
US20190316145A1 (en) Methods And Compositions For Providing Resistance To Glufosinate
CN116888267A (en) Engineering resistance of soybeans
US20140366219A1 (en) Increasing Soybean Defense Against Pests
CN105102474A (en) Constitutive soybean promoters
US20210071193A1 (en) Plants with enhanced resistance to bacterial pathogens
US9157087B2 (en) Inducible plant promoters and the use thereof
WO2023141464A1 (en) Method for designing synthetic nucleotide sequences
JP5173847B2 (en) Disease resistant plant
Xia et al. Transgenic Miscanthus lutarioriparius that co-expresses the Cry 2Aa# and Bar genes
US20220162623A1 (en) Compositions and methods for incorporation of dna into the genome of an organism
CN1326464A (en) Modified synthetic DNA sequences for improved insecticidal control
WO2024030824A2 (en) Plant regulatory sequences and expression cassettes
WO2020198496A1 (en) Modified agrobacterium strains and use thereof for plant transformation
WO2023067574A1 (en) Compositions and methods comprising plants with modified sugar content
Gatehouse Plant transformation: methodology, applications and the potential for unintended effects

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23707839

Country of ref document: EP

Kind code of ref document: A1