WO2009005564A2 - Séquences nucléotidiques codant pour l'enzyme dégradant la cellulose et l'hémicellulose et ayant une cinétique traductionnelle raffinée, et procédé de production correspondant - Google Patents

Séquences nucléotidiques codant pour l'enzyme dégradant la cellulose et l'hémicellulose et ayant une cinétique traductionnelle raffinée, et procédé de production correspondant Download PDF

Info

Publication number
WO2009005564A2
WO2009005564A2 PCT/US2008/006379 US2008006379W WO2009005564A2 WO 2009005564 A2 WO2009005564 A2 WO 2009005564A2 US 2008006379 W US2008006379 W US 2008006379W WO 2009005564 A2 WO2009005564 A2 WO 2009005564A2
Authority
WO
WIPO (PCT)
Prior art keywords
nucleotides
replaced
amino acids
seq
codon
Prior art date
Application number
PCT/US2008/006379
Other languages
English (en)
Other versions
WO2009005564A3 (fr
Inventor
Kirsty A. Salmon
David A. Roth
Wesley G. Hatfield
Yimeng Dou
Original Assignee
The Regents Of The University Of California
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by The Regents Of The University Of California filed Critical The Regents Of The University Of California
Publication of WO2009005564A2 publication Critical patent/WO2009005564A2/fr
Publication of WO2009005564A3 publication Critical patent/WO2009005564A3/fr

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/24Hydrolases (3) acting on glycosyl compounds (3.2)
    • C12N9/2402Hydrolases (3) acting on glycosyl compounds (3.2) hydrolysing O- and S- glycosyl compounds (3.2.1)
    • C12N9/2405Glucanases
    • C12N9/2434Glucanases acting on beta-1,4-glucosidic bonds
    • C12N9/2437Cellulases (3.2.1.4; 3.2.1.74; 3.2.1.91; 3.2.1.150)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/52Genes encoding for enzymes or proenzymes
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/0004Oxidoreductases (1.)
    • C12N9/0055Oxidoreductases (1.) acting on diphenols and related substances as donors (1.10)
    • C12N9/0057Oxidoreductases (1.) acting on diphenols and related substances as donors (1.10) with oxygen as acceptor (1.10.3)
    • C12N9/0061Laccase (1.10.3.2)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/24Hydrolases (3) acting on glycosyl compounds (3.2)
    • C12N9/2402Hydrolases (3) acting on glycosyl compounds (3.2) hydrolysing O- and S- glycosyl compounds (3.2.1)
    • C12N9/2477Hemicellulases not provided in a preceding group
    • C12N9/248Xylanases
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y302/00Hydrolases acting on glycosyl compounds, i.e. glycosylases (3.2)
    • C12Y302/01Glycosidases, i.e. enzymes hydrolysing O- and S-glycosyl compounds (3.2.1)
    • C12Y302/01004Cellulase (3.2.1.4), i.e. endo-1,4-beta-glucanase
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y302/00Hydrolases acting on glycosyl compounds, i.e. glycosylases (3.2)
    • C12Y302/01Glycosidases, i.e. enzymes hydrolysing O- and S-glycosyl compounds (3.2.1)
    • C12Y302/01091Cellulose 1,4-beta-cellobiosidase (3.2.1.91)

Definitions

  • the present invention relates to refining the translational kinetics of an mRNA into polypeptide, and polypeptide-encoding nucleotide sequences which have refined translational properties.
  • Saccharomyces yeasts have proven to be safe, effective and user- friendly microorganisms for large-scale production of industrial ethanol from glucose- based feedstocks. Recently, efforts have been made to use cellulosic biomass as feedstock for producing ethanol.
  • the major fermentable sugars from hydrolysis of these feedstocks such as rice and wheat straw, sugarcane bagasse, corn stover, corn fibre, softwood, hardwood and grasses
  • lignin a major component of such feedstocks.
  • Lignin minimizes the accessibility of cellulose and hemicellulose to microbial enzymes.
  • lignin is generally associated with reduced digestibility of the overall plant biomass.
  • yeast and other microorganisms that can degrade cellulose, hemicellulose and lignin. Many such pathways have been identified in organism such as white-rot fungi.
  • Some translational pauses are resultant from the presence of particular codon pairs in the nucleotide sequence encoding the polypeptide to be translated. As provided herein, inappropriate or excessive translation pauses can reduce protein expression considerably. Further, the translational pausing properties of codon pairs vary from organism to organism. As a result, exogenous expression of genes foreign to the expression organism can lead to inefficient translation and poor expression. Even when the gene is translated in a sufficiently efficient manner that recoverable quantities of the translation product are produced, the protein is often inactive, insoluble, aggregated, or otherwise different in properties from the native protein. Thus, removing inappropriate or excessive translation pause structures coded for by specific di-codon nucleotide sequences in the open reading frame (ORF) can improve protein expression.
  • ORF open reading frame
  • hydrolysis enzyme-encoding nucleotide sequences with refined translational kinetics and methods of designing and synthesizing the same.
  • a hydrolysis enzyme-encoding nucleotide sequence wherein the encoded sequence has amino acid sequence identity with an original hydrolysis enzyme polypeptide, and wherein predicted translation pauses in the expression organism have been removed or reduced by replacing original codon pairs with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the resultant hydrolysis enzyme-encoding nucleotide is predicted to be translated rapidly along its entire length.
  • Expression of the resultant hydrolysis enzyme-encoding nucleotide is predicted to result in improved protein expression levels in cases where inappropriate or excessive translation pauses reduce protein expression.
  • expression of the resultant hydrolysis enzyme- encoding nucleotide is predicted to result in improved levels of active and/or natively folded polypeptide expression products in cases where inappropriate or excessive translation pauses cause expression of inactive, insoluble or aggregated enzyme.
  • hydrolysis enzyme-encoding nucleotide sequences wherein the encoded sequence has amino acid sequence identity with an original hydrolysis enzyme -encoding nucleotide sequence and is adapted for expression in a heterologous host organism, wherein at least 1 , 2, or 3 codon pairs of the original sequence that are predicted to cause a translational pause in the host organism have been replaced with codon pairs that are predicted to be less likely to cause a translational pause therein.
  • the at least three codon pairs of the original sequence that are predicted to cause a translational pause in the host organism are highly- overrepresented codon pairs therein and have been replaced with codon pairs that are not highly-overrepresented therein.
  • the host organism is not human, E. coli or S. cerevisiae.
  • a cellobiohydrolase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 27-471 of wild-type cellobiohydrolase as set forth in SEQ ID NO: 2, wherein at least 3 codon pairs of SEQ ID NO: 1 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the at least 3 codon pairs to be replaced are selected from the following: CCCTCT (nucleotides 463-468); GGCCAA (nucleotides 94- 99); CAGTTT (nucleotides 565-570); GATATC (nucleotides 703-708); GTGGAA (nucleotides 691-696); GGATTT (nucleotides 1 192-1197); GGTATT (nucleotides 1198- 1203).
  • CCCTCT nucleotides 463-468
  • GGCCAA nucleotides 94- 99
  • CAGTTT nucleotides 565-570
  • GATATC nucleotides 703-708
  • GTGGAA nucleotides 691-696
  • GGATTT nucleotides 1 192-1197
  • GGTATT nucleotides 1198- 1203
  • CCCTCT nucleotides 463-4608 replaced with CCTTCT
  • GGCCAA nucleotides 94-99 replaced with GGTCAA
  • CAGTTT nucleotides 565-570 replaced with CAATTT
  • GATATC nucleotides 703-708 replaced with GACATT
  • GTGGAA nucleotides 691- 696 replaced with GTTGAA
  • GGATTT nucleotides 1 192-1 197) replaced with GGTTTC
  • GGTATT nucleotides 1 198-1203 replaced with GGAATT.
  • the nucleotide sequence is optimized for expression in S. cerevisiae.
  • a cellobiohydrolase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 27-471 of wild-type cellobiohydrolase as set forth in SEQ ID NO: 2, wherein at least 3 codon pairs of SEQ ID NO: 1 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the at least 3 codon pairs to be replaced are selected from the following: CTCGGT (nucleotides 760-765); ATTGCC (nucleotides 631-636); GACAGC (nucleotides 1285-1290); GTCTGG (nucleotides 88-93); GTCTGG (nucleotides 1246-1251); TTGCTG (nucleotides 1231-1236); GTGGTG (nucleotides 571-576); ACGCTG (nucleotides 22-27); ACGCTG (nucleotides 31-36); GACTGG (nucleotides 1168-1173); GCCGGA (nucleotides 559-564); CTGGTG (nucleotides 748- 753).
  • At least 3, or 4, or 5, or 6 or more of the specified codon pairs have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • at least 3 of the following codon pair replacements have been made: CTCGGT (nucleotides 760-765) replaced with CTGGGT; ATTGCC (nucleotides 631- 636) replaced with ATTGCG; GACAGC (nucleotides 1285-1290) replaced with GACTCT; GTCTGG (nucleotides 88-93) replaced with GTTTGG; GTCTGG (nucleotides 1246-1251) replaced with GTTTGG; TTGCTG (nucleotides 1231-1236) replaced with CTGCTG; GTGGTG (nucleotides 571-576) replaced with GTTGTT; ACGCTG (nucleotides 22-27) replaced with ACCCTC; ACGCTG (nucleotides 3
  • a cellobiohydrolase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 27-471 of wild-type cellobiohydrolase as set forth in SEQ ID NO: 2, wherein at least 3 codon pairs of SEQ ID NO: 1 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the at least 3 codon pairs to be replaced are selected from the following: CAGTTT (nucleotides 565-570); TTTGAC (nucleotides 1303-1308); TCGTTT (nucleotides 1240-1245); GGCCAA (nucleotides 94-99); AAGAAT (nucleotides 541-546); AAGAAT (nucleotides 934-939); GCCAAA (nucleotides 649-654); GTCAAG (nucleotides 1252-1257); GGTATT (nucleotides 1 198- 1203); ATCAAC (nucleotides 808-813); GGCCAT (nucleotides 865-870); CTTCCA (nucleotides 835-840); GATATC (nucleotides 703-708); TCGTTG (nucleotides 1228- 1233).
  • At least 3, or 4, or 5, or 6 or more of the specified codon pairs have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • at least 3 of the following codon pair replacements have been made: CAGTTT (nucleotides 565-570) replaced with CAATTT; TTTGAC (nucleotides 1303- 1308) replaced with TTTGAT; TCGTTT (nucleotides 1240-1245) replaced with TCTTTT; GGCCAA (nucleotides 94-99) replaced with GGACAA; AAGAAT (nucleotides 541-546) replaced with AAAAAT; AAGAAT (nucleotides 934-939) replaced with AAAAAC; GCCAAA (nucleotides 649-654) replaced with GCTAAA; GTCAAG (nucleotides 1252-1257) replaced with GTTAAA; GGTATT
  • a cellobiohydrolase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 27-471 of wild-type cellobiohydrolase as set forth in SEQ ID NO: 2, wherein at least 3 codon pairs of SEQ ID NO: 1 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the at least 3 codon pairs to be replaced are selected from the following: GGCCAA (nucleotides 94-99); CAGTTT (nucleotides 565- 570); GATATC (nucleotides 703-708); TATTTG (nucleotides 853-858); GGCCAT (nucleotides 865-870); TCGTTG (nucleotides 1228-1233); TTTGTC (nucleotides 1243- 1248); TTCCAA (nucleotides 1363-1368).
  • At least 3, or 4, or 5, or 6 or more of the specified codon pairs have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • at least 3 of the following codon pair replacements have been made: GGCCAA (nucleotides 94-99) replaced with GGTCAA; CAGTTT (nucleotides 565-570) replaced with CAATTC; GATATC (nucleotides 703- 708) replaced with GACATT; TATTTG (nucleotides 853-858) replaced with TATTTA; GGCCAT (nucleotides 865-870) replaced with GGACAT; TCGTTG (nucleotides 1228- 1233) replaced with TCTTTA; TTTGTC (nucleotides 1243-1248) replaced with TTCGTT; TTCCAA (nucleotides 1363-1368) replaced with TTCCAG.
  • a cellobiohydrolase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 27-471 of wild-type cellobiohydrolase as set forth in SEQ ID NO: 2, wherein at least 3 codon pairs of SEQ ID NO: 1 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the at least 3 codon pairs to be replaced are selected from the following: GTGCCT (nucleotides 55-60); GCCAAT (nucleotides 370- 375); GCTATT (nucleotides 406-41 1); GCCGGA (nucleotides 559-564); GCCAAT (nucleotides 778-783); TTGGCA (nucleotides 967-972); AAGCTG (nucleotides 1051- 1056); GCTATT (nucleotides 1066-1071); GCCAAT (nucleotides 1084-1089); ACCGGA (nucleotides 1 147-1 152); ACCGGA (nucleotides 1189-1 194); GGTATT (nucleotides 1198 - 1203); GACAGC (nucleotides 1285-1290); GATGCC (nucleotides 1327-1332); GCCTTG (nucleotides 1285-1290
  • At least 3, or 4, or 5, or 6 or more of the specified codon pairs have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • at least 3 of the following codon pair replacements have been made: GTGCCT (nucleotides 55-60) replaced with GTTCCG; GCCAAT (nucleotides 370-375) replaced with GCTAAT; GCTATT (nucleotides 406-411) replaced with GCCATT; GCCGGA (nucleotides 559-564) replaced with GCTGGT;GCCAAT (nucleotides 778- 783) replaced with GCGAAT; TTGGCA (nucleotides 967-972) replaced with TTGGCT; AAGCTG (nucleotides 1051-1056) replaced with AAATTG; GCTATT (nucleotides 1066-1071) replaced with GCCATT; GCCAAT (nucleotides
  • a cellobiohydrolase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 27-471 of wild-type cellobiohydrolase as set forth in SEQ ID NO: 2 and is adapted for expression in a heterologous host organism, and wherein at least three codon pairs of the wild-type sequence that are predicted to cause a translational pause in the host organism have been replaced with codon pairs that are predicted to be less likely to cause a translational pause therein.
  • the codon pairs of the wild-type sequence that are predicted to cause a translational pause in the host organism are highly-overrepresented codon pairs therein and have been replaced with codon pairs that are not highly-overrepresented therein, wherein a highly-overrepresented codon pair is a codon pair that has a translational kinetics value greater than 5, or 3, or 2.5, or 2 times the standard deviation of translational kinetics values for the host organism.
  • the host organism is not human, E. coli or S. cerevisiae.
  • a cellobiohydrolase-encoding nucleotide sequence having at least a 75% amino acid sequence identity with amino acids 27-471 of wild-type cellobiohydrolase as set forth in SEQ ID NO: 2 and is adapted for expression in a heterologous host organism, wherein at least three codon pairs of the wild-type sequence that are predicted to cause a translational pause in the host organism have been replaced with codon pairs that are predicted to be less likely to cause a translational pause therein, and wherein the host organism is selected from the following: Pichia pastoris; Or ⁇ ctolagus cuniculus (rabbit); Macaca fascicularis (Long-tailed monkey); M. mulatto (Monkey); E.
  • an expression system comprising an expression vector in a host organism, wherein the expression vector includes the nucleotide sequence of the embodiments provided herein, operably linked to an expression control sequence.
  • a system for degrading cellulose comprising one or more host organisms that collectively include nucleotide sequences operably encoding the following enzymes: endo-l,4- ⁇ -glucanase, exo-l,4- ⁇ -D- glucanase, and ⁇ -D-glucosidase; wherein the enzymes are heterologous to the one or more host organisms, and wherein transcriptional kinetics of each of the nucleotide sequences encoding the enzymes has been modified to replace at least three codon pairs present in the original sequence for each enzyme, wherein the at least three replaced codon pairs are predicted to cause a translational pause in the host organism, and wherein said modification results in silent permutation or conservative amino acid substitution of said at least three codon pairs.
  • the one or more host organisms are selected from the group consisting of: Saccharomyces cerevisiae, Pichia pastoris, Escherichia coli, Bombyx mori, Spodoptera frugiperda, Drosophila melanogaster, Kluyveromyces lactis, Zymomonas mobilis and Schizo saccharomyces pombe.
  • each encoded enzyme has at least a 75% amino acid sequence identity with the original sequence of the enzyme.
  • the exo-l,4- ⁇ -D-glucanase retains at least 75% of the enzymatic activity of wild-type TrCBH-II (SEQ ID NO: 2) under normal physiological conditions.
  • a cellobiohydrolase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 27-471 of wild-type cellobiohydrolase as set forth in SEQ ID NO: 2 and is adapted for expression in a heterologous host organism, wherein at least 1 , 2 or 3 codon pairs present in SEQ ID NO: 1 and which encode amino acids 27-62 of SEQ ID NO: 2 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof, and wherein at least one replacement codon pair is predicted to be less likely to cause a translational pause in the heterologous host organism relative to the respective wild type codon pair when expressed in the heterologous host organism.
  • the z score of at least one replacement codon pair when expressed in the heterologous host organism is no more than 150% of the z score for the wild type codon pair when expressed in the native organism.
  • no replacement codon encoding amino acids 27-62 of SEQ ID NO: 2 has a z score for expression in the heterologous host that is more than 400%, or 300%, or 200%, or 150% or 100% of the mean or median of the five highest z scores of the wild type codon pairs encoding amino acids 27-62 when expressed in the native organism.
  • no replacement codon encoding amino acids 27-62 of SEQ ID NO: 2 has a z score for expression in the heterologous host that is more than 400%, or 300%, or 200%, or 150% or 100% of the wild type codon pair TCCAAC when expressed in the native organism.
  • a cellobiohydrolase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 27-471 of wild-type cellobiohydrolase as set forth in SEQ ID NO: 2 and is adapted for expression in a heterologous host organism, wherein at least 1, 2 or 3 codon pairs present in SEQ ID NO:1 and which encode amino acids 107- 471 of SEQ ID NO: 2 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof, and wherein at least one replacement codon pair is predicted to be less likely to cause a translational pause in the heterologous host organism relative to the respective wild type codon pair when expressed in the heterologous host organism.
  • the z score of at least one replacement codon pair when expressed in the heterologous host organism is no more than 150% of the z score for the wild type codon pair when expressed in the native organism.
  • no replacement codon encoding amino acids 107-471 of SEQ ID NO: 2 has a z score for expression in the heterologous host that is more than 400%, or 300%, or 200%, or 150% or 100% of the mean or median of the five highest z scores of the wild type codon pairs encoding amino acids 107-471 when expressed in the native organism.
  • no replacement codon encoding amino acids 107-471 of SEQ ID NO: 2 has a z score for expression in the heterologous host that is more than 400%, or 300%, or 200%, or 150% or 100% of the wild type codon pair GCAAAG when expressed in the native organism.
  • a cellobiohydrolase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 27-471 of wild-type cellobiohydrolase as set forth in SEQ ID NO: 2 and is adapted for expression in a heterologous host organism, wherein at least 1, 2 or 3 codon pairs present in SEQ ID NO:1 and which encode amino acids 62-107 of SEQ ID NO: 2 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof, and wherein at least one replacement codon pair is predicted to be equally or more likely to cause a translational pause in the heterologous host organism relative to the respective wild type codon pair when expressed in the heterologous host organism.
  • the z score of at least one replacement codon pair when expressed in the heterologous host organism is at least 75% of the z score for the wild type codon pair when expressed in the native organism.
  • at least one replacement codon encoding amino acids 62- 107 of SEQ ID NO: 2 has a z score for expression in the heterologous that is more than 200%, or 100%, or 75%, or 50% or 40% of the mean or median of the five highest z scores of the wild type codon pairs encoding amino acids 62-107 when expressed in the native organism.
  • At least one replacement codon encoding amino acids 62-107 of SEQ ID NO: 2 has a z score for expression in the heterologous host that is more than 200%, or 100%, or 75%, or 50% or 40% of the wild type codon pair TCTACT when expressed in the native organism.
  • a laccase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-518 of wild-type laccase as set forth in SEQ ID NO: 26, wherein at least 3 codon pairs of SEQ ID NO: 25SEQ ID NO: 25 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the at least 3 codon pairs to be replaced are selected from the following: GATATC (nucleotides 1474 - 1479); TTGAAT (nucleotides 802 - 807); ATCAAG (nucleotides 1477 - 1482); GCCAAG (nucleotides 526 - 531).
  • GATATC nucleotides 1474 - 1479
  • TTGAAT nucleotides 802 - 807
  • ATCAAG nucleotides 1477 - 1482
  • GCCAAG nucleotides 526 - 531.
  • at least 3, or 4, or 5, or 6 or more of the specified codon pairs have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • At least 3 of the following codon pair replacements have been made: GATATC (nucleotides 1474 - 1479) replaced with GATATA; TTGAAT (nucleotides 802 - 807) replaced with TTAAAT; ATCAAG (nucleotides 1477 - 1482) replaced with ATAAAA; GCCAAG (nucleotides 526 - 531) replaced with GCAAAA.
  • the nucleotide sequence is optimized for expression in S.cerevisiae.
  • a laccase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-518 of wild-type laccase as set forth in SEQ ID NO: 26, wherein at least 3 codon pairs of SEQ ID NO: 25 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the at least 3 codon pairs to be replaced are selected from the following: TTCCTC (nucleotides 1405 - 1410); ATCCTC (nucleotides 892 - 897); TTCCAG (nucleotides 190 - 195); TTCCAG (nucleotides 265 - 270); GACAGC (nucleotides 1360 - 1365); TTCCCG (nucleotides 544 - 549); CAGGCG (nucleotides 457 - 462); GCGGCA (nucleotides 589 - 594); TTCCGC (nucleotides 1327 - 1332).
  • TTCCTC nucleotides 1405 - 1410 replaced with TTCCTG
  • ATCCTC nucleotides 892 - 897 replaced with ATCCTG
  • TTCCAG nucleotides 190 - 195 replaced with TTCCAA
  • TTCCAG nucleotides 265 - 270 replaced with TTTCAG
  • GACAGC nucleotides 1360 - 1365 replaced with GATTCT
  • TTCCCG nucleotides 544 - 549) replaced with TTCCCA
  • CAGGCG nucleotides 457 - 462 replaced with CAAGCG
  • GCGGCA nucleotides 589 -
  • a laccase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-518 of wild-type laccase as set forth in SEQ ID NO: 26, wherein at least 3 codon pairs of SEQ ID NO: 25 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the at least 3 codon pairs to be replaced are selected from the following: GATATC (nucleotides 1474 - 1479); ATCAAG (nucleotides 1477 - 1482); TTCAAC (nucleotides 1051 - 1056); ATCAAC (nucleotides 205 - 210); ATCAAC (nucleotides 571 - 576); ATCAAC (nucleotides 880 - 885); ATCAAC (nucleotides 1078 - 1083).
  • GATATC nucleotides 1474 - 1479
  • ATCAAG nucleotides 1477 - 1482
  • TTCAAC nucleotides 1051 - 1056
  • ATCAAC nucleotides 205 - 210
  • ATCAAC nucleotides 571 - 576
  • ATCAAC nucleotides 880 - 885
  • ATCAAC nucleotides 1078 - 1083
  • At least 3 of the following codon pair replacements have been made: GATATC (nucleotides 1474 - 1479) replaced with GACATT; ATCAAG (nucleotides 1477 - 1482) replaced with ATTAAA; TTCAAC (nucleotides 1051 - 1056) replaced with TTTAAT; ATCAAC (nucleotides 205 - 210) replaced with ATTAAT; ATCAAC (nucleotides 571 - 576) replaced with ATTAAT; ATCAAC (nucleotides 880 - 885) replaced with ATTAAT; ATCAAC (nucleotides 1078 - 1083) replaced with ATTAAT.
  • the nucleotide sequence is optimized for expression in P. pastoris.
  • a laccase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-518 of wild-type laccase as set forth in SEQ ID NO: 26, wherein at least 3 codon pairs of SEQ ID NO: 25 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the at least 3 codon pairs to be replaced are selected from the following: AAGAAG (nucleotides 175 - 180 ); TTCCAT (nucleotides 349 - 354 ); GCCAAG (nucleotides 526 - 531 ); TTCCAT (nucleotides 1426 - 1431 ); GATATC (nucleotides 1474 - 1479 ).).
  • AAGAAG nucleotides 175 - 180
  • TTCCAT nucleotides 349 - 354
  • GCCAAG nucleotides 526 - 531
  • TTCCAT nucleotides 1426 - 1431
  • GATATC nucleotides 1474 - 1479 .
  • at least 3 of the following codon pair replacements have been made: AAGAAG (nucleotides 175 - 180 ) replaced with AAAAAG; TTCCAT (nucleotides 349
  • nucleotide sequence is optimized for expression in K.lactis.
  • a laccase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-518 of wild-type laccase as set forth in SEQ ID NO: 26, wherein at least 3 codon pairs of SEQ ID NO: 25 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the at least 3 codon pairs to be replaced are selected from the following: TCCGGT (nucleotides 7 - 12 ); ATCGGG (nucleotides 64 - 69 ); CACAGC (nucleotides 385 - 390 ); GCCAAG (nucleotides 526 - 531 ); AAGCTG (nucleotides 529 - 534 ); CGCTAT (nucleotides 643 - 648 ); GTCGAT (nucleotides 727 - 732 ); AACAGC (nucleotides 739 - 744 ); GATGCC (nucleotides 916 - 921 ); GCACCG (nucleotides 940
  • GTGCCT nucleotides 1000 - 1005
  • GTCGAT nucleotides 1027 - 1032
  • GCAGGG nucleotides 1 165 - 1170
  • CACAGC nucleotides 1192 - 1197
  • GACAGC nucleotides 1360 - 1365 .
  • at least 3, or 4, or 5, or 6 or more of the specified codon pairs have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • TCCGGT codon pair replacements
  • ATCGGG nucleotides 64 - 69
  • CACAGC nucleotides 385 - 390
  • CATTCT CATTCT
  • GCCAAG nucleotides 526 - 531
  • AAGCTG nucleotides 529 - 534
  • AAATTG AAATTG
  • CGCTAT nucleotides 643 - 648
  • GTCGAT nucleotides 727 - 732
  • AACAGC nucleotides 739 - 744
  • a laccase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-518 of wild-type laccase as set forth in SEQ ID NO: 26 and is adapted for expression in a heterologous host organism, and wherein at least three codon pairs of the wild-type sequence that are predicted to cause a translational pause in the host organism have been replaced with codon pairs that are predicted to be less likely to cause a translational pause therein.
  • the codon pairs of the wild-type sequence that are predicted to cause a translational pause in the host organism are highly- overrepresented codon pairs therein and have been replaced with codon pairs that are not highly-overrepresented therein, wherein a highly-overrepresented codon pair is a codon pair that has a translational kinetics value greater than 5, or 3, or 2.5, or 2 times the Standard deviation of translational kinetics values for the host organism.
  • the host organism is not human, E. coli or S.cerevisiae.
  • a laccase-encoding nucleotide sequence having at least a 75% amino acid sequence identity with amino acids 1-518 of wild-type laccase as set forth in SEQ ID NO: 26 and is adapted for expression in a heterologous host organism, wherein at least three codon pairs of the wild-type sequence that are predicted to cause a translational pause in the host organism have been replaced with codon pairs that are predicted to be less likely to cause a translational pause therein, and wherein the host organism is selected from the following: Pichia pastoris; Otyctolagus cuniculus (rabbit); Macaca fascicularis (Long-tailed monkey); M. mulatto (Monkey); E.
  • an expression system comprising an expression vector in a host organism, wherein the expression vector includes the nucleotide sequence of the embodiments provided herein, operably linked to an expression control sequence.
  • a system for metabolizing lignin comprising one or more host organisms that collectively include nucleotide sequences operably encoding the following enzymes: laccase, Mn-dependent peroxidase, and lignin peroxidase; wherein the enzymes are heterologous to the one or more host organisms, and wherein transcriptional kinetics of each of the nucleotide sequences encoding the enzymes has been modified to replace at least three codon pairs present in the original sequence for each enzyme, wherein the at least three replaced codon pairs are predicted to cause a translational pause in the host organism, and wherein said modification results in silent permutation or conservative amino acid substitution of said at least three codon pairs.
  • the one or more host oganisms are selected from the group consisting of: Saccharomyces cerevisiae, Pichia pastoris, Escherichia coli, Bombyx mori, Spodoptera frugiperda, Drosophila melanogaster Kluyveromyces lactis, Zymomonas mobilis and Schizosaccharomyces pombe.
  • each encoded enzyme has at least a 75% amino acid sequence identity with the original sequence of the enzyme.
  • the laccase retains at least 75% of the enzymatic activity of wild-type LCC (SEQ ID NO: 26) under normal physiological conditions.
  • a laccase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-518 of wild-type laccase as set forth in SEQ ID NO: 26 SEQ ID NO: 26 and is adapted for expression in a heterologous host organism, wherein at least 1, 2 or 3 codon pairs present in SEQ ID NO:1 and which encode amino acids 28-152 of SEQ ID NO: 26 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof, and wherein at least one replacement codon pair is predicted to be less likely to cause a translational pause in the heterologous host organism relative to the respective wild type codon pair when expressed in the heterologous host organism.
  • the z score of at least one replacement codon pair when expressed in the heterologous host organism is no more than 150% of the z score for the wild type codon pair when expressed in the native organism, hi certain aspects, no replacement codon encoding amino acids 28-152 of SEQ ID NO: 26 has a z score for expression in the heterologous host that is more than 400%, or 300%, or 200%, or 150% or 100% of the mean or median of the five highest z scores of the wild type codon pairs encoding amino acids 28-152 when expressed in the native organism.
  • a laccase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-518 of wild-type laccase as set forth in SEQ ID NO: 26 and is adapted for expression in a heterologous host organism, wherein at least 1, 2 or 3 codon pairs present in SEQ ID NO:1 and which encode amino acids 161-305 of SEQ ID NO: 26 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof, and wherein at least one replacement codon pair is predicted to be less likely to cause a translational pause in the heterologous host organism relative to the respective wild type codon pair when expressed in the heterologous host organism.
  • the z score of at least one replacement codon pair when expressed in the heterologous host organism is no more than 150% of the z score for the wild type codon pair when expressed in the native organism.
  • no replacement codon encoding amino acids 161-305 of SEQ ID NO: 26 has a z score for expression in the heterologous host that is more than 400%, or 300%, or 200%, or 150% or 100% of the mean or median of the five highest z scores of the wild type codon pairs encoding amino acids 161-305 when expressed in the native organism.
  • a laccase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-518 of wild-type laccase as set forth in SEQ ID NO: 26 and is adapted for expression in a heterologous host organism, wherein at least 1, 2 or 3 codon pairs present in SEQ ID NO:1 and which encode amino acids 364-493 of SEQ ID NO: 26 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof, and wherein at least one replacement codon pair is predicted to be less likely to cause a translational pause in the heterologous host organism relative to the respective wild type codon pair when expressed in the heterologous host organism.
  • the z score of at least one replacement codon pair when expressed in the heterologous host organism is no more than 150% of the z score for the wild type codon pair when expressed in the native organism.
  • no replacement codon encoding amino acids 364-493 of SEQ ID NO: 26 has a z score for expression in the heterologous host that is more than 400%, or 300%, or 200%, or 150% or 100% of the mean or median of the five highest z scores of the wild type codon pairs encoding amino acids 364-493 when expressed in the native organism.
  • a laccase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-518 of wild-type laccase as set forth in SEQ ID NO: 26 and is adapted for expression in a heterologous host organism, wherein at least 1, 2 or 3 codon pairs present in SEQ ID NO:1 and which encode amino acids 1-28 of SEQ ID NO: 26 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof, and wherein at least one replacement codon pair is predicted to be equally or more likely to cause a translational pause in the heterologous host organism relative to the respective wild type codon pair when expressed in the heterologous host organism.
  • the z score of at least one replacement codon pair when expressed in the heterologous host organism is at least 75% of the z score for the wild type codon pair when expressed in the native organism.
  • at least one replacement codon encoding amino acids 1-28 of SEQ ID NO: 26 has a z score for expression in the heterologous that is more than 200%, or 100%, or 75%, or 50% or 40% of the mean or median of the five highest z scores of the wild type codon pairs encoding amino acids 1-28 when expressed in the native organism.
  • a laccase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-518 of wild-type laccase as set forth in SEQ ID NO: 26 and is adapted for expression in a heterologous host organism, wherein at least 1, 2 or 3 codon pairs present in SEQ ID NO:1 and which encode amino acids 152-161 of SEQ ID NO: 26 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof, and wherein at least one replacement codon pair is predicted to be equally or more likely to cause a translational pause in the heterologous host organism relative to the respective wild type codon pair when expressed in the heterologous host organism.
  • the z score of at least one replacement codon pair when expressed in the heterologous host organism is at least 75% of the z score for the wild type codon pair when expressed in the native organism.
  • at least one replacement codon encoding amino acids 152-161 of SEQ ID NO: 26 has a z score for expression in the heterologous that is more than 200%, or 100%, or 75%, or 50% or 40% of the mean or median of the five highest z scores of the wild type codon pairs encoding amino acids 152-161 when expressed in the native organism.
  • a laccase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-518 of wild-type laccase as set forth in SEQ ID NO: 26 and is adapted for expression in a heterologous host organism, wherein at least 1, 2 or 3 codon pairs present in SEQ ID NO:1 and which encode amino acids 305-364 of SEQ ID NO: 26 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof, and wherein at least one replacement codon pair is predicted to be equally or more likely to cause a translational pause in the heterologous host organism relative to the respective wild type codon pair when expressed in the heterologous host organism.
  • the z score of at least one replacement codon pair when expressed in the heterologous host organism is at least 75% of the z score for the wild type codon pair when expressed in the native organism.
  • at least one replacement codon encoding amino acids 305-364 of SEQ ID NO: 26 has a z score for expression in the heterologous that is more than 200%, or 100%, or 75%, or 50% or 40% of the mean or median of the five highest z scores of the wild type codon pairs encoding amino acids 305-364 when expressed in the native organism.
  • a lignin peroxidase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-372 of wild-type lignin peroxidase as set forth in SEQ ID NO: 50, wherein at least 3 codon pairs of SEQ ID NO: 49 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the at least 3 codon pairs to be replaced are selected from the following: CTTTCC (nucleotides 901 - 906); CTTTCT (nucleotides 19 - 24); GACCGT (nucleotides 547 - 552); TTCCCC (nucleotides 301 - 306); TTCCCC (nucleotides 730 - 735); TTCCCC (nucleotides 988 - 993); TTCCCC (nucleotides 1051 - 1056).
  • CTTTCC nucleotides 901 - 906
  • CTTTCT nucleotides 19 - 24
  • GACCGT nucleotides 547 - 552
  • TTCCCC nucleotides 301 - 306
  • TTCCCC nucleotides 730 - 735
  • TTCCCC nucleotides 988 - 993
  • TTCCCC nucleotides 1051 - 1056.
  • At least 3 of the following codon pair replacements have been made: CTTTCC (nucleotides 901 - 906) replaced with TTGTCT; CTTTCT (nucleotides 19 - 24) replaced with TTGTCT; GACCGT (nucleotides 547 - 552) replaced with GATAGA; TTCCCC (nucleotides 301 - 306) replaced with TTTCCA; TTCCCC (nucleotides 730 - 735) replaced with TTTCCA; TTCCCC (nucleotides 988 - 993) replaced with TTTCCA; TTCCCC (nucleotides 1051 - 1056) replaced with TTTCCA.
  • the nucleotide sequence is optimized for expression in S.cerevisiae.
  • a lignin peroxidase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-372 of wild-type lignin peroxidase as set forth in SEQ ID NO: 50, wherein at least 3 codon pairs of SEQ ID NO: 49 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the at least 3 codon pairs to be replaced are selected from the following: CTTTCC (nucleotides 901 - 906); TTCCTC (nucleotides 700 - 705); CTCGAC (nucleotides 340 - 345); CTTTCT (nucleotides 19 - 24); TTCCAG (nucleotides 880 - 885); GTCTGG (nucleotides 595 - 600); TTCCCG (nucleotides 1042 - 1047); ATCGCC (nucleotides 229 - 234); ATCGCC (nucleotides 373 - 378).
  • At least 3, or 4, or 5, or 6 or more of the specified codon pairs have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • at least 3 of the following codon pair replacements have been made: CTTTCC (nucleotides 901 - 906) replaced with CTGTCT; TTCCTC (nucleotides 700 - 705) replaced with TTCTTG; CTCGAC (nucleotides 340 - 345) replaced with CTGGAC; CTTTCT (nucleotides 19 - 24) replaced with CTGTCT; TTCCAG (nucleotides 880 - 885) replaced with TTCCAA; GTCTGG (nucleotides 595 - 600) replaced with GTTTGG ;TTCCCG (nucleotides 1042 - 1047) replaced with TTCCCA; ATCGCC (nucleotides 229 - 234) replaced with ATTGC
  • a lignin peroxidase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-372 of wild-type lignin peroxidase as set forth in SEQ ID NO: 50, wherein at least 3 codon pairs of SEQ ID NO: 49 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the at least 3 codon pairs to be replaced are selected from the following: TTCAAG (nucleotides 7 - 12); ATCAAC (nucleotides 922 - 927); GACGAA (nucleotides 343 - 348); CTTTCC (nucleotides 901 - 906).
  • TTCAAG nucleotides 7 - 12
  • ATCAAC nucleotides 922 - 927
  • GACGAA nucleotides 343 - 348
  • CTTTCC nucleotides 901 - 906
  • TTCAAG nucleotides 7 - 12
  • ATCAAC nucleotides 922 - 927) replaced with ATTAAT
  • GACGAA nucleotides 343 - 3448
  • CTTTCC nucleotides 901 - 906 replaced with TTGTCT.
  • the nucleotide sequence is optimized for expression in P. pastoris.
  • a lignin peroxidase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-372 of wild-type lignin peroxidase as set forth in SEQ ID NO: 50, wherein at least 3 codon pairs of SEQ ID NO: 49 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the at least 3 codon pairs to be replaced are selected from the following: CTTTCT (nucleotides 19 - 24 ); TTTGTC (nucleotides 25 - 30 ); TTCCCC (nucleotides 301 - 306 ); GACCGT (nucleotides 547 - 552 ); TTCCCC (nucleotides 730 - 735 ); CTTTCC (nucleotides 901 - 906 ); TTCCCC (nucleotides 988 - 993 ); TTCCCC (nucleotides 1051 - 1056 ).
  • At least 3, or 4, or 5, or 6 or more of the specified codon pairs have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • at least 3 of the following codon pair replacements have been made: CTTTCT (nucleotides 19 - 24 ) replaced with TTGTCT; TTTGTC (nucleotides 25 - 30 ) replaced with TTCGTT; TTCCCC (nucleotides 301 - 306 ) replaced with TTCCCT; GACCGT (nucleotides 547 - 552 ) replaced with GATAGA; TTCCCC (nucleotides 730 - 735 ) replaced with TTCCCT; CTTTCC (nucleotides 901 - 906 ) replaced with TTGTCT; TTCCCC (nucleotides 988 - 993 ) replaced with TTTCCT; TTCCCC (nucleotides 1988 - 993 ) replaced with TTTC
  • a lignin peroxidase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-372 of wild-type lignin peroxidase as set forth in SEQ ID NO: 50, wherein at least 3 codon pairs of SEQ ID NO: 49 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the at least 3 codon pairs to be replaced are selected from the following: CTTTCT (nucleotides 19 - 24 ); ACGGCT (nucleotides 184 - 189 ); CTGACC (nucleotides 211 - 216 ); GCCCGT (nucleotides 376 - 381 ); ATCGGT (nucleotides 424 - 429 ); CTGACC (nucleotides 604 - 609 ); AAGGCT (nucleotides 865 - 870 ); CTTTCC (nucleotides 901 - 906 ); CCCGGA (nucleotides 1063 - 1068 ).
  • At least 3, or 4, or 5, or 6 or more of the specified codon pairs have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • at least 3 of the following codon pair replacements have been made: CTTTCT (nucleotides 19 - 24 ) replaced with TTGTCT; ACGGCT (nucleotides 184 - 189 ) replaced with ACCGCT; CTGACC (nucleotides 21 1 - 216 ) replaced with TTGACC; GCCCGT (nucleotides 376 - 381 ) replaced with GCTCGT; ATCGGT (nucleotides 424 - 429 ) replaced with ATTGGA; CTGACC (nucleotides 604 - 609 ) replaced with TTGACA; AAGGCT (nucleotides 865 - 870 ) replaced with AAAGCC; CTTTCC (nucleot
  • a lignin peroxidase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-372 of wild-type lignin peroxidase as set forth in SEQ ID NO: 50 and is adapted for expression in a heterologous host organism, and wherein at least three codon pairs of the wild-type sequence that are predicted to cause a translational pause in the host organism have been replaced with codon pairs that are predicted to be less likely to cause a translational pause therein.
  • the codon pairs of the wild-type sequence that are predicted to cause a translational pause in the host organism are highly-overrepresented codon pairs therein and have been replaced with codon pairs that are not highly-overrepresented therein, wherein a highly-overrepresented codon pair is a codon pair that has a translational kinetics value greater than 5, or 3, or 2.5, or 2 times the standard deviation of translational kinetics values for the host organism.
  • the host organism is not human, E. coli or S.cerevisiae.
  • a lignin peroxidase-encoding nucleotide sequence having at least a 75% amino acid sequence identity with amino acids 1-372 of wild-type lignin peroxidase as set forth in SEQ ID NO: 50 and is adapted for expression in a heterologous host organism, wherein at least three codon pairs of the wild-type sequence that are predicted to cause a translational pause in the host organism have been replaced with codon pairs that are predicted to be less likely to cause a translational pause therein, and wherein the host organism is selected from the following: Pichia pastoris; Oryctolagus cuniculus (rabbit); Macaca fascicularis (Long-tailed monkey); M.
  • an expression system comprising an expression vector in a host organism, wherein the expression vector includes the nucleotide sequence of the embodiments provided herein, operably linked to an expression control sequence.
  • a system for metabolizing lignin comprising one or more host organisms that collectively include nucleotide sequences operably encoding the following enzymes: laccase, Mn-dependent peroxidase, and lignin peroxidase; wherein the enzymes are heterologous to the one or more host organisms, and wherein transcriptional kinetics of each of the nucleotide sequences encoding the enzymes has been modified to replace at least three codon pairs present in the original sequence for each enzyme, wherein the at least three replaced codon pairs are predicted to cause a translational pause in the host organism, and wherein said modification results in silent permutation or conservative amino acid substitution of said at least three codon pairs.
  • the one or more host oganisms are selected from the group consisting of: Saccharomyces cerevisiae, Pichia pastoris, Escherichia coli, .Bombyx mori, Spodoptera frugiperda, Drosophila melanogaster Kluyveromyces lactis, Zymomonas mobilis and Schizosaccharomyces pombe.
  • each encoded enzyme has at least a 75% amino acid sequence identity with the original sequence of the enzyme.
  • the lignin peroxidase retains at least 75% of the enzymatic activity of wild-type LIP (SEQ ID NO: 50) under normal physiological conditions.
  • a lignin peroxidase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-372 of wild-type lignin peroxidase as set forth in SEQ ID NO: 50 and is adapted for expression in a heterologous host organism, wherein at least 1 , 2 or 3 codon pairs present in SEQ ID NO: 49 and which encode amino acids 46- 287 of SEQ ID NO: 50 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof, and wherein at least one replacement codon pair is predicted to be less likely to cause a translational pause in the heterologous host organism relative to the respective wild type codon pair when expressed in the heterologous host organism.
  • the z score of at least one replacement codon pair when expressed in the heterologous host organism is no more than 150% of the z score for the wild type codon pair when expressed in the native organism.
  • no replacement codon encoding amino acids 46-287 of SEQ ID NO: 50 has a z score for expression in the heterologous host that is more than 400%, or 300%, or 200%, or 150% or 100% of the mean or median of the five highest z scores of the wild type codon pairs encoding amino acids 46-287 when expressed in the native organism.
  • a lignin peroxidase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-372 of wild-type lignin peroxidase as set forth in SEQ ID NO: 50 and is adapted for expression in a heterologous host organism, wherein at least 1 , 2 or 3 codon pairs present in SEQ ID NO: 49 and which encode amino acids 1 -46 of SEQ ID NO: 50 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof, and wherein at least one replacement codon pair is predicted to be equally or more likely to cause a translational pause in the heterologous host organism relative to the respective wild type codon pair when expressed in the heterologous host organism.
  • the z score of at least one replacement codon pair when expressed in the heterologous host organism is at least 75% of the z score for the wild type codon pair when expressed in the native organism.
  • at least one replacement codon encoding amino acids 1-46 of SEQ ID NO: 50 has a z score for expression in the heterologous that is more than 200%, or 100%, or 75%, or 50% or 40% of the mean or median of the five highest z scores of the wild type codon pairs encoding amino acids 1 -46 when expressed in the native organism.
  • a Mn-dependent peroxidase- encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-364 of wild-type Mn-dependent peroxidase as set forth in SEQ ID NO: 74SEQ ID NO: 74, wherein at least 3 codon pairs of SEQ ID NO: 73 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the at least 3 codon pairs to be replaced are selected from the following: TTCCCC (nucleotides 130 - 135); TTCCCC (nucleotides 721 - 726); TTCCCC (nucleotides 979 - 984); TTCCCC . (nucleotides 1033 - 1038); GCCAAG (nucleotides 247 - 252). In some such nucleotide sequences, at least 3, or 4, or 5, or 6 or more of the specified codon pairs have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • TTCCCC nucleotides 130 - 135) replaced with TTTCCG
  • TTCCCC nucleotides 721 - 726) replaced with TTCCCA
  • TTCCCC nucleotides 979 - 984 replaced with TTTCCG
  • TTCCCC nucleotides 1033 - 1038 replaced with TTCCCA
  • GCCAAG nucleotides 247 - 252 replaced with GCGAAG.
  • the nucleotide sequence is optimized for expression in S.cer ⁇ visiae.
  • a Mn-dependent peroxidase- encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-364 of wild-type Mn-dependent peroxidase as set forth in SEQ ID NO: 74SEQ ID NO: 74, wherein at least 3 codon pairs of SEQ ID NO: 73 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the at least 3 codon pairs to be replaced are selected from the following: ATTGCC (nucleotides 289 - 294); CAGGCG (nucleotides 358 - 363); CAGGCG (nucleotides 850 - 855); CAGGCG (nucleotides 1012 - 1017); CTCTCC (nucleotides 991 - 996); ATCGCC (nucleotides 244
  • ATCGCC nucleotides 370 - 375
  • ATCGCC nucleotides 610 - 615.
  • at least 3, or 4, or 5, or 6 or more of the specified codon pairs have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • At least 3 of the following codon pair replacements have been made: ATTGCC (nucleotides 289 - 294) replaced with ATCGCT; CAGGCG (nucleotides 358 - 363) replaced with CAGGCT; CAGGCG (nucleotides 850 - 855) replaced with CAGGCT; CAGGCG (nucleotides 1012 - 1017) replaced with CAGGCT; CTCTCC (nucleotides 991 - 996) replaced with CTGTCT; ATCGCC (nucleotides 244 - 249) replaced with ATTGCG; ATCGCC (nucleotides 370 - 375) replaced with ATCGCT; ATCGCC (nucleotides 610 - 615) replaced with ATTGCT.
  • the nucleotide sequence is optimized for expression in E.coli.
  • a Mn-dependent peroxidase- encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-364 of wild-type Mn-dependent peroxidase as set forth in SEQ ID NO: 74SEQ ID NO: 74, wherein at least 2 codon pairs of SEQ ID NO: 73 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the at least 3 codon pairs to be replaced are as follows: TTCAAG (nucleotides 7 - 12 ); GACGAG (nucleotides 340 - 345 ); ACCAAG (nucleotides 532 - 537 ); GAGCTG (nucleotides 670
  • TCTCCC nucleotides 757 - 762
  • GTCAAC nucleotides 841 - 846
  • TTCAAG nucleotides 871 - 876 .
  • at least 2 of the specified codon pairs have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • TTCAAG nucleotides 7 - 12
  • GACGAG nucleotides 340 - 345
  • ACCAAG nucleotides 532 - 537
  • GAGCTG nucleotides 670 - 675
  • TCTCCC nucleotides 757 - 762
  • GTCAAC nucleotides 841 - 846
  • GTTAAT TTCAAG
  • TTCAAG nucleotides 871 - 876
  • a Mn-dependent peroxidase- encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-364 of wild-type Mn-dependent peroxidase as set forth in SEQ ID NO: 74SEQ ID NO: 74, wherein at least 2 codon pairs of SEQ ID NO: 73 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the at least 2 codon pairs to be replaced are as follows: TTCCCC (nucleotides 130 - 135 ); GCCAAG (nucleotides 247 - 252 ); TTCCCC (nucleotides 721 - 726 ); TTCCCC (nucleotides 979 - 984 ); TTCCCC (nucleotides 1033 - 1038 ).In some such nucleotide sequences, at least 2 of the specified codon pairs have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • TTCCCC nucleotides 130 - 135 replaced with TTTCCA
  • GCCAAG nucleotides 247 - 252 replaced with GCTAAA
  • TTCCCC nucleotides 721 - 726 replaced with TTTCCA
  • TTCCCC nucleotides 979 - 984 replaced with TTTCCA
  • TTCCCC nucleotides 1033 - 1038 replaced with TTCCCT.
  • the nucleotide sequence is optimized for expression in K. lactis.
  • a Mn-dependent peroxidase- encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-364 of wild-type Mn-dependent peroxidase as set forth in SEQ ID NO: 74SEQ ID NO: 74, wherein at least 2 codon pairs of SEQ ID NO: 73 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the at least 2 codon pairs to be replaced are as follows: GCCAAG (nucleotides 247 - 252 ); GCCGGT (nucleotides 412 - 417 ); ATCGGT (nucleotides 421 - 426 ); GATGCC (nucleotides 556 - 561 ); GGAACG (nucleotides 646 - 651 ); CCCGGA (nucleotides 1054 - 1059 ). In some such nucleotide sequences, at least 2 of the specified codon pairs have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • GCCAAG nucleotides 247 - 252
  • GCCGGT nucleotides 412 - 417
  • ATCGGT nucleotides 421 - 426
  • ATAGGT nucleotides 421 - 426
  • GATGCC nucleotides 556 - 561
  • GATGCT nucleotides 556 - 561
  • GGAACG nucleotides 646 - 651
  • the nucleotide sequence is optimized for expression in Z mobilis.
  • Mn-dependent peroxidase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-364 of wild-type Mn-dependent peroxidase as set forth in SEQ ID NO: 74SEQ ID NO: 74 and is adapted for expression in a heterologous host organism, and wherein at least three codon pairs of the wild-type sequence that are predicted to cause a translational pause in the host organism have been replaced with codon pairs that are predicted to be less likely to cause a translational pause therein.
  • the codon pairs of the wild-type sequence that are predicted to cause a translational pause in the host organism are highly-overrepresented codon pairs therein and have been replaced with codon pairs that are not highly-overrepresented therein, wherein a highly-overrepresented codon pair is a codon pair that has a translational kinetics value greater than 5, or 3, or 2.5, or 2 times the standard deviation of translational kinetics values for the host organism.
  • the host organism is not human, E. coli or S.cerevisiae.
  • Mn-dependent peroxidase-encoding nucleotide sequence having at least a 75% amino acid sequence identity with amino acids 1-364 of wild-type Mn-dependent peroxidase as set forth in SEQ ID NO: 74SEQ ID NO: 74 and is adapted for expression in a heterologous host organism, wherein at least three codon pairs of the wild-type sequence that are predicted to cause a translational pause in the host organism have been replaced with codon pairs that are predicted to be less likely to cause a translational pause therein, and wherein the host organism is selected from the following: Pichia pasto ⁇ s; Oryctolagus cuniculus (rabbit); Macaca fascicularis (Long- tailed monkey); M.
  • an expression system comprising an expression vector in a host organism, wherein the expression vector includes the nucleotide sequence of the embodiments provided herein, operably linked to an expression control sequence.
  • a system for metabolizing lignin comprising one or more host organisms that collectively include nucleotide sequences operably encoding the following enzymes: laccase, Mn-dependent peroxidase, and lignin peroxidase; wherein the enzymes are heterologous to the one or more host organisms, and wherein transcriptional kinetics of each of the nucleotide sequences encoding the enzymes has been modified to replace at least three codon pairs present in the original sequence for each enzyme, wherein the at least three replaced codon pairs are predicted to cause a translational pause in the host organism, and wherein said modification results in silent permutation or conservative amino acid substitution of said at least three codon pairs.
  • the one or more host oganisms are selected from the group consisting of: Saccharomyces cerevisiae, Pichia pastoris, Escherichia coli, Bombyx mori, Spodoptera frugiperda, Drosophila melanogaster and Schizosaccharomyces pombe.
  • each encoded enzyme has at least a 75% amino acid sequence identity with the original sequence of the enzyme.
  • the Mn-dependent peroxidase retains at least 75% of the enzymatic activity of wild-type MnP (SEQ ID NO: 74) under normal physiological conditions.
  • a Mn-dependent peroxidase- encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-364 of wild-type Mn-dependent peroxidase as set forth in SEQ ID NO: 74SEQ ID NO: 74 and is adapted for expression in a heterologous host organism, wherein at least 1, 2 or 3 codon pairs present in SEQ ID NO: 73 and which encode amino acids 45-284 of SEQ ID NO: 74SEQ ID NO: 74 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof, and wherein at least one replacement codon pair is predicted to be less likely to cause a translational pause in the heterologous host organism relative to the respective wild type codon pair when expressed in the heterologous host organism.
  • the z score of at least one replacement codon pair when expressed in the heterologous host organism is no more than 150% of the z score for the wild type codon pair when expressed in the native organism.
  • no replacement codon encoding amino acids 45-284 of SEQ ID NO: 74SEQ ID NO: 74 has a z score for expression in the heterologous host that is more than 400%, or 300%, or 200%, or 150% or 100% of the mean or median of the five highest z scores of the wild type codon pairs encoding amino acids 45-284 when expressed in the native organism.
  • a Mn-dependent peroxidase- encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-364 of wild-type Mn-dependent peroxidase as set forth in SEQ ID NO: 74SEQ ID NO: 74 and is adapted for expression in a heterologous host organism, wherein at least 1, 2 or 3 codon pairs present in SEQ ID NO: 73 and which encode amino acids 45-284 of SEQ ID NO: 74SEQ ID NO: 74 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof, and wherein at least one replacement codon pair is predicted to be less likely to cause a translational pause in the heterologous host organism relative to the respective wild type codon pair when expressed in the heterologous host organism.
  • the z score of at least one replacement codon pair when expressed in the heterologous host organism is no more than 150% of the z score for the wild type codon pair when expressed in the native organism.
  • no replacement codon encoding amino acids 45-284 of SEQ ID NO: 74 has a z score for expression in the heterologous host that is more than 400%, or 300%, or 200%, or 150% or 100% of the mean or median of the five highest z scores of the wild type codon pairs encoding amino acids 45-284 when expressed in the native organism.
  • a Mn-dependent peroxidase- encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-364 of wild-type Mn-dependent peroxidase as set forth in SEQ ID NO: 74 and is adapted for expression in a heterologous host organism, wherein at least 1, 2 or 3 codon pairs present in SEQ ID NO: 73 and which encode amino acids 45-284 of SEQ ID NO: 74 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof, and wherein at least one replacement codon pair is predicted to be less likely to cause a translational pause in the heterologous host organism relative to the respective wild type codon pair when expressed in the heterologous host organism.
  • the z score of at least one replacement codon pair when expressed in the heterologous host organism is no more than 150% of the z score for the wild type codon pair when expressed in the native organism.
  • no replacement codon encoding amino acids 45-284 of SEQ ID NO: 74 has a z score for expression in the heterologous host that is more than 400%, or 300%, or 200%, or 150% or 100% of the mean or median of the five highest z scores of the wild type codon pairs encoding amino acids 45-284 when expressed in the native organism.
  • a Mn-dependent peroxidase- encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-364 of wild-type Mn-dependent peroxidase as set forth in SEQ ID NO: 74 and is adapted for expression in a heterologous host organism, wherein at least 1, 2 or 3 codon pairs present in SEQ ID NO: 73 and which encode amino acids 1-45 of SEQ ID NO: 74 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof, and wherein at least one replacement codon pair is predicted to be equally or more likely to cause a translational pause in the heterologous host organism relative to the respective wild type codon pair when expressed in the heterologous host organism.
  • the z score of at least one replacement codon pair when expressed in the heterologous host organism is at least 75% of the z score for the wild type codon pair when expressed in the native organism.
  • at least one replacement codon encoding amino acids 1-45 of SEQ ID NO: 74 has a z score for expression in the heterologous that is more than 200%, or 100%, or 75%, or 50% or 40% of the mean or median of the five highest z scores of the wild type codon pairs encoding amino acids 1- 45 when expressed in the native organism.
  • a Mn-dependent peroxidase- encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-364 of wild-type Mn-dependent peroxidase as set forth in SEQ ID NO: 74 and is adapted for expression in a heterologous host organism, wherein at least 1, 2 or 3 codon pairs present in SEQ ID NO: 73 and which encode amino acids 1-45 of SEQ ID NO: 74 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof, and wherein at least one replacement codon pair is predicted to be equally or more likely to cause a translational pause in the heterologous host organism relative to the respective wild type codon pair when expressed in the heterologous host organism.
  • the z score of at least one replacement codon pair when expressed in the heterologous host organism is at least 75% of the z score for the wild type codon pair when expressed in the native organism.
  • at least one replacement codon encoding amino acids 1-45 of SEQ ID NO: 74 has a z score for expression in the heterologous that is more than 200%, or 100%, or 75%, or 50% or 40% of the mean or median of the five highest z scores of the wild type codon pairs encoding amino acids 1 - 45 when expressed in the native organism.
  • a Mn-dependent peroxidase- encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1 -364 of wild-type Mn-dependent peroxidase as set forth in SEQ ID NO: 74 and is adapted for expression in a heterologous host organism, wherein at least 1, 2 or 3 codon pairs present in SEQ ID NO: 73 and which encode amino acids 1-45 of SEQ ID NO: 74 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof, and wherein at least one replacement codon pair is predicted to be equally or more likely to cause a translational pause in the heterologous host organism relative to the respective wild type codon pair when expressed in the heterologous host organism.
  • the z score of at least one replacement codon pair when expressed in the heterologous host organism is at least 75% of the z score for the wild type codon pair when expressed in the native organism.
  • at least one replacement codon encoding amino acids 1-45 of SEQ ID NO: 74 has a z score for expression in the heterologous that is more than 200%, or 100%, or 75%, or 50% or 40% of the mean or median of the five highest z scores of the wild type codon pairs encoding amino acids 1 - 45 when expressed in the native organism.
  • a laccase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-619 of wild-type laccase as set forth in SEQ ID NO: 98, wherein at least 3 codon pairs of SEQ ID NO: 97have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the at least 3 codon pairs to be replaced are selected from the following: GGGTTC (nucleotides 1246 - 1251); GCAAGA (nucleotides 1834 - 1839); TTGAAC (nucleotides 1540 - 1545); TCTCCA (nucleotides 193 - 198); GACCGT (nucleotides 694 - 699); TTCCCC (nucleotides 1795 - 1800); GCCAAG (nucleotides 763 - 768); GCCAAG (nucleotides 1585 - 1590).
  • At least 3, or 4, or 5, or 6 or more of the specified codon pairs have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • at least 3 of the following codon pair replacements have been made: GGGTTC (nucleotides 1246 - 1251) replaced with GGTTTT; GCAAGA (nucleotides 1834 - 1839) replaced with GCTAGA; TTGAAC (nucleotides 1540 - 1545) replaced with TTAAAT; TCTCCA (nucleotides 193 - 198) replaced with TCACCA; GACCGT (nucleotides 694 - 699) replaced with GATAGA; TTCCCC (nucleotides 1795 - 1800) replaced with TTTCCA; GCCAAG (nucleotides 763
  • nucleotide sequence is optimized for expression in S.cerevisiae.
  • a laccase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-619 of wild-type laccase as set forth in SEQ ID NO: 98, wherein at least 3 codon pairs of SEQ ID NO: 97have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the at least 3 codon pairs to be replaced are selected from the following: CTGGTG (nucleotides 877 - 882); CTCGAC (nucleotides 1240 - 1245); ATCCTC (nucleotides 1462 - 1467); CTCGGC (nucleotides 652 - 657); CTCGGC (nucleotides 952
  • GTCTGG nucleotides 1252 - 1257
  • GACAGC nucleotides 940 - 945
  • AGCCAG nucleotides 1495 - 1500
  • TTCCCG nucleotides 661 - 666
  • ATTGCC nucleotides 16 - 21
  • ATTGCC nucleotides 1651 - 1656
  • CTCGGT nucleotides 58 - 63
  • CTCGGT nucleotides 1465 - 1470
  • GCCTGG nucleotides 1654 - 1659
  • TCGCTG nucleotides 874 - 879
  • GTGATG nucleotides 1312 - 1317
  • TTCCGC nucleotides 1609 - 1614
  • At least 3, or 4, or 5, or 6 or more of the specified codon pairs have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • at least 3 of the following codon pair replacements have been made: CTGGTG (nucleotides 877 - 882) replaced with CTGGTT; CTCGAC (nucleotides 1240 - 1245) replaced with CTGGAC; ATCCTC (nucleotides 1462 - 1467) replaced with ATCCTG; CTCGGC (nucleotides 652 - 657) replaced with CTGGGT ;CTCGGC (nucleotides 952 - 957) replaced with CTGGGT; GTCTGG (nucleotides 1252
  • a laccase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-619 of wild-type laccase as set forth in SEQ ID NO: 98, wherein at least 3 codon pairs of SEQ ID NO: 97have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the at least 3 codon pairs to be replaced are selected from the following: AAACTG (nucleotides 403 - 408); TTCAAC (nucleotides 202 - 207); TTCAAC (nucleotides 751 - 756); ATCAAC (nucleotides 208 - 213); ATCAAC (nucleotides 397 - 402); ATCAAC (nucleotides 616 - 621); ATCAAC (nucleotides 841 - 846); ATCAAC (nucleotides 1276 - 1281); ATCAAC (nucleotides 1282 - 1287); GTCAAG (nucleotides 1828 - 1833); GGGTTC (nucleotides 1246 - 1251); TTGAAC (nucleotides 1540 - 1545); TTTGAC (nucleotides 1513 - 1518).
  • At least 3, or 4, or 5, or 6 or more of the specified codon pairs have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • at least 3 of the following codon pair replacements have been made: AAACTG (nucleotides 403 - 408) replaced with AAATTA; TTCAAC (nucleotides 202 - 207) replaced with TTTAAC; TTCAAC (nucleotides 751 - 756) replaced with TTTAAT; ATCAAC (nucleotides 208 - 213) replaced with ATTAAT; ATCAAC (nucleotides 397 - 402) replaced with ATTAAT; ATCAAC (nucleotides 616 - 621) replaced with ATTAAC; ATCAAC (nucleotides 841 - 846) replaced with ATTAAT; ATCAAC (nucleotides 1276 - 1281) replaced with ATTA
  • a laccase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-619 of wild-type laccase as set forth in SEQ ID NO: 98, wherein at least 3 codon pairs of SEQ ID NO: 97have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the at least 3 codon pairs to be replaced are selected from the following: GACCGT (nucleotides 694 - 699 ); GCCAAG (nucleotides 763 - 768 ); AAGAAG (nucleotides 820 - 825 ); TTCCAA (nucleotides 865 - 870 ); GGTACC (nucleotides 1048
  • GGGTTC nucleotides 1246 - 1251
  • GTGTTT nucleotides 1510 - 1515
  • TTGAAC nucleotides 1540 - 1545
  • GCCAAG nucleotides 1585 - 1590
  • AAGAAG nucleotides 1735 - 1740
  • TTCCCC nucleotides 1795 - 1800 .
  • at least 3, or 4, or 5, or 6 or more of the specified codon pairs have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • At least 3 of the following codon pair replacements have been made: AAACTG (nucleotides 403 - 408) replaced with AAATTA; TTCAAC (nucleotides 202 - 207) replaced with GACCGT (nucleotides 694 - 699 ) replaced with GACAGA; GCCAAG (nucleotides 763 - 768 ) replaced with GCTAAA; AAGAAG (nucleotides 820 - 825 ) replaced with AAAAAG; TTCCAA (nucleotides 865 - 870 ) replaced with TTTCAG; GGTACC (nucleotides 1048
  • nucleotide sequence is optimized for expression in K. lactis.
  • a laccase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-619 of wild-type laccase as set forth in SEQ ID NO: 98, wherein at least 3 codon pairs of SEQ ID NO: 97have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the at least 3 codon pairs to be replaced are selected from the following: GCCAAG (nucleotides 763 - 768 ); GACAGC (nucleotides 940 - 945 ); AACAGC (nucleotides 1198 - 1203 ); GCCTTT (nucleotides 1414 - 1419 ); GCCAAG (nucleotides 1585 - 1590 ); GCCTTT (nucleotides 1741 - 1746 ). In some such nucleotide sequences, at least 3, or 4, or 5, or 6 or more of the specified codon pairs have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • GCCAAG nucleotides 763 - 768
  • GACAGC nucleotides 940 - 945
  • AACAGC nucleotides 1 198 - 1203
  • GCCTTT nucleotides 1414 - 1419
  • GCCAAG nucleotides 1585 - 1590
  • GCCTTT nucleotides 1741 - 1746
  • the nucleotide sequence is optimized for expression in Z.mobilis.
  • a laccase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-619 of wild-type laccase as set forth in SEQ ID NO: 98 and is adapted for expression in a heterologous host organism, and wherein at least three codon pairs of the wild-type sequence that are predicted to cause a translational pause in the host organism have been replaced with codon pairs that are predicted to be less likely to cause a translational pause therein.
  • the codon pairs of the wild-type sequence that are predicted to cause a translational pause in the host organism are highly- overrepresented codon pairs therein and have been replaced with codon pairs that are not highly-overrepresented therein, wherein a highly-overrepresented codon pair is a codon pair that has a translational kinetics value greater than 5, or 3, or 2.5, or 2 times the standard deviation of translational kinetics values for the host organism.
  • the host organism is not human, E. coli or S.cerevisiae.
  • a laccase-encoding nucleotide sequence having at least a 75% amino acid sequence identity with amino acids 1-619 of wild-type laccase as set forth in SEQ ID NO: 98 and is adapted for expression in a heterologous host organism, wherein at least three codon pairs of the wild-type sequence that are predicted to cause a translational pause in the host organism have been replaced with codon pairs that are predicted to be less likely to cause a translational pause therein, and wherein the host organism is selected from the following: Pichia pastoris; Oryctolagus cuniculus (rabbit); Macaca fascicularis (Long-tailed monkey); M. mulatto (Monkey); E.
  • an expression system comprising an expression vector in a host organism, wherein the expression vector includes the nucleotide sequence of the embodiments provided herein, operably linked to an expression control sequence.
  • a system for metabolizing lignin comprising one or more host organisms that collectively include nucleotide sequences operably encoding the following enzymes: laccase, Mn-dependent peroxidase, and lignin peroxidase; wherein the enzymes are heterologous to the one or more host organisms, and wherein transcriptional kinetics of each of the nucleotide sequences encoding the enzymes has been modified to replace at least three codon pairs present in the original sequence for each enzyme, wherein the at least three replaced codon pairs are predicted to cause a translational pause in the host organism, and wherein said modification results in silent permutation or conservative amino acid substitution of said at least three codon pairs.
  • the one or more host oganisms are selected from the group consisting of: Saccharomyces cerevisiae, Pichia pastoris, Escherichia coli, Bombyx mori, Spodoptera frugiperda, Drosophila melanogaster Kluyveromyces lactis, Zymomonas mobilis and Schizo saccharomyces pombe.
  • each encoded enzyme has at least a 75% amino acid sequence identity with the original sequence of the enzyme.
  • the laccase retains at least 75% of the enzymatic activity of wild-type LCC (SEQ ID NO: 98) under normal physiological conditions.
  • a laccase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-619 of wild-type laccase as set forth in SEQ ID NO: 98 and is adapted for expression in a heterologous host organism, wherein at least 1, 2 or 3 codon pairs present in SEQ ID NO: 97 and which encode amino acids 90-212 of SEQ ID NO: 98 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof, and wherein at least one replacement codon pair is predicted to be less likely to cause a translational pause in the heterologous host organism relative to the respective wild type codon pair when expressed in the heterologous host organism.
  • the z score of at least one replacement codon pair when expressed in the heterologous host organism is no more than 150% of the z score for the wild type codon pair when expressed in the native organism.
  • no replacement codon encoding amino acids 90-212 of SEQ ID NO: 98 has a z score for expression in the heterologous host that is more than 400%, or 300%, or 200%, or 150% or 100% of the mean or median of the five highest z scores of the wild type codon pairs encoding amino acids 90-212 when expressed in the native organism.
  • no replacement codon encoding amino acids 90-212 of SEQ ID NO: 98 has a z score for expression in the heterologous host that is more than 400%, or 300%, or 200%, or 150% or 100% of the wild type codon pair GTCAAC when expressed in the native organism.
  • a laccase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-619 of wild-type laccase as set forth in SEQ ID NO: 98 and is adapted for expression in a heterologous host organism, wherein at least 1, 2 or 3 codon pairs present in SEQ ID NO: 97 and which encode amino acids 216-367 of SEQ ID NO: 98 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof, and wherein at least one replacement codon pair is predicted to be less likely to cause a translational pause in the heterologous host organism relative to the respective wild type codon pair when expressed in the heterologous host organism.
  • the z score of at least one replacement codon pair when expressed in the heterologous host organism is no more than 150% of the z score for the wild type codon pair when expressed in the native organism.
  • no replacement codon encoding amino acids 216-367 of SEQ ID NO: 98 has a z score for expression in the heterologous host that is more than 400%, or 300%, or 200%, or 150% or 100% of the mean or median of the five highest z scores of the wild type codon pairs encoding amino acids 216-367 when expressed in the native organism.
  • no replacement codon encoding amino acids 216-367 of SEQ ID NO: 98 has a z score for expression in the heterologous host that is more than 400%, or 300%, or 200%, or 150% or 100% of the wild type codon pair GCCGAC when expressed in the native organism.
  • a laccase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-619 of wild-type laccase as set forth in SEQ ID NO: 98 and is adapted for expression in a heterologous host organism, wherein at least 1, 2 or 3 codon pairs present in SEQ ID NO: 97 and which encode amino acids 426-570 of SEQ ID NO: 98 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof, and wherein at least one replacement codon pair is predicted to be less likely to cause a translational pause in the heterologous host organism relative to the respective wild type codon pair when expressed in the heterologous host organism.
  • the z score of at least one replacement codon pair when expressed in the heterologous host organism is no more than 150% of the z score for the wild type codon pair when expressed in the native organism.
  • no replacement codon encoding amino acids 426-570 of SEQ ID NO: 98 has a z score for expression in the heterologous host that is more than 400%, or 300%, or 200%, or 150% or 100% of the mean or median of the five highest z scores of the wild type codon pairs encoding amino acids 426-570 when expressed in the native organism.
  • no replacement codon encoding amino acids 426-570 of SEQ ID NO: 98 has a z score for expression in the heterologous host that is more than 400%, or 300%, or 200%, or 150% or 100% of the wild type codon pair TTCCGC when expressed in the native organism.
  • a laccase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-619 of wild-type laccase as set forth in SEQ ID NO: 98 and is adapted for expression in a heterologous host organism, wherein at least 1, 2 or 3 codon pairs present in SEQ ID NO: 97 and which encode amino acids 1-90 of SEQ ID NO: 98 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof, and wherein at least one replacement codon pair is predicted to be equally or more likely to cause a translational pause in the heterologous host organism relative to the respective wild type codon pair when expressed in the heterologous host organism.
  • the z score of at least one replacement codon pair when expressed in the heterologous host organism is at least 75% of the z score for the wild type codon pair when expressed in the native organism.
  • at least one replacement codon encoding amino acids 1-90 of SEQ ID NO: 98 has a z score for expression in the heterologous that is more than 200%, or 100%, or 75%, or 50% or 40% of the mean or median of the five highest z scores of the wild type codon pairs encoding amino acids 1-90 when expressed in the native organism, hi certain aspects, at least one replacement codon encoding amino acids 1-90 of SEQ ID NO: 98 has a z score for expression in the heterologous host that is more than 200%, or 100%, or 75%, or 50% or 40% of the wild type codon pair GGTGGT when expressed in the native organism.
  • a laccase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-619 of wild-type laccase as set forth in SEQ ID NO: 98 and is adapted for expression in a heterologous host organism, wherein at least 1, 2 or 3 codon pairs present in SEQ ID NO: 97 and which encode amino acids 212-216 of SEQ ID NO: 98 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof, and wherein at least one replacement codon pair is predicted to be equally or more likely to cause a translational pause in the heterologous host organism relative to the respective wild type codon pair when expressed in the heterologous host organism.
  • the z score of at least one replacement codon pair when expressed in the heterologous host organism is at least 75% of the z score for the wild type codon pair when expressed in the native organism.
  • at least one replacement codon encoding amino acids 212-216 of SEQ ID NO: 98 has a z score for expression in the heterologous that is more than 200%, or 100%, or 75%, or 50% or 40% of the mean or median of the five highest z scores of the wild type codon pairs encoding amino acids 212-216 when expressed in the native organism.
  • At least one replacement codon encoding amino acids 212-216 of SEQ ID NO: 98 has a z score for expression in the heterologous host that is more than 200%, or 100%, or 75%, or 50% or 40% of the wild type codon pair GCCAAC when expressed in the native organism.
  • a laccase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-619 of wild-type laccase as set forth in SEQ ID NO: 98 and is adapted for expression in a heterologous host organism, wherein at least 1, 2 or 3 codon pairs present in SEQ ID NO: 97 and which encode amino acids 367-426 of SEQ ID NO: 98 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof, and wherein at least one replacement codon pair is predicted to be equally or more likely to cause a translational pause in the heterologous host organism relative to the respective wild type codon pair when expressed in the heterologous host organism.
  • the z score of at least one replacement codon pair when expressed in the heterologous host organism is at least 75% of the z score for the wild type codon pair when expressed in the native organism.
  • at least one replacement codon encoding amino acids 367-426 of SEQ ID NO: 98 has a z score for expression in the heterologous that is more than 200%, or 100%, or 75%, or 50% or 40% of the mean or median of the five highest z scores of the wild type codon pairs encoding amino acids 367-426 when expressed in the native organism.
  • At least one replacement codon encoding amino acids 367-426 of SEQ ID NO: 98 has a z score for expression in the heterologous host that is more than 200%, or 100%, or 75%, or 50% or 40% of the wild type codon pair CTCGAC when expressed in the native organism.
  • a laccase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-518 of wild-type laccase as set forth in SEQ ID NO: 122, wherein at least 3 codon pairs of SEQ ID NO: 121 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the at least 3 codon pairs to be replaced are selected from the following: TTGAAA (nucleotides 235 - 240); CTTTCT (nucleotides 670 - 675); TTTGCC (nucleotides 778 - 783); TTCCCC (nucleotides 1240 - 1245); ATCAAG (nucleotides 625 - 630); GCCAAG (nucleotides 529 - 534).
  • TTGAAA nucleotides 235 - 240
  • CTTTCT nucleotides 670 - 675
  • TTTGCC nodeoxyribon
  • TTCCCC TTCCCC
  • ATCAAG nucleotides 625 - 630
  • GCCAAG nucleotides 529 - 534
  • TTGAAA nucleotides 235 - 240
  • CTTTCT nucleotides 670 - 675
  • TTGTCT TTGTCT
  • TTTGCC nucleotides 778 - 783
  • TTCCCC nucleotides 1240 - 1245
  • ATCAAG nucleotides 625 - 630
  • ATTAAA nucleotides 529 - 534 replaced with GCTAAA.
  • the nucleotide sequence is optimized for expression in S.cerevisiae.
  • a laccase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-518 of wild-type laccase as set forth in SEQ ID NO: 122, wherein at least 3 codon pairs of SEQ ID NO: 121 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the at least 3 codon pairs to be replaced are selected from the following: TTCCTC (nucleotides 1405 - 1410); CTCGAC (nucleotides 1432 - 1437); CTTTCT (nucleotides 670 - 675); TTTGCC (nucleotides 778 - 783); ATCCTC (nucleotides 1126 - 1131); ACGCTG (nucleotides 502 - 507); TTCCAG (nucleotides 10 - 15); TTCCAG (nucleotides 193 - 198); TTCCAG (nucleotides 268 - 273); GTGGTG (nucleotides 139 - 144); GTCAGC (nucleotides 106 - 1 1 1); GTCAGC (nucleotides 1339 - 1344); AGCCAG (nucleotides 814 - 819); GCCGGG (nucleotides 1405
  • TTCCTC nucleotides 1405 - 1410 replaced with TTCCTG
  • CTCGAC nucleotides 1432 - 1437) replaced with CTGGAT
  • CTTTCT nucleotides 670 - 675 replaced with CTGTCT
  • TTTGCC nucleotides 778 - 783 replaced with TTCGCT
  • ATCCTC nucleotides 1 126 - 1 131) replaced with ATTCTG
  • ACGCTG nucleotides 502 - 507 replaced with ACCCTC
  • TTCCAG nucleotides 10 - 15 replaced with TTTCAG
  • TTCCAG nucleotides 193 - 198
  • a laccase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-518 of wild-type laccase as set forth in SEQ ID NO: 122, wherein at least 3 codon pairs of SEQ ID NO: 121 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the at least 3 codon pairs to be replaced are selected from the following: ATCAAG (nucleotides 625 - 630); TTTGCC (nucleotides 778 - 783); TTGAAA (nucleotides 235 - 240); TTCAAC (nucleotides 1051 - 1056); TTCAAC (nucleotides 1057 - 1062); ATCAAC (nucleotides 739 - 744); ATCAAC (nucleotides 1078 - 1083); GGTATC (nucleotides 148 - 153).
  • At least 3, or 4, or 5, or 6 or more of the specified codon pairs have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • at least 3 of the following codon pair replacements have been made: ATCAAG (nucleotides 625 - 630) replaced with ATTAAA; TTTGCC (nucleotides 778 - 783) replaced with TTTGCA; TTGAAA (nucleotides 235 - 240) replaced with TTAAAA; TTCAAC (nucleotides 1051 - 1056) replaced with TTTAAT; TTCAAC (nucleotides 1057 - 1062) replaced with TTTAAC; ATCAAC (nucleotides 739 - 744) replaced with ATTAAT; ATCAAC (nucleotides 1078 - 1083) replaced with ATTAAT; GGTATC (nucleotides 625 - 630) replaced with ATTAAA; TTTGCC (
  • a laccase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-518 of wild-type laccase as set forth in SEQ ID NO: 122, wherein at least 3 codon pairs of SEQ ID NO: 121 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the at least 3 codon pairs to be replaced are selected from the following: GGTATC (nucleotides 148 - 153 ); TTGAAA (nucleotides 235 - 240 ); GCCAAG (nucleotides 529 - 534 ); TTCCCA (nucleotides 547 - 552 ); CTTTCT (nucleotides 670 - 675 ); TTTGCC (nucleotides 778 - 783 ); TTTGCT (nucleotides 871 - 876 ); TTTGTC (nucleotides 1093 - 1098 ); TTCCCC (nucleotides 1240 - 1245 ); TTTGCT (nucleotides 1444 - 1449 ).In some such nucleotide sequences, at least 3, or 4, or 5, or 6 or more of the specified codon pairs have been replaced with different codon pairs encoding identical amino acids or conservative amino
  • GGTATC nucleotides 148 - 153
  • TTGAAA nucleotides 235 - 240
  • GCCAAG nucleotides 529 - 534
  • TTCCCA nucleotides 547 - 552
  • CTTTCT nucleotides 670 - 675
  • TTTGCC nucleotides 778 - 783
  • TTCGCT nucleotides 871 - 876
  • a laccase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-518 of wild-type laccase as set forth in SEQ ID NO: 122, wherein at least 3 codon pairs of SEQ ID NO: 121 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the at least 3 codon pairs to be replaced are selected from the following: GGTATC (nucleotides 148 - 153 ); GCAGGG (nucleotides 370 - 375 ); GCCAAG (nucleotides 529 - 534 ); ATCAAT (nucleotides 574 - 579 ); GCACCG (nucleotides 604 - 609 ); TTGGCA (nucleotides 616 - 621 ); ATCAAT (nucleotides 883 - 888 ); GTGCCT (nucleotides 1000 - 1005 ); GCGGCT (nucleotides 1144 - 1 149 ); GCCAAT (nucleotides 1225 - 1230 ).
  • At least 3, or 4, or 5, or 6 or more of the specified codon pairs have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • at least 3 of the following codon pair replacements have been made: GGTATC (nucleotides 148 - 153 ) replaced with GGCATT; GCAGGG (nucleotides 370 - 375 ) replaced with GCTGGA; GCCAAG (nucleotides 529 - 534 ) replaced with GCTAAA; ATCAAT (nucleotides 574 - 579 ) replaced with ATTAAT; GCACCG (nucleotides 604 - 609 ) replaced with GCCCCA; TTGGCA (nucleotides 616 - 621 ) replaced with TTGGCT; ATCAAT (nucleotides 883 - 888 ) replaced with ATAAAT; GTGCCT
  • a laccase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-518 of wild-type laccase as set forth in SEQ ID NO: 122 and is adapted for expression in a heterologous host organism, and wherein at least three codon pairs of the wild-type sequence that are predicted to cause a translational pause in the host organism have been replaced with codon pairs that are predicted to be less likely to cause a translational pause therein.
  • the codon pairs of the wild-type sequence that are predicted to cause a translational pause in the host organism are highly- overrepresented codon pairs therein and have been replaced with codon pairs that are not highly-overrepresented therein, wherein a highly-overrepresented codon pair is a codon pair that has a translational kinetics value greater than 5, or 3, or 2.5, or 2 times the standard deviation of translational kinetics values for the host organism.
  • the host organism is not human, E. coli or S.cerevisiae.
  • a laccase-encoding nucleotide sequence having at least a 75% amino acid sequence identity with amino acids 1-518 of wild-type laccase as set forth in SEQ ID NO: 122 and is adapted for expression in a heterologous host organism, wherein at least three codon pairs of the wild-type sequence that are predicted to cause a translational pause in the host organism have been replaced with codon pairs that are predicted to be less likely to cause a translational pause therein, and wherein the host organism is selected from the following: Pichia pastoris; Oryctolagus cuniculus (rabbit); Macaca fascicularis (Long-tailed monkey); M. mulatto (Monkey); E.
  • an expression system comprising an expression vector in a host organism, wherein the expression vector includes the nucleotide sequence of the embodiments provided herein, operably linked to an expression control sequence.
  • a system for metabolizing lignin comprising one or more host organisms that collectively include nucleotide sequences operably encoding the following enzymes: laccase, Mn-dependent peroxidase, and lignin peroxidase; wherein the enzymes are heterologous to the one or more host organisms, and wherein transcriptional kinetics of each of the nucleotide sequences encoding the enzymes has been modified to replace at least three codon pairs present in the original sequence for each enzyme, wherein the at least three replaced codon pairs are predicted to cause a translational pause in the host organism, and wherein said modification results in silent permutation or conservative amino acid substitution of said at least three codon pairs.
  • the one or more host oganisms are selected from the group consisting of: Saccharomyces cerevisiae, Pichia pastoris, Escherichia coli, Bombyx mori, Spodoptera frugiperda, Drosophila melanogaster Kluyveromyces lactis, Zymomonas mobilis and Schizosaccharomyces pombe.
  • each encoded enzyme has at least a 75% amino acid sequence identity with the original sequence of the enzyme.
  • the laccase retains at least 75% of the enzymatic activity of wild-type LCC (SEQ ID NO: 122) under normal physiological conditions.
  • a laccase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-518 of wild-type laccase as set forth in SEQ ID NO: 122 and is adapted for expression in a heterologous host organism, wherein at least 1, 2 or 3 codon pairs present in SEQ ID NO: 121and which encode amino acids 29-153 of SEQ ID NO: 122 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof, and wherein at least one replacement codon pair is predicted to be less likely to cause a translational pause in the heterologous host organism relative to the respective wild type codon pair when expressed in the heterologous host organism.
  • the z score of at least one replacement codon pair when expressed in the heterologous host organism is no more than 150% of the z score for the wild type codon pair when expressed in the native organism.
  • no replacement codon encoding amino acids 29-153 of SEQ ID NO: 122 has a z score for expression in the heterologous host that is more than 400%, or 300%, or 200%, or 150% or 100% of the mean or median of the five highest z scores of the wild type codon pairs encoding amino acids 29-153 when expressed in the native organism.
  • a laccase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-518 of wild-type laccase as set forth in SEQ ID NO: 122 and is adapted for expression in a heterologous host organism, wherein at least 1 , 2 or 3 codon pairs present in SEQ ID NO: 121and which encode amino acids 162-306 of SEQ ID NO: 122 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof, and wherein at least one replacement codon pair is predicted to be less likely to cause a translational pause in the heterologous host organism relative to the respective wild type codon pair when expressed in the heterologous host organism.
  • the z score of at least one replacement codon pair when expressed in the heterologous host organism is no more than 150% of the z score for the wild type codon pair when expressed in the native organism.
  • no replacement codon encoding amino acids 162-306 of SEQ ID NO: 122 has a z score for expression in the heterologous host that is more than 400%, or 300%, or 200%, or 150% or 100% of the mean or median of the five highest z scores of the wild type codon pairs encoding amino acids 162-306 when expressed in the native organism.
  • a laccase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-518 of wild-type laccase as set forth in SEQ ID NO: 122 and is adapted for expression in a heterologous host organism, wherein at least 1 , 2 or 3 codon pairs present in SEQ ID NO: 121 and which encode amino acids 364-493 of SEQ ID NO: 122 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof, and wherein at least one replacement codon pair is predicted to be less likely to cause a translational pause in the heterologous host organism relative to the respective wild type codon pair when expressed in the heterologous host organism.
  • the z score of at least one replacement codon pair when expressed in the heterologous host organism is no more than 150% of the z score for the wild type codon pair when expressed in the native organism.
  • no replacement codon encoding amino acids 364-493 of SEQ ID NO: 122 has a z score for expression in the heterologous host that is more than 400%, or 300%, or 200%, or 150% or 100% of the mean or median of the five highest z scores of the wild type codon pairs encoding amino acids 364-493 when expressed in the native organism.
  • a laccase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-518 of wild-type laccase as set forth in SEQ ID NO: 122 and is adapted for expression in a heterologous host organism, wherein at least 1 , 2 or 3 codon pairs present in SEQ ID NO: 121 and which encode amino acids 1-30 of SEQ ID NO: 122 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof, and wherein at least one replacement codon pair is predicted to be equally or more likely to cause a translational pause in the heterologous host organism relative to the respective wild type codon pair when expressed in the heterologous host organism.
  • the z score of at least one replacement codon pair when expressed in the heterologous host organism is at least 75% of the z score for the wild type codon pair when expressed in the native organism.
  • at least one replacement codon encoding amino acids 1-30 of SEQ ID NO: 122 has a z score for expression in the heterologous that is more than 200%, or 100%, or 75%, or 50% or 40% of the mean or median of the five highest z scores of the wild type codon pairs encoding amino acids 1-30 when expressed in the native organism.
  • a laccase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-518 of wild-type laccase as set forth in SEQ ID NO: 122 and is adapted for expression in a heterologous host organism, wherein at least 1 , 2 or 3 codon pairs present in SEQ ID NO: 121and which encode amino acids 153-162 of SEQ ID NO: 122 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof, and wherein at least one replacement codon pair is predicted to be equally or more likely to cause a translational pause in the heterologous host organism relative to the respective wild type codon pair when expressed in the heterologous host organism.
  • the z score of at least one replacement codon pair when expressed in the heterologous host organism is at least 75% of the z score for the wild type codon pair when expressed in the native organism.
  • at least one replacement codon encoding amino acids 153-162 of SEQ ID NO: 122 has a z score for expression in the heterologous that is more than 200%, or 100%, or 75%, or 50% or 40% of the mean or median of the five highest z scores of the wild type codon pairs encoding amino acids 153-162 when expressed in the native organism.
  • a laccase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-518 of wild-type laccase as set forth in SEQ ID NO: 122 and is adapted for expression in a heterologous host organism, wherein at least 1 , 2 or 3 codon pairs present in SEQ ID NO: 121 and which encode amino acids 306-364 of SEQ ID NO: 122 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof, and wherein at least one replacement codon pair is predicted to be equally or more likely to cause a translational pause in the heterologous host organism relative to the respective wild type codon pair when expressed in the heterologous host organism.
  • the z score of at least one replacement codon pair when expressed in the heterologous host organism is at least 75% of the z score for the wild type codon pair when expressed in the native organism.
  • at least one replacement codon encoding amino acids 306-364 of SEQ ID NO: 122 has a z score for expression in the heterologous that is more than 200%, or 100%, or 75%, or 50% or 40% of the mean or median of the five highest z scores of the wild type codon pairs encoding amino acids 306-364 when expressed in the native organism.
  • a laccase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-518 of wild-type laccase as set forth in SEQ ID NO:146, wherein at least 3 codon pairs of SEQ ID NO: 145 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the at least 3 codon pairs to be replaced are selected from the following: CTTTCC (nucleotides 397 - 402); TTGAAG (nucleotides 235 - 240); GGGTTC (nucleotides 868 - 873); ATCAAA (nucleotides 625 - 630); ACTTTG (nucleotides 502 - 507); GACCGT (nucleotides 187 - 192); GGCCAA (nucleotides 148 - 153); AGCGAT (nucleotides 1546 - 1551).
  • At least 3, or 4, or 5, or 6 or more of the specified codon pairs have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • at least 3 of the following codon pair replacements have been made: CTTTCC (nucleotides 397 - 402) replaced with CTGTCT; TTGAAG (nucleotides 235 - 240) replaced with CTGAAA; GGGTTC (nucleotides 868 - 873) replaced with GGTTTC; ATCAAA (nucleotides 625 - 630) replaced with ATCAAA; ACTTTG (nucleotides 502 - 507) replaced with ACCCTG; GACCGT (nucleotides 187 - 192) replaced with GACCGT; GGCCAA (nucleotides 148 - 153) replaced with GGTCAA; AGCGAT (nucleotides 1546 - 1551) replaced
  • a laccase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-518 of wild-type laccase as set forth in SEQ ID NO: 146, wherein at least 3 codon pairs of SEQ ID NO: 145 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the at least 3 codon pairs to be replaced are selected from the following: GCCAGC (nucleotides 811 - 816); CTTTCC (nucleotides 397 - 402); TTCCTC (nucleotides 1405 - 1410); ATCCTC (nucleotides 895 - 900); TTCCAG (nucleotides 10 - 15); TTCCAG (nucleotides 193 - 198); TTCCAG (nucleotides 268 - 273); TTCCAG (nucleotides 1378 - 1383); CTCTCT (nucleotides 670 - 675); GTCAGC (nucleotides 106
  • GTCAGC nucleotides 1339 - 1344
  • AGCCAG nucleotides 814 - 819
  • TTCCCG nucleotides 547 - 552
  • ATTGCC nucleotides 169 - 174
  • GATCTC nucleotides 1549 - 1554
  • CTCGGT nucleotides 583 - 588
  • TTCCGC nucleotides 655
  • TTCCGC nucleotides 1327 - 1332
  • TTCTGG nucleotides 379 - 384
  • CTCTCC nucleotides 22 - 27.
  • at least 3, or 4, or 5, or 6 or more of the specified codon pairs have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • GCCAGC nucleotides 81 1 - 816) replaced with GCTTCT; CTTTCC (nucleotides 397 - 402) replaced with CTGTCT; TTCCTC (nucleotides 1405 - 1410) replaced with TTCCTG; ATCCTC (nucleotides 895 - 900) replaced with ATTCTG; TTCCAG (nucleotides 10 - 15) replaced with TTCCAA; TTCCAG (nucleotides 193 - 198) replaced with TTTCAG; TTCCAG (nucleotides 268 - 273) replaced with TTTCAG; TTCCAG (nucleotides 1378 - 1383) replaced with TTCCAA; CTCTCT (nucleotides 670 - 675) replaced with CTGTCT; GTCAGC (nucleotides 106 - 1 1 1
  • nucleotide sequence is optimized for expression in E.coli.
  • a laccase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-518 of wild-type laccase as set forth in SEQ ID NO: 146, wherein at least 3 codon pairs of SEQ ID NO: 145 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the at least 3 codon pairs to be replaced are selected from the following: AAACTG (nucleotides 532 - 537); TTCAAC (nucleotides 1051 - 1056); ATCAAC (nucleotides 307 - 312); TCAAC (nucleotides 1078 - 1083); TCAAA (nucleotides 625 - 630); GGCCGT (nucleotides 1006 - 1011); GGGTTC (nucleotides 868 - 873); GGCCAA (nucleotides 148 - 153); CTTTCC (nucleotides 397 - 402).
  • At least 3, or 4, or 5, or 6 or more of the specified codon pairs have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • at least 3 of the following codon pair replacements have been made: AAACTG (nucleotides 532 - 537) replaced with AAATTG; TTCAAC (nucleotides 1051 - 1056) replaced with TTTAAT; ATCAAC (nucleotides 307 - 312) replaced with ATTAAT; ATCAAC (nucleotides 1078
  • nucleotide sequence is optimized for expression in P. pastoris.
  • a laccase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-518 of wild-type laccase as set forth in SEQ ID NO: 146, wherein at least 3 codon pairs of SEQ ID NO: 145 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the at least 3 codon pairs to be replaced are selected from the following: GGCCAA (nucleotides 148 - 153 ); GACCGT (nucleotides 187 - 192 ); TTGAAG (nucleotides 235 - 240 ); CTTTCC (nucleotides 397 - 402 ); ATCAAA (nucleotides 625 - 630 ); GGGTTC (nucleotides 868 - 873 ); GGCCGT (nucleotides 1006 - 101 1 ); TTTGCT (nucleotides 1444 - 1449 ).
  • At least 3, or 4, or 5, or 6 or more of the specified codon pairs have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • at least 3 of the following codon pair replacements have been made: GGCCAA (nucleotides 148 - 153 ) replaced with GGTCAA; GACCGT (nucleotides 187 - 192 ) replaced with GATAGA; TTGAAG (nucleotides 235 - 240 ) replaced with TTAAAA; CTTTCC (nucleotides 397 - 402 ) replaced with TTGTCT; ATCAAA (nucleotides 625 - 630 ) replaced with ATTAAA; GGGTTC (nucleotides 868 - 873 ) replaced with GGTTTC; GGCCGT (nucleotides 1006
  • nucleotide sequence is optimized for expression in K. lactis.
  • a laccase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-518 of wild-type laccase as set forth in SEQ ID NO:146, wherein at least 3 codon pairs of SEQ ID NO: 145 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the at least 3 codon pairs to be replaced are selected from the following: AGCCGT (nucleotides 124 - 129 ); GCCGGT (nucleotides 172 - 177 ); GGCCCC (nucleotides 295 - 300 ); TCCGGT (nucleotides 328 - 333 ); GCAGGG (nucleotides 370
  • CACAGC nucleotides 388 - 393
  • CTCTAT nucleotides 469 - 474
  • ACTTTG nucleotides 502 - 507
  • ATCAAT nucleotides 574 - 579
  • GCGGCT nucleotides 607 - 612
  • GATGCC nucleotides 808 - 813
  • GCCAAT nucleotides 844 - 849
  • GCCGGT nucleotides 874 - 879
  • GTGCCT nucleotides 1000 - 1005
  • GCCAAT nucleotides 1225 - 1230
  • GATGCC nucleotides 1435 - 1440 .
  • At least 3, or 4, or 5, or 6 or more of the specified codon pairs have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • at least 3 of the following codon pair replacements have been made: AGCCGT (nucleotides 124 - 129 ) replaced with TCTCGT; GCCGGT (nucleotides 172 - 177 ) replaced with GCTGGT; GGCCCC (nucleotides 295 - 300 ) replaced with GGACCT; TCCGGT (nucleotides 328 - 333 ) replaced with TCTGGT; GCAGGG (nucleotides 370 - 375 ) replaced with GCTGGT; CACAGC (nucleotides 388 - 393 ) replaced with CATTCT; CTCTAT (nucleotides 469 - 474 ) replaced with TTGTAT; ACTTTG
  • a laccase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-518 of wild-type laccase as set forth in SEQ ID NO: 146 and is adapted for expression in a heterologous host organism, and wherein at least three codon pairs of the wild-type sequence that are predicted to cause a translational pause in the host organism have been replaced with codon pairs that are predicted to be less likely to cause a translational pause therein.
  • the codon pairs of the wild-type sequence that are predicted to cause a translational pause in the host organism are highly- overrepresented codon pairs therein and have been replaced with codon pairs that are not highly-overrepresented therein, wherein a highly-overrepresented codon pair is a codon pair that has a translational kinetics value greater than 5, or 3, or 2.5, or 2 times the standard deviation of translational kinetics values for the host organism.
  • the host organism is not human, E. coli or S.cerevisiae.
  • a laccase-encoding nucleotide sequence having at least a 75% amino acid sequence identity with amino acids 1-518 of wild-type laccase as set forth in SEQ ID NO: 146 and is adapted for expression in a heterologous host organism, wherein at least three codon pairs of the wild-type sequence that are predicted to cause a translational pause in the host organism have been replaced with codon pairs that are predicted to be less likely to cause a translational pause therein, and wherein the host organism is selected from the following: Pichia pastoris; Oryctolagus cuniculus (rabbit); Macaca fascicularis (Long-tailed monkey); M. mulatta (Monkey); E.
  • an expression system comprising an expression vector in a host organism, wherein the expression vector includes the nucleotide sequence of the embodiments provided herein, operably linked to an expression control sequence.
  • a system for metabolizing lignin comprising one or more host organisms that collectively include nucleotide sequences operably encoding the following enzymes: laccase, Mn-dependent peroxidase, and lignin peroxidase; wherein the enzymes are heterologous to the one or more host organisms, and wherein transcriptional kinetics of each of the nucleotide sequences encoding the enzymes has been modified to replace at least three codon pairs present in the original sequence for each enzyme, wherein the at least three replaced codon pairs are predicted to cause a translational pause in the host organism, and wherein said modification results in silent permutation or conservative amino acid substitution of said at least three codon pairs.
  • the one or more host oganisms are selected from the group consisting of: Saccharomyces cerevisiae, Pichia pastoris, Escherichia coli, Bombyx mori, Spodoptera frugiperda, Drosophila melanogaster Kluyveromyces lactis, Zymomonas mobilis and Schizosaccharomyces pombe.
  • each encoded enzyme has at least a 75% amino acid sequence identity with the original sequence of the enzyme.
  • the laccase retains at least 75% of the enzymatic activity of wild-type LCC (SEQ ID NO: 146) under normal physiological conditions.
  • a laccase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-518 of wild-type laccase as set forth in SEQ ID NO: 146 and is adapted for expression in a heterologous host organism, wherein at least 1 , 2 or 3 codon pairs present in SEQ ID NO: 145 and which encode amino acids 29-153 of SEQ ID NO: 146 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof, and wherein at least one replacement codon pair is predicted to be less likely to cause a translational pause in the heterologous host organism relative to the respective wild type codon pair when expressed in the heterologous host organism.
  • the z score of at least one replacement codon pair when expressed in the heterologous host organism is no more than 150% of the z score for the wild type codon pair when expressed in the native organism.
  • no replacement codon encoding amino acids 29-153 of SEQ ID NO: 146 has a z score for expression in the heterologous host that is more than 400%, or 300%, or 200%, or 150% or 100% of the mean or median of the five highest z scores of the wild type codon pairs encoding amino acids 29-153 when expressed in the native organism.
  • a laccase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-518 of wild-type laccase as set forth in SEQ ID NO:146 and is adapted for expression in a heterologous host organism, wherein at least 1 , 2 or 3 codon pairs present in SEQ ID NO: 145 and which encode amino acids 162-306 of SEQ ID NO: 146 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof, and wherein at least one replacement codon pair is predicted to be less likely to cause a translational pause in the heterologous host organism relative to the respective wild type codon pair when expressed in the heterologous host organism.
  • the z score of at least one replacement codon pair when expressed in the heterologous host organism is no more than 150% of the z score for the wild type codon pair when expressed in the native organism.
  • no replacement codon encoding amino acids 162-306 of SEQ ID NO: 146 has a z score for expression in the heterologous host that is more than 400%, or 300%, or 200%, or 150% or 100% of the mean or median of the five highest z scores of the wild type codon pairs encoding amino acids 162-306 when expressed in the native organism.
  • a laccase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-518 of wild-type laccase as set forth in SEQ ID NO: 146 and is adapted for expression in a heterologous host organism, wherein at least 1 , 2 or 3 codon pairs present in SEQ ID NO: 145 and which encode amino acids 364-493 of SEQ ID NO: 146 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof, and wherein at least one replacement codon pair is predicted to be less likely to cause a translational pause in the heterologous host organism relative to the respective wild type codon pair when expressed in the heterologous host organism.
  • the z score of at least one replacement codon pair when expressed in the heterologous host organism is no more than 150% of the z score for the wild type codon pair when expressed in the native organism.
  • no replacement codon encoding amino acids 364-493 of SEQ ID NO: 146 has a z score for expression in the heterologous host that is more than 400%, or 300%, or 200%, or 150% or 100% of the mean or median of the five highest z scores of the wild type codon pairs encoding amino acids 364-493 when expressed in the native organism.
  • a laccase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-518 of wild-type laccase as set forth in SEQ ID NO: 146 and is adapted for expression in a heterologous host organism, wherein at least 1 , 2 or 3 codon pairs present in SEQ ID NO:145 and which encode amino acids 1-29 of SEQ ID NO:146 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof, and wherein at least one replacement codon pair is predicted to be equally or more likely to cause a translational pause in the heterologous host organism relative to the respective wild type codon pair when expressed in the heterologous host organism.
  • the z score of at least one replacement codon pair when expressed in the heterologous host organism is at least 75% of the z score for the wild type codon pair when expressed in the native organism.
  • at least one replacement codon encoding amino acids 1 -29 of SEQ ID NO: 146 has a z score for expression in the heterologous that is more than 200%, or 100%, or 75%, or 50% or 40% of the mean or median of the five highest z scores of the wild type codon pairs encoding amino acids 1 -29 when expressed in the native organism.
  • a laccase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-518 of wild-type laccase as set forth in SEQ ID NO: 146 and is adapted for expression in a heterologous host organism, wherein at least 1 , 2 or 3 codon pairs present in SEQ ID NO: 145 and which encode amino acids 153-162 of SEQ ID NO: 146 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof, and wherein at least one replacement codon pair is predicted to be equally or more likely to cause a translational pause in the heterologous host organism relative to the respective wild type codon pair when expressed in the heterologous host organism.
  • the z score of at least one replacement codon pair when expressed in the heterologous host organism is at least 75% of the z score for the wild type codon pair when expressed in the native organism.
  • at least one replacement codon encoding amino acids 153-162 of SEQ ID NO: 146 has a z score for expression in the heterologous that is more than 200%, or 100%, or 75%, or 50% or 40% of the mean or median of the five highest z scores of the wild type codon pairs encoding amino acids 153-162 when expressed in the native organism.
  • a laccase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-518 of wild-type laccase as set forth in SEQ ID NO: 146 and is adapted for expression in a heterologous host organism, wherein at least 1 , 2 or 3 codon pairs present in SEQ ID NO: 145 and which encode amino acids 306-364 of SEQ ID NO: 146 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof, and wherein at least one replacement codon pair is predicted to be equally or more likely to cause a translational pause in the heterologous host organism relative to the respective wild type codon pair when expressed in the heterologous host organism.
  • the z score of at least one replacement codon pair when expressed in the heterologous host organism is at least 75% of the z score for the wild type codon pair when expressed in the native organism.
  • at least one replacement codon encoding amino acids 306-364 of SEQ ID NO: 146 has a z score for expression in the heterologous that is more than 200%, or 100%, or 75%, or 50% or 40% of the mean or median of the five highest z scores of the wild type codon pairs encoding amino acids 306-364 when expressed in the native organism.
  • a cellobiohydrolase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1 -497 of wild-type cellobiohydrolase as set forth in SEQ ID NO: 170, wherein at least 3 of the following codon pairs of SEQ ID NO: 169 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof: TTGAAC (nucleotides 421 - 426 ); GCCAAG (nucleotides 496 - 501 ); GATATC (nucleotides 643 - 648 ); AAGAAA (nucleotides 859 - 864 ); GCCAAG (nucleotides 1243 - 1248 ); ATCAAG (nucleotides 1264 - 1269 ); GGTATT (nucleotides 1411 - 1416 ).
  • At least 3, or 4, or 5, or 6 or more of the specified codon pairs have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • at least 3 of the following codon pair replacements have been made: TTGAAC (nucleotides 421 - 426 ) replaced with TTAAAT; GCCAAG (nucleotides 496 - 501 ) replaced with GCTAAA; GATATC (nucleotides 643 - 648 ) replaced with GACATT; AAGAAA (nucleotides 859 - 864 ) replaced with AAAAAG; GCCAAG (nucleotides 1243 - 1248 ) replaced with GCTAAG; ATCAAG (nucleotides 1264 - 1269 ) replaced with ATTAAA; GGTATT (nucleotides 141 1 - 1416 ) replaced with GGAATA.
  • the following codon pair replacements have been made: TTGAAC (nucleo
  • a cellobiohydrolase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1 -497 of wild-type cellobiohydrolase as set forth in SEQ ID NO: 170, wherein at least 3 of the following codon pairs of SEQ ID NO: 169 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof: CTCTCC (nucleotides 274 - 279 ); GACAGC (nucleotides 520 - 525 ); AGCCAG (nucleotides 523 - 528 ); GACTGG (nucleotides 787
  • TTCCAG nucleotides 934 - 939
  • GCCAGC nucleotides 1441 - 1446 .
  • at least 3, or 4, or 5, or 6 or more of the specified codon pairs have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • At least 3 of the following codon pair replacements have been made: CTCTCC (nucleotides 274 - 279 ) replaced with TTATCT; GACAGC (nucleotides 520 - 525 ) replaced with GATTCT; AGCCAG (nucleotides 523 - 528 ) replaced with TCTCAA; GACTGG (nucleotides 787 - 792 ) replaced with GATTGG; TTCCAG (nucleotides 934 - 939 ) replaced with TTCCAG; GCCAGC (nucleotides 1441 - 1446 ) replaced with GCTTCG.
  • the nucleotide sequence is optimized for expression in E. coli.
  • a cellobiohydrolase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1 -497 of wild-type cellobiohydrolase as set forth in SEQ ID NO: 170, wherein at least 3 of the following codon pairs of SEQ ID NO: 169 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof: TTGAAC (nucleotides 421 - 426 ); GATATC (nucleotides 643 - 648 ); AAGAAA (nucleotides 859 - 864 ); ATCAAC (nucleotides 901
  • TTCAAG nucleotides 1057 - 1062
  • ATCAAG nucleotides 1264 - 1269
  • GGTATT nucleotides 141 1 - 1416 .
  • at least 3, or 4, or 5, or 6 or more of the specified codon pairs have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • TTGAAC nucleotides 421 - 426
  • GATATC nucleotides 643 - 648
  • GACATT nucleotides 643 - 648
  • AAGAAA nucleotides 859 - 864
  • AAAAAG AAAAAG
  • ATCAAC nucleotides 901 - 906
  • TTCAAG nucleotides 1057 - 1062
  • ATCAAG nucleotides 1264 - 1269
  • GGTATT nucleotides 141 1 - 1416 replaced with GGAATT.
  • the nucleotide sequence is optimized for expression in P. pastoris.
  • a cellobiohydrolase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1 -497 of wild-type cellobiohydrolase as set forth in SEQ ID NO: 170, wherein at least 3 of the following codon pairs of SEQ ID NO: 169 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof: TTTGTC (nucleotides 286 - 291 ); TTGAAC (nucleotides 421 - 426 ); GCCAAG (nucleotides 496 - 501 ); GATATC (nucleotides 643 - 648 ); AAGAAA (nucleotides 859 - 864 ); AAGAAG (nucleotides 1060 - 1065 ); GCCAAG (nucleotides 1243 - 1248 ).
  • At least 3, or 4, or 5, or 6 or more of the specified codon pairs have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • at least 3 of the following codon pair replacements have been made: TTTGTC (nucleotides 286 - 291 ) replaced with TTCGTT; TTGAAC (nucleotides 421 - 426 ) replaced with TTAAAT; GCCAAG (nucleotides 496 - 501 ) replaced with GCTAAA; GATATC (nucleotides 643 - 648 ) replaced with GACATT; AAGAAA (nucleotides 859 - 864 ) replaced with AAAAAG; AAGAAG (nucleotides 1060 - 1065 ) replaced with AAAAAG; GCCAAG (nucleotides 1243 - 1248 ) replaced with GCTAAA.
  • TTTGTC nucleotides 286 - 291
  • a cellobiohydrolase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-497 of wild-type cellobiohydrolase as set forth in SEQ ID NO: 170, wherein at least 3 of the following codon pairs of SEQ ID NO: 169 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof: ACATGG (nucleotides 46 - 51 ); AACAGC (nucleotides 136 - 141 ); AACAGC (nucleotides 268 - 273 ); CTTTAC (nucleotides 325 - 330 ); GCCAAG (nucleotides 496 - 501 ); GACAGC (nucleotides 520 - 525 ); ATCAAT (nucleotides 550 - 555 ); CTCGAT (nucleotides 847 - 852
  • At least 3, or 4, or 5, or 6 or more of the specified codon pairs have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • at least 3 of the following codon pair replacements have been made: ACATGG (nucleotides 46 - 51 ) replaced with ACCTGG; AACAGC (nucleotides 136 - 141 ) replaced with AATAGT; AACAGC (nucleotides 268
  • nucleotide sequence is optimized for expression in Z.
  • a cellobiohydrolase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-497 of wild-type cellobiohydrolase as set forth in SEQ ID NO: 170 and is adapted for expression in a heterologous host organism, and wherein at least three codon pairs of the wild-type sequence that are predicted to cause a translational pause in the host organism have been replaced with codon pairs that are predicted to be less likely to cause a translational pause therein.
  • the codon pairs of the wild-type sequence that are predicted to cause a translational pause in the host organism are highly-overrepresented codon pairs therein and have been replaced with codon pairs that are not highly-overrepresented therein, wherein a highly-overrepresented codon pair is a codon pair that has a translational kinetics value greater than 5, or 3, or 2.5, or 2 times the standard deviation of translational kinetics values for the host organism.
  • the host organism is not human, E. coli or S. cerevisiae.
  • a cellobiohydrolase-encoding nucleotide sequence having at least a 75% amino acid sequence identity with amino acids 1-497 of wild-type cellobiohydrolase as set forth in SEQ ID NO: 170 and is adapted for expression in a heterologous host organism, wherein at least three codon pairs of the wild-type sequence that are predicted to cause a translational pause in the host organism have been replaced with codon pairs that are predicted to be less likely to cause a translational pause therein, and wherein the host organism is selected from the following: Pichia pastoris; Oryctolagus cuniculus (rabbit); Macaca fascicula ⁇ s (Long-tailed monkey); M.
  • an expression system comprising an expression vector in a host organism, wherein the expression vector includes the nucleotide sequence of the embodiments provided herein, operably linked to an expression control sequence.
  • a system for degrading cellulose comprising one or more host organisms that collectively include nucleotide sequences operably encoding the following enzymes: endo-l,4- ⁇ -glucanase, exo-l,4- ⁇ -D- glucanase, and ⁇ -D-glucosidase; wherein the enzymes are heterologous to the one or more host organisms, and wherein transcriptional kinetics of each of the nucleotide sequences encoding the enzymes has been modified to replace at least three codon pairs present in the original sequence for each enzyme, wherein the at least three replaced codon pairs are predicted to cause a translational pause in the host organism, and wherein said modification results in silent permutation or conservative amino acid substitution of said at least three codon pairs.
  • the one or more host organisms are selected from the group consisting of: Saccharomyces cerevisiae, Pichia pastoris, Escherichia coli, Bombyx mori, Spodoptera frugiperda, Drosophila melanogaster, Kluyveromyces lactis, Zymomonas mobilis and Schizosaccharomyces pombe.
  • each encoded enzyme has at least a 75% amino acid sequence identity with the original sequence of the enzyme.
  • the exo-l,4- ⁇ -D-glucanase retains at least 75% of the enzymatic activity of wild-type TrCBH-I (SEQ ID NO: 170) under normal physiological conditions.
  • a cellobiohydrolase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1 -497 of wild-type cellobiohydrolase as set forth in SEQ ID NO: 170 and is adapted for expression in a heterologous host organism, wherein at least 1, 2 or 3 codon pairs present in SEQ ID NO: 169 and which encode amino acids 465-493 of SEQ ID NO: 170 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof, and wherein at least one replacement codon pair is predicted to be less likely to cause a translational pause in the heterologous host organism relative to the respective wild type codon pair when expressed in the heterologous host organism.
  • the z score of at least one replacement codon pair when expressed in the heterologous host organism is no more than 150% of the z score for the wild type codon pair when expressed in the native organism.
  • no replacement codon encoding amino acids 465-493 of SEQ ID NO: 170 has a z score for expression in the heterologous host that is more than 400%, or 300%, or 200%, or 150% or 100% of the mean or median of the five highest z scores of the wild type codon pairs encoding amino acids 465-493 when expressed in the native organism.
  • no replacement codon encoding amino acids 465-493 of SEQ ID NO: 170 has a z score for expression in the heterologous host that is more than 400%, or 300%, or 200%, or 150% or 100% of the wild type codon pair ATTGGC when expressed in the native organism.
  • a cellobiohydrolase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1 -497 of wild-type cellobiohydrolase as set forth in SEQ ID NO: 170 and is adapted for expression in a heterologous host organism, wherein at least 1, 2 or 3 codon pairs present in SEQ ID NO: 169 and which encode amino acids 435-464 of SEQ ID NO: 170 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof, and wherein at least one replacement codon pair is predicted to be equally or more likely to cause a translational pause in the heterologous host organism relative to the respective wild type codon pair when expressed in the heterologous host organism.
  • the z score of at least one replacement codon pair when expressed in the heterologous host organism is at least 75% of the z score for the wild type codon pair when expressed in the native organism.
  • at least one replacement codon encoding amino acids 435-464 of SEQ ID NO: 170 has a z score for expression in the heterologous that is more than 200%, or 100%, or 75%, or 50% or 40% of the mean or median of the five highest z scores of the wild type codon pairs encoding amino acids 435-464 when expressed in the native organism.
  • At least one replacement codon encoding amino acids 62-107 of SEQ ID NO: 170 has a z score for expression in the heterologous host that is more than 200%, or 100%, or 75%, or 50% or 40% of the wild type codon pair CCTACC when expressed in the native organism.
  • a endoglucanase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-335 of wild-type endoglucanase as set forth in SEQ ID NO: 182, wherein at least 3 of the following codon pairs of SEQ ID NO: 181 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof: CAGTTT (nucleotides 445 - 450 ); CAGTAC (nucleotides 571 - 576 ); CAGTAC (nucleotides 685 - 690 ); AAGGGC (nucleotides 793 - 798 ); GAGTTT (nucleotides 808 - 813 ).
  • nucleotide sequences at least 3, or 4, or 5, or 6 or more of the specified codon pairs have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • at least 3 of the following codon pair replacements have been made: CAGTTT (nucleotides 445 - 450 ) replaced with CAATTT; CAGTAC (nucleotides 571 - 576 ) replaced with CAATAT; CAGTAC (nucleotides 685 - 690 ) replaced with CAATAT; AAGGGC (nucleotides 793 - 798 ) replaced with AAGGGA; GAGTTT (nucleotides 808 - 813 ) replaced with GAATTT.
  • the nucleotide sequence is optimized for expression in S. cerevisiae.
  • a endoglucanase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-335 of wild-type endoglucanase as set forth in SEQ ID NO: 182, wherein at least 3 of the following codon pairs of SEQ ID NO: 181 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof: CTCGGC (nucleotides 7 - 12 ); AGCCAG (nucleotides 142 - 147 ); CTGGCA (nucleotides 301 - 306 ); GATCTC (nucleotides 307 - 312 ); TTCCAG (nucleotides 415 - 420 ); TTCTGG (nucleotides 424 - 429 ); GCCGGA (nucleotides 556 - 561 ); GTCTGG (nucleotides 886
  • At least 3, or 4, or 5, or 6 or more of the specified codon pairs have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • at least 3 of the following codon pair replacements have been made:CTCGGC (nucleotides 7 - 12 ) replaced with CTGGGT; AGCCAG (nucleotides 142 - 147 ) replaced with AGCCAA; CTGGCA (nucleotides 301 - 306 ) replaced with CTCGCG; GATCTC (nucleotides 307 - 312 ) replaced with GACCTG; TTCCAG (nucleotides 415 - 420 ) replaced with TTCCAA; TTCTGG (nucleotides 424 - 429 ) replaced with TTTTGG; GCCGGA (nucleotides 556 - 561 ) replaced with GCGGGT; GTCTGG (nucleot
  • a endoglucanase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-335 of wild-type endoglucanase as set forth in SEQ ID NO: 182, wherein at least 3 of the following codon pairs of SEQ ID NO: 181 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof: GGCTCT (nucleotides 10 - 15 ); ACCAAG (nucleotides 82 - 87 ); CTTCCA (nucleotides 151 - 156 ); GGCTCT (nucleotides 280 - 285 ); CAGTTT (nucleotides 445 - 450 ); CACGAT (nucleotides 493 - 498 ); AAGAAG (nucleotides 790
  • nucleotide sequences at least 3, or 4, or 5, or 6 or more of the specified codon pairs have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • GGCTCT nucleotides 10 - 15
  • ACCAAG nucleotides 82 - 87
  • CTTCCA nucleotides 151 - 156
  • GGCTCT nucleotides 280 - 285
  • CAGTTT nucleotides 445 - 450
  • CACGAT nucleotides 493 - 498
  • CACGAT nucleotides 493 - 498
  • CACGAT nucleotides 493 - 498
  • a endoglucanase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-335 of wild-type endoglucanase as set forth in SEQ ID NO: 182, wherein at least 3 of the following codon pairs of SEQ ID NO: 181 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof: GGCTCT (nucleotides 10 - 15 ); ACCAAG (nucleotides 82 - 87 ); CTTCCA (nucleotides 151 - 156 ); GGCTCT (nucleotides 280 - 285 ); CAGTTT (nucleotides 445 - 450 ); CACGAT (nucleotides 493 - 498 ); AAGAAG (nucleotides 790
  • nucleotide sequences at least 3, or 4, or 5, or 6 or more of the specified codon pairs have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • GGCTCT nucleotides 10 - 15
  • ACCAAG nucleotides 82 - 87
  • CTTCCA nucleotides 151 - 156
  • GGCTCT nucleotides 280 - 285
  • CAGTTT nucleotides 445 - 450
  • CACGAT nucleotides 493 - 498
  • CACGAT nucleotides 493 - 498
  • CACGAT nucleotides 493 - 498
  • a endoglucanase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-335 of wild-type endoglucanase as set forth in SEQ ID NO: 182, wherein at least 3 of the following codon pairs of SEQ ID NO: 181 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof: TCCGGT (nucleotides 124 - 129 ); GTCGAT (nucleotides 358 - 363 ); GCCGGA (nucleotides 556 - 561 ); GGGGCA (nucleotides 604 - 609 ); GCATGG (nucleotides 607 - 612 ).
  • nucleotide sequences at least 3, or 4, or 5, or 6 or more of the specified codon pairs have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • at least 3 of the following codon pair replacements have been made: TCCGGT (nucleotides 124 - 129 ) replaced with TCTGGT; GTCGAT (nucleotides 358 - 363 ) replaced with GTTGAT; GCCGGA (nucleotides 556 - 561 ) replaced with GCTGGT; GGGGCA (nucleotides 604 - 609 ) replaced with GGCGCG; GCATGG (nucleotides 607 - 612 ) replaced with GCGTGG.
  • the nucleotide sequence is optimized for expression in Z. mobilis.
  • the codon pairs of the wild-type sequence that are predicted to cause a translational pause in the host organism are highly-overrepresented codon pairs therein and have been replaced with codon pairs that are not highly-overrepresented therein, wherein a highly-overrepresented codon pair is a codon pair that has a translational kinetics value greater than 5, or 3, or 2.5, or 2 times the standard deviation of translational kinetics values for the host organism.
  • the host organism is not human, E. coli or S. cerevisiae.
  • a endoglucanase -encoding nucleotide sequence having at least a 75% amino acid sequence identity with amino acids 1-335 of wild-type endoglucanase as set forth in SEQ ID NO: 182 and is adapted for expression in a heterologous host organism, wherein at least three codon pairs of the wild-type sequence that are predicted to cause a translational pause in the host organism have been replaced with codon pairs that are predicted to be less likely to cause a translational pause therein, and wherein the host organism is selected from the following: Pichia pastoris; Oryctolagus cuniculus (rabbit); Macaca fascicularis (Long-tailed monkey); M.
  • an expression system comprising an expression vector in a host organism, wherein the expression vector includes the nucleotide sequence of the embodiments provided herein, operably linked to an expression control sequence.
  • a system for degrading cellulose comprising one or more host organisms that collectively include nucleotide sequences operably encoding the following enzymes: endo-l,4- ⁇ -glucanase, exo-l ,4- ⁇ -D- glucanase, and ⁇ -D-glucosidase; wherein the enzymes are heterologous to the one or more host organisms, and wherein transcriptional kinetics of each of the nucleotide sequences encoding the enzymes has been modified to replace at least three codon pairs present in the original sequence for each enzyme, wherein the at least three replaced codon pairs are predicted to cause a translational pause in the host organism, and wherein said modification results in silent permutation or conservative amino acid substitution of said at least three codon pairs.
  • the one or more host organisms are selected from the group consisting of: Saccharomyces cerevisiae, Pichia pastoris, Escherichia coli, Bombyx mori, Spodoptera frugiperda, Drosophila melanogaster, Kluyveromyces lactis, Zymomonas mobilis and Schizosaccharomyces pombe.
  • each encoded enzyme has at least a 75% amino acid sequence identity with the original sequence of the enzyme.
  • the endo-l ,4- ⁇ -glucanase retains at least 75% of the enzymatic activity of wild-type endoglucanase (SEQ ID NO: 182) under normal physiological conditions.
  • a endoglucanase -encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-335 of wild-type endoglucanase as set forth in SEQ ID NO: 182 and is adapted for expression in a heterologous host organism, wherein at least 1, 2 or 3 codon pairs present in SEQ ID NO: 181 and which encode amino acids 32- 276 of SEQ ID NO: 182 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof, and wherein at least one replacement codon pair is predicted to be less likely to cause a translational pause in the heterologous host organism relative to the respective wild type codon pair when expressed in the heterologous host organism.
  • the z score of at least one replacement codon pair when expressed in the heterologous host organism is no more than 150% of the z score for the wild type codon pair when expressed in the native organism.
  • no replacement codon encoding amino acids 32-276 of SEQ ID NO: 182 has a z score for expression in the heterologous host that is more than 400%, or 300%, or 200%, or 150% or 100% of the mean or median of the five highest z scores of the wild type codon pairs encoding amino acids 32-276 when expressed in the native organism.
  • no replacement codon encoding amino acids 32-276 of SEQ ID NO: 2 has a z score for expression in the heterologous host that is more than 400%, or 300%, or 200%, or 150% or 100% of the wild type codon pair with the highest z score when expressed in the native organism.
  • a endoglucanase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-335 of wild-type cellobiohydrolase as set forth in SEQ ID NO: 182 and is adapted for expression in a heterologous host organism, wherein at least 1, 2 or 3 codon pairs present in SEQ ID NO: 181 and which encode amino acids 1- 32 of SEQ ID NO: 182 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof, and wherein at least one replacement codon pair is predicted to be equally or more likely to cause a translational pause in the heterologous host organism relative to the respective wild type codon pair when expressed in the heterologous host organism.
  • the z score of at least one replacement codon pair when expressed in the heterologous host organism is at least 75% of the z score for the wild type codon pair when expressed in the native organism.
  • at least one replacement codon encoding amino acids 1-32 of SEQ ID NO: 182 has a z score for expression in the heterologous that is more than 200%, or 100%, or 75%, or 50% or 40% of the mean or median of the five highest z scores of the wild type codon pairs encoding amino acids 1-32 when expressed in the native organism.
  • At least one replacement codon encoding amino acids 1-32 of SEQ ID NO: 182 has a z score for expression in the heterologous host that is more than 200%, or 100%, or 75%, or 50% or 40% of the wild type codon pair with the highest z score when expressed in the native organism.
  • a xylanase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-225 of wild-type xylanase as set forth in SEQ ID NO: 194, wherein at least 3 of the following codon pairs of SEQ ID NO: 193 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof: AGTGAC (nucleotides 58 - 63 ); AAGGGC (nucleotides 148 - 153 ); GCAAGA (nucleotides 172 - 177 ); GACCAA (nucleotides 406 - 411 ); AGCGGT (nucleotides 442 - 447 ); TTGAAT (nucleotides 493 - 498 ).
  • At least 3, or 4, or 5, or 6 or more of the specified codon pairs have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • at least 3 of the following codon pair replacements have been made: AGTGAC (nucleotides 58 - 63 ) replaced with TCTGAT; AAGGGC (nucleotides 148 - 153 ) replaced with AAAGGT; GCAAGA (nucleotides 172 - 177 ) replaced with GCTAGA; GACCAA (nucleotides 406 - 411 ) replaced with GATCAA; AGCGGT (nucleotides 442 - 447 ) replaced with TCTGGA; TTGAAT (nucleotides 493 - 498 ) replaced with TTAAAC.
  • the nucleotide sequence is optimized for expression in 5". cerevisiae.
  • a xylanase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-225 of wild-type xylanase as set forth in SEQ ID NO: 194, wherein at least 3 of the following codon pairs of SEQ ID NO: 193 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof: GGCTGG (nucleotides 25 - 30 ); CTGGAA (nucleotides 91 - 96 ); GGCGGT (nucleotides 127 - 132 ); GGCTGG (nucleotides 151 - 156 ); CTCGGC (nucleotides 352 - 357 ); TACTGG (nucleotides 412 - 417 ); CGCCAG (nucleotides 424 - 429 ); ACCAGC (nucleotides 4
  • At least 3, or 4, or 5, or 6 or more of the specified codon pairs have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • at least 3 of the following codon pair replacements have been made: GGCTGG (nucleotides 25 - 30 ) replaced with GGTTGG; CTGGAA (nucleotides 91 - 96 ) replaced with CTGGAG; GGCGGT (nucleotides 127 - 132 ) replaced with GGCGGC; GGCTGG (nucleotides 151 - 156 ) replaced with GGTTGG; CTCGGC (nucleotides 352 - 357 ) replaced with CTGGGT; TACTGG (nucleotides 412 - 417 ) replaced with TATTGG; CGCCAG (nucleotides 424 - 429 ) replaced with CGTCAG; ACCAGC (nucleotides 4
  • a xylanase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-225 of wild-type xylanase as set forth in SEQ ID NO: 194, wherein at least 3 of the following codon pairs of SEQ ID NO: 193 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof: CACGAT (nucleotides 31 - 36 ); AGTGAC (nucleotides 58 - 63 ); GAGTAT (nucleotides 259 - 264 ); AACTTT (nucleotides 277 - 282 ); GTCAAC (nucleotides 370 - 375 ); GTCAAC (nucleotides 499 - 504 ).
  • At least 3, or 4, or 5, or 6 or more of the specified codon pairs have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • at least 3 of the following codon pair replacements have been made: CACGAT (nucleotides 31 - 36 ) replaced with CATGAT; AGTGAC (nucleotides 58 - 63 ) replaced with TCTGAT; GAGTAT (nucleotides 259 - 264 ) replaced with GAATAT; AACTTT (nucleotides 277 - 282 ) replaced with AATTTC; GTCAAC (nucleotides 370 - 375 ) replaced with GTTAAT; GTCAAC (nucleotides 499 - 504 ) replaced with GTGAAT.
  • the nucleotide sequence is optimized for expression in P. pastoris.
  • a A xylanase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-225 of wild-type xylanase as set forth in SEQ ID NO: 194, wherein at least 3 of the following codon pairs of SEQ ID NO: 193 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof: GGCTGG (nucleotides 25 - 30 ); GGCTGG (nucleotides 151 - 156 ); GCAAGA (nucleotides 172 - 177 ); GGTGTT (nucleotides 193 - 198 ); AACTTT (nucleotides 277 - 282 ); GACCAA (nucleotides 406 - 41 1 ); GGTACC (nucleotides 445 - 450 ); TTGAAT (nucleotides
  • At least 3, or 4, or 5, or 6 or more of the specified codon pairs have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • at least 3 of the following codon pair replacements have been made: GGCTGG (nucleotides 25 - 30 ) replaced with GGTTGG; GGCTGG (nucleotides 151 - 156 ) replaced with GGTTGG; GCAAGA (nucleotides 172 - 177 ) replaced with GCTAGA; GGTGTT (nucleotides 193 - 198 ) replaced with GGTGTT; AACTTT (nucleotides 277 - 282 ) replaced with AATTTC; GACCAA (nucleotides 406 - 411 ) replaced with GATCAA; GGTACC (nucleotides 445 - 450 ) replaced with GGTACA; TTGAAT (nucleotides 4
  • a xylanase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-225 of wild-type xylanase as set forth in SEQ ID NO: 194, wherein at least 3 of the following codon pairs of SEQ ID NO: 193 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof: GAAGGC (nucleotides 94 - 99 ); GCAAGA (nucleotides 172 - 177 ); AACAGC (nucleotides 214 - 219 ); ACCTAT (nucleotides 286 - 291 ); TCCGGT (nucleotides 301 - 306 ); GCAACG (nucleotides 529 - 534 ); GGCTAT (nucleotides 553 - 558 ).
  • At least 3, or 4, or 5, or 6 or more of the specified codon pairs have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • at least 3 of the following codon pair replacements have been made: GAAGGC (nucleotides 94 - 99 ) replaced with GAAGGA; GCAAGA (nucleotides 172 - 177 ) replaced with GCTCGT; AACAGC (nucleotides 214 - 219 ) replaced with AATTCT; ACCTAT (nucleotides 286 - 291 ) replaced with ACGTAT; TCCGGT (nucleotides 301 - 306 ) replaced with TCTGGT; GCAACG (nucleotides 529 - 534 ) replaced with GCCACC; GGCTAT (nucleotides 553 - 558 ) replaced with GGTTAT.
  • the nucleotide sequence is optimized for expression in Z. mobilis.
  • a xylanase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-225 of wild-type xylanase as set forth in SEQ ID NO: 194 and is adapted for expression in a heterologous host organism, and wherein at least three codon pairs of the wild-type sequence that are predicted to cause a translational pause in the host organism have been replaced with codon pairs that are predicted to be less likely to cause a translational pause therein.
  • the codon pairs of the wild-type sequence that are predicted to cause a translational pause in the host organism are highly- overrepresented codon pairs therein and have been replaced with codon pairs that are not highly-overrepresented therein, wherein a highly-overrepresented codon pair is a codon pair that has a translational kinetics value greater than 5, or 3, or 2.5, or 2 times the standard deviation of translational kinetics values for the host organism.
  • the host organism is not human, E. coli or S. cerevisiae.
  • a xylanase-encoding nucleotide sequence having at least a 75% amino acid sequence identity with amino acids 1-225 of wild-type xylanase as set forth in SEQ ID NO: 194 and is adapted for expression in a heterologous host organism, wherein at least three codon pairs of the wild-type sequence that are predicted to cause a translational pause in the host organism have been replaced with codon pairs that are predicted to be less likely to cause a translational pause therein, and wherein the host organism is selected from the following: Pichia pastoris; Oryctolagus cuniculus (rabbit); Macaco, fascicularis (Long-tailed monkey); M.
  • an expression system comprising an expression vector in a host organism, wherein the expression vector includes the nucleotide sequence of the embodiments provided herein, operably linked to an expression control sequence.
  • a xylanase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-225 of wild-type xylanase as set forth in SEQ ID NO: 194 and is adapted for expression in a heterologous host organism, wherein at least 1, 2 or 3 codon pairs present in SEQ ID NO: 193 and which encode amino acids 31-221 of SEQ ED NO: 194 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof, and wherein at least one replacement codon pair is predicted to be less likely to cause a translational pause in the heterologous host organism relative to the respective wild type codon pair when expressed in the heterologous host organism.
  • the z score of at least one replacement codon pair when expressed in the heterologous host organism is no more than 150% of the z score for the wild type codon pair when expressed in the native organism.
  • no replacement codon encoding amino acids 31-221 of SEQ ID NO: 194 has a z score for expression in the heterologous host that is more than 400%, or 300%, or 200%, or 150% or 100% of the mean or median of the five highest z scores of the wild type codon pairs encoding amino acids 31-221 when expressed in the native organism.
  • no replacement codon encoding amino acids 31-221 of SEQ ID NO: 194 has a z score for expression in the heterologous host that is more than 400%, or 300%, or 200%, or 150% or 100% of the wild type codon pair with the highest z score when expressed in the native organism.
  • a xylanase-encoding nucleotide sequence wherein the encoded sequence has at least a 75% amino acid sequence identity with amino acids 1-225 of wild-type xylanase as set forth in SEQ ID NO: 194 and is adapted for expression in a heterologous host organism, wherein at least 1 , 2 or 3 codon pairs present in SEQ ID NO: 193 and which encode amino acids 1-31 of SEQ ID NO: 194 have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof, and wherein at least one replacement codon pair is predicted to be equally or more likely to cause a translational pause in the heterologous host organism relative to the respective wild type codon pair when expressed in the heterologous host organism.
  • the z score of at least one replacement codon pair when expressed in the heterologous host organism is at least 75% of the z score for the wild type codon pair when expressed in the native organism.
  • at least one replacement codon encoding amino acids 1-31 of SEQ ID NO: 194 has a z score for expression in the heterologous that is more than 200%, or 100%, or 75%, or 50% or 40% of the mean or median of the five highest z scores of the wild type codon pairs encoding amino acids 1-31 when expressed in the native organism.
  • At least one replacement codon encoding amino acids 1-31 of SEQ ID NO: 194 has a z score for expression in the heterologous host that is more than 200%, or 100%, or 75%, or 50% or 40% of the wild type codon pair highest z score when expressed in the native organism.
  • isolated polynucleotides comprising the nucleotide sequence of SEQ ID NOs: 3, 5, 7, 9, 1 1, 13, 15, 17, 19, 21, 23, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 51 , 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 75, 77, 79, 81, 83, 85, 87, 89, 91 , 93, 95, 99, 101, 103, 105, 107, 109, 1 1 1 , 1 13, 1 15, 1 17, 1 19, 123, 125, 127, 129, 131, 133, 135, 137, 139, 141, 143, 147, 149, 151, 153, 155, 157, 159, 161, 163, 165, 167, 171, 173, 175, 177, 179, 183, 185, 187, 189, 191, 195, 197, 199, 201 or 203.
  • isolated polypeptides encoded by the any of the nucleotide sequences provided herein, provided that the amino acid sequence of said polypeptide is not SEQ ID NO: 2, 26, 50, 74, 98, 122, 146, 170, 182 or 194.
  • expression systems comprising: an expression vector in a host organism, wherein the expression vector includes the any of the polynucleotides provided herein operably linked to an expression control sequence. Also provided herein are expression systems, comprising: an expression vector in a host organism, wherein the expression vector includes two or more polynucleotides provided herein, each polynucleotide being operably linked to the same or different expression control sequences.
  • expression systems for degrading cellulose comprising: one or more host organisms that collectively include polynucleotides operably encoding the following enzymes: endo-l ,4- ⁇ -glucanase, exo-l,4- ⁇ -D-glucanase, and ⁇ -D-glucosidase; wherein the enzymes are heterologous to the one or more host organisms, and wherein translational kinetics of each of the polynucleotides encoding the enzymes has been modified to replace at least three codon pairs present in the original sequence for each enzyme, wherein the at least three codon pairs are predicted to cause a translational pause in the host organism, and wherein said modification results in silent permutation or conservative amino acid substitution of said at least three codon pairs.
  • expression systems for metabolizing lignin comprising: one or more host organisms that collectively include polynuclotides operably encoding the following enzymes: laccase, Mn-dependent peroxidase, and lignin peroxidase; wherein the enzymes are heterologous to the one or more host organisms, and wherein translational kinetics of each of the DNA sequences encoding the enzymes has been modified to replace at least three codon pairs present in the original sequence for each enzyme, wherein the at least three codon pairs are predicted to cause a translational pause in the host organism, and wherein said modification results in silent permutation or conservative amino acid substitution of said at least three codon pairs.
  • one or more of said polynucleotides comprises the nucleotide sequence of SEQ ID NOs: 3, 5, 7, 9, 1 1 , 13, 15, 17, 19, 21, 23, 171 , 173, 175, 177, 179, 183, 185, 187, 189 or 191.
  • Some such systems comprise two or more polynucleotides comprising the nucleotide sequence of SEQ ID NOs: 3, 5, 7, 9, 1 1 , 13, 15, 17, 19, 21, 23, 171, 173, 175, 177, 179, 183, 185, 187, 189 or 191.
  • one or more of said polynucleotides comprises the nucleotide sequence of SEQ ID NOs: 27, 29, 31, 33, 35, 37, 39, 41 , 43, 45, 47, 51, 53, 55, 57, 59, 6L 63, 65, 67, 69, 71, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95, 99, 101, 103, 105, 107, 109, 11 1, 113, 1 15, 1 17, 119, 123, 125, 127, 129, 131, 133, 135, 137, 139, 141 , 143, 147, 149, 151, 153, 155, 157, 159, 161, 163, 165 or 167.
  • Some such systems comprise two or more polynucleotides comprising the nucleotide sequence of SEQ ID NOs: 27, 29, 31, 33, 35, 37, 39, 41 , 43, 45, 47, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95, 99, 101, 103, 105, 107, 109, 1 11, 1 13, 1 15, 1 17, 1 19, 123, 125, 127, 129, 131, 133, 135, 137, 139, 141, 143, 147, 149, 151, 153, 155, 157, 159, 161, 163, 165 or 167.
  • the one or more host organisms are selected from the group consisting of: Saccharomyces cerevisiae, Pichia pastoris, Escherichia coli, Bombyx mori, Spodoptera frugiperda, Drosophila melanogaster, Kluyveromyces lactis, Zymomonas mobilis and Schizosaccharomyces pombe.
  • each encoded enzyme has at least a 75% amino acid sequence identity with the original sequence of said enzyme.
  • each encoded enzyme retains at least 75% of the enzymatic activity of wild-type polypeptide (SEQ ID NO: 2, 26, 50, 74, 98, 122, 146, 170, 182 or 194) under normal physiological conditions.
  • cells comprising any of the polynucleotides provided herein.
  • the cell expresses the polypeptide encoded by said polynucleotide.
  • Also provided herein are methods of introducing a polynucleotide into a host cell comprising: providing a host cell; and contacting said host cell with any of the polynucleotides provided herein under conditions that permit the polynucleotide to be introduced into the host cell.
  • Also provided herein are methods of expressing a polypeptide comprising: providing a cell comprising any of the polynucleotides provided herein; and placing the cell under conditions that permit the cell to express the polypeptide encoded by the DNA sequence, whereby said encoded polypeptide is expressed by said cell.
  • Also provided herein are methods of hydrolyzing a carbohydrate comprising: providing a carbohydrate comprising at least one glycosidic bond; providing a polypeptide encoded by any of the polynucleotides provided herein; and contacting said carbohydrate with said polypeptide under conditions that permit said polypeptide to hydrolyze at least one covalent bond of said carbohydrate, whereby at least one covalentbond of said carbohydrate is hydrolyzed.
  • integrable polynucleotides for modifying an endogenous nucleotide sequence in a cell comprising: a removable selectable marker cassette comprising a selectable marker flanked by a 5' site-specific recombinase recognition site and a 3' site-specific recombinase recognition site, wherein said removable selectable marker cassette is flanked by a 5' nucleic acid sequence with homology to an endogenous sequence and a 3' nucleic acid sequence with homology to an endogenous sequence.
  • integrable polynucleotides further comprise a heterologous nucleic acid flanked by said 5' nucleic acid sequence with homology to an endogenous sequence and said 3' nucleic acid sequence with homology to an endogenous sequence.
  • the heterologous nucleic acid comprises a sequence encoding a polypeptide.
  • the heterologous nucleic acid comprises a regulatory sequence.
  • the sequence encoding a polypeptide is operatively linked to said regulatory sequence.
  • the regulatory sequence comprises a promoter sequence and a terminator sequence.
  • the heterologous nucleic acid comprises a polynucleotide in accordance with any of the polynucleotides provided herein. In some embodiments, the heterologous nucleic acid encodes a polypeptide that degrades cellulose and/or lignin.
  • the heterologous nucleic acid comprises SEQ ID NOs: 3, 5, 7, 9, 1 1, 13, 15, 17, 19, 21, 23, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 51, 53, 55, 57, 59, 61 , 63, 65, 67, 69, 71, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95, 99, 101, 103, 105, 107, 109, 11 1, 1 13, 115, 117, 119, 123, 125, 127, 129, 131, 133, 135, 137, 139, 141, 143, 147, 149, 151, 153, 155, 157, 159, 161, 163, 165, 167, 171, 173, 175, 177, 179, 183, 185, 187, 189, 191, 195, 197, 199, 201 or 203.
  • the selectable marker can be selected for or can be selected against. In some such integrable polynucleotides, the selectable marker can be selected for and can be selected against. In some such integrable polynucleotides, the selectable mark is selected from the group consisting of URA3, TRPl, CANl, KIURA3, CYH2, LYS2 and METl 5. In some such integrable polynucleotides, the nucleic acid sequence with homology to an endogenous sequence comprises a genomic repetitive element. In some such integrable polynucleotides, the nucleic acid sequence with homology to an endogenous sequence comprises TyI DNA or Ty3 DNA.
  • the site- specific recombinase recognition site comprises a loxP sequence. In some such integrable polynucleotides, the site-specific recombinase recognition site comprises a frt sequence. In some such integrable polynucleotides, the integrable polynucleotide comprises a PCR product.
  • cells comprising any of the integrable polynucleotides provided herein. Some such cells comprise a gene encoding a site- specific recombinase. In some such cells, the site-specific recombinase comprises a CRE recombinase or a FLP recombinase. Some such cells are S. cerevisiae cells.
  • Also provided herein are methods of modifying an endogenous sequence in a cell comprising: providing a cell with at least one of the integrable polynucleotides provided; and selecting for a cell comprising said at least one integrable polynucleotide integrated therein to the genome of the cell. Some such methods further comprise excising at least one selectable marker from said at least one cell comprising said at least one integrable polynucleotide integrated therein; and selecting for a cell in which said at least one selectable marker has been excised. In some such methods, the excising said selectable marker comprises providing said cell with a site-specific recombinase.
  • the site-specific recombinase comprises a CRE recombinase or a FLP recombinase. In some such methods, the site-specific recombinase is expressed from an endogenous gene or from a heterologous nucleic acid.
  • the providing a cell with at least one integrable polynucleotide comprises providing a cell with a plurality of integrable polynucleotides, wherein said plurality of integrable polynucleotides comprises at least a first integrable polynucleotide comprising a first selectable marker and a second integrable polynucleotide comprising a second selectable marker.
  • the plurality comprises 3 or more, 4 or more, 5 or more, 6 or more, 7 or more, 8 or more, 9 or more, 10 or more different integrable polynucleotides.
  • cells comprising an endogenous sequence modified by any of such methods provided herein.
  • the modified endogenous sequence comprises an insertion, a deletion or a mutation.
  • cells comprising a removable selectable marker cassette integrated into said cell comprising a selectable marker flanked by a 5' site- specific recombinase recognition site and a 3' site-specific recombinase recognition site; and a heterologous nucleic acid integrated into said cell, wherein said removable selectable marker is juxtaposed to said heterologous nucleic.
  • cells comprising: a heterologous nucleic acid integrated into said cell, and a site-specific recombinase recognition site integrated into said cell, wherein said site-specific recombinase recognition site is juxtaposed to said heterologous nucleic acid.
  • the site-specific recombinase recognition site comprises a loxP or frt sequence.
  • the cell is a S. cerevisae cell.
  • the heterologous nucleic acid comprises a polynucleotide in accordance with any of the polynucleotides provided herein. In some such cells, the heterologous nucleic acid encodes a polypeptide that degrades cellulose and/or lignin.
  • the heterologous nucleic acid comprises SEQ ID NOs: 3, 5, 7, 9, 1 1, 13, 15, 17, 19, 21, 23, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 75, 77, 79, 81 , 83, 85, 87, 89, 91 , 93, 95, 99, 101, 103, 105, 107, 109, 111, 113, 1 15, 1 17, 119, 123, 125, 127, 129, 131, 133, 135, 137, 139, 141, 143, 147, 149, 151 , 153, 155, 157, 159, 161, 163, 165, 167, 171, 173, 175, 177, 179, 183, 185, 187, 189, 191, 195, 197, 199, 201 or 203.
  • Figure 1 depicts a graphical display of z scores of translational kinetics values for codon pair utilization in T.
  • Reesei of nucleic acid sequences encoding the cellobiohydrolase-II enzyme of T. Reesei (TrCBH-II), plotted as a function of codon pair position.
  • Figures 2-6 depicts effects of Translational EngineeringTM on protein expression levels. Each of Figures 2-6 depict graphical displays of z scores of translational kinetics values for codon pair utilization of nucleic acid sequences encoding TrCBH-II, plotted as a function of codon pair position.
  • Figure 2 A depicts a graphical display of the S. cerevisiae expression of the native nucleic acid sequence encoding the TrCBH-II protein.
  • Figure 2B depicts a graphical display of the S. cerevisiae expression of a nucleic acid sequence encoding the TrCBH-II which has been modified to eliminate codon pairs that are predicted to cause a translational pause in S. cerevisiae.
  • Figure 3A depicts a graphical display of the E. coli expression of the native nucleic acid sequence encoding the TrCBH-II protein.
  • Figure 3B depicts a graphical display of the E. coli expression of a nucleic acid sequence encoding the TrCBH-II which has been modified to eliminate codon pairs that are predicted to cause a translational pause in E. coli.
  • Figure 4A depicts a graphical display of the P. pastoris expression of the native nucleic acid sequence encoding the TrCBH-II protein.
  • Figure 4B depicts a graphical display of the P. pastoris expression of a nucleic acid sequence encoding the TrCBH-II which has been modified to eliminate codon pairs that are predicted to cause a translational pause in P. pastoris.
  • Figure 5A depicts a graphical display of the K. lactis expression of the native nucleic acid sequence encoding the TrCBH-II protein.
  • Figure 5B depicts a graphical display of the K. lactis expression of a nucleic acid sequence encoding the TrCBH-II which has been modified to eliminate codon pairs that are predicted to cause a translational pause in K. lactis.
  • Figure 6A depicts a graphical display of the Z. mobilis expression of the native nucleic acid sequence encoding the TrCBH-II protein.
  • Figure 6B depicts a graphical display of the Z mobilis expression of a nucleic acid sequence encoding the TrCBH-II which has been modified to eliminate codon pairs that are predicted to cause a translational pause in Z mobilis.
  • Figures 7-11 depict effects of Translational eEngineeringTM on protein expression levels.
  • Each of Figures 7-11 depict graphical displays of z scores of translational kinetics values for codon pair utililization of nucleic acid sequences encoding the laccase enzyme of P. sanguineus (LCC), plotted as a function of codon pair position.
  • Figure 7A depicts a graphical display of the S. cerevisiae expression of the native nucleic acid sequence encoding the LCC protein.
  • Figure 7B depicts a graphical display of the S. cerevisiae expression of a nucleic acid sequence encoding the LCC which has been modified to eliminate codon pairs that are predicted to cause a translational pause in S. cerevisiae.
  • Figure 8A depicts a graphical display of the E. coli expression of the native nucleic acid sequence encoding the LCC protein.
  • Figure 8B depicts a graphical display of the E. coli expression of a nucleic acid sequence encoding the LCC which has been modified to eliminate codon pairs that are predicted to cause a translational pause in E. coli.
  • Figure 9A depicts a graphical display of the P. pastoris expression of the native nucleic acid sequence encoding the LCC protein.
  • Figure 9B depicts a graphical display of the P. pastoris expression of a nucleic acid sequence encoding the LCC which has been modified to eliminate codon pairs that are predicted to cause a translational pause in P. pastoris.
  • Figure 1OA depicts a graphical display of the K. lactis expression of the native nucleic acid sequence encoding the LCC protein.
  • Figure 1OB depicts a graphical display of the K. lactis expression of a nucleic acid sequence encoding the LCC which has been modified to eliminate codon pairs that are predicted to cause a translational pause in K. lactis.
  • Figure HA depicts a graphical display of the Z mobilis expression of the native nucleic acid sequence encoding the LCC protein.
  • Figure HB depicts a graphical display of the Z. mobilis expression of a nucleic acid sequence encoding the LCC which has been modified to eliminate codon pairs that are predicted to cause a translational pause in Z. mobilis.
  • Figures 12-16 depict effects of Translational eEngineeringTM on protein expression levels.
  • Each of Figures 12-16 depict graphical displays of z scores of translational kinetics values for codon pair utililization of nucleic acid sequences encoding the lignin peroxidase enzyme of T. versicolor (LIP), plotted as a function of codon pair position.
  • LIP T. versicolor
  • Figure 12A depicts a graphical display of the S. cerevisiae expression of the native nucleic acid sequence encoding the LIP protein.
  • Figure 12B depicts a graphical display of the S. cerevisiae expression of a nucleic acid sequence encoding the LIP which has been modified to eliminate codon pairs that are predicted to cause a translational pause in S. cerevisiae.
  • Figure 13A depicts a graphical display of the E. coli expression of the native nucleic acid sequence encoding the LIP protein.
  • Figure 13B depicts a graphical display of the E. coli expression of a nucleic acid sequence encoding the LIP which has been modified to eliminate codon pairs that are predicted to cause a translational pause in E. coli.
  • Figure 14A depicts a graphical display of the P. pastoris expression of the native nucleic acid sequence encoding the LIP protein.
  • Figure 14B depicts a graphical display of the P. pastoris expression of a nucleic acid sequence encoding the LIP which has been modified to eliminate codon pairs that are predicted to cause a translational pause in P. pastoris.
  • Figure 15A depicts a graphical display of the K. lactis expression of the native nucleic acid sequence encoding the LIP protein.
  • Figure 15B depicts a graphical display of the K. lactis expression of a nucleic acid sequence encoding the LIP which has been modified to eliminate codon pairs that are predicted to cause a translational pause in K. lactis.
  • Figure 16A depicts a graphical display of the Z. mobilis expression of the native nucleic acid sequence encoding the LIP protein.
  • Figure 16B depicts a graphical display of the Z mobilis expression of a nucleic acid sequence encoding the LIP which has been modified to eliminate codon pairs that are predicted to cause a translational pause in Z mobilis.
  • Figures 17-21 depict effects of Translational eEngineeringTM on protein expression levels.
  • Each of Figures 17-21 depict graphical displays of z scores of translational kinetics values for codon pair utilization of nucleic acid sequences encoding the Mn-dependent peroxidase enzyme of T. versicolor (MnP), plotted as a function of codon pair position.
  • MnP Mn-dependent peroxidase enzyme of T. versicolor
  • Figure 17A depicts a graphical display of the S. cerevisiae expression of the native nucleic acid sequence encoding the MnP protein.
  • Figure 17B depicts a graphical display of the S. cerevisiae expression of a nucleic acid sequence encoding the MnP which has been modified to eliminate codon pairs that are predicted to cause a translational pause in S. cerevisiae.
  • Figure 18A depicts a graphical display of the E. coli expression of the native nucleic acid sequence encoding the MnP protein.
  • Figure 18B depicts a graphical display of the E. coli expression of a nucleic acid sequence encoding the MnP which has been modified to eliminate codon pairs that are predicted to cause a translational pause in E. coli.
  • Figure 19A depicts a graphical display of the P. pastoris expression of the native nucleic acid sequence encoding the MnP protein.
  • Figure 19B depicts a graphical display of the P. pastoris expression of a nucleic acid sequence encoding the MnP which has been modified to eliminate codon pairs that are predicted to cause a translational pause in P. pastoris.
  • Figure 2OA depicts a graphical display of the K. lactis expression of the native nucleic acid sequence encoding the MnP protein.
  • Figure 2OB depicts a graphical display of the K. lactis expression of a nucleic acid sequence encoding the MnP which has been modified to eliminate codon pairs that are predicted to cause a translational pause in K. lactis.
  • Figure 21 A depicts a graphical display of the Z. mobilis expression of the native nucleic acid sequence encoding the MnP protein.
  • Figure 21 B depicts a graphical display of the Z. mobilis expression of a nucleic acid sequence encoding the MnP which has been modified to eliminate codon pairs that are predicted to cause a translational pause in Z. mobilis.
  • Figure 22 depicts a graphical display of z scores of translational kinetics values for codon pair utilization in N. crassa of nucleic acid sequences encoding the laccase enzyme of TV. crassa (LCC), plotted as a function of codon pair position.
  • Figures 23-27 depict effects of Translational eEngineeringTM on protein expression levels.
  • Each of Figures 23-27 depict graphical displays of z scores of translational kinetics values for codon pair utililization of nucleic acid sequences encoding LCC, plotted as a function of codon pair position.
  • Figure 23A depicts a graphical display of the S. cerevisiae expression of the native nucleic acid sequence encoding the LCC protein.
  • Figure 23B depicts a graphical display of the S. cerevisiae expression of a nucleic acid sequence encoding the LCC which has been modified to eliminate codon pairs that are predicted to cause a translational pause in S. cerevisiae.
  • Figure 24A depicts a graphical display of the E. coli expression of the native nucleic acid sequence encoding the LCC protein.
  • Figure 24B depicts a graphical display of the E. coli expression of a nucleic acid sequence encoding the LCC which has been modified to eliminate codon pairs that are predicted to cause a translational pause in E. coli.
  • Figure 25A depicts a graphical display of the P. pastoris expression of the native nucleic acid sequence encoding the LCC protein.
  • Figure 25B depicts a graphical display of the P. pastoris expression of a nucleic acid sequence encoding the LCC which has been modified to eliminate codon pairs that are predicted to cause a translational pause in P. pastoris.
  • Figure 26A depicts a graphical display of the K. lactis expression of the native nucleic acid sequence encoding the LCC protein.
  • Figure 26B depicts a graphical display of the K. lactis expression of a nucleic acid sequence encoding the LCC which has been modified to eliminate codon pairs that are predicted to cause a translational pause in K. lactis.
  • Figure 27A depicts a graphical display of the Z mobilis expression of the native nucleic acid sequence encoding the LCC protein.
  • Figure 27B depicts a graphical display of the Z. mobilis expression of a nucleic acid sequence encoding the LCC which has been modified to eliminate codon pairs that are predicted to cause a translational pause in Z mobilis.
  • Figures 28-32 depict effects of Translational eEngineeringTM on protein expression levels.
  • Each of Figures 28-32 depict graphical displays of z scores of translational kinetics values for codon pair utilization of nucleic acid sequences encoding the laccase enzyme of P. cinnabarinus (LCC), plotted as a function of codon pair position.
  • Figure 28A depicts a graphical display of the S. cerevisiae expression of the native nucleic acid sequence encoding the LCC protein.
  • Figure 28B depicts a graphical display of the S. cerevisiae expression of a nucleic acid sequence encoding the LCC which has been modified to eliminate codon pairs that are predicted to cause a translational pause in S. cerevisiae.
  • Figure 29A depicts a graphical display of the E. coli expression of the native nucleic acid sequence encoding the LCC protein.
  • Figure 29B depicts a graphical display of the E. coli expression of a nucleic acid sequence encoding the LCC which has been modified to eliminate codon pairs that are predicted to cause a translational pause in E. coli.
  • Figure 30A depicts a graphical display of the P. pastoris expression of the native nucleic acid sequence encoding the LCC protein.
  • Figure 30B depicts a graphical display of the P. pastoris expression of a nucleic acid sequence encoding the LCC which has been modified to eliminate codon pairs that are predicted to cause a translational pause in P. pastoris.
  • Figure 31 A depicts a graphical display of the K. lactis expression of the native nucleic acid sequence encoding the LCC protein.
  • Figure 3 IB depicts a graphical display of the K. lactis expression of a nucleic acid sequence encoding the LCC which has been modified to eliminate codon pairs that are predicted to cause a translational pause in K. lactis.
  • Figure 32A depicts a graphical display of the Z mobilis expression of the native nucleic acid sequence encoding the LCC protein.
  • Figure 32B depicts a graphical display of the Z mobilis expression of a nucleic acid sequence encoding the LCC which has been modified to eliminate codon pairs that are predicted to cause a translational pause in Z mobilis.
  • Figures 33-37 depict effects of Translational eEngineeringTM on protein expression levels.
  • Each of Figures 33-37 depict graphical displays of z scores of translational kinetics values for codon pair utilization of nucleic acid sequences encoding the laccase enzyme of P. coccineus (LCC), plotted as a function of codon pair position.
  • Figure 33A depicts a graphical display of the S. cerevisiae expression of the native nucleic acid sequence encoding the LCC protein.
  • Figure 33B depicts a graphical display of the S. cerevisiae expression of a nucleic acid sequence encoding the LCC which has been modified to eliminate codon pairs that are predicted to cause a translational pause in S. cerevisiae.
  • Figure 34A depicts a graphical display of the E. coli expression of the native nucleic acid sequence encoding the LCC protein.
  • Figure 34B depicts a graphical display of the E. coli expression of a nucleic acid sequence encoding the LCC which has been modified to eliminate codon pairs that are predicted to cause a translational pause in E. coli.
  • Figure 35A depicts a graphical display of the P. pastoris expression of the native nucleic acid sequence encoding the LCC protein.
  • Figure 35B depicts a graphical display of the P. pastoris expression of a nucleic acid sequence encoding the LCC which has been modified to eliminate codon pairs that are predicted to cause a translational pause in P. pastoris.
  • Figure 36A depicts a graphical display of the K. lactis expression of the native nucleic acid sequence encoding the LCC protein.
  • Figure 36B depicts a graphical display of the K. lactis expression of a nucleic acid sequence encoding the LCC which has been modified to eliminate codon pairs that are predicted to cause a translational pause in K. lactis.
  • Figure 37A depicts a graphical display of the Z. mobilis expression of the native nucleic acid sequence encoding the LCC protein.
  • Figure 37B depicts a graphical display of the Z. mobilis expression of a nucleic acid sequence encoding the LCC which has been modified to eliminate codon pairs that are predicted to cause a translational pause in Z mobilis.
  • Figure 38 depicts a graphical display of z scores of translational kinetics values for codon pair utilization in T. Reesei of nucleic acid sequences encoding the cellobiohydrolase-I enzyme of T. Reesei (TrCBH-I), plotted as a function of codon pair position.
  • Figures 39-43 depict effects of Translational EngineeringTM on protein expression levels.
  • Each of Figures 39-43 depict graphical displays of z scores of translational kinetics values for codon pair utilization of nucleic acid sequences encoding TrCBH-II, plotted as a function of codon pair position.
  • Figure 39A depicts a graphical display of the S. cerevisiae expression of the native nucleic acid sequence encoding the TrCBH-I protein.
  • Figure 39B depicts a graphical display of the S. cerevisiae expression of a nucleic acid sequence encoding the TrCBH-I which has been modified to eliminate codon pairs that are predicted to cause a translational pause in S. cerevisiae.
  • Figure 4OA depicts a graphical display of the E. coli expression of the native nucleic acid sequence encoding the TrCBH-I protein.
  • Figure 4OB depicts a graphical display of the E. coli expression of a nucleic acid sequence encoding the TrCBH-I which has been modified to eliminate codon pairs that are predicted to cause a translational pause in E. coli.
  • Figure 41 A depicts a graphical display of the P. pastoris expression of the native nucleic acid sequence encoding the TrCBH-I protein.
  • Figure 41 B depicts a graphical display of the P. pastoris expression of a nucleic acid sequence encoding the TrCBH-I which has been modified to eliminate codon pairs that are predicted to cause a translational pause in P. pastoris.
  • Figure 42A depicts a graphical display of the K. lactis expression of the native nucleic acid sequence encoding the TrCBH-I protein.
  • Figure 42B depicts a graphical display of the K. lactis expression of a nucleic acid sequence encoding the TrCBH-I which has been modified to eliminate codon pairs that are predicted to cause a translational pause in K. lactis.
  • Figure 43 A depicts a graphical display of the Z. mobilis expression of the native nucleic acid sequence encoding the TrCBH-I protein.
  • Figure 43B depicts a graphical display of the Z. mobilis expression of a nucleic acid sequence encoding the TrCBH-I which has been modified to eliminate codon pairs that are predicted to cause a translational pause in Z. mobilis.
  • Figures 44-48 depict effects of Translational eEngineeringTM on protein expression levels.
  • Each of Figures 1-3 depict graphical displays of z scores of translational kinetics values for codon pair utililization of nucleic acid sequences encoding the endoglucanase enzyme of T. aurantiacus (EGl), plotted as a function of codon pair position.
  • EGl T. aurantiacus
  • Figure 44A depicts a graphical display of the S. cerevisiae expression of the native nucleic acid sequence encoding the EGl protein.
  • Figure 44B depicts a graphical display of the S. cerevisiae expression of a nucleic acid sequence encoding the EGl which has been modified to eliminate codon pairs that are predicted to cause a translational pause in S. cerevisiae.
  • Figure 45A depicts a graphical display of the E. coli expression of the native nucleic acid sequence encoding the EGl protein.
  • Figure 45B depicts a graphical display of the E. coli expression of a nucleic acid sequence encoding the EGl which has been modified to eliminate codon pairs that are predicted to cause a translational pause in E. coli.
  • Figure 46A depicts a graphical display of the P. pastoris expression of the native nucleic acid sequence encoding the EGl protein.
  • Figure 46B depicts a graphical display of the P. pastoris expression of a nucleic acid sequence encoding the EGl which has been modified to eliminate codon pairs that are predicted to cause a translational pause in P. pastoris.
  • Figure 47A depicts a graphical display of the K. lactis expression of the native nucleic acid sequence encoding the EGl protein.
  • Figure 47B depicts a graphical display of the K. lactis expression of a nucleic acid sequence encoding the EGl which has been modified to eliminate codon pairs that are predicted to cause a translational pause in K. lactis.
  • Figure 48A depicts a graphical display of the Z. mobilis expression of the native nucleic acid sequence encoding the EGl protein.
  • Figure 48B depicts a graphical display of the Z. mobilis expression of a nucleic acid sequence encoding the EGl which has been modified to eliminate codon pairs that are predicted to cause a translational pause in Z. mobilis.
  • Figures 49-53 depict effects of Translational eEngineeringTM on protein expression levels.
  • Each of Figures 1-3 depict graphical displays of z scores of translational kinetics values for codon pair utililization of nucleic acid sequences encoding the xylanase enzyme of T. lanuginosis (XynA), plotted as a function of codon pair position.
  • XynA T. lanuginosis
  • Figure 49A depicts a graphical display of the S. cerevisiae expression of the native nucleic acid sequence encoding the XynA protein.
  • Figure 49B depicts a graphical display of the S. cerevisiae expression of a nucleic acid sequence encoding the XynA which has been modified to eliminate codon pairs that are predicted to cause a translational pause in S. cerevisiae.
  • Figure 5OA depicts a graphical display of the E. coli expression of the native nucleic acid sequence encoding the XynA protein.
  • Figure 50B depicts a graphical display of the E. coli expression of a nucleic acid sequence encoding the XynA which has been modified to eliminate codon pairs that are predicted to cause a translational pause in E. coli.
  • Figure 51 A depicts a graphical display of the P. pastoris expression of the native nucleic acid sequence encoding the XynA protein.
  • Figure 51B depicts a graphical display of the P. pastoris expression of a nucleic acid sequence encoding the XynA which has been modified to eliminate codon pairs that are predicted to cause a translational pause in P. pastoris.
  • Figure 52A depicts a graphical display of the K. lactis expression of the native nucleic acid sequence encoding the XynA protein.
  • Figure 52B depicts a graphical display of the K. lactis expression of a nucleic acid sequence encoding the XynA which has been modified to eliminate codon pairs that are predicted to cause a translational pause in K. lactis.
  • Figure 53A depicts a graphical display of the Z. mobilis expression of the native nucleic acid sequence encoding the XynA protein.
  • Figure 53B depicts a graphical display of the Z. mobilis expression of a nucleic acid sequence encoding the XynA which has been modified to eliminate codon pairs that are predicted to cause a translational pause in Z. mobilis.
  • Figure 54A depicts a graphical display of the Z. mobilis expression of the native nucleic acid sequence encoding the XynA protein.
  • Figure 54B depicts a graphical display of the Z. mobilis expression of a nucleic acid sequence encoding the XynA which has been modified to eliminate codon pairs that are predicted to cause a translational pause in Z. mobilis.
  • Biomass is the earth's most attractive alternative among fuel sources and most sustainable energy resource and is reproduced by the bioconversion of carbon dioxide.
  • Ethanol produced from biomass is today the most widely used biofuel when blended with gasoline.
  • the use of biofuels can significantly reduce the accumulation of greenhouse gas.
  • Ethanol is just one example of the uses of biomass harvesting using industrial enzymes. The technologies associated with biomass harvesting are similarly applicable in the production of other biofuels, fine chemicals as well as other diverse applications.
  • a variety of highly specialized microorganisms have evolved to produce enzymes that either synergistically or in complexes can carry out the complete hydrolysis of cellulose.
  • the anaerobic bacteria Clostridium thermocellum and Clostridium cellulovorans and the filamentous fungus Trichoderma reesei are known as cellulolytic and xylanolytic microorganisms.
  • the bacteria C. thermocellum and C cellulovorans produce a cellulosome complex consisting of cellulase and hemicellulase organized on the cell surface (Doi and Tamaru (2001) Chem. Rec. 1 :24-32; Shoham et al. (1999) Trends Microbiol. 7:275-281).
  • T. reesei three types of cellulolytic enzyme are extracellularly secreted, including five endoglucanases (EG [EC 3.2.1.4]) (Okada et al (1998) Appl. Environ. Microbiol. 64:555- 563), two cellobiohydrolases (CBH [EC 3.2.1.91]) (Henrissat et al. (1985) Bio/Technology 3:722-726; Teeri et al. (1987) Gene 51 :43-52), and two ⁇ -glucosidases (BGL [EC 3.2.1.21]) (Chen et al. (1992) Biochim. Biophys.
  • EG [EC 3.2.1.4] endoglucanases
  • CBH [EC 3.2.1.91] two cellobiohydrolases
  • BGL [EC 3.2.1.21] two ⁇ -glucosidases
  • Endoglucanases act randomly against the amorphous region of the cellulose chain to produce reducing and nonreducing ends for cellobiohydrolases, which produce cellobiose from reducing or nonreducing ends of crystalline cellulose.
  • Exoglucanase enzymes including CBH-I and CBH-II, liberate the disaccharide D-cellobiose from 1 ,4- ⁇ -glucans.
  • Cellulose chains are thus efficiently degraded to soluble cellobiose and cellooligosaccharides by the endo-exo synergism of EG and CBH (Henrissat et al. (1985) Bio/Technology 3:722-726).
  • the predominant polysaccharide in the primary cell wall of biomass is cellulose, the second most abundant is hemi-cellulose, and the third is pectin.
  • the secondary cell wall produced after the cell has stopped growing, also contains polysaccharides and is strengthened through polymeric lignin covalently cross-linked to hemicellulose.
  • Cellulose is a homopolymer of anhydrocellobiose and thus a linear ⁇ -(l- 4)-D-glucan, while hemicelluloses include a variety of compounds, such as xylans, xyloglucans, arabinoxylans, and mannans in complex branched structures with a spectrum of substituents.
  • cellulose is found in plant tissue primarily as an insoluble crystalline matrix of parallel glucan chains. Hemicelluloses usually hydrogen bond to cellulose, as well as to other hemicelluloses, which helps stabilize the cell wall matrix.
  • DNA constructs encoding cellulase enzymes are known in the art.
  • U.S. Patent No. 5,686,593 relates to cellulose- or hemicellulose-degrading enzymes that are derivable from a fungus other than Trichoderma or Phanerochaete, and which comprise a carbohydrate binding domain homologous to a terminal A region of T. reesei cellulases.
  • Lignocellulosic biomass is composed predominantly of cellulose, hemicellulose, and lignin.
  • Lignin is a complex, highly cross-linked polyphenolic heteropolymer, and is naturally resistant to chemical and biologic conversion.
  • An economical biomass-to-ethanol process critically depends on the rapid and efficient conversion of all of the sugars present in both its cellulose and hemicellulose fractions.
  • lignin Although cellulose and hemicellulose are readily degraded by fungal and bacterial pathways, lignin is extremely recalcitrant. Furthermore, because of its cross-linking with the other cell wall components, lignin minimizes the accessibility of cellulose and hemicellulose to microbial enzymes. Hence, lignin is generally associated with reduced digestibility of the overall plant biomass.
  • White rot fungi are believed to be the most effective lignin-degrading microbes in nature. These white-rot fungi secrete one or more of three extracellular enzymes that are essential for lignin degradation. They are often referred to as lignin-modifying enzymes or LMEs.
  • the three enzymes comprise two glycosylated heme-containing peroxidases: lignin peroxidase (LIP); Mn-dependent peroxidase (MNP); and, a copper-containing phenoloxidase Laccase (LCC).
  • LIP lignin peroxidase
  • MNP Mn-dependent peroxidase
  • LCC copper-containing phenoloxidase Laccase
  • Laccases are copper containing oxidase enzymes that are found in many plants, fungi and microorganisms. Laccases are enzymatically active on phenols and similar molecules and perform a one electron oxidation. Laccases can be polymeric and the enzymatically active form can be a dimer or trimer.
  • Mn-dependent peroxidase The enzymatic activity of Mn-dependent peroxidase (MnP) in is dependent on Mn 2+ . Without being bound by theory, it has been suggested that the main role of this enzyme is to oxidize Mn 2+ to Mn 3+ (Glenn et al. (1986) Arch. Biochem. Biophys. 251 :688-696). Subsequently, phenolic substrates are oxidized by the Mn 3+ generated.
  • Lignin peroxidase is an extracellular heme that catalyses the oxidative depolymerization of dilute solutions of polymeric lignin in vitro.
  • Some of the substrates of LiP most notably 3,4-dimethoxybenzyl alcohol (veratryl alcohol, VA), are active redox compounds that have been shown to act as redox mediators.
  • VA is a secondary metabolite produced at the same time as LiP by ligninolytic cultures of P.
  • hydrolysis enzymes do not express well in host organisms such as E. coli or S. cerevisiae. Accordingly, provided herein are hydroysis enzyme-encoding nucleotide sequences and methods of making the same for improved expression of hydrolysis enzymes.
  • Some translational pauses are resultant from the presence of particular codon pairs in the nucleotide sequence encoding the polypeptide to be translated. As provided herein, inappropriate or excessive translation pauses can reduce protein expression considerably. Further, the translational pausing properties of codon pairs vary from organism to organism. As a result, exogenous expression of genes foreign to the expression organism can lead to inefficient translation. Even when the gene is translated in a sufficiently efficient manner that recoverable quantities of the translation product are produced, the protein is often inactive, insoluble, aggregated, or otherwise different in properties from the native protein. Thus, removing inappropriate or excessive translation pauses can improve protein expression.
  • the pause(s) can serve to facilitate proper polypeptide folding, post-translational modification, re-organization/folding at protein domain boundaries, or other steps toward arriving at the native, active wild type protein. Accordingly, in some embodiments provided herein, one or more pauses that are predicted to be present in native translation of hydrolysis enzymes is/are preserved in a modified hydrolysis-encoding polynucleotide provided in accordance with the teachings herein.
  • a codon pair in the modified hydrolysis enzyme-encoding polynucleotide can be selected to have a predicted translational kinetics value that is at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, or 99% that of the native codon pair whose predicted pause is to be preserved; further, the codon pair in the modified hydrolysis enzyme -encoding polynucleotide can be selected to be located within 30, 20, 10, 9, 8, 7, 6, 5, 4, 3, 2 or 1 codons of the native codon pair whose predicted pause is to be preserved.
  • Translation EngineeringTM refers to a process used to modify the translational kinetics of a polypeptide-encoding nucleic sequence.
  • Translation EngineeringTM can be applied to modify the translational kinetics of a polypeptide-encoding nucleic sequence when expressed in its native organism.
  • Translation EngineeringTM can be applied to modify the translational kinetics of a polypeptide-encoding nucleic sequence when expressed in its native organism.
  • this process alters the polypeptide-encoding nucleic sequence to optimize codon usage and codon pair optimization in the organism in which the polypeptide-encoding nucleic sequence is expressed.
  • sequence modifications can be made to place or prevent restriction sites in the sequence, eliminate strong RNA secondary structures and avoid inadvertent Shine-Delgarno sequences.
  • Translation EngineeringTM involves modifying the translational kinetics of a polypeptide-encoding nucleic sequence by removing, preserving, and/or inserting translational pauses into the polypeptide-encoding nucleic sequence.
  • hydrolysis enzyme -encoding nucleotide sequences with refined translational kinetics and methods of making same.
  • a hydrolysis enzyme -encoding DNA sequence wherein the encoded sequence has amino acid sequence identity with wild-type hydrolysis enzyme, and wherein predicted translation pauses in the expression organism have been removed or reduced by replacing input-sequence codon pairs with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the resultant hydrolysis enzyme -encoding nucleotide is predicted to be translated rapidly along its entire length.
  • expression of the resultant hydrolysis enzyme -encoding nucleotide is predicted to result in improved protein expression levels in cases where inappropriate or excessive translation pauses reduce protein expression.
  • expression of the resultant hydrolysis enzyme -encoding nucleotide is predicted to result in improved levels of active and/or natively folded polypeptide expression in cases where inappropriate or excessive translation pauses causes expression of inactive, insoluble or aggregated hydrolysis enzyme .
  • expression of the resultant hydrolysis enzyme -encoding nucleotide is predicted to result in improved levels of active and/or natively folded polypeptide expression in cases where one or more predicted pauses are preserved from the native expression profile or are added to preserve expression of active and/or soluble hydrolysis enzyme .
  • the hydrolysis enzyme -encoding nucleotide sequences provided herein allow for one or more of the following results: higher expression levels; higher enzymatic activity; greater protein stability and resistance to degradation; and increased solubility.
  • hydrolysis enzyme refers to the enzymes encoded by the nucleotide sequences provided herein, and includes cellobiohydrolase-II, laccase, lignin peroxidase, Mn-dependent peroxidase, cellobiohydrolase-I, endoglucanase and xylanase enzymes.
  • nucleic acid sequences encoding the cellobiohydrolase-II enzyme of T. Reesei are provided.
  • the nucleotide sequences provided herein include the native sequence from T. Reesei shown in the sequence listing (SEQ ID NO: 1) which encodes the TrCBH-II amino acid sequence (SEQ ID NO: 2).
  • nucleic acid sequences encoding the laccase enzyme of P. sanguineus are provided.
  • the nucleotide sequences provided herein include the native sequence from P. sanguineus shown in the sequence listing (SEQ ID NO: 25) which encodes the LCC amino acid sequence (SEQ ID NO: 26).
  • nucleic acid sequences encoding the lignin peroxidase enzyme of T. versicolor are provided.
  • the nucleotide sequences provided herein include the native sequence from T. versicolor shown in the sequence listing (SEQ ID NO: 49) which encodes the LIP amino acid sequence (SEQ ID NO: 50).
  • nucleic acid sequences encoding the Mn-dependent peroxidase enzyme of T. versicolor (MnP) are provided.
  • the nucleotide sequences provided herein include the native sequence from T. versicolor shown in the sequence listing (SEQ ID NO: 73) which encodes the MnP amino acid sequence (SEQ ID NO: 74).
  • nucleic acid sequences encoding the laccase enzyme of N. crassa are provided.
  • the nucleotide sequences provided herein include the native sequence from N. crassa shown in the sequence listing (SEQ ID NO: 1) which encodes the LCC amino acid sequence (SEQ ID NO: 98).
  • nucleic acid sequences encoding the laccase enzyme of P. cinnabarinus are provided.
  • the nucleotide sequences provided herein include the native sequence from P. cinnabarinus shown in the sequence listing (SEQ ID NO: 121) which encodes the LCC amino acid sequence (SEQ ID NO: 122).
  • nucleic acid sequences encoding the laccase enzyme of P. coccineus are provided.
  • the nucleotide sequences provided herein include the native sequence from P. coccineus shown in the sequence listing (SEQ ID NO: 145) which encodes the LCC amino acid sequence (SEQ ID NO: 146).
  • nucleic acid sequences encoding the cellobiohydrolase-I enzyme of T. Reesei are provided.
  • the nucleotide sequences provided herein include the native sequence from T. Reesei shown in the sequence listing (SEQ ID NO: 169) which encodes the TrCBH-I amino acid sequence (SEQ ID NO: 170).
  • nucleic acid sequences encoding the endoglucanase enzyme of T. aurantiacus are provided.
  • the nucleotide sequences provided herein include the native sequence from P. coccineus shown in the sequence listing (SEQ ID NO: 181) which encodes the LCC amino acid sequence (SEQ ID NO: 182).
  • nucleic acid sequences encoding the xylanase enzyme of T. lanuginosus are provided.
  • the nucleotide sequences provided herein include the native sequence from P. coccineus shown in the sequence listing (SEQ ID NO: 193) which encodes the LCC amino acid sequence (SEQ ID NO: 194).
  • nucleic acid sequences encoding hydrolysis enzymes with refined translational kinetics for expression in S. cerevisiae (SEQ ID NOS: 3, 27, 51, 75, 99, 123, 147, 171, 183 and 195), E. coli (SEQ ID NOS: 9, 33, 57, 81, 105, 129, 153, 173, 185 and 197), P. pastoris (SEQ ID NOS: 15, 39, 63, 87, 1 1 1 , 135, 159, 175, 187 and 199), K. lactis (SEQ ID NOS: 21 , 45, 69, 93, 1 17, 141, 165, 177, 189 and 201.
  • nucleotide sequences may be added 3' or 5' of any nucleic acid, for example, to facilitate hybridization of PCR primers, to add cloning restriction sites or other sites that facilitate cloning and/or expression. Accordingly, provided in the sequence listing are nucleic acid sequences with additional 5' and 3' cloning and/or PCR sequences, and which encode hydrolysis enzymes with refined translational kinetics for expression in S.
  • hydrolysis enzyme amino acid sequences encoded by the nucleotide sequences with refined translational kinetics described herein are hydrolysis enzyme amino acid sequences encoded by the nucleotide sequences with refined translational kinetics described herein.
  • hydrolysis enzyme nucleic acid sequences with refined translational kinetics SEQ ID NOS: 3, 5, 7, 9, 1 1, 13, 15, 17, 19, 21, 23, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 75, 77, 79, 81, 83, 85, 87, 89, 91 , 93, 95, 99, 101, 103, 105, 107, 109, 1 1 1, 113, 115, 117, 119, 123, 125, 127, 129, 131, 133, 135, 137, 139, 141 , 143, 147, 149, 151, 153, 155, 157
  • hydrolysis enzyme-encoding DNA sequences wherein the encoded sequence has amino acid sequence identity with an original hydrolysis enzyme polypeptide and is adapted for expression in a heterologous host organism, wherein at least three codon pairs of the original sequence that are predicted to cause a translational pause in the host organism have been replaced with codon pairs that are predicted to be less likely to cause a translational pause therein.
  • the at least three codon pairs of the original sequence that are predicted to cause a translational pause in the host organism are highly-overrepresented codon pairs therein and have been replaced with codon pairs that are not highly- overrepresented therein.
  • the host organism is not human, E. coli or S. cerevisiae.
  • a laccase nucleotide sequences encodes a polypeptide having laccase activity.
  • Laccase and like terms refers to the enzymes involved in the oxidative depolymerization of lignin.
  • a method for measuring laccase activity is exemplified by a known method in which an enzymatic reaction is carried out using 2,6- dimethoxyphenol (DMP) as a substrate and 2,2',6,6'-demethoxydiphenoquinone absorbance at 468nm is monitored by spectrophotometry, as described in de Jong et al. ((1992) Mycol. Res. 96:1098-1 104), hereby incorporated by reference in its entirety.
  • DMP 2,6- dimethoxyphenol
  • a cellobiohydrolase nucleotide sequences encodes a polypeptide having cellobiohydrolase activity.
  • Cellobiohydrolase, exoglucanase, exo- 1 ,4- ⁇ -D-glucanase and like terms refers to the enzymatic hydrolysis of a glucoside bond in a polysaccharide or an oligosaccharide containing D-glucose subunits bonded through ⁇ -1 ,4 bonds, to release cellobiose, a disaccharide in which D-glucose is bonded through a ⁇ -1,4 bond.
  • a method for measuring the cellobiohydrolase activity is exemplified by a known method in which an enzymatic reaction is carried out using phosphoric acid- swollen cellulose as a substrate and the existence of cellobiose in the reaction is confirmed by thin-layer silica gel chromatography, as described in U.S. Patent No. 6,566,113, hereby incorporated by reference in its entirety.
  • a lignin peroxidase nucleotide sequences encodes a polypeptide having lignin peroxidase activity.
  • Lignin peroxidase, diarylpropane peroxidase, ligninase and like terms refers to the enzymes involved in the oxidative depolymerization of lignin.
  • a method for measuring lignin peroxidase activity is exemplified by a known method in which an enzymatic reaction is carried out and veratryl alcohol absorbance at 310 nm is monitored by spectrophotometry, as described by Linko and Haapala. ((1993) Biotechnol. Techniques. 7:75-80), hereby incorporated by reference in its entirety.
  • Mn-dependent peroxidase nucleotide sequences encodes a polypeptide having Mn-dependent peroxidase activity.
  • Mn-dependent peroxidase and like terms refers to the enzymes involved in the oxidative depolymerization of lignin.
  • a method for measuring Mn-dependent peroxidase activity is exemplified by a known method in which an enzymatic reaction is carried out and production of oxidized 3-methyl-2-benzothiazolinone hydrazone hydrachloride (MBTH) plus 3-dimethylaminobenzoic acid (DMAB) absorbance at 590 nm is monitored by spectrophotometry, as described in Daniel et al.
  • an endoglucanase nucleotide sequence encodes an endo-l,4- ⁇ -glucanase polypeptide having endo-l,4- ⁇ -glucanase activity.
  • Endoglucanase and like terms refer to the enzymes involved in the enzymatic hydrolysis of a glucoside bond in a polysaccharide or an oligosaccharide containing D-glucose subunits bonded through ⁇ -1,4 bonds, to release cellobiose, a disaccharide in which D-glucose is bonded through a ⁇ -1 ,4 bond.
  • Endoglucanases act randomly against the amorphous region of the cellulose chain to produce reducing and nonreducing ends for cellobiohydrolases, which produce cellobiose from reducing or nonreducing ends of crystalline cellulose.
  • a xylanase nucleotide sequence encodes a xylanase polypeptide having xylanase activity.
  • Xylanase and like terms refer to a class of enzymes which degrade the linear polysaccharide beta-l,4-xylan into xylose, thus breaking down hemi cellulose, which is a major component of the cell wall of plants.
  • polypeptides provided herein encode polypeptides that have hydrolysis activity.
  • a hydrolysis enzyme-encoding polynucleotide comprising any of the DNA sequences provided herein can be transcribed and the resulting RNA translated to produce a polypeptide with hydrolysis enzyme activity.
  • nucleotide sequence is used to refer to any polynucleotide sequence.
  • DNA sequence is used herein to refer to the nucleotide sequences presented herein.
  • RNA equivalent nucleotide sequences are also described by DNA sequences presented herein.
  • an equivalent RNA sequence can be substituted for a DNA sequecne by a T to U substitution, (i.e., replacing thymine in the DNA sequence with uracil in the RNA sequence).
  • the hydrolysis enzyme-encoding DNA sequence is adapted for expression in a heterologous host organism.
  • a DNA sequence that has been adapted for expression is a DNA sequence that has been inserted into an expression vector or otherwise modified to contain regulatory elements necessary for expression of the DNA in the host cell, positioned in such a manner as to permit expression of the DNA in the host cell.
  • regulatory elements required for expression include promoter sequences, transcription initiation sequences and, optionally, enhancer sequences.
  • a DNA sequence may be inserted into a plasmid vector adapted for expression in a bacterial cell, such as E. coli, or a eukaryotic cell, such as S. cerevisiae or other yeast, or any other host organism.
  • a heterologous host organism is an organism used to express DNA, RNA or protein that is foreign to the host organism. In certain aspects, the host organism is not human, E. coli or S. cerevisiae.
  • polynucleotides provided herein also encode polypeptides that have other lignin-metabolizing activities such as a lignin peroxidase and a Mn-dependent peroxidase activity.
  • translational kinetics of an mRNA into polypeptide can be changed in order to achieve any of a variety of expression profiles. For example, translational kinetics of an mRNA into polypeptide can be changed in order to remove some or all translational pauses. In another example, translational kinetics of an mRNA into polypeptide can be changed in order to replace some or all translational pauses predicted to occur within an autonomous folding unit of a nascent protein. In another example, translational kinetics of an mRNA into polypeptide can be changed in order to replace some or all over-represented codon pairs.
  • a pause or translation slowing codon pair can queue ribosomes back to the beginning of the coding sequence, thereby inhibiting further ribosome attachment to the message which can result in down- regulation of protein expression levels as the rate of translation initiation readily saturates and the slowest translation step time becomes rate limiting. It is also proposed herein that the presence of a pause or translational slowing codon pair can stall or detach a ribosome. It is also proposed herein that the presence of a pause or translational slowing codon pair can expose naked mRNA, which is then subject to message degradation.
  • Organism-specific codon usage and codon pair usage, and the presence of organism-specific pause sites result in gene translation that is highly adapted to the original host organism.
  • ribosomal pausing sites that may be functional in a human cell will typically be scrambled, random, or not appropriate or not recognized in the proper context in a bacterium or other non-native host.
  • a heterologous cDNA or synthetic polynucleotide has a random but high probability of inadvertently encoding a pause site somewhere, often leading to protein expression and/or activity failure.
  • Methods for refining translational kinetics of an mRNA into polypeptide can be performed according to any method known in the art, as exemplified in U.S. Patent Publication No. 2008/0046192, published on February 21, 2008, which is incorporated by reference herein in its entirety.
  • a polypeptide-encoding nucleotide can be designed to be predicted to be translated rapidly along its entire length.
  • some polypeptide-encoding nucleotides provided herein are those that have been engineered to remove all predicted pauses. Expression of such a polypeptide-encoding nucleotide can result in improved protein expression levels and improved levels of active and/or natively folded polypeptide expression.
  • a test of translation pausing or slowing as a result of codon pair usage can be performed by comparing a series of genes that have random pauses with modified genes where codon pairs predicted to cause translational pauses are replaced. Unmodified genes moved from their source organism and expressed in a heterologous host can have an altered set of codon pairs predicted to cause a translational pause or ribosomal slowing (e.g., an altered set of over-represented codon pairs), resulting in altered configuration and location of presumed pause sites.
  • translational kinetics of an mRNA into hydrolysis enzyme-encoding polypeptide can be changed in order to remove some or all translational pauses or replace other codon pairs that cause translational slowing, message instability and degradation, and poor protein translation, expression, and functional properties. While not intending to be limited to the following, it is believed that, for at least some proteins, reduction or elimination of translational pauses can serve to increase the expression level and/or quality and characteristics of the protein. Accordingly, by removing some or all translational pauses or replacing other codon pairs that cause translational slowing, the expression levels and/or quality of an expressed protein can be increased.
  • hydrolysis enzyme-encoding nucleotide sequences provided herein allow for one or more of the following results: higher expression levels, higher enzymatic activity, greater protein stability, resistance to degradation, and increased solubility compared to the original native gene when expressed in a heterologous host.
  • hydrolysis enzyme -encoding nucleotide sequences that have been modified to have one or more transcriptional pauses or slowing sites removed by modifying one or more codon pairs to a corresponding codon pair that is less likely to cause a translational pause or slowing. While in some embodiments it is preferred to replace all codon pairs predicted to cause a translational pause or slowing, in other embodiments, it is sufficient to replace a subset of codon pairs predicted to cause a translational pause or slowing. For example, expression levels can be increased by replacing at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 or more codon pairs predicted to cause a translational pause or slowing.
  • At least 10%, 20%, 30%, 40%, 50%, 60%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98% or 99% of codon pairs predicted to cause a translational pause or slowing are replaced by, for example, substituting different codon pairs that encode the same amino acids.
  • translational kinetics of an mRNA into polypeptide can be changed in order to remove some or all translational pauses predicted to occur within an autonomous folding unit of a protein.
  • an autonomous folding unit of a protein refers to an element of the overall protein structure that is self- stabilizing and often folds independently of the rest of the protein chain. Such autonomous folding units typically correspond to a protein domain.
  • expression of a gene in a heterologous host organism can result in translational pauses located in regions that inhibit protein expression and/or protein folding.
  • preserving or inserting a translational pause in a region predicted to separate autonomous folding units of a protein can result in improved folding and/or solubility of expressed proteins.
  • methods of changing translational kinetics of an mRNA into polypeptide by preserving, relative to native, or inserting one or more translational pauses in one or more regions predicted to separate autonomous folding units of a protein, thereby increasing improving the folding and/or solubility of the expressed protein.
  • one step can include identifying predicted autonomous folding units of a protein.
  • Methods for identifying predicted autonomous folding units of a protein or protein domains are known in the art, and include alignment of amino acid sequences with protein sequences having known structures, and threading amino acid sequences against template protein domain databases. Such methods can employ any of a variety of software algorithms in searching any of a variety of databases known in the art for predicting the location of protein domains. The results of such methods will typically include an identification of the amino acids predicted to be present in a particular domain, and also can include an identification of the domain itself, and an identification of the secondary structural element, if any, in which each amino acid sequence of a domain is located.
  • the polypeptide- encoding nucleotide sequence it is not possible to modify the polypeptide- encoding nucleotide sequence to remove a translational pause not present in the expression profile of the polypeptide in the native host organism. For example, there may be no codon pairs that are not predicted to cause a translational pause or slowing and that encode a corresponding pair of amino acids. In such instances, several options are available: the codon pair that is least likely to cause a translational pause or slowing can be selected; an amino acid insertion, deletion or mutation can be introduced to yield a codon pair that is not predicted to cause a translational pause or slowing; or no change is made.
  • One option in a computational method is to request human input in order to resolve the issue.
  • the computational method may, for example, involve the use of a computer that is programmed to request human input.
  • the computer may be programmed to make a selection, or combination of selections, such that multiple genes, or Ordered Gene Sets or small permutation libraries are designed and synthetically produced for use in expression analysis.
  • an amino acid insertion, deletion or mutation is made in order to change translational kinetics, it is preferable to select a change that is predicted not to substantially influence the final three-dimensional structure of the protein and/or the activity of the protein.
  • Such an amino acid insertion, deletion or mutation can include, for example, a conservative amino acid substitution such as the conservative substitutions shown in Table 1.
  • the substitutions shown are based on amino acid physical-chemical properties, and as such, are independent of organism.
  • the conservative amino acid substitution is a substitution listed under the heading of exemplary substitutions.
  • codon pairs predicted to cause a translational pause or slowing are treated equally
  • one or more different threshold levels can be established for differential treatment of codon pairs, where codon pairs above a highest threshold are the codon pairs most likely to cause a translational pause or slowing, and succeedingly lower codon pair threshold-based groups correspond to succeedingly lower likelihoods of the respective codon pairs causing a translational pause or slowing.
  • codon pairs above a highest threshold are the codon pairs most likely to cause a translational pause or slowing
  • succeedingly lower codon pair threshold-based groups correspond to succeedingly lower likelihoods of the respective codon pairs causing a translational pause or slowing.
  • different numbers or percentages of codon pairs can be replaced for each of these different threshold-based groups. For example, 95% or more codon pairs above a highest threshold level can be replaced, while 90% or less of all codon pairs between that level and an intermediate threshold level are replaced.
  • codon pairs likely to cause a translational pause or slowing can be segregated into two or more different threshold- based groups, three or more different threshold-based groups, four or more different threshold-based groups, five or more different threshold-based groups, six or more different threshold-based groups, or more. Discussion of specific thresholds are provided elsewhere herein; however, typically the higher the threshold, the higher the likelihood of a translational pause or slowing caused by a codon pair with a translational kinetics value greater than the threshold. In embodiments in which codon pairs likely to cause a translational pause or slowing can be segregated into two or more different threshold- based groups, different numbers or percentages of codon pairs can be replaced for each codon pair group.
  • codon pairs above a highest threshold are replaced, while the same or a lower percentage of codon pairs are replaced from codon pair groups corresponding to one or more lower thresholds.
  • the same or a lower percentage of codon pairs are replaced.
  • all codon pairs above a highest threshold are replaced, while a codon pair above an intermediate threshold is replaced only if the codon pair is located within an autonomous folding unit.
  • all codon pairs above a highest threshold are replaced, while a codon pair above an intermediate threshold is replaced only if the codon pair can be replaced without requiring a change in the encoded polypeptide sequence.
  • all codon pairs above a highest threshold are replaced, while a codon pair above a first higher intermediate threshold is replaced only if the codon pair can be replaced without changing the encoded polypeptide sequence or with only a conservative change to the encoded polypeptide sequence, while a codon pair above a second lower intermediate threshold is replaced only if the codon pair can be replaced without requiring any change in the encoded polypeptide sequence.
  • an evaluation method can be used that determines the degree to which a codon pair should be replaced according to the translational kinetics value of the codon pair, where the degree to which the codon pair should be replaced can be counterbalanced by any of a variety of user-determined factors such as, for example, presence of the codon pair within or between autonomous folding units, and degree of change to the encoded polypeptide sequence.
  • a translational kinetics value of a codon pair is a representation of the degree to which it is expected that a codon pair is associated with a translational pause. Methods of determining the translational kinetics value of a codon pair are discussed elsewhere herein. Such translational kinetics values can be normalized to facilitate comparison of translational kinetics values between species. In some embodiments, the translational value can be the degree of over-representation of a codon pair. An over-represented codon pair is a codon pair which is present in a protein-encoding sequence in higher abundance than would be expected if all codon pairs were statistically randomly abundant.
  • a codon pair predicted to cause a translational pause or slowing is a codon pair whose likelihood of causing a translational pause or slowing is at least one standard deviation above the mean translational kinetics value, where a particular translational kinetics value above the mean translational kinetics value in this context refers to a translational kinetics value indicative of a greater likelihood of causing translational pausing or slowing, relative to a mean translational kinetics value, and is not strictly limited to a particular mathematical relationship (e.g., greater than the mean) since the depiction of propensity to cause a translational pause by a translational kinetics value can be selected to be negative or positive, based on the selected implementation by one skilled in the art.
  • over-represented codon pairs may be graphically displayed as a positive function in a SpeedPlotTM, as depicted in Figure 1, where a positive deflection or peak above a selected threshold describes a translational pause or slowing at the exact nucleotide location as defined by the abscissa.
  • a threshold for the translational kinetics value of codon pairs that are predicted to cause a translational pause or slowing can be set in accordance with the method and level of stringency desired by one skilled in the art.
  • a threshold value can be set to 5, or 3, or 2, or 1.5 standard deviations or more above the mean.
  • Typical threshold values can be at least 1, 1.25, 1.5, 1.75, 2, 2.25, 2.5, 3, 3.5, 4, 4.5 and 5 or more standard deviations above the mean.
  • a plurality of thresholds can be applied in the herein-provided methods in segregating codon pairs into a plurality of groups. Each threshold of such a plurality can be a different value selected from 1 , 1.25, 1.5, 1.75, 2, 2.25, 2.5, 3, 3.5, 4, 4.5 and 5 or more standard deviations above the mean.
  • translational kinetics of an mRNA into polypeptide can be changed to add or retain one or more translational pauses predicted to occur before, after or within an autonomous folding unit of a protein, or between autonomous folding units. While not intending to be limited to the following, it is proposed that translational pauses are present in wild type genes in order to slow translation of a nascent polypeptide subsequent to translation of a protein domain, thus providing time for acquisition of secondary and at least partial tertiary structure in the domain prior to further downstream translation and reorganization or reconfiguration of the growing polypeptide or domain. By modifying the translational kinetics of complex multi-domain proteins it may be possible to experimentally alter the time each domain has available to organize.
  • Folding of a heterologously-expressed gene having two or more independent domains can be altered by the presence of pause sites between the domains. Refolding studies indicate that the time it takes for a protein to settle into its final configuration may take longer than the translation of the protein. Pausing may allow each domain to partially organize and commit to a particular, independent fold. Other co- translational events, such as those associated with co-factors, protein subunits, protein complexes, membranes, chaperones, secretion, or proteolysis complexes, also can depend on the kinetics of the emerging nascent polypeptide. Pauses can be introduced by engineering one codon pair predicted to cause a translational pause or slowing, or two or more such codon pairs into the sequence to facilitate these co-translational interactions.
  • typically a translational pause is preserved, which refers to maintaining the same codon pair for a polypeptide-encoding nucleotide sequence that is expressed in the native host organism, or, when the polypeptide-encoding nucleotide sequence is heterologously expressed, changing the codon pair as appropriate to have a translational kinetics value comparable to or closest to the translational kinetics value of the native codon pair in the native host organism.
  • proximal codon pairs can be selected to be replaced in order to introduce a translational pause or slowing.
  • one of the 1, 2, 3, 4 or 5 most proximal codon pairs upstream (5' of the desired pause site) or one of the 1, 2, 3, 4 or 5 most proximal codon pairs downstream (3' of the desired pause site) can be chosen for replacement to introduce the translational pause or slowing.
  • the selected codon pair for replacement to introduce the translational pause or slowing is the codon pair closest to the originally desired codon pair location of the translational pause or slowing, provided the desired translational pause or slowing can be attained (e.g., 1 codon pair upstream or downstream is typically selected instead of 2 codon pairs upstream or downstream, provided the desired translational pause or slowing can be attained).
  • a translational pause or slowing can be introduced by selecting a replacement codon pair encoding a conservative amino acid substitution, such as the conservative substitutions shown in Table 1.
  • replacement of a proximal codon pair to introduce a translational pause or slowing is preferred over replacement of a codon pair resulting in a change in the encoded amino acid sequence.
  • graphical displays of translational kinetics values of one or more proteins can be used to provide information to assist in the selection of a translational pause or slowing to preserve or insert in a redesigned polypeptide-encoding nucleotide sequence.
  • graphical displays of translational kinetics values can permit, for example, alignment of homologous proteins from different species and an identification, based on this alignment, of predicted translational pause or slowing sites that are conserved in the aligned proteins.
  • Such predicted translational pause or slowing sites can be preserved or inserted in a redesigned polypeptide-encoding nucleotide sequence.
  • regions between autonomous folding units in one or more proteins within a particular species can be graphically examined for the presence or absence of predicted pause sites.
  • Such graphical display methods can result in an identification of a region between autonomous folding units in which a translational pause or slowing is desirably preserved in a redesigned polypeptide-encoding sequence.
  • Methods for identifying and selecting conserved translational pauses can be performed according to any method known in the art, as exemplified in U.S. Patent Publication No. 2007/0298503, published on December 27, 2007, and U.S. Patent Publication No. 2007/0275399, published on November 29, 2007.
  • the codon pair translation kinetics values can be compared with a database of related gene sequences and conserved pause sites can be identified.
  • a synthetic gene can be designed wherein at least one conserved pause site is maintained to provide a synthetic gene with modified translation kinetics.
  • codon pairs are associated with translational pauses, and can thereby influence translational kinetics of an mRNA into polypeptide.
  • the methods of changing translational kinetics provided herein will typically be performed by modifying or designing one or more nucleotide sequences encoding a polypeptide to be expressed.
  • methods of modifying a gene or designing a synthetic nucleotide sequence encoding the polypeptide encoded by the gene collectively referred to herein as redesigning a polypeptide-encoding gene sequence or redesigning a polypeptide-encoding nucleotide sequence.
  • redesigning a polypeptide-encoding gene sequence or redesigning a polypeptide-encoding nucleotide sequence.
  • Also included in the various embodiments provided herein are redesigned gene sequences encoding polypeptides that are not identical to the original gene.
  • a hydrolysis enzyme-encoding DNA sequence wherein the encoded sequence has at least a 50%, 60%, 70%, 75%, 80%, 85%, and more typically at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% amino acid sequence identity to the wild type hydrolysis polypeptide sequence as set forth in SEQ ID NO: 2, 26, 50, 74, 98, 122, 146, 170, 182 or 194.
  • At least 1, 2 or 3 codon pairs of a polynucleotide sequence encoding the hydrolysis enzyme have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • at least 3, or 4, or 5, or 6 or more of the specified codon pairs have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the DNA sequence is optimized for expression in S. cerevisiae, E. coli, P. pastoris, K. lactis or Z mobilis.
  • a hydrolysis enzyme-encoding DNA sequence adapted for expression in a heterologous host organism, wherein at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more codon pairs present in wild-type nucleotide sequence and which encode the a functional domain of the hydrolysis enzyme have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the conserved amino acid sequence pattern and domain boundaries for functional domains are known in the art.
  • the replacement codon pairs are predicted to be less likely to cause a translational pause in the heterologous host organism relative to the respective wild type codon pair when expressed in the heterologous host organism. That is, the embodiments in which one or more codon pairs encoding amino acids of the a functional domain of one of the encoded polypeptides provided herein have been replaced include embodiments in which the nucleotide sequence encoding the functional domain is changed to increase the predicted translational kinetics of translation of the functional domain. As provided herein, incomplete translation, improper folding, or other protein expression shortcomings can result from the presence of one or more translational pauses in a heterologously-expressed polypeptide. In some embodiments, removal of one or more of these pauses can increase the speed of translation of the functional domain, and thereby increase the quantity of protein produced and/or increase the amount of stable, properly folded, active, and/or soluble protein produced.
  • the replacement codons i.e., the codons added as replacements for the wild type codons
  • the replacement codon are typically predicted to be less likely to cause a translational pause.
  • the replacement codon can have a translational kinetics value in the heterologous host organism that is 95%, 90%, 85%, 80%, 75%, 70%, or less, than the translational kinetics value of the wild type codon pair when expressed in the heterologous host organism.
  • the replacement codon is selected to have a translational kinetics value similar to the translational kinetics value of the wild type codon pair in the native organism.
  • the z score of at least one replacement codon pair when expressed in the heterologous host organism can be no more than 250%, 200%, 150%, 125% or 100% of the z score for the wild type codon pair when expressed in the native organism.
  • a hydrolysis enzyme-encoding DNA sequence adapted for expression in a heterologous host organism, wherein at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more codon pairs present in wild-type nucleotide sequence and which encode the region between domains of the hydrolysis enzyme, have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the conserved amino acid sequence pattern and domain boundaries for the domains are known in the art and are described in detail below.
  • a cellobiohydrolase-encoding DNA sequence adapted for expression in a heterologous host organism, wherein at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more codon pairs present in wild-type nucleotide sequence and which encode the cellulose binding domain of the cellobiohydrolase, have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the conserved amino acid sequence pattern and domain boundaries for cellulose binding domains are known in the art.
  • the cellulose binding domain includes at least amino acids 35-58, 30- 61 or 27-62.
  • a cellobiohydrolase-encoding DNA sequence adapted for expression in a heterologous host organism, wherein at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more codon pairs present in wild-type nucleotide sequence and which encode the glycosyl hydrolase domain of the cellobiohydrolase, have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the conserved amino acid sequence pattern and domain boundaries for glycosyl hydrolase domains are known in the art.
  • the glycosyl hydrolase domain includes at least amino acids 124-437, 1 15-450 or 107-471.
  • a laccase-encoding DNA sequence adapted for expression in a heterologous host organism, wherein at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more codon pairs present in wild-type nucleotide sequence and which encode the Cu-oxidase-3 domain of the laccase, have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the conserved amino acid sequence pattern and domain boundaries for Cu-oxidase-3 domains are known in the art.
  • the Cu-oxidase-3 domain includes at least amino acids 29-151 or 28-152.
  • a laccase-encoding DNA sequence adapted for expression in a heterologous host organism, wherein at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more codon pairs present in wild-type nucleotide sequence and which encode the Cu-oxidase domain of the laccase, have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the conserved amino acid sequence pattern and domain boundaries for Cu-oxidase domains are known in the art.
  • the Cu-oxidase domain includes at least amino acids 162-304 or 161-305.
  • a laccase-encoding DNA sequence adapted for expression in a heterologous host organism, wherein at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more codon pairs present in wild-type nucleotide sequence and which encode the Cu-oxidase-2 domain of the laccase, have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the conserved amino acid sequence pattern and domain boundaries for Cu-oxidase-2 domains are known in the art.
  • the Cu-oxidase-2 domain includes at least amino acids 365-492 or 364-493.
  • a lignin peroxidase-encoding DNA sequence adapted for expression in a heterologous host organism, wherein at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more codon pairs present in wild-type nucleotide sequence and which encode the haem peroxidase domain of the lignin peroxidase, have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the conserved amino acid sequence pattern and domain boundaries for haem peroxidase domains are known in the art.
  • the haem peroxidase domain includes at least amino acids 47-286 or 46- 287.
  • a Mn-dependent peroxidase-encoding DNA sequence adapted for expression in a heterologous host organism, wherein at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more codon pairs present in wild-type nucleotide sequence and which encode the haem peroxidase domain of the Mn-dependent peroxidase, have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the conserved amino acid sequence pattern and domain boundaries for haem peroxidase domains are known in the art.
  • the haem peroxidase domain includes at least amino acids 46-283 or 45-284.
  • a laccase-encoding DNA sequence adapted for expression in a heterologous host organism, wherein at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more codon pairs present in wild-type nucleotide sequence and which encode the Cu-oxidase-3 domain of the laccase, have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the conserved amino acid sequence pattern and domain boundaries for Cu-oxidase-3 domains are known in the art.
  • the Cu- oxidase-3 domain includes at least amino acids 91-211 or 90-212.
  • a laccase-encoding DNA sequence adapted for expression in a heterologous host organism, wherein at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more codon pairs present in wild-type nucleotide sequence and which encode the Cu-oxidase domain of the laccase, have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the conserved amino acid sequence pattern and domain boundaries for Cu-oxidase domains are known in the art.
  • the Cu-oxidase domain includes at least amino acids 217-366 or 216-367.
  • a laccase-encoding DNA sequence adapted for expression in a heterologous host organism, wherein at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more codon pairs present in wild-type nucleotide sequence and which encode the Cu-oxidase-2 domain of the laccase, have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the conserved amino acid sequence pattern and domain boundaries for Cu-oxidase-2 domains are known in the art.
  • the Cu-oxidase-2 domain includes at least amino acids 427-569 or 426-570.
  • a laccase-encoding DNA sequence adapted for expression in a heterologous host organism, wherein at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more codon pairs present in wild-type nucleotide sequence and which encode the Cu-oxidase-3 domain of the laccase, have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the conserved amino acid sequence pattern and domain boundaries for Cu-oxidase-3 domains are known in the art.
  • the Cu-oxidase-3 domain includes at least amino acids 30-152 or 29-153.
  • a laccase-encoding DNA sequence adapted for expression in a heterologous host organism, wherein at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more codon pairs present in wild-type nucleotide sequence and which encode the Cu-oxidase domain of the laccase, have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the conserved amino acid sequence pattern and domain boundaries for Cu-oxidase domains are known in the art.
  • the Cu-oxidase domain includes at least amino acids 163-305 or 162-306.
  • a laccase-encoding DNA sequence adapted for expression in a heterologous host organism, wherein at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more codon pairs present in wild-type nucleotide sequence and which encode the Cu-oxidase-2 domain of the laccase, have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the conserved amino acid sequence pattern and domain boundaries for Cu-oxidase-2 domains are known in the art.
  • the Cu-oxidase-2 domain includes at least amino acids 365-492 or 364-493.
  • a laccase-encoding DNA sequence adapted for expression in a heterologous host organism, wherein at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more codon pairs present in wild-type nucleotide sequence and which encode the Cu-oxidase-3 domain of the laccase, have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the conserved amino acid sequence pattern and domain boundaries for Cu-oxidase-3 domains are known in the art.
  • the Cu-oxidase-3 domain includes at least amino acids 30-152 or 29-153.
  • a laccase-encoding DNA sequence adapted for expression in a heterologous host organism, wherein at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more codon pairs present in wild-type nucleotide sequence and which encode the Cu-oxidase domain of the laccase, have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the conserved amino acid sequence pattern and domain boundaries for Cu-oxidase domains are known in the art.
  • the Cu-oxidase domain includes at least amino acids 163-305 or 162-306.
  • a laccase-encoding DNA sequence adapted for expression in a heterologous host organism, wherein at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more codon pairs present in wild-type nucleotide sequence and which encode the Cu-oxidase-2 domain of the laccase, have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the conserved amino acid sequence pattern and domain boundaries for Cu-oxidase-2 domains are known in the art.
  • the Cu-oxidase-2 domain includes at least amino acids 365-492 or 364-493.
  • a cellobiohydrolase-encoding DNA sequence adapted for expression in a heterologous host organism, wherein at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more codon pairs present in wild-type nucleotide sequence and which encode the cellulose binding domain of the cellobiohydrolase, have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the conserved amino acid sequence pattern and domain boundaries for cellulose binding domains are known in the art.
  • the cellulose binding domain includes at least amino acids 465-493.
  • a cellobiohydrolase-encoding DNA sequence adapted for expression in a heterologous host organism, wherein at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more codon pairs present in wild-type nucleotide sequence and which encode the glycosyl hydrolase domain of the cellobiohydrolase, have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the conserved amino acid sequence pattern and domain boundaries for glycosyl hydrolase domains are known in the art.
  • the glycosyl hydrolase domain includes at least amino acids 1-434.
  • a endoglucanase-encoding DNA sequence adapted for expression in a heterologous host organism, wherein at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more codon pairs present in wild-type nucleotide sequence and which encode the endoglucanase domain of the endoglucanase, have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the conserved amino acid sequence pattern and domain boundaries for endoglucanase domains are known in the art.
  • the endoglucanase domain includes at least amino acids 32-276.
  • a xylanase-encoding DNA sequence adapted for expression in a heterologous host organism, wherein at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more codon pairs present in wild-type nucleotide sequence and which encode the glycosyl hydrolase domain of the xylanase, have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the conserved amino acid sequence pattern and domain boundaries for glycosyl hydrolase domains are known in the art. hi the case of the xylanase of SEQ ID NO: 193, the glycosyl hydrolase domain includes at least amino acids 31-221.
  • a cellobiohydrolase-encoding DNA sequence adapted for expression in a heterologous host organism, wherein at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more codon pairs present in wild-type nucleotide sequence and which encode the region between the cellulose binding domain and the glycosyl hydrolase domain of the cellobiohydrolase, have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the conserved amino acid sequence pattern and domain boundaries for the cellulose binding domain and glycosyl hydrolase domain are described hereinabove.
  • a laccase-encoding DNA sequence adapted for expression in a heterologous host organism, wherein at least 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, or more codon pairs present in wild-type nucleotide sequence and which encode the region between the N-terminus and the Cu-oxidase-3 domain of the laccase, have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the conserved amino acid sequence pattern and domain boundaries for the Cu-oxidase-3 domain are described hereinabove.
  • a laccase-encoding DNA sequence adapted for expression in a heterologous host organism, wherein at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more codon pairs present in wild-type nucleotide sequence and which encode the region between the Cu-oxidase-3 and the Cu-oxidase domain of the laccase, have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the conserved amino acid sequence pattern and domain boundaries for the Cu-oxidase domain are described hereinabove.
  • a laccase-encoding DNA sequence adapted for expression in a heterologous host organism, wherein at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more codon pairs present in wild-type nucleotide sequence and which encode the region between the Cu-oxidase and the Cu-oxidase-2 domain of the laccase, have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the conserved amino acid sequence pattern and domain boundaries for the Cu-oxidase-2 domain are described hereinabove.
  • a lignin peroxidase-encoding DNA sequence adapted for expression in a heterologous host organism, wherein at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more codon pairs present in wild-type nucleotide sequence and which encode the region between the N-terminus and the haem peroxidase domain of the lignin peroxidase, have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the conserved amino acid sequence pattern and domain boundaries for the haem peroxidase domain are described hereinabove.
  • a Mn-dependent peroxidase-encoding DNA sequence adapted for expression in a heterologous host organism, wherein at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more codon pairs present in wild-type nucleotide sequence and which encode the region between the N-terminus and the haem peroxidase domain of the Mn-dependent peroxidase, have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the conserved amino acid sequence pattern and domain boundaries for the haem peroxidase domain are described hereinabove.
  • the conserved amino acid sequence pattern and domain boundaries for the Cu-oxidase-3 domain are described hereinabove.
  • the conserved amino acid sequence pattern and domain boundaries for the Cu-oxidase domain are described hereinabove.
  • the conserved amino acid sequence pattern and domain boundaries for the Cu-oxidase-2 domain are described hereinabove.
  • a cellobiohydrolase-encoding DNA sequence adapted for expression in a heterologous host organism, wherein at least 1 , 2, 3, 4, 5, 6, 7, 8, 9, 10, or more codon pairs present in wild-type nucleotide sequence and which encode the region between the cellulose binding domain and the glycosyl hydrolase domain of the cellobiohydrolase, have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the conserved amino acid sequence pattern and domain boundaries for the cellulose binding domain and glycosyl hydrolase domain are described hereinabove.
  • a endoglucanase-encoding DNA sequence adapted for expression in a heterologous host organism, wherein at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more codon pairs present in wild-type nucleotide sequence and which encode the region between the N-terminus and the endoglucanase domain of the endoglucanse enzyme have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the conserved amino acid sequence pattern and domain boundaries for the endoglucanase domain are described hereinabove.
  • a xylanase-encoding DNA sequence adapted for expression in a heterologous host organism, wherein at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more codon pairs present in wild-type nucleotide sequence and which encode the region between the N-terminus and the glycosyl hydrolase domain of the xylanase, have been replaced with different codon pairs encoding identical amino acids or conservative amino acid substitutions thereof.
  • the conserved amino acid sequence pattern and domain boundaries for the glycosyl hydrolase domain are described hereinabove.
  • polypeptide-encoding nucleotide sequence provided herein to modify the translational kinetics of the polypeptide-encoding nucleotide sequence, where the polypeptide-encoding nucleotide sequence is altered such that one or more codon pairs have a decreased likelihood of causing a translational pause or slowing relative to the unaltered polypeptide-encoding nucleotide sequence.
  • one or more nucleotides of a polypeptide-encoding nucleotide sequence can be changed such that a codon pair containing the changed nucleotides has a translational kinetics value indicative of a decreased likelihood of causing a translational pause or slowing relative to the unchanged polypeptide-encoding nucleotide sequence.
  • the redesigned polypeptide-encoding nucleotide sequence need not possess a high degree of identity to the polypeptide-encoding nucleotide sequence of the original gene, in some embodiments, the redesigned polypeptide-encoding nucleotide sequence will have at least 50%, 60%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% nucleotide identity with the polypeptide-encoding nucleotide sequence of the original gene.
  • an original gene refers to a gene for which codon pair refinement is to be performed; such original genes can be, for example, wild type genes, native genes, naturally occurring mutant genes, other mutant genes such as site-directed mutant genes or engineered or completely synthetic genes.
  • the polynucleotide sequence will be completely synthetic, and will bear much lower identity with the original gene, e.g., no more than 90%, 80%, 70%, 60%, 50%, 40%, or lower.
  • the resulting sequence can be designed to: (1) reduce or eliminate translational problems caused by inappropriate ribosome pausing, such as those caused by over- represented codon pairs or other codon pairs with translational values predictive of a translational pause; (2) have codon usage refined to avoid over-reliance on rare codons; (3) reduce in number or remove particular restriction sites, splice sites, internal Shine- Dalgarno sequences, or other sites that may cause problems in cloning or in interactions with the host organism; or (4) have controlled RNA secondary structure to avoid detrimental translational termination effects, translation initiation effects, or RNA processing, which can arise from, for example, RNA self-hybridization.
  • this sequence also can be designed to avoid oligonucleotides that mis-hybridize, resulting in genes that can be assembled from refined oligonucleotides that by thermodynamic necessity only pair up in the desired manner, using methods known in the art, as exemplified in U.S. Patent Publication No. 2005/0106590, which is hereby incorporated by reference in its entirety.
  • polypeptide-encoding nucleotide sequence it is not possible to modify the polypeptide- encoding nucleotide sequence to suitably modify the translational kinetics of the mRNA into polypeptide without modifying the amino acid sequence of the encoded polypeptide.
  • an amino acid insertion, deletion or mutation can be introduced to yield a codon pair that is not predicted to cause a translational pause or slowing; or no change is made.
  • the change is preferably predicted to not substantially influence the final three-dimensional structure of the protein and/or the activity of the protein.
  • Such non-identical polypeptides can vary by containing one or more insertions, deletions and/or mutations.
  • polypeptide sequence can vary according to the purpose of the change, typically such a change results in a polypeptide that is at least 50%, 60%, 70%, 75%, 80%, 85%, and more typically at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% identical to the wild type polypeptide sequence.
  • the sequence of the polynucleotide can be generated, optionally in conjunction with optimization of a plurality of parameters where one such parameter can be codon pair usage, where the resultant polynucleotide can be prepared by assembly of a plurality of oligonucleotides sufficiently small to be synthesized by known oligonucleotide synthetic methods.
  • Methods known in the art for optimizing multiple parameters in synthetic nucleotide sequences can be applied to optimizing the parameters recited in the present claims. Such methods may advantageously include those exemplified in U.S. Patent App. Publication No. 2005/0106590, U.S. Patent App. Publication No. 2007/0009928, and R. H.
  • an exemplary method for generating a sequence can also include dividing the desired sequence into a plurality of partially overlapping segments; optimizing the melting temperatures of the overlapping regions of each segment to disfavor hybridization to the overlapping segments which are non- adjacent in the desired sequence; allowing the overlapping regions of single stranded segments which are adjacent to one another in the desired sequence to hybridize to one another under conditions which disfavor hybridization of non-adjacent segments; and filling in, ligating, or repairing the gaps between the overlapping regions, thereby forming a double-stranded DNA with the desired sequence.
  • This process can be performed manually or can be automated, e.g., in a general purpose digital computer.
  • the search of possible codon assignments is mapped into an anytime branch and bound computerized algorithm developed for biological applications.
  • a synthetic nucleotide sequence for the polynucleotides provided herein, where the synthetic nucleotide sequence also is typically designed to have desirable translational kinetics properties, such as the removal of some or all codon pairs predicted to result in a translational pause or slowing.
  • Such design methods include determining a set of partially overlapping segments with optimized melting temperatures, and determining the translational kinetics of the synthetic sequence, where if it is desired to change the translational kinetics of the synthetic gene, the sequences of the overlapping segments are modified and refined in order to approximate the desired translational kinetics while still possessing acceptable hybridization properties. In some embodiments, this process is performed iteratively.
  • a criterion is established for selecting codon pairs having high translational kinetics values to be replaced with codon pairs having lower the translational kinetics values unless a codon pair of this group is the site of a planned pause.
  • the top 1%, 1.5%, 2%, 2.5%, 3%, 3.5%, 4%, 4.5%, 5%, 5.5%, 6%, 6.5%, 7%, 7.5%, 8%, 8.5%, 9%, 9.5%, or 10% of codon pairs ranked by translational kinetics values can be replaced by codon pairs having lower translational kinetics values, such as translational kinetics value below a user defined level that can be, for example, a translational kinetics value equal to or below the translational kinetics values of codon pairs not in the top selected percentage, unless a codon pair of this group is the site of a planned pause (in which case it is not necessarily replaced).
  • all codon pairs above a user-selected translational kinetics value such as more than 5, 4.5, 4, 3.5, 3, 2.5 or 2 standard deviations above the mean translational kinetics value can be replaced by codon pairs having lower translational kinetics values, such as translational kinetics value below a user defined level that can be, for example, a translational kinetics value that is 4, 3.5, 3, 2.5, 2, 1.5 or 1 standard deviations less than the mean translational kinetics value, unless a codon pair of this group is the site of a planned pause (in which case it is not necessarily replaced).
  • polynucleotide sequences design methods provided herein can be employed where a plurality of properties of the polynucleotide sequences can be refined in addition to codon pair usage properties, where such properties can include, but are not limited to, melting temperature gap between oligonucleotides of synthetic gene, average codon usage, average codon pair chi-squared (e.g., z score), worst codon usage, worst codon pair (e.g., z score), maximum usage in adjacent codons, Shine-Dalgarno sequence (for E.
  • coli expression occurrences of 5 consecutive G's or 5 consecutive Cs, occurrences of 6 consecutive A's or 6 consecutive T's, long exactly repeated subsequences, cloning restriction sites, user-prohibited sequences (e.g., other restriction sites), codon usage of a specific codon above user-specified limit, and out-of-frame stop codons (framecatchers).
  • additional properties that can be considered in a process of designing a polynucleotide sequence include, but are not limited to, occurrences of RNA splice sites, occurrences of polyA sites, and occurrence of ribosome binding sequence.
  • a process of designing a poly nucleotide sequence can include constraints including, but not limited to, minimum melting temperature gap between oligonucleotides of synthetic gene, minimum average codon usage, maximum average codon pair chi-squared (z score), minimum absolute codon usage, maximum absolute codon pair (z score), minimum maximum usage in adjacent codons, no Shine-Dalgarno sequence (for E.
  • additional constraints can include, but are not limited to, minimum occurrences of RNA splice sites, minimum occurrences of polyA sites, and occurrence of ribosome binding sequence.
  • a process of designing a polynucleotide sequence can include preferences including, but not limited to, prefer high average codon usage, prefer low average codon pair chi-squared, prefer larger melting temperature gap, prefer more out of frame stop codons (framecatchers), and optionally prefer evenly distributed codon usage.
  • Any of a variety of nucleotide sequence refinement/optimization methods known in the art can be used to refine the polynucleotide sequence according to the codon pair usage properties, and according to any of the additional properties specifically described above, or other properties that are refined in nucleotide sequence redesign methods known in the art.
  • a branch and bound method is employed to refine the polynucleotide sequence according to codon pair usage properties and at least one additional property, such as codon usage.
  • the methods provided herein can further include analyzing at least a portion of the candidate polynucleotide sequence in frame shift, and selecting codons for the candidate polynucleotide sequence such that stop codons are added to at least one said frame shift.
  • the generating step further includes analyzing at least a portion of the candidate polynucleotide sequence in frame shift, and selecting codons for the candidate polynucleotide sequence such that one or more stop codons in one, two or three reading frames are added downstream of polypeptide-encoding region of the nucleotide sequence.
  • methods for redesigning a polypeptide-encoding gene for expression in a host organism, by providing a data set representative of codon pair translational kinetics for the host organism which includes translational kinetics values of the codon pairs utilized by the host organism, providing a desired polypeptide sequence for expression in the host organism, and generating a polynucleotide sequence encoding the polypeptide sequence by analyzing candidate nucleotides to select, where possible, codon pairs that are predicted not to cause a translational pause in the host organism, with reference to the data set, thereby providing a candidate polynucleotide sequence encoding the desired polypeptide.
  • Also provided herein are methods for redesigning a polypeptide- encoding gene for expression in a host organism by providing a first data set representative of codon pair translational kinetics for the host organism which includes translational kinetics values of the codon pairs utilized by the host organism, providing a second data set representative of at least one additional desired property of the synthetic gene, providing a desired polypeptide sequence for expression in the host organism, and generating a polynucleotide sequence encoding the polypeptide sequence by analyzing candidate nucleotides to select, where possible, both (i) codon pairs that are predicted not to cause a translational pause in the host organism, with reference to the first data set, and (ii) nucleotides that provide a desired property, with reference to the second data set, thereby providing a candidate polynucleotide sequence encoding the desired polypeptide.
  • a branch and bound method is employed to refine the polypeptide- encoding nucleotide sequence according to codon pair usage properties of the first data set and according to the properties of the second data set.
  • the second data set contains codon preferences representative of codon usage by the host organism, including the most common codons used by the host organism for a given amino acid.
  • a hydrolysis enzyme -encoding DNA sequence wherein the encoded sequence has at least a 50%, 60%, 70%, 75%,80%, 85%, and more typically at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% amino acid sequence identity to the wild type hydrolysis enzyme polypeptide sequence as set forth in the sequence listing.
  • the polynucleotide provided herein is adapted for expression in a heterologous host organism.
  • a heterologous host organism is an organism used to express DNA, RNA or protein that is foreign to the host organism.
  • the host organism is not human, E. coli or S. cerevisiae.
  • At least 1 , 2 or 3 codon pairs of the original sequence that are predicted to cause a translational pause in the host organism have been replaced with codon pairs that are predicted to be less likely to cause a translational pause therein.
  • the at least three codon pairs of the original sequence that are predicted to cause a translational pause in the host organism are highly-overrepresented codon pairs therein and have been replaced with codon pairs that are not highly-overrepresented therein.
  • a highly- overrepresented codon pair is a codon pair that has a translational kinetics value greater than a designated threshold, wherein a threshold value can be at least 1 , 1.25, 1.5, 1.75, 2, 2.25, 2.5, 3, 3.5, 4, 4.5 or 5 or more standard deviations above the mean translational kinetics value.
  • a hydrolysis enzyme -encoding DNA sequence having at least a 75% sequence identity with an original hydrolysis enzyme polypeptide sequence as set forth in the sequence listing and is adapted for expression in a heterologous host organism, wherein at least three codon pairs of the original sequence that are predicted to cause a translational pause in the host organism have been replaced with codon pairs that are predicted to be less likely to cause a translational pause therein, and wherein the host organisms are selected from the following: Pichia pastoris; Oryctolagus cuniculus (rabbit); Macaca fascicularis (Long-tailed monkey); M. mulatto (Monkey); E. coli K12 W31 10; E. coli UTI89; E.
  • the methods provided herein can include analyzing the candidate polynucleotide sequence to confirm that no codon pairs are predicted to cause a translational pause in the host organism by more than a designated threshold.
  • the likelihood that a particular codon pair will cause translational pausing or slowing in an organism can be represented by a translational kinetics value.
  • the translational kinetics value can be expressed in any of a variety of manners in accordance with the guidance provided herein. In one example, a translational kinetics value can be expressed in terms of the mean translational kinetics value and the corresponding standard deviation for all codon pairs in an organism.
  • the translational kinetics value for a particular codon pair can be expressed in terms of the number of standard deviations that separate the translational kinetics value of the codon pair from the mean translational kinetics value.
  • a threshold value can be at least 1 , 1.25, 1.5, 1.75, 2, 2.25, 2.5, 3, 3.5, 4, 4.5 or 5 or more standard deviations above the mean translational kinetics value.
  • the methods provided herein also include generating a candidate nucleotide sequence according to codon usage.
  • codon usage As is known in the art, different organisms can have different preference for the three- nucleotide codon sequence encoding a particular amino acid. As a result, translation can often be improved by using the most common three-nucleotide codon sequence encoding a particular amino acid.
  • some methods provided herein also include generating a candidate nucleotide sequence such that codon utilization is non-randomly biased in favor of codons most commonly used by the host organism. Codon usage preferences are known in the art for a variety of organisms and methods for selecting the more commonly used codons are well known in the art.
  • the methods of redesigning a polypeptide- encoding nucleotide sequence are based on a plurality of properties, where a conflict in the preferred nucleotide sequence arising from the plurality of properties is determined in order to optimize the predicted translational kinetics. That is, when the plurality of properties being optimized would lead to more than one possible nucleotide sequence depending on which property is to be accorded more weight, typically, the conflict is resolved by selecting the nucleotide sequence predicted to be translated more rapidly, for example, due to fewer predicted translational pauses.
  • the methods of redesigning a polypeptide-encoding nucleotide sequence are based on a plurality of properties, where a conflict in the preferred nucleotide sequence arising from the plurality of properties is determined in order to optimize codon pair usage preferences. That is, when the plurality of properties being optimized would lead to more than one possible nucleotide sequence depending on which property is to be accorded more weight, typically, codon pair usage will be accorded more weight in order to resolve the conflict between the more than one possible nucleotide sequences.
  • the methods provided herein can include identifying at least one instance of a conflict between selecting common codons and avoiding codon pairs predicted to cause a translational pause; in such instances, the conflict is resolved in favor of avoiding codon pairs predicted to cause a translational pause.
  • Some embodiments provided herein include generating a candidate polynucleotide sequence encoding the polypeptide sequence, the candidate polynucleotide sequence having a non-random codon pair usage, such that the codon pairs encoding any particular pair of amino acids have the lowest translational kinetics values.
  • the candidate polynucleotide sequence encoding the polypeptide sequence is generated and/or altered such that the encoded amino acid sequence is not altered.
  • the candidate polynucleotide sequence encoding the polypeptide sequence is generated and/or altered such that the three dimensional structure of the encoded polypeptide is not substantially altered.
  • the candidate polynucleotide sequence encoding the polypeptide sequence is generated and/or altered such that no more than conservative amino acid changes are made to the encoded polypeptide.
  • the methods provided herein can further include a step of refining or altering the candidate polynucleotide sequence in accordance with a second nucleotide sequence property to be refined.
  • the methods further include generating or refining a candidate polynucleotide sequence encoding a polypeptide sequence such that the candidate polynucleotide sequence has a non-random codon usage, where the most common codons used by the host organism are over-represented in the candidate polynucleotide sequence.
  • the methods can include refining or altering the candidate polynucleotide sequence in accordance with any of a variety of additional properties provided herein, including but not limited to, melting temperature gap between oligonucleotides of synthetic gene, Shine-Dai garno sequence, occurrences of 5 consecutive G's or 5 consecutive Cs, occurrences of 6 consecutive A's or 6 consecutive T's long exactly repeated subsequences, cloning restriction sites, or any other user-prohibited sequences. Further, any of a variety of combinations of these properties can be additionally included in the nucleotide sequence refinement methods provided herein.
  • the method provided herein can further include an evaluation step in which after the candidate polynucleotide sequence is altered, the sequence is compared with at least a portion of a data set of a property against which the sequence was refined.
  • an evaluation step in which after the candidate polynucleotide sequence is altered, the sequence is compared with at least a portion of a data set of a property against which the sequence was refined.
  • the candidate nucleotide sequence can be compared to each property considered in the refinement, and, if the values for all properties are deemed to be acceptable or desired, no further sequence alteration is required. If the values for fewer than all properties are deemed to be acceptable or desired, the candidate nucleotide sequence can be subjected to further sequence alteration and evaluation.
  • sequence alteration steps of methods provided herein can be performed iteratively. That is, one or more steps of altering the nucleotide sequence can be performed, and the candidate nucleotide sequence can be evaluated to determine whether or not further sequence alteration is necessary and/or desirable. These steps can be repeated until values for all properties are deemed to be acceptable or desired, or until no further improvement can be achieved.
  • the methods and sequences provided herein include determination and use of translational kinetics values for codon pairs. As provided herein, such a translational kinetics value can be calculated and/or empirically measured, and the final translational kinetics value used in graphical displays and methods of predicting translational kinetics can be a refined value resultant from two or more types of codon pair translational kinetics information.
  • codon pair translational kinetics information that can be used in refining or replacing a translational kinetics value for a codon pair include, for example, values of observed versus expected codon pair frequencies in a particular organism, normalized values of observed versus expected codon pair frequencies in a particular organism, the degree to which observed versus expected codon pair frequency values are conserved in related proteins across two or more species, the degree to which observed versus expected codon pair frequency values are conserved at predicted pause sites such as boundaries between autonomous folding units in related proteins across two or more species, the degree to which codon pairs are conserved at predicted pause sites across different proteins in the same species, and empirical measurement of translational kinetics for a codon pair.
  • the values of observed versus expected codon pair frequencies in a host organism can be determined by any of a variety of methods known in the art for statistically evaluating observed occurrences relative to expected occurrences. Regardless of the statistical method used, this typically involves obtaining codon sequence data for the organism, for example, on a gene-by-gene basis. In some embodiments, the analysis is focused only on the coding regions of the genome. Because the analysis is a statistical one, a large database is preferred. Initially, the total number of codons is determined and the number of times each of the 61 non-terminating codons appears is determined.
  • the expected frequency of each of the 3721 (61 2 ) possible non- terminating codon pairs is calculated, typically by multiplying together the frequencies with which each of the component codons appears.
  • This frequency analysis can be carried out on a global basis, analyzing all of the sequences in the database together; however, it is typically done on a local basis, analyzing each sequence individually. This will tend to minimize the statistical effect of an unusually high proportion of rare codons in a sequence.
  • the expected number of occurrences of each codon pair is calculated by, for example, multiplying the expected frequency by the number of pairs in the sequence. This information can then be added to a global table, and each next succeeding sequence can be analyzed in like manner.
  • the values of observed versus expected codon pair frequencies are chi-squared values, such as chi-squared 2 (chisq2) values or chi- squared 3 (chisq3) values.
  • Methods for calculating chi-squared values can be performed according to any method known in the art, as exemplified in U.S. Patent No. 5,082,767, which is incorporated by reference herein in its entirety.
  • chisq2 (observed-expected) 2 / expected [0342]
  • a new value chi-squared 2 (chisq2) can be calculated as follows. For each group of codon pairs encoding the same amino acid pair (i.e., 400 groups), the sums of the expected and observed values are tallied; any non-randomness in amino acid pairs is reflected in the difference between these two values.
  • each of the expected values within the group is multiplied by the factor [sum observed/sum expected], so that the sums of the expected and observed values with the group are equal.
  • the new chi- squared, chisq2 is evaluated using these new expected values. Calculation methods for removing the contribution to chi-squared of non-randomness in amino acid pairs are known in the art, as exemplified in Gutman and Hatfield, Proc. Natl. Acad. Sci. USA, (1989) 86:3699-3703.
  • a new value chi-squared 3 (chisq3) can be calculated. Correction is made only for those dinucleotides formed between adjacent codon pairs; any bias of dinucleotides within codons (codon triplet positions I-II and II-III) will directly affect codon usage and is, therefore, automatically taken into account in the underlying calculations.
  • the sums of the expected and observed values are tallied; any non- randomness in dinucleotide pairs is reflected in the difference between these two values. Therefore, each of the expected values within the group is multiplied by the factor [sum observed/sum expected], so that the sums of the expected and observed values with the group are equal.
  • the new chi-squared, chisq3, is evaluated using these new expected values.
  • Dinucleotide bias represents a smaller effect in yeast, and only a very minor one in E. coll.
  • the predominant dinucleotide bias in human is the well-known CpG deficit, other dinucleotides are also very highly biased. For example, there is a deficit of TA, as well as an excess of TG, CA and CT. Overall, the deficit of CpG contributes only 35% of the total dinucleotide bias in the human database, and 17% in yeast.
  • the values of observed versus expected codon pair frequencies in a host organism herein can be normalized. Normalization permits different sets of values of observed versus expected codon pair frequencies to be compared by placing these values on the same numerical scale. For example, normalized codon pair frequency values can be compared between different organisms, or can be compared for different codon pair frequency value calculations within a particular organism (e.g., different calculations based on input sequence information or based on different calculations such as chisql or chisq2 or chisq3). Typically, normalization results in codon pair frequency values that are described in terms of their mean and standard deviation from the mean.
  • An exemplary method for normalizing codon pair frequency values is the calculation of z scores.
  • the z score for an item indicates how far and in what direction that item deviates from its distribution's mean, expressed in units of its distribution's standard deviation.
  • the mathematics of the z score transformation are such that if every item in a distribution is converted to its z score, the transformed scores will have a mean of zero and a standard deviation of one.
  • the z scores transformation can be especially useful when seeking to compare the relative standings of items from distributions with different means and/or different standard deviations, z scores are especially informative when the distribution to which they refer is normal. In a normal distribution, the distance between the mean and a given z score cuts off a fixed proportion of the total area under the curve.
  • An exemplary method for determining z scores for codon pair chi- squared values is as follows: First, a list of all 3721 possible non-terminating codon pairs is generated. Second, for the i ⁇ codon pair, the I th chi-squared value is calculated, where the i ⁇ chi-squared value is denoted c,. The chi-squared value, C 1 , is given the sign of (observed - expected), so that over-represented codon pairs are assigned a positive c, and under-represented codon pairs are assigned a negative C 1 .
  • c sgn(obs, - exp,) * (obs, - exp,) 2 / exp, [0349]
  • m (I 1 C 1 ) / 3721 where ⁇ 1 means sum over i.
  • s the standard deviation of the chi-squared values is calculated, where the standard deviation is denoted s.
  • a z score is calculated by subtracting the mean then dividing by the standard deviation, wherein the i th z score is denoted z,.
  • the formula for the z score is: s
  • provided herein are methods of refining the predictive capability of a translational kinetics value of a codon pair in a host organism by providing an initial translational kinetics value based on the value of observed codon pair frequency versus expected codon pair frequency for a codon pair in a host organism, providing additional translational kinetics data for the codon pair in the host organism, and modifying the initial translational kinetics value according to the additional codon pair translational kinetics data to generate a refined translational kinetics value for the codon pair in the host organism.
  • the translational kinetics data that can be used to refine translational kinetics values and methods of modifying translational kinetics values according to such additional translational kinetics data to generate a refined translational kinetics value for a codon pair in a host organism are provided below.
  • translational kinetics data that can be used to refine translational kinetics values are based on recurrence of a codon pair and/or recurrence of a predicted translational kinetics value associated with a codon pair.
  • Recurrence-based refinement of translational kinetics values is based on the investigation of multiple polypeptide-encoding nucleotide sequences to determine whether or not there are multiple occurrences of either codon pairs or predicted translational kinetics values in those sequences.
  • Recurrence-based refinement of translational kinetics can be performed using any of a variety of known sequence comparison methods consistent with the examples provided herein. For purposes of exemplification, and not for limitation, the following example of recurrence-based refinement of translational kinetics is provided.
  • the predicted translational kinetics value for a codon pair can be refined according to the degree to which observed versus expected codon pair frequency values are conserved in related proteins across two or more species.
  • related proteins are proteins having homologous amino acid sequences and/or similar three dimensional structures.
  • Related proteins having homologous amino acid sequences will typically have at least about 50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95% sequence identity.
  • Related proteins having similar three dimensional structures will typically share similar secondary structure topology and similar relative positioning of secondary structural elements; exemplary related proteins having three dimensional structures are members of the same SCOP- classified Family (see, e.g., Murzin A. G., Brenner S. E., Hubbard T., Chothia C. (1995).
  • SCOP a structural classification of proteins database for the investigation of sequences and structures. J. MoI. Biol. 247, 536-540.).
  • the observed versus expected codon pair frequency values for any given codon pair can vary from species to species. However, as provided herein, evolutionarily related proteins in different species will typically conserve some or all translational pause or slowing sites. Based on this, an observed conservation of one or more predicted translational pause or slowing sites in evolutionarily related proteins of different species can confirm or increase the likelihood that a translational pause or slowing site is a functional translational kinetics signal.
  • the codon pair located at the position on a protein that is confirmed as, or considered to have an increased likelihood of, containing an actual translational pause or slowing can itself be confirmed as being, or considered to have an increased likelihood of being, a functional translational kinetics signal.
  • a codon pair located at a position on a protein that is confirmed as not containing, or considered to have a decreased likelihood of containing, an actual translational pause or slowing, can itself be confirmed as not acting, or considered to have an decreased likelihood of acting, as a functional translational kinetics signal.
  • initially predicted translational kinetics data e.g., data based on values of observed codon pair frequency versus expected codon pair frequency
  • the predicted translational kinetics value for a codon pair can be refined according to the presence of the codon pair at a location predicted by methods other than codon pair frequency methods to contain a translational pause or slowing site.
  • a predicted location is a boundary location between autonomous folding units of a protein.
  • translational pauses are present in wild type genes in order to slow translation of a nascent polypeptide subsequent to translation of a secondary structural element of a protein and/or a protein domain, thus providing time for acquisition of secondary and at least partial tertiary structure by the nascent protein prior to further downstream translation, and thereby allowing each domain to partially organize and commit to a particular, independent fold.
  • codon pairs can be associated with translational pauses between autonomous folding units of a protein, where autonomous folding units can be secondary structural elements such as an alpha helix, or can be tertiary structural elements such as a protein domain.
  • the presence of a codon pair at a boundary location between autonomous folding units of a protein can confirm or increase the likelihood that the codon pair acts to pause or slow translation.
  • predicted translational kinetics data e.g., data based on values of observed codon pair frequency versus expected codon pair frequency
  • predicted translational kinetics data can be modified according to the presence of the codon pair at a boundary location between autonomous folding units of a protein, which can increase the likelihood of the codon pair acts to pause or slow translation.
  • an over-represented codon pair that is present at a boundary location between autonomous folding units of a protein can be confirmed as acting as a translational pause or slowing codon pair.
  • a single observation of the codon pair at a boundary location between autonomous folding units of a protein can confirm or increase the likely translational pause or slowing properties of a codon pair.
  • typically a plurality of observations will be used to more accurately estimate the translational pause or slowing properties of a codon pair.
  • methods of using, for example, predicted boundary locations can be combined with methods that are based on recurrence of a codon pair and/or recurrence of a predicted translational kinetics value associated with a codon pair in methods of refining a predicted translational kinetics value for a codon pair.
  • a protein present in two or more species can have conserved boundary locations between autonomous folding units of the protein, and recurrent presence of an over-represented codon pair at the boundary locations can confirm the likelihood of an actual translational pause at that boundary location, leading to confirmation, or increased likelihood, that the corresponding codon pair for the respective species acts as a translational pause or slowing codon pair.
  • two or more proteins of the same species can have boundary locations between autonomous folding units, and recurrent presence of an over-represented codon pair at the boundary locations can confirm or indicate the likelihood of an actual translational pause at that boundary location, leading to confirmation or indication of increased likelihood that the corresponding codon pair acts as a translational pause or slowing codon pair.
  • Such recurrence-based methods also can be used to confirm or indicate increased likelihood that a non-over-represented codon pair (e.g., an under-represented codon pair or a represented-as-expected codon pair) acts as a translational pause or slowing codon pair.
  • a non-over-represented codon pair e.g., an under-represented codon pair or a represented-as-expected codon pair
  • two or more proteins of the same species can have boundary locations between autonomous folding units, and recurrent presence of a non- over-represented codon pair at the boundary locations, particularly if no over-represented codon pair is present, can confirm or indicate the likelihood of an actual translational pause at that boundary location, leading to confirmation or indication of increased likelihood that the corresponding codon pair acts as a translational pause or slowing codon pair.
  • Such recurrence-based methods also can be used to confirm or indicate the likelihood that a codon pair, such as an over-represented codon pair, does not act as a translational pause or slowing codon pair.
  • a codon pair such as an over-represented codon pair
  • two or more proteins of the same species can have boundary locations between autonomous folding units, and consistent absence of a non-over-represented codon pair at the boundary locations can confirm or indicate increased likelihood that the codon pair does not act as a translational pause or slowing codon pair.
  • the predicted translational kinetics value for a codon pair can be refined according to empirical measurement of translational kinetics for a codon pair.
  • the influence of a codon pair on translational kinetics can be experimentally measured, and these experimental measurements can be used to refine or replace the predicted translational kinetics values for a codon pair.
  • Several methods of experimentally measuring the translational kinetics of a codon pair are known in the art, and can be used herein, as exemplified in Irwin et al., J. Biol. Chem., (1995) 270:22801.
  • One such exemplary assay is based on the observation that a ribosome pausing at a site near the beginning of an mRNA coding sequence can inhibit translation initiation by physically interfering with the attachment of a new ribosome to the message, and, thus, the codon pair to be assayed can be placed at the beginning of a polypeptide-encoding nucleotide sequence and the effect of the codon pair on translational initiation can be measured as an indication of the ability of the codon pair to cause a translational pause.
  • Another such exemplary assay is based on the fact that the transit time of a ribosome through the leader polypeptide coding region of the leader RNA of the trp operon sets the basal level of transcription through the trp attenuator, and, thus, the codon pair to be assayed can be placed into a trpLep leader polypeptide codon region, and level of expression can be inversely indicative of the translational pause properties of the codon pair, due to a faster translation causing formation of a stem-loop attenuator in the leader RNA, which results in transcriptional attenuation.
  • the methods provided herein for calculation of translational kinetics values can be applied to the native organism of the polypeptide of SEQ ID NOS: 2, 26, 50, 74, 98, 122, 146, 170, 182 or 194, and also can be applied to a selected organism in which the polypeptide of SEQ ID NO: 2, 26, 50, 74, 98, 122, 146, 170, 182 or 194, or a modification thereof, is to be heterologously expressed.
  • the nucleotide sequence information of an organism can be used to calculate chi-squared values in accordance with the methods provided herein, and the translational kinetics values can be based on these chi-squared values as well as on additional translational kinetics information provided herein, including, but not limited to, codon pairs conserved in domain boundaries and empirically measured translational kinetics for a codon pair.
  • the translational kinetics data described herein can be combined in such a manner as to provide a refined translational kinetics value for a codon pair in a host organism.
  • Methods of combining predictive data to arrive at a refined predictive value are known in the art and can be used herein.
  • an hypothesis H is that a given sequence feature, e.g., a given codon pair, has utility for translational kinetics engineering, e.g., creates a translational pause site.
  • H) P(Dl & D2 & D3 & D4
  • H) P(Dl & D2 & D3 & D4
  • H) P(Dl & D2 & D3 & D4
  • P(Di is correct) and P(Di is not correct) can be estimated a priori by the correlation of Di with previous experimental measurements.
  • H) are obtained by observing whether or not hypothesis H is consistent with observed data item Di. More complex and powerful Bayesian approaches are also well known to the art. The fully general approach rewrites P(D
  • the translational kinetics values for a codon pair can be refined by consideration of, for example, chi-squared value of observed versus expected codon pair frequency and the degree to which codon pairs are conserved at predicted pause sites across different proteins in the same species, for example, at protein structure domain boundaries.
  • An over-represented codon pair which is present with above-random frequency at boundary locations between autonomous folding units of proteins in the same species can have a translational kinetics value reflecting higher predicted translational pause properties of the codon pair.
  • an over- represented codon pair which is present with below-random frequency at boundary locations between autonomous folding units of proteins in the same species can have a translational kinetics value reflecting lower predicted translational pause properties of the codon pair.
  • the translational kinetics values for a codon pair can be refined by consideration of, for example, experimentally measured translation step times in one species and the degree to which codon pairs that correspond to measured pause sites in the first species are conserved across homologous proteins in other species, for example, in a multiple sequence alignment.
  • an over-represented codon pair in another species is aligned with above-random frequency to a codon pair that corresponds to a measured translation pause site in the first species, it can have a translational kinetics value reflecting higher predicted translational pause properties of that codon pair in the other species.
  • an over-represented codon pair in another species when aligned with below-random frequency to a codon pair that corresponds to a measured translation pause site in the first species, it can have a translational kinetics value reflecting lower predicted translational pause properties of that codon pair in the other species.
  • translational kinetics values for codon pairs can be determined.
  • the translational kinetic values can be organized according to the likelihood of causing a translational pause or slowing based on any method known in the art.
  • the translational kinetic values for two or more codon pairs, up to all codon pairs, in an organism are determined, and the mean translational kinetics value and associated standard deviation are calculated. Based on this, the translational kinetics value for a particular codon pair can be described in terms of the multiple of standard deviations the translational kinetics value for the particular codon pair differs from the mean translational kinetics value.
  • Such a graphical display provides a visual display of the predicted translational influence, including translational pause or slowing for numerous or all codon pairs of a polypeptide-encoding nucleotide sequence.
  • This visual display can be used in methods of modifying polypeptide-encoding nucleotide sequences in order to thereby modify the predicted translational kinetics of the mRNA into polypeptide in methods such as those provided herein.
  • the graphical displays can be used to identify one or more codon pairs to be modified in a polypeptide-encoding nucleotide sequence.
  • the graphical displays can be used in analyzing a polypeptide-encoding nucleotide sequence prior to modifying the polypeptide-encoding nucleotide sequence, or can be used in analyzing a modified polypeptide-encoding nucleotide sequence to determine, for example, whether or not further modifications are desired.
  • Methods for creating and using graphical displays can be performed according to any method known in the art, as exemplified in U.S. Patent Publication No. 2007/0298503, published on December 27, 2007, and U.S. Patent Publication No. 2007/0275399, published on November 29, 2007, which are incorporated by reference herein in their entireties.
  • graphical displays as described therein can be created to illustrate the translational kinetics of an original or redesigned polypeptide- encoding nucleotide sequence in the native or a heterologous organism, or to illustrate differences and/or similarities of translation kinetic of a polypeptide-encoding nucleotide sequence in which one or more codon pairs have been modified.
  • numerous normalized graphical displays can be created to illustrate differences and/or similarities of translation kinetics of a polypeptide-encoding nucleotide sequence when expressed in two or more different organisms.
  • the graphical displays can be created using translational kinetics values based on any of the methods for determining translational kinetics values provided herein or otherwise known in the art. For example, chi-squared as a function of codon pair position, chi-squared 2 as a function of codon position, or chi-squared 3 as a function of codon pair position, translational kinetics values thereof, empirical measurement of translational pause of codon pairs in a host organism, estimated translational pause capability based on observed presence and/or recurrence of a codon pair at predicted pause site, and variations and combinations thereof as provided herein.
  • the exact format of the graphical displays can take any of a variety of forms, and the specific form is typically selected for ease of analysis and comparison between plots.
  • the abscissa typically lists the position along the nucleotide sequence or polypeptide sequence, and can be represented by nucleotide position, codon position, codon pair position, amino acid position, or amino acid pair position.
  • the ordinate typically lists the translational kinetics value of the codon pair, such as, but not limited to, a translational kinetics value of codon pair frequency, including, but not limited to the z score of chisql , the z score of chisq2, the z score of chisq3, the empirically measured value, and the refined translational kinetics value.
  • the sequence position can be plotted along the ordinate and the translational kinetics value can be plotted along the abscissa.
  • a set of graphical displays including at least a first graphical display and a second graphical display, are prepared. These sets of displays can be compared in order to determine the difference in predicted translational efficiency or translational kinetics of the two plots.
  • the plots can differ according to any of a variety of criteria. For example, each plot can represent a different polypeptide-encoding nucleotide sequence, each plot can represent a different host organism, each plot can represent differently determined translational kinetics values, or any combination thereof.
  • any number of different graphical displays can be compared in accordance with the methods provided herein, for example, 2, 3, 4, 5, 6, 7, 8 or more different graphical displays can be compared.
  • two plots will represent different polypeptide-encoding nucleotide sequences, the same sequence in different host organisms, or different sequences in different host organisms.
  • Comparison of different graphical displays can be used to analyze the predicted change in translational kinetics as a result of the difference represented by the graphical displays. For example, comparison of the same polypeptide-encoding nucleotide sequence in different host organisms can be used to analyze any predicted transcriptional pauses that can be removed. Accordingly, provided herein are methods of analyzing translational kinetics of an mRNA into polypeptide in a host organism by comparing two graphical displays to understand or predict the differences in translational kinetics of the mRNA into polypeptide, where the differences in the graphical displays can be as a result of, for example, a difference in the polypeptide-encoding nucleotide sequence or a difference in the host organism.
  • a graphical display of the translational kinetics values of codon pairs for the original polypeptide- encoding nucleotide sequence in the heterologous host can be compared to a graphical display of the translational kinetics values of codon pairs for a modified polypeptide- encoding nucleotide sequence in the heterologous host, and it can be determined whether or not the modification to the polypeptide-encoding nucleotide sequence resulted in improved translational kinetics.
  • the nucleic acid sequences provided herein can be present in a polynucleotide (e.g., DNA or RNA molecule).
  • a polynucleotide e.g., DNA or RNA molecule.
  • the polynucleotides can be inserted into a replicable vector for cloning (e.g., amplification of the DNA) or for expression.
  • a replicable vector for cloning (e.g., amplification of the DNA) or for expression.
  • Various vectors are publicly available and are known in the art.
  • the vector can, for example, be in the form of a plasmid, cosmid, viral particle, or phage.
  • the appropriate nucleic acid sequence can be inserted into the vector by any of a variety of procedures known in the art.
  • Vector components can generally include, but are not limited to, one or more of a signal sequence, an origin of replication, one or more marker genes, an enhancer element, a promoter, and a transcription termination sequence. Construction of suitable vectors containing one or more of these components employs standard ligation techniques which are known to the skilled artisan.
  • the encoded polypeptide can be produced recombinantly not only directly, but also as a fusion polypeptide with a heterologous polypeptide, which can be, e.g., a signal sequence or other polypeptide having a specific cleavage site at the N- terminus of the mature protein or polypeptide.
  • a heterologous polypeptide which can be, e.g., a signal sequence or other polypeptide having a specific cleavage site at the N- terminus of the mature protein or polypeptide.
  • the signal sequence can be a component of the vector, or it can be a part of the polynucleotide that is inserted into the vector.
  • the signal sequence can be a prokaryotic signal sequence selected, for example, from the group of the alkaline phosphatase, penicillinase, lpp, or heat-stable enterotoxin II leaders.
  • the signal sequence can be, e.g., the yeast invertase leader, alpha factor leader (including Saccharomyces and Kluyveromyces ⁇ -factor leaders, the latter described in U.S. Patent No. 5,010,182), or acid phosphatase leader, the C. albicans glucoamylase leader (EP 362,179 published 4 April 1990), or the signal described in WO 90/13646 published 15 November 1990.
  • mammalian signal sequences can be used to direct secretion of the protein, such as signal sequences from secreted polypeptides of the same or related species, as well as viral secretory leaders.
  • Both expression and cloning vectors contain a polynucleoitde that permits the vector to replicate in one or more selected host cells. Such sequences are well known for a variety of bacteria, yeast, and viruses.
  • the origin of replication from the plasmid pBR322 is suitable for most Gram-negative bacteria, the 2 ⁇ plasmid origin is suitable for yeast, and various viral origins (SV40, polyoma, adenovirus, VSV or BPV) are useful for cloning vectors in mammalian cells.
  • Expression and cloning vectors will typically contain a selection gene, also termed a selectable marker.
  • Typical selection genes encode proteins that (a) confer resistance to antibiotics or other toxins, e.g., ampicillin, neomycin, methotrexate, or tetracycline, (b) complement auxotrophic deficiencies, or (c) supply critical nutrients not available from complex media, e.g., the gene encoding D-alanine racemase for Bacilli.
  • Suitable selectable markers for mammalian cells are those that enable the identification of cells competent to take up the polynucleotide- containing vector, such as DHFR or thymidine kinase.
  • An appropriate host cell when wild-type DHFR is employed is the CHO cell line deficient in DHFR activity, prepared and propagated as described by Urlaub et al., Proc. Natl. Acad. Sci. USA, 77:4216 (1980).
  • a suitable selection gene for use in yeast is the trpl gene present in the yeast plasmid YRp7 [Stinchcomb et al., Nature, 282:39 (1979); Kingsman et al., Gene, 7:141 (1979); Tschemper et al., Gene, 10: 157 (1980)].
  • the trpl gene provides a selection marker for a mutant strain of yeast lacking the ability to grow in tryptophan, for example, ATCC No. 44076 or PEP4-1 [Jones, Genetics, 85:12 (1977)].
  • Expression and cloning vectors usually contain a promoter operably linked to the polynucleotide provided herein to direct mRNA synthesis. Promoters recognized by a variety of potential host cells are well known. Promoters suitable for use with prokaryotic hosts include the ⁇ -lactamase and lactose promoter systems [Chang et al., Nature, 275:615 (1978); Goeddel et al., Nature, 281 :544 (1979)], alkaline phosphatase, a tryptophan (trp) promoter system [Goeddel, Nucleic Acids Res., 8:4057 (1980); EP 36,776], and hybrid promoters such as the tac promoter [deBoer et al., Proc. Natl. Acad. Sci. USA, 80:21-25 (1983)]. Promoters for use in bacterial systems also will contain a Shine-Dalgarno (S. D.) sequence operably linked to the poly
  • Suitable promoting sequences for use with yeast hosts include the promoters for 3-phosphoglycerate kinase [Hitzeman et al., J. Biol. Chem., 255:2073 (1980)] or other glycolytic enzymes [Hess et al., J. Adv.
  • yeast promoters which are inducible promoters having the additional advantage of transcription controlled by growth conditions, are the promoter regions for alcohol dehydrogenase 2, isocytochrome C, acid phosphatase, degradative enzymes associated with nitrogen metabolism, metallothionein, glyceraldehyde-3- phosphate dehydrogenase, and enzymes responsible for maltose and galactose utilization. Suitable vectors and promoters for use in yeast expression are further described in EP 73,657.
  • Transcription from vectors in mammalian host cells is controlled, for example, by promoters obtained from the genomes of viruses such as polyoma virus, fowlpox virus (UK 2,211,504 published 5 July 1989), adenovirus (such as Adenovirus T), bovine papilloma virus, avian sarcoma virus, cytomegalovirus, a retrovirus, hepatitis-B virus and Simian Virus 40 (SV40), from heterologous mammalian promoters, e.g., the actin promoter or an immunoglobulin promoter, and from heat-shock promoters, provided such promoters are compatible with the host cell systems.
  • viruses such as polyoma virus, fowlpox virus (UK 2,211,504 published 5 July 1989), adenovirus (such as Adenovirus T), bovine papilloma virus, avian sarcoma virus, cytomegalovirus, a retrovirus,
  • Enhancers are cis-acting elements of DNA, usually about from 10 to 300 bp, that act on a promoter to increase its transcription.
  • Many enhancer sequences are now known from mammalian genes (globin, elastase, albumin, ⁇ - fetoprotein, and insulin).
  • an enhancer from a eukaryotic cell virus. Examples include the S V40 enhancer on the late side of the replication origin (bp 100-270), the cytomegalovirus early promoter enhancer, the polyoma enhancer on the late side of the replication origin, and adenovirus enhancers.
  • the enhancer can be spliced into the vector at a position 5' or 3' to the polynucleotide provided herein, but is preferably located at a site 5' from the promoter.
  • Expression vectors used in eukaryotic host cells will also contain sequences necessary for the termination of transcription and for stabilizing the mRNA. Such sequences are commonly available from the 5' and, occasionally 3', untranslated regions of eukaryotic or viral DNAs or cDNAs. These regions contain nucleotide segments transcribed as polyadenylated fragments in the untranslated portion of the mRNA transcribed from the polynucleotide provided herein.
  • Host cells are transfected or transformed with expression or cloning vectors described herein for polypeptide production and cultured in conventional nutrient media modified as appropriate for inducing promoters, selecting transformants, or amplifying the genes encoding the desired sequences.
  • the culture conditions such as media, temperature, pH and the like, can be selected by the skilled artisan without undue experimentation. In general, principles, protocols, and practical techniques for maximizing the productivity of cell cultures can be found in Mammalian Cell Biotechnology: a Practical Approach, M. Butler, ed. (IRL Press, 1991) and Sambrook et al., supra.
  • Methods of eukaryotic cell transfection and prokaryotic cell transformation are known to the ordinarily skilled artisan, for example, CaCl 2 , CaPO 4 , liposome-mediated and electroporation. Depending on the host cell used, transformation is performed using standard techniques appropriate to such cells.
  • the calcium treatment employing calcium chloride, as described in Sambrook et al., supra, or electroporation is generally used for prokaryotes.
  • Infection with Agrobacterium tumefaciens is used for transformation of certain plant cells, as described by Shaw et al., Gene, 23:315 (1983) and WO 89/05859 published 29 June 1989.
  • Suitable host cells for cloning or expressing the DNA in the vectors herein include prokaryote, yeast, or higher eukaryote cells.
  • Suitable prokaryotes include but are not limited to eubacteria, such as Gram-negative or Gram-positive organisms, for example, Enterobacteriaceae such as E. coli.
  • Various E. coli strains are publicly available, such as E. coli Kl 2 strain MM294 (ATCC 31,446); E. coli Xl 776 (ATCC 31,537); E. coli strain W31 10 (ATCC 27,325) and K5 772 (ATCC 53,635).
  • suitable prokaryotic host cells include Enterobacteriaceae such as Escherichia, e.g., E. coli, Enterobacter, Erwinia, Klebsiella, Proteus, Salmonella, e.g., Salmonella typhimurium, Serratia, e.g., Serratia marcescans, and Shigella, as well as Bacilli such as B. subtilis and B. licheniformis (e.g., B. licheniformis 41P disclosed in DD 266,710 published 12 April 1989), Pseudomonas such as P. aeruginosa, and Streptomyces. These examples are illustrative rather than limiting.
  • Strain W3110 is one particularly preferred host or parent host because it is a common host strain for recombinant DNA product fermentations. Preferably, the host cell secretes minimal amounts of proteolytic enzymes.
  • strain W3110 can be modified to effect a genetic mutation in the genes encoding proteins endogenous to the host, with examples of such hosts including E. coli W31 10 strain 1A2, which has the complete genotype tonA ; E. coli W3110 strain 9E4, which has the complete genotype tonA ptr3; E.
  • coli W31 10 strain 27C7 (ATCC 55,244), which has the complete genotype tonA ptr3 phoA El 5 (argF-lac)169 degP ompT kanr; E. coli W31 10 strain 37D6, which has the complete genotype tonA ptr3 phoA El 5 (argF- lac)169 degP ompT rbs7 ilvG kanr; E. coli W31 10 strain 40B4, which is strain 37D6 with a non-kanamycin resistant degP deletion mutation; and an E. coli strain having mutant periplasmic protease disclosed in U.S. Patent No. 4,946,783 issued 7 August 1990.
  • in vitro methods of cloning e.g., PCR or other nucleic acid polymerase reactions, are suitable.
  • eukaryotic microbes such as filamentous fungi or yeast are suitable cloning or expression hosts for polynucleoitide-containing vectors.
  • Saccharomyces cerevisiae is a commonly used lower eukaryotic host microorganism.
  • Others include Schizosaccharomyces pombe (Beach and Nurse, Nature, 290: 140 [1981]; EP 139,383 published 2 May 1985); Kluyveromyces hosts (U.S. Patent No. 4,943,529; Fleer et al., Bio/Technology, 9:968-975 (1991)) such as, e.g., K.
  • lactis (MW98-8C, CBS683, CBS4574; Louvencourt et al., J. Bacterid., 154(2):737-742 [1983]), K. fragilis (ATCC 12,424), K. bulgaricus (ATCC 16,045), K. wickeramii (ATCC 24,178), K. waltii (ATCC 56,500), K. drosophilarum (ATCC 36,906; Van den Berg et al., Bio/Technology, 8:135 (1990)), K. thermotolerans, and K. marxianus; yarrowia (EP 402,226); Pichia pastoris (EP 183,070; Sreekrishna et al., J.
  • Candida Trichoderma reesia (EP 244,234); Neurospora crassa (Case et al., Proc. Natl. Acad. Sci. USA, 76:5259-5263 [1979]); Schwanniomyces such as Schwanniomyces occidentalis (EP 394,538 published 31 October 1990); and filamentous fungi such as, e.g., Neurospora, Penicillium, Tolypocladium (WO 91/00357 published 10 January 1991), and Aspergillus hosts such as A. nidulans (Ballance et al., Biochem. Biophys. Res.
  • Methylotropic yeasts are suitable herein and include, but are not limited to, yeast capable of growth on methanol selected from the genera consisting of Hansenula, Candida, Kloeckera, Pichia, Saccharomyces, Torulopsis, and Rhodotorula. A list of specific species that are exemplary of this class of yeasts can be found in C. Anthony, The Biochemistry of Methylotrophs, 269 (1982).
  • Suitable host cells for the expression of glycosylated polypeptides are derived from multicellular organisms.
  • invertebrate cells include insect cells such as Drosophila S2 and Spodoptera Sf9, as well as plant cells.
  • useful mammalian host cell lines include Chinese hamster ovary (CHO) and COS cells. More specific examples include monkey kidney CVl line transformed by SV40 (COS-7, ATCC CRL 1651); human embryonic kidney line (293 or 293 cells subcloned for growth in suspension culture, Graham et al., J. Gen Virol., 36:59 (1977)); Chinese hamster ovary cells/-DHFR (CHO, Urlaub and Chasin, Proc. Natl. Acad. Sci.
  • mice Sertoli cells TM4, Mather, Biol. Reprod., 23:243-251 (1980)
  • human lung cells Wl 38, ATCC CCL 75
  • human liver cells Hep G2, HB 8065
  • mouse mammary tumor MMT 060562, ATCC CCL51. The selection of the appropriate host cell is deemed to be within the skill in the art.
  • Gene amplification and/or expression can be measured in a sample directly, for example, by conventional Southern blotting, Northern blotting to quantitate the transcription of mRNA [Thomas, Proc. Natl. Acad. Sci. USA, 77:5201 5205 (1980)], dot blotting (DNA analysis), or in situ hybridization, using an appropriately labeled probe, based on the sequences provided herein.
  • antibodies can be employed that can recognize specific duplexes, including DNA duplexes, RNA duplexes, and DNA RNA hybrid duplexes or DNA protein duplexes. The antibodies in turn can be labeled and the assay can be carried out where the duplex is bound to a surface, so that upon the formation of duplex on the surface, the presence of antibody bound to the duplex can be detected.
  • Gene expression can be measured by immunological methods, such as immunohistochemical staining of cells or tissue sections and assay of cell culture or body fluids, to quantitate directly the expression of gene product.
  • Antibodies useful for immunohistochemical staining and/or assay of sample fluids can be either monoclonal or polyclonal, and can be prepared in any mammal. Conveniently, the antibodies can be prepared against any polypeptide provided herein or against a synthetic peptide based on the sequences provided herein or against exogenous sequence fused to the polypeptide or fragment thereof and encoding a specific antibody epitope.
  • Polypeptides can be recovered from culture medium or from host cell lysates. If membrane-bound, it can be released from the membrane using a suitable detergent solution (e.g. Triton-X 100) or by enzymatic cleavage. Cells employed in expression of polypeptides can be disrupted by various physical or chemical means, such as freeze-thaw cycling, sonication, mechanical disruption, or cell lysing agents, as is known in the art. [0397] It may be desired to purify polyeptpides.
  • the following procedures are exemplary of suitable purification procedures: by fractionation on an ion-exchange column; ethanol precipitation; reverse phase HPLC; chromatography on silica or on a cation-exchange resin such as DEAE; chromatofocusing; SDS-PAGE; ammonium sulfate precipitation; gel filtration using, for example, Sephadex G-75; protein A Sepharose columns to remove contaminants such as IgG; and metal chelating columns to bind epi tope-tagged forms of the polypeptide.
  • Various additional known methods of protein purification can be employed; exemplary methods are described in Deutscher, Methods in Enzymology, 182 (1990); Scopes, Protein Purification: Principles and Practice, Springer- Verlag, New York (1982).
  • the purification step(s) selected will depend, for example, on the nature of the production process used and the particular polypeptide produced.
  • an expression system comprising an expression vector in a host organism, wherein the expression vector includes a DNA sequence of the embodiments provided herein operably linked to an expression control sequence.
  • an expression vector is a DNA or RNA vector that is capable of transforming a host cell and of effecting expression of a specified nucleic acid molecule.
  • the expression vector is also capable of replicating within the host cell.
  • Expression vectors can be either prokaryotic or eukaryotic, and are typically viruses or plasmids.
  • operably linked refers to functional linkage between a nucleic acid expression control sequence (such as a promoter, or array of transcription factor binding sites) and a second nucleic acid sequence, wherein the expression control sequence directs transcription of the nucleic acid corresponding to the second sequence.
  • An operably linked expression vector can also include secretion signals and other modifying sequences, and can encode chaperones and proteins for a variety of organisms and systems.
  • Methods of expressing polypeptides from polypeptide-encoding nucleotide sequences are known in the art, as exemplified, for example, by the techniques described in Maniatis et al., 1989, Molecular Cloning: A Laboratory Manual, Cold Spring Harbor Laboratory, N.Y. and Ausubel et al., 2008, Current Protocols in Molecular Biology, Greene Publishing Associates and Wiley Interscience, N.Y.
  • the methods include inserting a polypeptide- encoding nucleotide sequence designed by the methods provided herein into a cell, and expressing the polypeptide-encoding nucleotide sequence under conditions suitable for gene expression. Additionally provided expression methods include cell-free expression systems as known in the art, where such methods include providing a polypeptide- encoding nucleotide sequence designed by the methods provided herein and contacting the polypeptide-encoding nucleotide sequence with a cell-free expression system under conditions suitable for protein translation.
  • the expression levels of one or more enzymes in a metabolic pathway are individually manipulated. Differential metabolic expression levels can be manipulated using methods known in the art. For example, by selecting a specific promoter with a desired transcriptional level, one can vary the expression level of the gene that is operably linked to the promoter. Similarly, one may select an expression vector that produces the desired levels of expression.
  • Endogenous sequences include genomic sequences of a cell. Such genomic sequences can include sequences previously modified by the constructs, methods and systems provided herein. Modifications of endogenous sequences can include insertions, deletions and mutations. In some embodiments, a modification can include the insertion of a heterologous sequence. Heterologous sequences include exogenous nucleic acid sequences and can include sequences with homology to endogenous sequences.
  • Integrable polynucleotides for modifying endogenous nucleotide sequences in cell are provided.
  • Such integrable polynucleotides can contain sequences with homology to endogenous sequences and a removable selectable marker cassette.
  • the removable selectable marker cassette can include a selectable marker flanked by a 5' site-specific recombinase recognition sequence and a 3' site-specific recombinase recognition sequence.
  • integrable polynucleotides can also contain heterologous sequences.
  • the heterologous sequences and removable selectable marker cassette can be flanked by a 5' nucleic acid sequence with homology to an endogenous sequence and a 3' nucleic acid sequence with homology to an endogenous sequence.
  • integrable polynucleotides can include episomal nucleic acids, such as plasmids and YACS.
  • integrable polynucleotides can include autonomous replication sequences such as CoIEl, Ori, oriT, 2 ⁇ m, CEN/ARS.
  • integrable polynucleotides can include linearized episomal nucleic acids, for example, plasmids cut with a restriction enzyme.
  • integrable polynucleotides can include PCR products.
  • a removable selectable cassette can contain a selectable marker flanked by a 5' site-specific recombinase recognition sequence and a 3' site-specific recombinase recognition sequence.
  • Removable selectable marker cassettes can be used to select for integration of an integrable polynucleotide into the genome of a cell. Subsequent to integration of the integrable polynucleotide, the removable selectable marker cassette can be excised, if desired, from the genome of the cell. Because the number of known selectable markers is limited, one advantage of excising a selectable maker from the genome of a cell is that the selectable marker can be used repeatedly.
  • the same selectable marker can be used in a second integrable polynucleotide to modify the genome of a cell previously modified by the first integrable polynucleotide.
  • the selectable marker can allow selection for a cell in which the selectable marker has integrated into the cell's genome.
  • Selectable markers can be antibiotic resistance genes against compounds, for example, kanamycin, ampicillin, tetracycline, chloramphenicol, spectinomycin, gentamycin, zeomycin, or streptomycin. More selectable markers can be genes capable of complementing strains of yeast having well characterized metabolic deficiencies, for example, tryptophan or histidine deficient mutants.
  • a selectable marker can be used to select against cells that retain the selectable marker. In such embodiments, cells which do not express the selectable marker will be selected for.
  • a selectable marker can be selected for and against.
  • selectable markers examples include, but are not limited to, URA3 (Boeke, J. D. , LaCroute, F. , and Fink, G. R. (1984).
  • a counterselection for the tryptophan pathway in yeast 5-fluoroanthranilic acid resistance.
  • Yeast 16, 553-560 CANl (Whelan, W. L., Gocke, E., and Manney, T. R. (1979).
  • the CANl locus of Saccharomyces cerevisiae fine-structure analysis and forward mutation rates. Genetics 35-51), KIURA3, CYH2, LYS2 and MET15 (Singh, A. and Sherman, F. (1975). Genetic and physiological characterization of metl5 mutants of Saccharomyces cerevisiae: a selective system for forward and reverse mutations. Genetics 75-97).
  • Such examples can typically be used in conjunction with specific strains of Saccharamyces cerevisiae which are non-functional for specific genes.
  • a first selection of the selectable marker can be made to select for incorporation of the selectable marker and a second selection of the selectable marker can be made to select against maintaining the selectable marker.
  • Such embodiments can find particular application when the same selectable marker is utilized iteratively, namely, two or more times, for the separate incorporation of two or more heterologous polynucleotides into the host organism.
  • the selectable marker can be flanked by site- specific recombinase recognition sequences.
  • site-specific recombinase recognition sequences allow a site-specific recombinase to excise the selectable marker from an integrable polynucleotide integrated into the genome of a cell.
  • sequence-specific recombinase target sites include, but are not limited to, loxP sites, fit sites, att sites and dif sites.
  • the site-specific recombinase recognition sequences can be loxP sites recognized by the CRE recombinase.
  • the CRE recombinase can be a CRE recombinase optimized for expression in a particular organism, for example, S. cerevisiae, using methods known in the art.
  • the site-specific recombinase recognition sequence can be frt sites recognized by the FLP recombinase.
  • flanking loxP sites or flanking frt sites should be in the same orientation, that is, the sites should be in tandem orientation.
  • CRE recombinase or FLP recombinase expressed in a cell can excise the sequence between loxP sites or frt sites, respectively.
  • the site-specific recombinase can be expressed from a plasmid. In other embodiments, the site-specific recombinase can be expressed from an inducible endogenous gene.
  • integration of an integrable polynucleotide into the genome of a cell can be mediated by a variety of processes.
  • Such processes can include, but are not limited to, random integration, homologous recombination, or site- specific recombination.
  • integrable polynucleotides can contain sequences with homology to endogenous sequences. Such sequences with homology to endogenous sequences can direct integration of integrable polynucleotides to certain locations in a cell's genome, specifically, the location of the endogenous sequence.
  • One advantage of directing integration of integrable polynucleotides to particular locations of the genome is that the integrable polynucleotides can be directed to locations of the genome that, for example, can contain enhancer elements, locus control regions, or can be more permissive for expression of a heterologous sequence contained within an integrable polynucleotide.
  • sequences with homology to endogenous sequences can be more than about 5 nucleotides, more than about 10 nucleotides, more than about 15 nucleotides, more than about 20 nucleotides, more than about 25 nucleotides, more than about 30 nucleotides, more than about 35 nucleotides, more than about 40 nucleotides, more than about 45 nucleotides, more than about 50 nucleotides, more than about 100 nucleotides, more than 500 nucleotides, more than about 1 kilobases, more than about 2 kilobases, more than about 3 kilobases, more than about 4 kilobases, or more than about 5 kilobases in length.
  • Sequences with homology to endogenous sequences can be 100% identical or can have at least 99 %, 98 %, 97 %, 96 %, 95 %, 94 %, 93 %, 92 %, 91 %, 90 %, 85 %, 80 %, 70 %, or 70% identity to the endogenous sequence.
  • sequences with homology to endogenous sequences can contain sequences with homology to genomic repetitive elements, such as long interspersed repeats (LINEs), short interspersed repeats (SINEs), or retrotransposon DNA, such as long terminal repeats (LTR).
  • genomic repetitive elements can be TyI or Ty3 elements.
  • integrable polynucleotides containing sequences with homology to genomic repetitive elements may integrate at more than one site in the genome of a cell.
  • sequences with homology to endogenous sequences can contain ⁇ sequences, ⁇ sequences are a component of the LTR of the TyI retrotransposon and are distributed throughout the S. cerevisiae genome.
  • Vectors containing ⁇ sequences for integration into S. cerevisiae are known in the art, as exemplified in Lee F.W. and Da Dilva N.A., Sequential delta-integration for the regulated insertion of cloned genes in Saccharomyces cerevisiae. Biotechnol Prog. (1997) 13(4): 368-373.
  • the 5' nucleic acid sequence with homology to an endogenous sequence and the 3' nucleic acid sequence with homology to an endogenous sequence can contain ⁇ sequences.
  • Vectors containing heterologous sequences flanked by ⁇ sequences are known in the art to have an increased stability for expression of heterologous sequences contained therein (Lee F.W.
  • an integrable polynucleotide can contain heterologous sequences.
  • Such heterologous sequences can include sequences encoding polypeptides.
  • the heterologous sequences can encode genes important in sugar metabolism, cellulose metabolism, arabinose metabolism, and xylose metabolism.
  • heterologous sequences can contain regulatory elements operatively linked to a sequence encoding a polypeptide.
  • regulatory elements can include, for example, promoters, enhancers, and terminator sequences. Promoters may be constitutive or inducible. Suitable promoters for use in prokaryotic hosts include, but are not limited to, the trp, lac and phage promoters, tRNA promoters and glycolytic enzyme promoters.
  • Useful yeast promoters include, but are not limited to, the promoter regions for metallothionein, 3-phosphoglycerate kinase or other glycolytic enzymes such as enolase or glyceraldehyde-3 -phosphate dehydrogenase and the enzymes responsible for maltose and galactose utilization.
  • Appropriate mammalian promoters include, but are not limited to, the early and late promoters from SV40 and promoters derived from murine Moloney leukemia virus (MLV), mouse mammary tumor virus (MMTV), avian sarcoma viruses, adenovirus II, bovine papilloma virus and polyomas.
  • a heterologous sequence can contain the PGKl promoter, the TEFl promoter, the CYCl terminator, and combinations thereof.
  • heterologous sequences encode and express the gene of interest in a cell in which the heterologous sequence has integrated.
  • a cell can contain any of the integrable polynucleotides described herein.
  • a cell can be a prokaryotic cell or a eukaryotic cell.
  • prokaryotic cells include Escherichia coli, and Clostridium species.
  • eukaryotic cells include, but are not limited to, fungi and yeast cells, such as, Saccharomyces cerevisiae, Pichia pasto ⁇ s, Zymomonas mobilis, Kluyveromyces lactis, Kluveromyces marxianus, Trichoderma species, and Aspergillus species; mammalian cells, such as Chinese hamster cells; avian cells; and insect cells.
  • the cell can contain an integrable polynucleotide integrated into the genome of a cell.
  • a cell can contain a heterologous nucleic acid integrated into the genome of the cell in which the removable selectable marker is juxtaposed to said heterologous nucleic acid.
  • a removable selectable marker can be juxtaposed to a heterologous nucleic acid where the removable selectable marker and the heterologous nucleic acid are adjacent to one another on a sequence, for example, the removable selectable marker and the heterologous nucleic acid can be immediately adjacent to one another, or separated by less than 1 nucleotide, less than about 5 nucleotides, less than about 10 nucleotides, less than about 20 nucleotides, less than about 30 nucleotides, less than about 40 nucleotides, less than about 50 nucleotides, less than about 60 nucleotides, less than about 70 nucleotides, less than about 80 nucleotides, less than about 90 nucleotides, less than about 100 nucleotides, less than about 200 nucleotides, less than about 300 nucleotides, less than about 400 nucleotides, less than about 0.5 kilobases, less than about 1 kilobases, less than about 2 kilobases, less
  • a cell can contain an integrable polynucleotide integrated into the genome of the cell where the removable selectable cassette has been excised from the integrated polynucleotide.
  • a cell can contain a heterologous nucleic acid integrated into the genome of the cell in which a site-specific recombinase recognition site is juxtaposed to the heterologous nucleic acid.
  • a site-specific recombinase recognition site can be juxtaposed to a heterologous nucleic acid where the site-specific recombinase recognition site and the heterologous nucleic acid are adjacent to one another on a sequence, for example, the site-specific recombinase recognition site and the heterologous nucleic acid can be immediately adjacent to one another, or separated by less than 1 nucleotide, less than about 5 nucleotides, less than about 10 nucleotides, less than about 20 nucleotides, less than about 30 nucleotides, less than about 40 nucleotides, less than about 50 nucleotides, less than about 60 nucleotides, less than about 70 nucleotides, less than about 80 nucleotides, less than about 90 nucleotides, less than about 100 nucleotides, less than about 200 nucleotides, less than about 300 nucleotides, less than about 400 nucleotides, less than about 0.5 kilob
  • a cell can contain a plurality of integrable polynucleotides.
  • a cell can contain a plurality of different integrable polynucleotides containing different selectable markers.
  • a cell contains no more than about 1, no more than about 2, no more than about 3, no more than about 4, no more than about 5, no more than about 6, no more than about 7, no more than about 8, no more than about 8, or no more than about 10 different selectable markers.
  • the number of selectable markers a cell can contain can include the number of different selectable markers compatible with the methods and compositions described herein.
  • a cell can contain a plurality of different integrable polynucleotides that have integrated into the genome of the cell.
  • a cell can contain 2 or more, 3 or more, 4 or more, 5 or more, 6 or more, 7 or more, 8 or more, 9 or more, 10 or more, 15 or more, 20 or more, 25 or more, 30 or more, 40 or more, 45 or more, or 50 or more different integrable polynucleotides that have integrated into the genome of the cell.
  • a cell can contain a plurality of different integrable polynucleotides that have integrated into the genome of the cell where some integrable polynucleotides contain selectable markers, and some integrable polynucleotides have no selectable marker. In even more embodiments, a cell can contain a plurality of different integrable polynucleotides where some or all of the selectable markers have been excised.
  • methods to modify an endogenous sequence in a cell can include providing a cell with any integrable polynucleotide described herein, and selecting for at least one cell containing the integrable polynucleotide integrated into the genome of the cell.
  • a plurality of different integrable polynucleotides can be provided to a cell.
  • the plurality of different integrable polynucleotides can include 2 or more, 3 or more, 4 or more, 5 or more, 6 or more, 7 or more, 8 or more, 9 or more, 10 or more different integrable polynucleotides.
  • the plurality of integrable polynucleotides can include integrable polynucleotides with different selectable makers.
  • One advantage of providing a cell with a plurality of polynucleotides with different selectable markers includes the ability to make more than one modification to endogenous sequences in a cell simultaneously.
  • the plurality of integrable polynucleotides can include integrable polynucleotides with different heterologous sequences.
  • the plurality of integrable polynucleotides can include integrable polynucleotides with different flanking sequences with homology to endogenous sequences.
  • at least one selectable marker can be used iteratively.
  • a cell can be produced from a first round of modification(s) using the methods described herein.
  • a cell can be provided with a first integrable polynucleotide containing a selectable marker, a cell can be selected for containing the integrable polynucleotide integrated into the cell's genome, the selection cassette can be excised from a cell containing an integrated integrable polynucleotide, and a cell can be selected for having the selection cassette excised. Subsequent to the first round of modifications, a cell containing the modifications of the first round, can undergo at least a second round of modifications using a second integrable polynucleotide containing the same selectable marker as the first integrable polynucleotide. As such, a selectable marker can be reused and is used iteratively.
  • a cell can be provided with a plurality of integrable polynucleotides containing set of different selectable markers in a first round of modifications.
  • a cell containing the modifications of the first round of modifications can be provided with a plurality of integrable polynucleotides containing the same set of different selectable markers as the first round of modifications.
  • the integrable polynucleotide can be provided to a cell as a linearized plasmid.
  • the integrable polynucleotide can be provided to a cell as a PCR product.
  • Methods of PCR are well known in the art.
  • the template for the PCR can comprise a sequence for an integrable polynucleotide, for example, a vector containing the integrable polynucleotide sequence.
  • the initial template for PCR may not contain the entire sequence for an integrable polynucleotide.
  • One advantage of using PCR to generate the integrable polynucleotide includes the ability to incorporate additional sequences to the ends of the initial PCR template.
  • PCR primers with tails can be designed and used to amplify the initial PCR template and incorporate the additional sequences in the tails into the amplified product.
  • Such additional tail sequences can be 2 nucleotides, 3 nucleotides, 4 nucleotides, 5 nucleotides, 6 nucleotides, 7 nucleotides, 8 nucleotides, 9 nucleotides, 10 nucleotides, 1 1 nucleotides, 12 nucleotides, 13 nucleotides, 14 nucleotides, 15 nucleotides, 16 nucleotides, 17 nucleotides, 18 nucleotides, 19 nucleotides, 20 nucleotides, 21 nucleotides, 22 nucleotides, 24 nucleotides, 25 nucleotides, 26 nucleotides, 27 nucleotides, 28 nucleotides, 29 nucleotides, 30 nucleotides, 31 nucleotides, 32 nucleotides, 33 nucleotides, 34 nucleotides, 35 nucleotides, 36 nucleotides, 37 nucleotides, 38
  • primers for the PCR can be designed to add sequences with homology to endogenous sequences to the initial PCR template.
  • an integrable polynucleotide with flanking sequences with homology to endogenous sequences can be generated.
  • additional tail sequences can include TyI sequences.
  • methods to modify an endogenous sequence in a cell can also include excising the selectable marker from the integrable polynucleotide integrated into the genome of the cell.
  • excising a selectable marker integrated into the genome of a cell is that the selectable marker can be re-used to select for another modification in a subsequent round of modifications.
  • a selectable marker can be excised from an integrated site by site-specific recombination using a site-specific recombinase expressed in the cell.
  • Site-specific recombinases can include CRE recombinase to excise sequences between tandem loxP sites, and FLP recombinase to excise sequences between tandem frt sites.
  • the site- specific recombinase can be expressed from a plasmid transformed into the cell.
  • the site-specific recombinase can be expressed from an inducible endogenous gene. It is contemplated that in instances where more than one type of different selectable makers have integrated into the cell's genome, all the different selectable makers can be excised simultaneously by the expression of at least one type of site-specific recombination.
  • the selectable markers of an integrable polynucleotide containing the URA3 marker flanked by loxP sites, and an integrable polynucleotide containing the TRPl marker flanked by loxP sites can both be excised from sites where the integrable polynucleotides have integrated into the cell by expression in the cell of CRE recombinase.
  • a cell can be provided with a plurality of integrable polynucleotides which contain different recombinase recognition sequences.
  • the plurality of integrable polynucleotides can include some integrable polynucleotides that contain one type of recombinase recognition sequences, such as loxP sites, and some integrable polynucleotides can contain another type of recombinase recognition sequences, such as frt sites.
  • a cell in which a selectable marker has been excised can be identified by selecting against cells that retain the marker. Methods for such negative selection are well known in the art.
  • one or more, or all of the enzymes are heterologous to the one or more host organisms.
  • the translational kinetics of each of the DNA sequences encoding the enzymes has been increased by silent permutation or conservative amino acid substitution of at least 1 , 2, or 3 codon pairs present in the original sequence for each enzyme.
  • a silent permutation is a change to one or more nucleotides of a codon such that the encoded amino acid does not change.
  • the at least 1 , 2 or 3 substituted codon pairs are predicted to cause a translational pause or slowing in the host organism, and the substituting codon pair is typically a codon pair not predicted to cause a translational pause or slowing in the host organism.
  • the one or more host organisms are selected from the group consisting of: Saccharomyces cerevisiae, Pichia pastoris, Escherichia coli, Bombyx mori, Spodoptera frugiperda, Drosophila melanogaster, Kluyveromyces lactis, Zymomonas mobilis and Schizosaccharomyces pombe.
  • each encoded enzyme in the system has at least a 50%, 60%, 70%, 80%, and more typically at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% amino acid sequence identity to the with the original sequence of the enzyme.
  • one or more of the endo-l,4- ⁇ -glucanase, exo-l ,4- ⁇ - D-glucanase, and ⁇ -D-glucosidase enzymes in the system retains at least 75% of the enzymatic activity of the enzyme encoded by the original sequence under conditions suitable for degradation of cellulose.
  • Methods for measuring the activity of the enzymes in the system are known in the art.
  • the incorporated materials of U.S. Patent No. 6,566,1 13 provide methods for measuring the activity of cellobiohydrolases that have been recombinantly expressed.
  • Also provided are methods of hydrolyzing a carbohydrate comprising providing a carbohydrate comprising at least one glycosidic bond, providing a polypeptide encoded by any of the polynucleotides provided herein, and contacting said carbohydrate with said polypeptide under conditions that permit said polypeptide to hydrolyze at least one glycosidic bond of said carbohydrate, whereby at least one glycosidic bond of said carbohydrate is hydrolyzed.
  • the carbohydrate is cellulose.
  • the carbohydrate comprises two or more ⁇ -l ,4-linked glucose units.
  • Such methods can be performed using the cells and systems provided herein. Such methods can be performed in order to provide smaller polysaccharides and/or monosaccharides which can be used by a cell or processed extracellularly according to any one of a variety of known methods in the art.
  • An exemplary system for lignin metabolism is a cassette of enzymes that can include laccase (LCC), Mn-dependent peroxidase (MnP), and lignin peroxidase (LiP).
  • LCC laccase
  • MnP Mn-dependent peroxidase
  • LiP lignin peroxidase
  • one or more, or all of the enzymes are heterologous to the one or more host organisms.
  • the translational kinetics of each of the DNA sequences encoding the enzymes has been increased by silent permutation or conservative amino acid substitution of at least 1, 2, 3, 4, 5 or 6 or more codon pairs present in the original sequence for each enzyme.
  • a silent permutation is a change to one or more nucleotides of a codon such that the encoded amino acid does not change.
  • the at least 1, 2, 3, 4, 5 or 6 or more substituted codon pairs are predicted to cause a translational pause or slowing in the host organism, and the substituting codon pair is typically a codon pair not predicted to cause a translational pause or slowing in the host organism.
  • a codon pair in the modified polynucleotide can be selected to preserve or insert a predicted pause.
  • the one or more host organisms are selected from the group consisting of: Saccharomyces cerevisiae, Pichia pastoris, Escherichia coli, Bombyx mori, Spodoptera frugiperda, Drosophila melanogaster and Schizosaccharomyces pombe.
  • each encoded enzyme in the system has at least a 50%, 60%, 70%, 80%, and more typically at least 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98% or 99% amino acid sequence identity to the with the original sequence of the enzyme.
  • one or more of the enzymes in the system retains at least 75% of the enzymatic activity of the enzyme encoded by the original sequence under conditions suitable for metabolism of lignin. Methods for measuring the activity of the enzymes in the system are known in the art.
  • Also provided are methods of hydrolyzing a carbohydrate comprising providing a carbohydrate comprising at least one glycosidic bond, providing a polypeptide encoded by any of the polynucleotides provided herein, and contacting said carbohydrate with said polypeptide under conditions that permit said polypeptide to hydrolyze at least one glycosidic bond of said carbohydrate, whereby at least one glycosidic bond of said carbohydrate is hydrolyzed.
  • the carbohydrate is cellulose.
  • the carbohydrate comprises two or more ⁇ -l,4-linked glucose units.
  • Such methods can be performed using the cells and systems provided herein. Such methods can be performed in order to provide smaller polysaccharides and/or monosaccharides which can be used by a cell or processed extracellularly according to any one of a variety of known methods in the art.
  • a polynucleotide containing an improved-expression nucleotide sequence calculated in accordance with the teachings herein can be prepared by known methods, such as, for example, assembly of overlapping oligonucleotides which can be solid phase synthesized, as is described in U.S. Patent Number 7,262,031, and U.S. Patent Publication Numbers 2005/0106590 and 2007/0009928.
  • the prepared polynucleotide can then be amplified by PCR methodologies or by insertion into a vector, transformation into cells, and subsequent harvesting of the vector from the cells. Examples of such methods for amplification of a polynucleotide are provided in Ausubel et al., 2008, Current Protocols in Molecular Biology, Greene Publishing Associates and Wiley Interscience, N.Y.
  • the polynucleotide itself or amplicon thereof can be inserted into an expression vector configured to produce the polypeptide encoded by the inserted polynucleotide.
  • the expression vector is then inserted into cells, and according to the expression vector used, the cells are treated under conditions suitable for polypeptide expression.
  • the expressed polypeptide can be analyzed and manipulated as desired.
  • the expressed polypeptide can be analyzed by Western blot analysis using a known antibody to the expressed polypeptide or using an anti-polypeptide antibody generated by known methods.
  • the expressed polypeptide also can be subjected to one or more purification steps to increase the purity of the expressed polypeptide.
  • Various analytical and purification method, as well as antibody-generation methods are known in the art, as exemplified in Ausubel, supra.
  • This example describes optimization of a DNA sequence encoding TrCBH-II for expression in yeast.
  • the chi-squared value "chisql” was generated by the expected and observed values determined.
  • the chsql was recalculated to remove any influence of non-randomness in amino acid pair frequencies, yielding "chisq2.”
  • the chsq2 was re-calculated to remove any influence of non- randomness in dinucleotide frequencies, yielding "chisq3.”
  • z scores of chisq3 were calculated by determining the mean chisq3 value and corresponding standard deviation for all codon pairs, and normalizing each chisq3 value to be reported in terms of number of standard deviations from the mean chisq3 values.
  • the nucleotide sequence for the gene encoding the TrCBH-II protein was modified to optimize codon usage for S. cerevisiae.
  • the DNA sequence encoding TrCBH-II (SEQ ID NO: 1) was derived from GenBank accession number M 16190 by removing untranslated sequence (5' untranslated region and introns).
  • a graphical display for the native gene (SEQ ID NO: 1) encoding the TrCBH-II protein (SEQ ID NO: 2) in T. reesei was prepared by plotting z scores of translational kinetics values for codon pair utilization in T. reesei as a function of codon pair position.
  • the graphical display is provided in Figure 1.
  • a graphical display for the native gene (SEQ ID NO: 1) encoding the TrCBH-II protein (SEQ ID NO: 2) in S. cerevisiae was prepared by plotting z scores of translational kinetics values for codon pair utilization in S. cerevisiae as a function of codon pair position.
  • the graphical display is provided in Figure 2A.
  • the nucleotide sequence for the gene encoding the TrCBH-II protein was modified to no longer contain codon pairs having z scores in S. cerevisiae greater than 3.
  • the resulting nucleotide sequence (SEQ ID NO: 3) was found to encode a protein (SEQ ID NO: 4) with 100% amino acid sequence identity to wild-type TrCBH-II (SEQ ID NO: 2).
  • a graphical display for the codon pair utilization-modified gene (SEQ ID NO: 3) encoding the TrCBH-II protein (SEQ ID NO: 4) expressed in S. cerevisiae was prepared by plotting z scores of translational kinetics values for codon pair utilization in S. cerevisiae as a function of codon pair position. The graphical display is provided in Figure 2B.
  • This example describes optimization of a DNA sequence encoding TrCBH-II for expression in bacteria.
  • Chi-squared values for E. coli were determined as described in Example 1 , with the following differences. Briefly, non-redundant protein coding regions for E. coli were obtained from GenBank sequence database (75,096 codon pairs in 237 sequences for E. coli) to determine an observed number of occurrences for each codon pair. The expected number of occurrences of each codon pair was calculated under the assumption that the codon pairs are used randomly. Chi-squared values chisql, chisq2, chisq3 and z scores of chisq3 were calculated as described in Example 1.
  • the nucleotide sequence for the gene encoding the TrCBH-II protein was modified to optimize codon usage for E. coli.
  • a graphical display for the native gene (SEQ ID NO: 1) encoding the TrCBH-II protein (SEQ ID NO: 2) in E. coli was prepared by plotting z scores of translational kinetics values for codon pair utilization in E. coli as a function of codon pair position. The graphical display is provided in Figure 3 A.
  • the nucleotide sequence for the gene encoding the TrCBH-II protein was modified to no longer contain codon pairs having z scores in E. coli greater than 3.
  • the resulting nucleotide sequence (SEQ ID NO: 9) was found to encode a protein (SEQ ID NO: 10) with 100% amino acid sequence identity to wild-type TrCBH-II (SEQ ID NO: 2).
  • a graphical display for the codon pair utilization-modified gene (SEQ ID NO: 9) encoding the TrCBH-II protein (SEQ ID NO: 10) expressed in E. coli was prepared by plotting z scores of translational kinetics values for codon pair utilization in E. coli as a function of codon pair position. The graphical display is provided in Figure 3B.
  • This example describes optimization of a DNA sequence encoding TrCBH-II for expression in P. pastoris.
  • Chi-squared values for P. pastoris were determined as described in Example 1 , with the following differences. Briefly, non-redundant protein coding regions for P. pastoris were obtained from GenBank sequences to determine an observed number of occurrences for each codon pair. The expected number of occurrences of each codon pair was calculated under the assumption that the codon pairs are used randomly. Chi- squared values chisql, chisq2, chisq3 and z scores of chisq3 were calculated as described in Example 1.
  • the nucleotide sequence for the gene encoding the TrCBH-II protein was modified to optimize codon usage for P. pastoris.
  • a graphical display for the native gene (SEQ ID NO: 1) encoding the TrCBH-II protein (SEQ ID NO: 2) in P. pastoris was prepared by plotting z scores of translational kinetics values for codon pair utilization in P. pastoris as a function of codon pair position. The graphical display is provided in Figure 4A.
  • the nucleotide sequence for the gene encoding the TrCBH-II protein was modified to no longer contain codon pairs having z scores in P. pastoris greater than 3.
  • the resulting nucleotide sequence (SEQ ID NO: 15) was found to encode a protein (SEQ ID NO: 16) with 100% amino acid sequence identity to wild-type TrCBH-II (SEQ ID NO: 2).
  • a graphical display for the codon pair utilization-modified gene (SEQ ID NO: 15) encoding the TrCBH-II protein (SEQ ID NO: 16) expressed in P. pastoris was prepared by plotting z scores of translational kinetics values for codon pair utilization in P. pastoris as a function of codon pair position. The graphical display is provided in Figure 4B.
  • This example describes optimization of a DNA sequence encoding TrCBH-II for expression in K. lactis.
  • Chi-squared values for K. lactis were determined as described in Example 1 , with the following differences. Briefly, non-redundant protein coding regions for K. lactis were obtained from GenBank sequences to determine an observed number of occurrences for each codon pair. The expected number of occurrences of each codon pair was calculated under the assumption that the codon pairs are used randomly. Chi-squared values chisql, chisq2, chisq3 and z scores of chisq3 were calculated as described in Example 1.
  • the nucleotide sequence for the gene encoding the TrCBH-II protein was modified to optimize codon usage for K. lactis.
  • a graphical display for the native gene (SEQ ID NO: 1) encoding the TrCBH-II protein (SEQ ID NO: 2) in K. lactis was prepared by plotting z scores of translational kinetics values for codon pair utilization in K. lactis as a function of codon pair position. The graphical display is provided in Figure 5A.
  • the nucleotide sequence for the gene encoding the TrCBH-II protein was modified to no longer contain codon pairs having z scores in K. lactis greater than 3.
  • the resulting nucleotide sequence (SEQ ID NO: 21) was found to encode a protein (SEQ ID NO: 22) with 100% amino acid sequence identity to wild-type TrCBH-II (SEQ ID NO: 2).
  • a graphical display for the codon pair utilization-modified gene (SEQ ID NO: 21) encoding the TrCBH-II protein (SEQ ID NO: 22) expressed in K. lactis was prepared by plotting z scores of translational kinetics values for codon pair utilization in K. lactis as a function of codon pair position. The graphical display is provided in Figure 5B.
  • This example describes optimization of a DNA sequence encoding TrCBH-II for expression in Z. mobilis.
  • Chi-squared values for Z. mobilis were determined as described in Example 1, with the following differences. Briefly, non-redundant protein coding regions for Z. mobilis were obtained from GenBank sequences to determine an observed number of occurrences for each codon pair. The expected number of occurrences of each codon pair was calculated under the assumption that the codon pairs are used randomly. Chi- squared values chisql, chisq2, chisq3 and z scores of chisq3 were calculated as described in Example 1.
  • the nucleotide sequence for the gene encoding the TrCBH-II protein was modified to optimize codon usage for Z mobilis.
  • a graphical display for the native gene (SEQ ID NO: 1) encoding the TrCBH-II protein (SEQ ID NO: 2) in Z mobilis was prepared by plotting z scores of translational kinetics values for codon pair utilization in Z mobilis as a function of codon pair position. The graphical display is provided in Figure 6A.
  • the nucleotide sequence for the gene encoding the TrCBH-II protein was modified to no longer contain codon pairs having z scores in Z. mobilis greater than 3.
  • the resulting nucleotide sequence (SEQ ID NO: 23) was found to encode a protein (SEQ ID NO: 24) with 100% amino acid sequence identity to wild-type TrCBH-II (SEQ ID NO: 2).
  • a graphical display for the codon pair utilization-modified gene (SEQ ID NO: 23) encoding the TrCBH-II protein (SEQ ID NO: 24) expressed in Z mobilis was prepared by plotting z scores of translational kinetics values for codon pair utilization in Z. mobilis as a function of codon pair position. The graphical display is provided in Figure 6B.
  • Proteins are transferred to Immobilon-P (Millipore, Bedford, MA) and are incubated with rabbit polyclonal anti-CBH-II antibody diluted 1 :20,000. Rabbit IgG is visualized using a HRP- conjugated secondary antibody and ECL + Plus (Amersham, Buckinghamshire, UK) according to manufacturer's instructions.
  • This example describes optimization of a DNA sequence encoding LCC for expression in yeast.
  • the chi-squared value "chisql” was generated by the expected and observed values determined.
  • the chsql was recalculated to remove any influence of non-randomness in amino acid pair frequencies, yielding "chisq2.”
  • the chsq2 was re-calculated to remove any influence of non- randomness in dinucleotide frequencies, yielding "chisq3.”
  • z scores of chisq3 were calculated by determining the mean chisq3 value and corresponding standard deviation for all codon pairs, and normalizing each chisq3 value to be reported in terms of number of standard deviations from the mean chisq3 values.
  • nucleotide sequence for the gene encoding the LCC protein was modified to optimize codon usage for S. cerevisiae.
  • a graphical display for the native gene (SEQ ID NO: 25) encoding the LCC protein (SEQ ID NO: 26) in S. cerevisiae was prepared by plotting z scores of translational kinetics values for codon pair utilization in S. cerevisiae as a function of codon pair position.
  • the graphical display is provided in Figure 7A.
  • the nucleotide sequence for the gene encoding the LCC protein was modified to no longer contain codon pairs having z scores in 5. cerevisiae greater than 3.
  • the resulting nucleotide sequence (SEQ ID NO: 27) was found to encode a protein (SEQ ID NO: 28) with 100% amino acid sequence identity to wild-type LCC (SEQ ID NO: 26).
  • a graphical display for the codon pair utilization-modified gene (SEQ ID NO: 27) encoding the LCC protein (SEQ ID NO: 28) expressed in S. cerevisiae was prepared by plotting z scores of translational kinetics values for codon pair utilization in S. cerevisiae as a function of codon pair position. The graphical display is provided in Figure 7B.
  • This example describes optimization of a DNA sequence encoding LCC for expression in bacteria.
  • Chi-squared values for E. coli were determined as described in Example 1, with the following differences. Briefly, non-redundant protein coding regions for E. coli were obtained from GenBank sequence database (75,096 codon pairs in 237 sequences for E. coli) to determine an observed number of occurrences for each codon pair. The expected number of occurrences of each codon pair was calculated under the assumption that the codon pairs are used randomly. Chi-squared values chisql, chisq2, chisq3 and z scores of chisq3 were calculated as described in Example 1.
  • the nucleotide sequence for the gene encoding the LCC protein was modified to optimize codon usage for E. coli.
  • a graphical display for the native gene (SEQ ID NO: 25) encoding the LCC protein (SEQ ID NO: 26) in E. coli was prepared by plotting z scores of translational kinetics values for codon pair utilization in E. coli as a function of codon pair position. The graphical display is provided in Figure 8A.
  • the nucleotide sequence for the gene encoding the LCC protein was modified to no longer contain codon pairs having z scores in E. coli greater than 3.
  • the resulting nucleotide sequence (SEQ ID NO: 33) was found to encode a protein (SEQ ID NO: 34) with 100% amino acid sequence identity to wild-type LCC (SEQ ID NO: 26).
  • a graphical display for the codon pair utilization-modified gene (SEQ ID NO: 33) encoding the LCC protein (SEQ ID NO: 34) expressed in E. coli was prepared by plotting z scores of translational kinetics values for codon pair utilization in E. coli as a function of codon pair position. The graphical display is provided in Figure 8B.
  • This example describes optimization of a DNA sequence encoding LCC for expression in P. pastoris.
  • Chi-squared values for P. pastoris were determined as described in Example 1, with the following differences. Briefly, non-redundant protein coding regions for P. pastoris were obtained from GenBank sequences to determine an observed number of occurrences for each codon pair. The expected number of occurrences of each codon pair was calculated under the assumption that the codon pairs are used randomly. Chi- squared values chisql , chisq2, chisq3 and z scores of chisq3 were calculated as described in Example 1.
  • the nucleotide sequence for the gene encoding the LCC protein was modified to optimize codon usage for P. pastoris.
  • a graphical display for the native gene (SEQ ID NO: 25) encoding the LCC protein (SEQ ID NO: 26) in P. pastoris was prepared by plotting z scores of translational kinetics values for codon pair utilization in P. pastoris as a function of codon pair position. The graphical display is provided in Figure 9A.
  • the nucleotide sequence for the gene encoding the LCC protein was modified to no longer contain codon pairs having z scores in P. pastoris greater than 3.
  • the resulting nucleotide sequence (SEQ ID NO: 39) was found to encode a protein (SEQ ID NO: 40) with 100% amino acid sequence identity to wild-type LCC (SEQ ID NO: 26).
  • a graphical display for the codon pair utilization-modified gene (SEQ ID NO: 39) encoding the LCC protein (SEQ ID NO: 40) expressed in P. pastoris was prepared by plotting z scores of translational kinetics values for codon pair utilization in P. pastoris as a function of codon pair position. The graphical display is provided in Figure 9B.
  • This example describes optimization of a DNA sequence encoding LCC for expression in K. lactis.
  • Chi-squared values for K. lactis were determined as described in Example 1 , with the following differences. Briefly, non-redundant protein coding regions for K. lactis were obtained from GenBank sequences to determine an observed number of occurrences for each codon pair. The expected number of occurrences of each codon pair was calculated under the assumption that the codon pairs are used randomly. Chi-squared values chisql, chisq2, chisq3 and z scores of chisq3 were calculated as described in Example 1.
  • the nucleotide sequence for the gene encoding the LCC protein was modified to optimize codon usage for K. lactis.
  • a graphical display for the native gene (SEQ ID NO: 25) encoding the LCC protein (SEQ ID NO: 26) in K. lactis was prepared by plotting z scores of translational kinetics values for codon pair utilization in K. lactis as a function of codon pair position. The graphical display is provided in Figure 1OA.
  • the nucleotide sequence for the gene encoding the LCC protein was modified to no longer contain codon pairs having z scores in K. lactis greater than 3.
  • the resulting nucleotide sequence (SEQ ID NO: 45) was found to encode a protein (SEQ ID NO: 46) with 100% amino acid sequence identity to wild-type LCC (SEQ ID NO: 26).
  • a graphical display for the codon pair utilization-modified gene (SEQ ID NO: 45) encoding the LCC protein (SEQ ID NO: 46) expressed in K. lactis was prepared by plotting z scores of translational kinetics values for codon pair utilization in K. lactis as a function of codon pair position. The graphical display is provided in Figure 1OB.
  • This example describes optimization of a DNA sequence encoding LCC for expression in Z. mobilis.
  • Chi-squared values for Z. mobilis were determined as described in Example 1 , with the following differences. Briefly, non-redundant protein coding regions for Z. mobilis were obtained from GenBank sequences to determine an observed number of occurrences for each codon pair. The expected number of occurrences of each codon pair was calculated under the assumption that the codon pairs are used randomly. Chi- squared values chisql, chisq2, chisq3 and z scores of chisq3 were calculated as described in Example 1.
  • the nucleotide sequence for the gene encoding the LCC protein was modified to optimize codon usage for Z. mobilis.
  • a graphical display for the native gene (SEQ ID NO: 25) encoding the LCC protein (SEQ ID NO: 26) in Z. mobilis was prepared by plotting z scores of translational kinetics values for codon pair utilization in Z. mobilis as a function of codon pair position. The graphical display is provided in Figure 1 IA.
  • the nucleotide sequence for the gene encoding the LCC protein was modified to no longer contain codon pairs having z scores in Z. mobilis greater than 3.
  • the resulting nucleotide sequence (SEQ ID NO: 47) was found to encode a protein (SEQ ID NO: 48) with 100% amino acid sequence identity to wild-type LCC (SEQ ID NO: 26).
  • a graphical display for the codon pair utilization-modified gene (SEQ ID NO: 47) encoding the LCC protein (SEQ ID NO: 48) expressed in Z. mobilis was prepared by plotting z scores of translational kinetics values for codon pair utilization in Z. mobilis as a function of codon pair position. The graphical display is provided in Figure 1 IB.
  • E. coli expression in E. coli of the codon optimized, codon pair utilization- based modification (Hot-Rod) from Example 8 and native LCC protein is examined by Western blot analysis.
  • Each vector is transformed into E. coli strain Top 10 (F-mcrA ⁇ (mrr-hsdRMS-mcrBQ ⁇ 80lacZ ⁇ M15 UacX74 deoR recAl araD139 6(ara-leu) 7697 galU galK rpsL (StrR) endAl nupG).
  • An overnight culture is inoculated at 1 :100 into 5 ml of LB medium plus lOO ⁇ g/ml ampicillin and grown at 37°C to OD 600 of 0.5.
  • Protein expression is induced by addition of 0.002 or 0.02% L-arabinose and grown for 3hrs at 37°C. Cells are harvested by centrifugation and the cell pellets are resuspended in phosphate buffered saline. Cells are disrupted by sonication and supernatant and pellet fractions are resolved in a 4-20% SDS-polyacrylamide gel (Pierce). Proteins are transferred to Immobilon-P (Millipore, Bedford, MA) and are incubated with rabbit polyclonal anti-LCC antibody diluted 1 :20,000. Rabbit IgG is visualized using a HRP- conjugated secondary antibody and ECL + Plus (Amersham, Buckinghamshire, UK) according to manufacturer's instructions.
  • This example describes optimization of a DNA sequence encoding LIP for expression in yeast.
  • the chi-squared value "chisql” was generated by the expected and observed values determined.
  • the chsql was recalculated to remove any influence of non-randomness in amino acid pair frequencies, yielding "chisq2.”
  • the chsq2 was re-calculated to remove any influence of non- randomness in dinucleotide frequencies, yielding "chisq3.”
  • z scores of chisq3 were calculated by determining the mean chisq3 value and corresponding standard deviation for all codon pairs, and normalizing each chisq3 value to be reported in terms of number of standard deviations from the mean chisq3 values.
  • nucleotide sequence for the gene encoding the LIP protein was modified to optimize codon usage for S. cerevisiae.
  • a graphical display for the native gene (SEQ ID NO: 49) encoding the LIP protein (SEQ ID NO: 50) in S. cerevisiae was prepared by plotting z scores of translational kinetics values for codon pair utilization in S. cerevisiae as a function of codon pair position.
  • the graphical display is provided in Figure 12A.
  • the nucleotide sequence for the gene encoding the LIP protein was modified to no longer contain codon pairs having z scores in S. cerevisiae greater than 3.
  • the resulting nucleotide sequence (SEQ ID NO: 51) was found to encode a protein (SEQ ID NO: 52) with 100% amino acid sequence identity to wild-type LIP (SEQ ID NO: 50).
  • a graphical display for the codon pair utilization-modified gene (SEQ ID NO: 51) encoding the LIP protein (SEQ ID NO: 52) expressed in S. cerevisiae was prepared by plotting z scores of translational kinetics values for codon pair utilization in S. cerevisiae as a function of codon pair position. The graphical display is provided in Figure 12B.
  • This example describes optimization of a DNA sequence encoding LIP for expression in bacteria.
  • Chi-squared values for E. coli were determined as described in Example 1, with the following differences. Briefly, non-redundant protein coding regions for E. coli were obtained from GenBank sequence database (75,096 codon pairs in 237 sequences for E. coli) to determine an observed number of occurrences for each codon pair. The expected number of occurrences of each codon pair was calculated under the assumption that the codon pairs are used randomly. Chi-squared values chisql, chisq2, chisq3 and z scores of chisq3 were calculated as described in Example 1.
  • the nucleotide sequence for the gene encoding the LIP protein was modified to optimize codon usage for E. coli.
  • a graphical display for the native gene (SEQ ID NO: 49) encoding the LIP protein (SEQ ID NO: 50) in E. coli was prepared by plotting z scores of translational kinetics values for codon pair utilization in E. coli as a function of codon pair position. The graphical display is provided in Figure 13 A.
  • the nucleotide sequence for the gene encoding the LIP protein was modified to no longer contain codon pairs having z scores in E. coli greater than 3.
  • the resulting nucleotide sequence (SEQ ID NO: 57) was found to encode a protein (SEQ ID NO: 58) with 100% amino acid sequence identity to wild-type LIP (SEQ ID NO: 50).
  • a graphical display for the codon pair utilization-modified gene (SEQ ID NO: 57) encoding the LIP protein (SEQ ID NO: 58) expressed in E. coli was prepared by plotting z scores of translational kinetics values for codon pair utilization in E. coli as a function of codon pair position. The graphical display is provided in Figure 13B.
  • This example describes optimization of a DNA sequence encoding LIP for expression in P. pastoris.
  • Chi-squared values for P. pastoris were determined as described in Example 1, with the following differences. Briefly, non-redundant protein coding regions for P. pastoris were obtained from GenBank sequences to determine an observed number of occurrences for each codon pair. The expected number of occurrences of each codon pair was calculated under the assumption that the codon pairs are used randomly. Chi- squared values chisql, chisq2, chisq3 and z scores of chisq3 were calculated as described in Example 1.
  • the nucleotide sequence for the gene encoding the LIP protein was modified to optimize codon usage for P. pastoris.
  • a graphical display for the native gene (SEQ ID NO: 49) encoding the LIP protein (SEQ ID NO: 50) in P. pastoris was prepared by plotting z scores of translational kinetics values for codon pair utilization in P. pastoris as a function of codon pair position. The graphical display is provided in Figure 14A.
  • the nucleotide sequence for the gene encoding the LIP protein was modified to no longer contain codon pairs having z scores in P. pastoris greater than 3.
  • the resulting nucleotide sequence (SEQ ID NO: 63) was found to encode a protein (SEQ ID NO: 64) with 100% amino acid sequence identity to wild-type LIP (SEQ ID NO: 50).
  • a graphical display for the codon pair utilization-modified gene (SEQ ID NO: 63) encoding the LIP protein (SEQ ID NO: 64) expressed in P. pastoris was prepared by plotting z scores of translational kinetics values for codon pair utilization in P. pastoris as a function of codon pair position. The graphical display is provided in Figure 14B.
  • This example describes optimization of a DNA sequence encoding LIP for expression in K. lactis.
  • Chi-squared values for K. lactis were determined as described in Example 1 , with the following differences. Briefly, non-redundant protein coding regions for K. lactis were obtained from GenBank sequences to determine an observed number of occurrences for each codon pair. The expected number of occurrences of each codon pair was calculated under the assumption that the codon pairs are used randomly. Chi-squared values chisql, chisq2, chisq3 and z scores of chisq3 were calculated as described in Example 1.
  • the nucleotide sequence for the gene encoding the LIP protein was modified to optimize codon usage for K. lactis.
  • a graphical display for the native gene (SEQ ID NO: 49) encoding the LIP protein (SEQ ID NO: 50) in K. lactis was prepared by plotting z scores of translational kinetics values for codon pair utilization in K. lactis as a function of codon pair position. The graphical display is provided in Figure 15 A.
  • the nucleotide sequence for the gene encoding the LIP protein was modified to no longer contain codon pairs having z scores in K. lactis greater than 3.
  • the resulting nucleotide sequence (SEQ ID NO: 69) was found to encode a protein (SEQ ID NO: 70) with 100% amino acid sequence identity to wild-type LIP (SEQ ID NO: 50).
  • a graphical display for the codon pair utilization-modified gene (SEQ ID NO: 69) encoding the LIP protein (SEQ ID NO: 70) expressed in K. lactis was prepared by plotting z scores of translational kinetics values for codon pair utilization in K. lactis as a function of codon pair position. The graphical display is provided in Figure 15B.
  • This example describes optimization of a DNA sequence encoding LIP for expression in Z. mobilis.
  • Chi-squared values for Z. mobilis were determined as described in Example 1, with the following differences. Briefly, non-redundant protein coding regions for Z. mobilis were obtained from GenBank sequences to determine an observed number of occurrences for each codon pair. The expected number of occurrences of each codon pair was calculated under the assumption that the codon pairs are used randomly. Chi- squared values chisql, chisq2, chisq3 and z scores of chisq3 were calculated as described in Example 1. [0508] The nucleotide sequence for the gene encoding the LIP protein was modified to optimize codon usage for Z mobilis.
  • a graphical display for the native gene (SEQ ID NO: 49) encoding the LIP protein (SEQ ID NO: 50) in Z. mobilis was prepared by plotting z scores of translational kinetics values for codon pair utilization in Z. mobilis as a function of codon pair position.
  • the graphical display is provided in Figure 16A.
  • the nucleotide sequence for the gene encoding the LIP protein was modified to no longer contain codon pairs having z scores in Z mobilis greater than 3.
  • the resulting nucleotide sequence (SEQ ID NO: 71) was found to encode a protein (SEQ ID NO: 72) with 100% amino acid sequence identity to wild-type LIP (SEQ ID NO: 50).
  • a graphical display for the codon pair utilization-modified gene (SEQ ID NO: 71) encoding the LIP protein (SEQ ID NO: 72) expressed in Z mobilis was prepared by plotting z scores of translational kinetics values for codon pair utilization in Z. mobilis as a function of codon pair position. The graphical display is provided in Figure 16B.
  • Protein expression is induced by addition of 0.002 or 0.02% L-arabinose and grown for 3hrs at 37°C. Cells are harvested by centrifugation and the cell pellets are resuspended in phosphate buffered saline. Cells are disrupted by sonication and supernatant and pellet fractions are resolved in a 4-20% SDS-polyacrylamide gel (Pierce). Proteins are transferred to Immobilon-P (Millipore, Bedford, MA) and are incubated with rabbit polyclonal anti-LIP antibody diluted 1 :20,000. Rabbit IgG is visualized using a HRP- conjugated secondary antibody and ECL + Plus (Amersham, Buckinghamshire, UK) according to manufacturer's instructions.
  • This example describes optimization of a DNA sequence encoding MnP for expression in yeast.
  • the chi-squared value "chisql” was generated by the expected and observed values determined.
  • the chsql was recalculated to remove any influence of non-randomness in amino acid pair frequencies, yielding "chisq2.”
  • the chsq2 was re-calculated to remove any influence of non- randomness in dinucleotide frequencies, yielding "chisq3.”
  • z scores of chisq3 were calculated by determining the mean chisq3 value and corresponding standard deviation for all codon pairs, and normalizing each chisq3 value to be reported in terms of number of standard deviations from the mean chisq3 values.
  • a graphical display for the native gene (SEQ ID NO: 73) encoding the MnP protein (SEQ ID NO: 74) in S. cerevisiae was prepared by plotting z scores of translational kinetics values for codon pair utilization in S. cerevisiae as a function of codon pair position.
  • the graphical display is provided in Figure 17A.
  • the nucleotide sequence for the gene encoding the MnP protein was modified to no longer contain codon pairs having z scores in S. cerevisiae greater than 3.
  • the resulting nucleotide sequence (SEQ ID NO: 75) was found to encode a protein (SEQ ID NO: 76) with 100% amino acid sequence identity to wild-type MnP (SEQ ID NO: 74).
  • a graphical display for the codon pair utilization-modified gene (SEQ ID NO: 75) encoding the MnP protein (SEQ ID NO: 76) expressed in S. cerevisiae was prepared by plotting z scores of translational kinetics values for codon pair utilization in S. cerevisiae as a function of codon pair position. The graphical display is provided in Figure 17B.
  • EXAMPLE 20 [0517] This example describes optimization of a DNA sequence encoding MnP for expression in bacteria.
  • Chi-squared values for E. coli were determined as described in Example 1 , with the following differences. Briefly, non-redundant protein coding regions for E. coli were obtained from GenBank sequence database (75,096 codon pairs in 237 sequences for E. coli) to determine an observed number of occurrences for each codon pair. The expected number of occurrences of each codon pair was calculated under the assumption that the codon pairs are used randomly. Chi-squared values chisql, chisq2, chisq3 and z scores of chisq3 were calculated as described in Example 1.
  • the nucleotide sequence for the gene encoding the MnP protein was modified to optimize codon usage for E. coli.
  • a graphical display for the native gene (SEQ ID NO: 73) encoding the MnP protein (SEQ ID NO: 74) in E. coli was prepared by plotting z scores of translational kinetics values for codon pair utilization in E. coli as a function of codon pair position. The graphical display is provided in Figure 18A.
  • the nucleotide sequence for the gene encoding the MnP protein was modified to no longer contain codon pairs having z scores in E. coli greater than 3.
  • the resulting nucleotide sequence (SEQ ID NO: 81) was found to encode a protein (SEQ ID NO: 82) with 100% amino acid sequence identity to wild-type MnP (SEQ ID NO: 74).
  • a graphical display for the codon pair utilization-modified gene (SEQ ID NO: 81) encoding the MnP protein (SEQ ID NO: 82) expressed in E. coli was prepared by plotting z scores of translational kinetics values for codon pair utilization in E. coli as a function of codon pair position. The graphical display is provided in Figure 18B.
  • This example describes optimization of a DNA sequence encoding MnP for expression in P. pastoris.
  • Chi-squared values for P. pastoris were determined as described in Example 1 , with the following differences. Briefly, non-redundant protein coding regions for P. pastoris were obtained from GenBank sequences to determine an observed number of occurrences for each codon pair. The expected number of occurrences of each codon pair was calculated under the assumption that the codon pairs are used randomly. Chi- squared values chisql, chisq2, chisq3 and z scores of chisq3 were calculated as described in Example 1. [0523] The nucleotide sequence for the gene encoding the MnP protein was modified to optimize codon usage for P. pastoris.
  • a graphical display for the native gene (SEQ ID NO: 73) encoding the MnP protein (SEQ ID NO: 74) in P. pastoris was prepared by plotting z scores of translational kinetics values for codon pair utilization in P. pastoris as a function of codon pair position.
  • the graphical display is provided in Figure 19A.
  • the nucleotide sequence for the gene encoding the MnP protein was modified to no longer contain codon pairs having z scores in P. pastoris greater than 3.
  • the resulting nucleotide sequence (SEQ ID NO: 87) was found to encode a protein (SEQ ID NO: 88) with 100% amino acid sequence identity to wild-type MnP (SEQ ID NO: 74).
  • a graphical display for the codon pair utilization-modified gene (SEQ ID NO: 87) encoding the MnP protein (SEQ ID NO: 88) expressed in P. pastoris was prepared by plotting z scores of translational kinetics values for codon pair utilization in P. pastoris as a function of codon pair position. The graphical display is provided in Figure 19B.
  • This example describes optimization of a DNA sequence encoding MnP for expression in K. lactis.
  • Chi-squared values for K. lactis were determined as described in Example 1 , with the following differences. Briefly, non-redundant protein coding regions for K. lactis were obtained from GenBank sequences to determine an observed number of occurrences for each codon pair. The expected number of occurrences of each codon pair was calculated under the assumption that the codon pairs are used randomly. Chi-squared values chisql, chisq2, chisq3 and z scores of chisq3 were calculated as described in Example 1.
  • the nucleotide sequence for the gene encoding the MnP protein was modified to optimize codon usage for K. lactis.
  • a graphical display for the native gene (SEQ ID NO: 73) encoding the MnP protein (SEQ ID NO: 74) in K. lactis was prepared by plotting z scores of translational kinetics values for codon pair utilization in K. lactis as a function of codon pair position. The graphical display is provided in Figure 2OA.
  • the nucleotide sequence for the gene encoding the MnP protein was modified to no longer contain codon pairs having z scores in K. lactis greater than 3.
  • the resulting nucleotide sequence (SEQ ID NO: 93) was found to encode a protein (SEQ ID NO: 94) with 100% amino acid sequence identity to wild-type MnP (SEQ ID NO: 74).
  • a graphical display for the codon pair utilization-modified gene (SEQ ID NO: 93) encoding the MnP protein (SEQ ID NO: 94) expressed in K. lactis was prepared by plotting z scores of translational kinetics values for codon pair utilization in K. lactis as a function of codon pair position. The graphical display is provided in Figure 2OB.
  • This example describes optimization of a DNA sequence encoding MnP for expression in Z. mobilis.
  • Chi-squared values for Z. mobilis were determined as described in Example 1 , with the following differences. Briefly, non-redundant protein coding regions for Z. mobilis were obtained from GenBank sequences to determine an observed number of occurrences for each codon pair. The expected number of occurrences of each codon pair was calculated under the assumption that the codon pairs are used randomly. Chi- squared values chisql, chisq2, chisq3 and z scores of chisq3 were calculated as described in Example 1.
  • the nucleotide sequence for the gene encoding the MnP protein was modified to optimize codon usage for Z. mobilis.
  • a graphical display for the native gene (SEQ ID NO: 73) encoding the MnP protein (SEQ ID NO: 74) in Z. mobilis was prepared by plotting z scores of translational kinetics values for codon pair utilization in Z mobilis as a function of codon pair position. The graphical display is provided in Figure 21 A.
  • the nucleotide sequence for the gene encoding the MnP protein was modified to no longer contain codon pairs having z scores in Z. mobilis greater than 3.
  • the resulting nucleotide sequence (SEQ ID NO: 95) was found to encode a protein (SEQ ID NO: 96) with 100% amino acid sequence identity to wild-type MnP (SEQ ID NO: 74).
  • a graphical display for the codon pair utilization-modified gene (SEQ ID NO: 95) encoding the MnP protein (SEQ ID NO: 96) expressed in Z. mobilis was prepared by plotting z scores of translational kinetics values for codon pair utilization in Z mobilis as a function of codon pair position. The graphical display is provided in Figure 22B.
  • E. coli expression in E. coli of the codon optimized, codon pair utilization- based modification (Hot-Rod) from Example 20 and native MnP protein is examined by Western blot analysis.
  • Each vector is transformed into E. coli strain Top 10 (F-mcrA ⁇ (mrr-hsdRMS-mcrBQ ⁇ 80lacZ ⁇ M15 UacX74 deoR recAl araD139 ⁇ (ara-leu) 7697 galU galK rpsL (StrR) endAl nupG).
  • An overnight culture is inoculated at 1 :100 into 5 ml of LB medium plus lOO ⁇ g/ml ampicillin and grown at 37°C to OD 600 of 0.5.
  • Protein expression is induced by addition of 0.002 or 0.02% L-arabinose and grown for 3hrs at 37°C. Cells are harvested by centrifugation and the cell pellets are resuspended in phosphate buffered saline. Cells are disrupted by sonication and supernatant and pellet fractions are resolved in a 4-20% SDS-polyacrylamide gel (Pierce). Proteins are transferred to Immobilon-P (Millipore, Bedford, MA) and are incubated with rabbit polyclonal anti-MnP antibody diluted 1 :20,000. Rabbit IgG is visualized using a HRP- conjugated secondary antibody and ECL + Plus (Amersham, Buckinghamshire, UK) according to manufacturer's instructions.
  • This example describes optimization of a DNA sequence encoding LCC for expression in yeast.
  • the chi-squared value "chisql” was generated by the expected and observed values determined.
  • the chsql was recalculated to remove any influence of non-randomness in amino acid pair frequencies, yielding "chisq2.”
  • the chsq2 was re-calculated to remove any influence of non- randomness in dinucleotide frequencies, yielding "chisq3.”
  • z scores of chisq3 were calculated by determining the mean chisq3 value and corresponding standard deviation for all codon pairs, and normalizing each chisq3 value to be reported in terms of number of standard deviations from the mean chisq3 values.
  • a graphical display for the native gene (SEQ ID NO: 97) encoding the LCC protein (SEQ ID NO: 98) in N. crassa was prepared by plotting z scores of translational kinetics values for codon pair utilization in N. crassa as a function of codon pair position. The graphical display is provided in Figure 22.
  • a graphical display for the native gene (SEQ ID NO: 97) encoding the LCC protein (SEQ ID NO: 98) in S. cerevisiae was prepared by plotting z scores of translational kinetics values for codon pair utilization in S. cerevisiae as a function of codon pair position.
  • the graphical display is provided in Figure 23 A.
  • the nucleotide sequence for the gene encoding the LCC protein was modified to no longer contain codon pairs having z scores in S. cerevisiae greater than 3.
  • the resulting nucleotide sequence (SEQ ID NO: 99) was found to encode a protein (SEQ ID NO: 100) with 100% amino acid sequence identity to wild-type LCC (SEQ ID NO: 98).
  • a graphical display for the codon pair utilization-modified gene (SEQ ID NO: 99) encoding the LCC protein (SEQ ID NO: 100) expressed in S. cerevisiae was prepared by plotting z scores of translational kinetics values for codon pair utilization in S. cerevisiae as a function of codon pair position. The graphical display is provided in Figure 23B.
  • This example describes optimization of a DNA sequence encoding LCC for expression in bacteria.
  • Chi-squared values for E. coli were determined as described in Example 1 , with the following differences. Briefly, non-redundant protein coding regions for E. coli were obtained from GenBank sequence database (75,096 codon pairs in 237 sequences for E. coli) to determine an observed number of occurrences for each codon pair. The expected number of occurrences of each codon pair was calculated under the assumption that the codon pairs are used randomly. Chi-squared values chisql, chisq2, chisq3 and z scores of chisq3 were calculated as described in Example 1.
  • the nucleotide sequence for the gene encoding the LCC protein was modified to optimize codon usage for E. coli.
  • a graphical display for the native gene (SEQ ID NO: 97) encoding the LCC protein (SEQ ID NO: 98) in E. coli was prepared by plotting z scores of translational kinetics values for codon pair utilization in E. coli as a function of codon pair position. The graphical display is provided in Figure 24A.
  • the nucleotide sequence for the gene encoding the LCC protein was modified to no longer contain codon pairs having z scores in E. coli greater than 3.
  • the resulting nucleotide sequence (SEQ ID NO: 105) was found to encode a protein (SEQ ID NO: 106) with 100% amino acid sequence identity to wild-type LCC (SEQ ID NO: 98).
  • a graphical display for the codon pair utilization-modified gene (SEQ ID NO: 105) encoding the LCC protein (SEQ ID NO: 106) expressed in E. coli was prepared by plotting z scores of translational kinetics values for codon pair utilization in E. coli as a function of codon pair position. The graphical display is provided in Figure 24B.
  • This example describes optimization of a DNA sequence encoding LCC for expression in P. pastoris.
  • Chi-squared values for P. pastoris were determined as described in Example 1 , with the following differences. Briefly, non-redundant protein coding regions for P. pastoris were obtained from GenBank sequences to determine an observed number of occurrences for each codon pair. The expected number of occurrences of each codon pair was calculated under the assumption that the codon pairs are used randomly. Chi- squared values chisql, chisq2, chisq3 and z scores of chisq3 were calculated as described in Example 1.
  • the nucleotide sequence for the gene encoding the LCC protein was modified to optimize codon usage for P. pastoris.
  • a graphical display for the native gene (SEQ ID NO: 97) encoding the LCC protein (SEQ ID NO: 98) in P. pastoris was prepared by plotting z scores of translational kinetics values for codon pair utilization in P. pastoris as a function of codon pair position. The graphical display is provided in Figure 25A.
  • the nucleotide sequence for the gene encoding the LCC protein was modified to no longer contain codon pairs having z scores in P. pastoris greater than 3.
  • the resulting nucleotide sequence (SEQ ID NO: 111) was found to encode a protein (SEQ ID NO: 112) with 100% amino acid sequence identity to wild-type LCC (SEQ ID NO: 98).
  • a graphical display for the codon pair utilization-modified gene (SEQ ID NO: 111) encoding the LCC protein (SEQ ID NO: 112) expressed in P. pastoris was prepared by plotting z scores of translational kinetics values for codon pair utilization in P. pastoris as a function of codon pair position. The graphical display is provided in Figure 25B.
  • EXAMPLE 28 [0549] This example describes optimization of a DNA sequence encoding LCC for expression in K. lactis.
  • Chi-squared values for K. lactis were determined as described in Example 1 , with the following differences. Briefly, non-redundant protein coding regions for K. lactis were obtained from GenBank sequences to determine an observed number of occurrences for each codon pair. The expected number of occurrences of each codon pair was calculated under the assumption that the codon pairs are used randomly. Chi-squared values chisql , chisq2, chisq3 and z scores of chisq3 were calculated as described in Example 1.
  • the nucleotide sequence for the gene encoding the LCC protein was modified to optimize codon usage for K. lactis.
  • a graphical display for the native gene (SEQ ID NO: 97) encoding the LCC protein (SEQ ID NO: 98) in K. lactis was prepared by plotting z scores of translational kinetics values for codon pair utilization in K. lactis as a function of codon pair position. The graphical display is provided in Figure 26A.
  • the nucleotide sequence for the gene encoding the LCC protein was modified to no longer contain codon pairs having z scores in K. lactis greater than 3.
  • the resulting nucleotide sequence (SEQ ID NO: 117) was found to encode a protein (SEQ ID NO: 118) with 100% amino acid sequence identity to wild-type LCC (SEQ ID NO: 98).
  • a graphical display for the codon pair utilization-modified gene (SEQ ID NO: 117) encoding the LCC protein (SEQ ID NO: 118) expressed in K. lactis was prepared by plotting z scores of translational kinetics values for codon pair utilization in K. lactis as a function of codon pair position. The graphical display is provided in Figure 26B.
  • This example describes optimization of a DNA sequence encoding LCC for expression in Z. mobilis.
  • Chi-squared values for Z. mobilis were determined as described in Example 1 , with the following differences. Briefly, non-redundant protein coding regions for Z. mobilis were obtained from GenBank sequences to determine an observed number of occurrences for each codon pair. The expected number of occurrences of each codon pair was calculated under the assumption that the codon pairs are used randomly. Chi- squared values chisql, chisq2, chisq3 and z scores of chisq3 were calculated as described in Example 1. [0555] The nucleotide sequence for the gene encoding the LCC protein was modified to optimize codon usage for Z mobilis.
  • a graphical display for the native gene (SEQ ID NO: 97) encoding the LCC protein (SEQ ID NO: 98) in Z mobilis was prepared by plotting z scores of translational kinetics values for codon pair utilization in Z. mobilis as a function of codon pair position.
  • the graphical display is provided in Figure 27A.
  • the nucleotide sequence for the gene encoding the LCC protein was modified to no longer contain codon pairs having z scores in Z. mobilis greater than 3.
  • the resulting nucleotide sequence (SEQ ID NO: 1 19) was found to encode a protein (SEQ ID NO: 120) with 100% amino acid sequence identity to wild-type LCC (SEQ ID NO: 98).
  • a graphical display for the codon pair utilization-modified gene (SEQ ID NO: 1 19) encoding the LCC protein (SEQ ID NO: 120) expressed in Z mobilis was prepared by plotting z scores of translational kinetics values for codon pair utilization in Z mobilis as a function of codon pair position. The graphical display is provided in Figure 27B.
  • Protein expression is induced by addition of 0.002 or 0.02% L-arabinose and grown for 3hrs at 37°C. Cells are harvested by centrifugation and the cell pellets are resuspended in phosphate buffered saline. Cells are disrupted by sonication and supernatant and pellet fractions are resolved in a 4-20% SDS-polyacrylamide gel (Pierce). Proteins are transferred to Immobilon-P (Millipore, Bedford, MA) and are incubated with rabbit polyclonal anti-LCC antibody diluted 1 :20,000. Rabbit IgG is visualized using a HRP- conjugated secondary antibody and ECL + Plus (Amersham, Buckinghamshire, UK) according to manufacturer's instructions.
  • This example describes optimization of a DNA sequence encoding LCC for expression in yeast.
  • the chi-squared value "chisql” was generated by the expected and observed values determined.
  • the chsql was recalculated to remove any influence of non-randomness in amino acid pair frequencies, yielding "chisq2.”
  • the chsq2 was re-calculated to remove any influence of non- randomness in dinucleotide frequencies, yielding "chisq3.”
  • z scores of chisq3 were calculated by determining the mean chisq3 value and corresponding standard deviation for all codon pairs, and normalizing each chisq3 value to be reported in terms of number of standard deviations from the mean chisq3 values.
  • nucleotide sequence for the gene encoding the LCC protein was modified to optimize codon usage for S. cerevisiae.
  • a graphical display for the native gene (SEQ ID NO: 121) encoding the LCC protein (SEQ ID NO: 122) in S. cerevisiae was prepared by plotting z scores of translational kinetics values for codon pair utilization in S. cerevisiae as a function of codon pair position.
  • the graphical display is provided in Figure 28A.
  • the nucleotide sequence for the gene encoding the LCC protein was modified to no longer contain codon pairs having z scores in S. cerevisiae greater than 3.
  • the resulting nucleotide sequence (SEQ ID NO: 123) was found to encode a protein (SEQ ID NO: 124) with 100% amino acid sequence identity to wild-type LCC (SEQ ID NO: 122).
  • a graphical display for the codon pair utilization-modified gene (SEQ ID NO: 123) encoding the LCC protein (SEQ ID NO: 124) expressed in S. cerevisiae was prepared by plotting z scores of translational kinetics values for codon pair utilization in S. cerevisiae as a function of codon pair position. The graphical display is provided in Figure 28B.
  • This example describes optimization of a DNA sequence encoding LCC for expression in bacteria.
  • Chi-squared values for E. coli were determined as described in Example 1 , with the following differences. Briefly, non-redundant protein coding regions for E. coli were obtained from GenBank sequence database (75,096 codon pairs in 237 sequences for E. coli) to determine an observed number of occurrences for each codon pair. The expected number of occurrences of each codon pair was calculated under the assumption that the codon pairs are used randomly. Chi-squared values chisql, chisq2, chisq3 and z scores of chisq3 were calculated as described in Example 1.
  • the nucleotide sequence for the gene encoding the LCC protein was modified to optimize codon usage for E. coli.
  • a graphical display for the native gene (SEQ ID NO: 121) encoding the LCC protein (SEQ ID NO: 122) in E. coli was prepared by plotting z scores of translational kinetics values for codon pair utilization in E. coli as a function of codon pair position. The graphical display is provided in Figure 29A.
  • the nucleotide sequence for the gene encoding the LCC protein was modified to no longer contain codon pairs having z scores in E. coli greater than 3.
  • the resulting nucleotide sequence (SEQ ID NO: 129) was found to encode a protein (SEQ ID NO: 130) with 100% amino acid sequence identity to wild-type LCC (SEQ ID NO: 122).
  • a graphical display for the codon pair utilization-modified gene (SEQ ID NO: 129) encoding the LCC protein (SEQ ID NO: 130) expressed in E. coli was prepared by plotting z scores of translational kinetics values for codon pair utilization in E. coli as a function of codon pair position. The graphical display is provided in Figure 29B.
  • This example describes optimization of a DNA sequence encoding LCC for expression in P. pastoris.
  • Chi-squared values for P. pastoris were determined as described in Example 1 , with the following differences. Briefly, non-redundant protein coding regions for P. pastoris were obtained from GenBank sequences to determine an observed number of occurrences for each codon pair. The expected number of occurrences of each codon pair was calculated under the assumption that the codon pairs are used randomly. Chi- squared values chisql, chisq2, chisq3 and z scores of chisq3 were calculated as described in Example 1. [0570] The nucleotide sequence for the gene encoding the LCC protein was modified to optimize codon usage for P. pastoris.
  • a graphical display for the native gene (SEQ ID NO: 121) encoding the LCC protein (SEQ ID NO: 122) in P. pastoris was prepared by plotting z scores of translational kinetics values for codon pair utilization in P. pastoris as a function of codon pair position.
  • the graphical display is provided in Figure 30A.
  • the nucleotide sequence for the gene encoding the LCC protein was modified to no longer contain codon pairs having z scores in P. pastoris greater than 3.
  • the resulting nucleotide sequence (SEQ ID NO: 135) was found to encode a protein (SEQ ID NO: 136) with 100% amino acid sequence identity to wild-type LCC (SEQ ID NO: 122).
  • a graphical display for the codon pair utilization-modified gene (SEQ ID NO: 135) encoding the LCC protein (SEQ ID NO: 136) expressed in P. pastoris was prepared by plotting z scores of translational kinetics values for codon pair utilization in P. pastoris as a function of codon pair position. The graphical display is provided in Figure 30B.
  • This example describes optimization of a DNA sequence encoding LCC for expression in K. lactis.
  • Chi-squared values for K. lactis were determined as described in Example 1 , with the following differences. Briefly, non-redundant protein coding regions for K. lactis were obtained from GenBank sequences to determine an observed number of occurrences for each codon pair. The expected number of occurrences of each codon pair was calculated under the assumption that the codon pairs are used randomly. Chi-squared values chisql, chisq2, chisq3 and z scores of chisq3 were calculated as described in Example 1.
  • the nucleotide sequence for the gene encoding the LCC protein was modified to optimize codon usage for K. lactis.
  • a graphical display for the native gene (SEQ ID NO: 121) encoding the LCC protein (SEQ ID NO: 122) in K. lactis was prepared by plotting z scores of translational kinetics values for codon pair utilization in K. lactis as a function of codon pair position. The graphical display is provided in Figure 31A.
  • the nucleotide sequence for the gene encoding the LCC protein was modified to no longer contain codon pairs having z scores in K. lactis greater than 3.
  • the resulting nucleotide sequence (SEQ ID NO: 141) was found to encode a protein (SEQ ID NO: 142) with 100% amino acid sequence identity to wild-type LCC (SEQ ID NO: 122).
  • a graphical display for the codon pair utilization-modified gene (SEQ ID NO: 141) encoding the LCC protein (SEQ ID NO: 142) expressed in K. lactis was prepared by plotting z scores of translational kinetics values for codon pair utilization in K. lactis as a function of codon pair position. The graphical display is provided in Figure 3 IB.
  • This example describes optimization of a DNA sequence encoding LCC for expression in Z. mobilis.
  • Chi-squared values for Z mobilis were determined as described in Example 1 , with the following differences. Briefly, non-redundant protein coding regions for Z. mobilis were obtained from GenBank sequences to determine an observed number of occurrences for each codon pair. The expected number of occurrences of each codon pair was calculated under the assumption that the codon pairs are used randomly. Chi- squared values chisql , chisq2, chisq3 and z scores of chisq3 were calculated as described in Example 1.
  • the nucleotide sequence for the gene encoding the LCC protein was modified to optimize codon usage for Z. mobilis.
  • a graphical display for the native gene (SEQ ID NO: 121) encoding the LCC protein (SEQ ID NO: 122) in Z mobilis was prepared by plotting z scores of translational kinetics values for codon pair utilization in Z. mobilis as a function of codon pair position. The graphical display is provided in Figure 32A.
  • the nucleotide sequence for the gene encoding the LCC protein was modified to no longer contain codon pairs having z scores in Z mobilis greater than 3.
  • the resulting nucleotide sequence (SEQ ID NO: 143) was found to encode a protein (SEQ ID NO: 144) with 100% amino acid sequence identity to wild-type LCC (SEQ ID NO: 122).
  • a graphical display for the codon pair utilization-modified gene (SEQ ID NO: 143) encoding the LCC protein (SEQ ID NO: 144) expressed in Z mobilis was prepared by plotting z scores of translational kinetics values for codon pair utilization in Z mobilis as a function of codon pair position. The graphical display is provided in Figure 32B.
  • E. coli expression in E. coli of the codon optimized, codon pair utilization- based modification (Hot-Rod) from Example 32 and native LCC protein is examined by Western blot analysis.
  • Each vector is transformed into E. coli strain Top 10 (F-mcrA ⁇ (mrr-hsdRMS-mcrBQ ⁇ 80lacZ ⁇ M15 UacX74 deoR recAl araD139 ⁇ (ara-leu) 7697 galU galK rpsL (StrR) endAl nupG).
  • An overnight culture is inoculated at 1 :100 into 5 ml of LB medium plus lOO ⁇ g/ml ampicillin and grown at 37°C to OD 6 oo of 0.5.
  • Protein expression is induced by addition of 0.002 or 0.02% L-arabinose and grown for 3hrs at 37°C. Cells are harvested by centrifugation and the cell pellets are resuspended in phosphate buffered saline. Cells are disrupted by sonication and supernatant and pellet fractions are resolved in a 4-20% SDS-polyacrylamide gel (Pierce). Proteins are transferred to Immobilon-P (Millipore, Bedford, MA) and are incubated with rabbit polyclonal anti-LCC antibody diluted 1 :20,000. Rabbit IgG is visualized using a HRP- conjugated secondary antibody and ECL + Plus (Amersham, Buckinghamshire, UK) according to manufacturer's instructions.
  • This example describes optimization of a DNA sequence encoding LCC for expression in yeast.
  • the chi-squared value "chisql” was generated by the expected and observed values determined.
  • the chsql was recalculated to remove any influence of non-randomness in amino acid pair frequencies, yielding "chisq2.”
  • the chsq2 was re-calculated to remove any influence of non- randomness in dinucleotide frequencies, yielding "chisq3.”
  • z scores of chisq3 were calculated by determining the mean chisq3 value and corresponding standard deviation for all codon pairs, and normalizing each chisq3 value to be reported in terms of number of standard deviations from the mean chisq3 values.
  • a graphical display for the native gene (SEQ ID NO: 145) encoding the LCC protein (SEQ ID NO: 146) in S. cerevisiae was prepared by plotting z scores of translational kinetics values for codon pair utilization in S. cerevisiae as a function of codon pair position.
  • the graphical display is provided in Figure 33 A.
  • the nucleotide sequence for the gene encoding the LCC protein was modified to no longer contain codon pairs having z scores in S. cerevisiae greater than 3.
  • the resulting nucleotide sequence (SEQ ID NO: 147) was found to encode a protein (SEQ ID NO: 148) with 100% amino acid sequence identity to wild-type LCC (SEQ ID NO: 146).
  • a graphical display for the codon pair utilization-modified gene (SEQ ID NO: 147) encoding the LCC protein (SEQ ID NO: 148) expressed in S. cerevisiae was prepared by plotting z scores of translational kinetics values for codon pair utilization in S. cerevisiae as a function of codon pair position. The graphical display is provided in Figure 33 B.
  • This example describes optimization of a DNA sequence encoding LCC for expression in bacteria.
  • Chi-squared values for E. coli were determined as described in Example 1 , with the following differences. Briefly, non-redundant protein coding regions for E. coli were obtained from GenBank sequence database (75,096 codon pairs in 237 sequences for E. coli) to determine an observed number of occurrences for each codon pair. The expected number of occurrences of each codon pair was calculated under the assumption that the codon pairs are used randomly. Chi-squared values chisql, chisq2, chisq3 and z scores of chisq3 were calculated as described in Example 1.
  • the nucleotide sequence for the gene encoding the LCC protein was modified to optimize codon usage for E. coli.
  • a graphical display for the native gene (SEQ ID NO: 145) encoding the LCC protein (SEQ ID NO: 146) in E. coli was prepared by plotting z scores of translational kinetics values for codon pair utilization in E. coli as a function of codon pair position. The graphical display is provided in Figure 34A.
  • the nucleotide sequence for the gene encoding the LCC protein was modified to no longer contain codon pairs having z scores in E. coli greater than 3.
  • the resulting nucleotide sequence (SEQ ID NO: 153) was found to encode a protein (SEQ ID NO: 154) with 100% amino acid sequence identity to wild-type LCC (SEQ ID NO: 146).
  • a graphical display for the codon pair utilization-modified gene (SEQ ID NO: 153) encoding the LCC protein (SEQ ID NO: 154) expressed in E. coli was prepared by plotting z scores of translational kinetics values for codon pair utilization in E. coli as a function of codon pair position. The graphical display is provided in Figure 34B.
  • This example describes optimization of a DNA sequence encoding LCC for expression in P. pastoris.
  • Chi-squared values for P. pastoris were determined as described in Example 1 , with the following differences. Briefly, non-redundant protein coding regions for P. pastoris were obtained from GenBank sequences to determine an observed number of occurrences for each codon pair. The expected number of occurrences of each codon pair was calculated under the assumption that the codon pairs are used randomly. Chi- squared values chisql, chisq2, chisq3 and z scores of chisq3 were calculated as described in Example 1.
  • the nucleotide sequence for the gene encoding the LCC protein was modified to optimize codon usage for P. pastoris.
  • a graphical display for the native gene (SEQ ID NO: 145) encoding the LCC protein (SEQ ID NO: 146) in P. pastoris was prepared by plotting z scores of translational kinetics values for codon pair utilization in P. pastoris as a function of codon pair position. The graphical display is provided in Figure 35 A.
  • the nucleotide sequence for the gene encoding the LCC protein was modified to no longer contain codon pairs having z scores in P. pastoris greater than 3.
  • the resulting nucleotide sequence (SEQ ID NO: 159) was found to encode a protein (SEQ ID NO: 160) with 100% amino acid sequence identity to wild-type LCC (SEQ ID NO: 146).
  • a graphical display for the codon pair utilization-modified gene (SEQ ID NO: 159) encoding the LCC protein (SEQ ID NO: 160) expressed in P. pastoris was prepared by plotting z scores of translational kinetics values for codon pair utilization in P. pastoris as a function of codon pair position. The graphical display is provided in Figure 35B.
  • EXAMPLE 40 [0595] This example describes optimization of a DNA sequence encoding LCC for expression in K. lactis.
  • Chi-squared values for K. lactis were determined as described in Example 1 , with the following differences. Briefly, non-redundant protein coding regions for K. lactis were obtained from GenBank sequences to determine an observed number of occurrences for each codon pair. The expected number of occurrences of each codon pair was calculated under the assumption that the codon pairs are used randomly. Chi-squared values chisql , chisq2, chisq3 and z scores of chisq3 were calculated as described in Example 1.
  • the nucleotide sequence for the gene encoding the LCC protein was modified to optimize codon usage for K. lactis.
  • a graphical display for the native gene (SEQ ID NO: 145) encoding the LCC protein (SEQ ID NO: 146) in K. lactis was prepared by plotting z scores of translational kinetics values for codon pair utilization in K. lactis as a function of codon pair position. The graphical display is provided in Figure 36A.
  • the nucleotide sequence for the gene encoding the LCC protein was modified to no longer contain codon pairs having z scores in K. lactis greater than 3.
  • the resulting nucleotide sequence (SEQ ID NO: 165) was found to encode a protein (SEQ ID NO: 166) with 100% amino acid sequence identity to wild-type LCC (SEQ ID NO: 146).
  • a graphical display for the codon pair utilization-modified gene (SEQ ID NO: 165) encoding the LCC protein (SEQ ID NO: 166) expressed in K. lactis was prepared by plotting z scores of translational kinetics values for codon pair utilization in K. lactis as a function of codon pair position. The graphical display is provided in Figure 36B.
  • This example describes optimization of a DNA sequence encoding LCC for expression in Z. mobilis.
  • Chi-squared values for Z mobilis were determined as described in Example 1 , with the following differences. Briefly, non-redundant protein coding regions for Z mobilis were obtained from GenBank sequences to determine an observed number of occurrences for each codon pair. The expected number of occurrences of each codon pair was calculated under the assumption that the codon pairs are used randomly. Chi- squared values chisql, chisq2, chisq3 and z scores of chisq3 were calculated as described in Example 1. [0601] The nucleotide sequence for the gene encoding the LCC protein was modified to optimize codon usage for Z mobilis.
  • a graphical display for the native gene (SEQ ID NO: 145) encoding the LCC protein (SEQ ID NO: 146) in Z mobilis was prepared by plotting z scores of translational kinetics values for codon pair utilization in Z mobilis as a function of codon pair position.
  • the graphical display is provided in Figure 37A.
  • the nucleotide sequence for the gene encoding the LCC protein was modified to no longer contain codon pairs having z scores in Z mobilis greater than 3.
  • the resulting nucleotide sequence (SEQ ID NO: 167) was found to encode a protein (SEQ ID NO: 168) with 100% amino acid sequence identity to wild-type LCC (SEQ ID NO: 146).
  • a graphical display for the codon pair utilization-modified gene (SEQ ID NO: 167) encoding the LCC protein (SEQ ID NO: 168) expressed in Z mobilis was prepared by plotting z scores of translational kinetics values for codon pair utilization in Z. mobilis as a function of codon pair position. The graphical display is provided in Figure 37B.
  • E. coli expression in E. coli of the codon optimized, codon pair utilization- based modification (Hot-Rod) from Example 38 and native LCC protein is examined by Western blot analysis.
  • Each vector is transformed into E. coli strain Top 10 (F-mcrA ⁇ (mrr-hsdRMS-mcrBQ ⁇ 80lacZ ⁇ M15 UacX74 deoR recAl araD139 ⁇ (ara-leu) 7697 galU galK rpsL (StrR) endAl nupG).
  • An overnight culture is inoculated at 1 :100 into 5 ml of LB medium plus lOO ⁇ g/ml ampicillin and grown at 37°C to OD 600 of 0.5.
  • Protein expression is induced by addition of 0.002 or 0.02% L-arabinose and grown for 3hrs at 37°C. Cells are harvested by centrifugation and the cell pellets are resuspended in phosphate buffered saline. Cells are disrupted by sonication and supernatant and pellet fractions are resolved in a 4-20% SDS-polyacrylamide gel (Pierce). Proteins are transferred to Immobilon-P (Millipore, Bedford, MA) and are incubated with rabbit polyclonal anti-LCC antibody diluted 1 :20,000. Rabbit IgG is visualized using a HRP- conjugated secondary antibody and ECL + Plus (Amersham, Buckinghamshire, UK) according to manufacturer's instructions.
  • This example describes optimization of a DNA sequence encoding enzyme of T. Reesei cellobiohydrolase-I (TrCBH-I) for expression in yeast.
  • the chi-squared value "chisql” was generated by the expected and observed values determined.
  • the chsql was recalculated to remove any influence of non-randomness in amino acid pair frequencies, yielding "chisq2.”
  • the chsq2 was re-calculated to remove any influence of non- randomness in dinucleotide frequencies, yielding "chisq3.”
  • z scores of chisq3 were calculated by determining the mean chisq3 value and corresponding standard deviation for all codon pairs, and normalizing each chisq3 value to be reported in terms of number of standard deviations from the mean chisq3 values.
  • the nucleotide sequence for the gene encoding the TrCBH-I protein was modified to optimize codon usage for S. cerevisiae.
  • the DNA sequence encoding TrCBH-I (SEQ ID NO: 169) was derived from GenBank accession number Ml 6190 by removing untranslated sequence (5' untranslated region and introns).
  • a graphical display for the native gene (SEQ ID NO: 169) encoding the protein (SEQ ID NO: 170) in T. reesei was prepared by plotting z scores of translational kinetics values for codon pair utilization in T. reesei as a function of codon pair position.
  • the graphical display is provided in Figure 38.
  • a graphical display for the native gene (SEQ ID NO: 169) encoding the TrCBH-I protein (SEQ ID NO: 170) in S. cerevisiae was prepared by plotting z scores of translational kinetics values for codon pair utilization in S. cerevisiae as a function of codon pair position.
  • the graphical display is provided in Figure 39A.
  • the nucleotide sequence for the gene encoding the TrCBH-I protein was modified to no longer contain codon pairs having z scores in 5. cerevisiae greater than 3.
  • the resulting nucleotide sequence (SEQ ID NO: 171) was found to encode a protein (SEQ ID NO: 172) with 100% amino acid sequence identity to wild-type TrCBH- I (SEQ ID NO: 170).
  • a graphical display for the codon pair utilization-modified gene (SEQ ID NO: 171) encoding the TrCBH-I protein (SEQ ID NO: 172) expressed in S. cerevisiae was prepared by plotting z scores of translational kinetics values for codon pair utilization in S. cerevisiae as a function of codon pair position. The graphical display is provided in Figure 39B.
  • This example describes optimization of a DNA sequence encoding TrCBH-I for expression in bacteria.
  • Chi-squared values for E. coli were determined as described in Example 1 , with the following differences. Briefly, non-redundant protein coding regions for E. coli were obtained from GenBank sequence database (75,096 codon pairs in 237 sequences for E. coli) to determine an observed number of occurrences for each codon pair. The expected number of occurrences of each codon pair was calculated under the assumption that the codon pairs are used randomly. Chi-squared values chisql, chisq2, chisq3 and z scores of chisq3 were calculated as described in Example 1.
  • the nucleotide sequence for the gene encoding the TrCBH-I protein was modified to optimize codon usage for E. coli.
  • a graphical display for the native gene (SEQ ID NO: 169) encoding the TrCBH-I protein (SEQ ID NO: 170) in E. coli was prepared by plotting z scores of translational kinetics values for codon pair utilization in E. coli as a function of codon pair position. The graphical display is provided in Figure 40A.
  • the nucleotide sequence for the gene encoding the TrCBH-I protein was modified to no longer contain codon pairs having z scores in E. coli greater than 3.
  • the resulting nucleotide sequence (SEQ ID NO: 173) was found to encode a protein (SEQ ID NO: 174) with 100% amino acid sequence identity to wild-type TrCBH-I (SEQ ID NO: 170).
  • a graphical display for the codon pair utilization-modified gene (SEQ ID NO: 173) encoding the TrCBH-I protein (SEQ ID NO: 174) expressed in E. coli was prepared by plotting z scores of translational kinetics values for codon pair utilization in E. coli as a function of codon pair position. The graphical display is provided in Figure 4OB.
  • EXAMPLE 45 This example describes optimization of a DNA sequence encoding TrCBH-I for expression in P. pastoris.
  • Chi-squared values for P. pastoris were determined as described in Example 1 , with the following differences. Briefly, non-redundant protein coding regions for P. pastoris were obtained from GenBank sequences to determine an observed number of occurrences for each codon pair. The expected number of occurrences of each codon pair was calculated under the assumption that the codon pairs are used randomly. Chi- squared values chisql, chisq2, chisq3 and z scores of chisq3 were calculated as described in Example 1.
  • the nucleotide sequence for the gene encoding the TrCBH-I protein was modified to optimize codon usage for P. pastoris.
  • a graphical display for the native gene (SEQ ID NO: 169) encoding the TrCBH-I protein (SEQ ID NO: 170) in P. pastoris was prepared by plotting z scores of translational kinetics values for codon pair utilization in P. pastoris as a function of codon pair position. The graphical display is provided in Figure 4 IA.
  • the nucleotide sequence for the gene encoding the TrCBH-I protein was modified to no longer contain codon pairs having z scores in P. pastoris greater than 3.
  • the resulting nucleotide sequence (SEQ ID NO: 175) was found to encode a protein (SEQ ID NO: 176) with 100% amino acid sequence identity to wild-type TrCBH-I (SEQ ID NO: 170).
  • a graphical display for the codon pair utilization-modified gene (SEQ ID NO: 175) encoding the TrCBH-I protein (SEQ ID NO: 176) expressed in P. pastoris was prepared by plotting z scores of translational kinetics values for codon pair utilization in P. pastoris as a function of codon pair position. The graphical display is provided in Figure 4 IB.
  • This example describes optimization of a DNA sequence encoding TrCBH-I for expression in K. lactis.
  • Chi-squared values for K. lactis were determined as described in Example 1 , with the following differences. Briefly, non-redundant protein coding regions for K. lactis were obtained from GenBank sequences to determine an observed number of occurrences for each codon pair. The expected number of occurrences of each codon pair was calculated under the assumption that the codon pairs are used randomly. Chi-squared values chisql , chisq2, chisq3 and z scores of chisq3 were calculated as described in Example 1.
  • the nucleotide sequence for the gene encoding the TrCBH-I protein was modified to optimize codon usage for K. lactis.
  • a graphical display for the native gene (SEQ ID NO: 169) encoding the TrCBH-I protein (SEQ ID NO: 170) in K. lactis was prepared by plotting z scores of translational kinetics values for codon pair utilization in K. lactis as a function of codon pair position. The graphical display is provided in Figure 42A.
  • the nucleotide sequence for the gene encoding the TrCBH-I protein was modified to no longer contain codon pairs having z scores in K. lactis greater than 3.
  • the resulting nucleotide sequence (SEQ ID NO: 177) was found to encode a protein (SEQ ID NO: 178) with 100% amino acid sequence identity to wild-type TrCBH-I (SEQ ID NO: 170).
  • a graphical display for the codon pair utilization-modified gene (SEQ ID NO: 177) encoding the TrCBH-I protein (SEQ ID NO: 178) expressed in K. lactis was prepared by plotting z scores of translational kinetics values for codon pair utilization in K. lactis as a function of codon pair position. The graphical display is provided in Figure 42B.
  • This example describes optimization of a DNA sequence encoding TrCBH-I for expression in Z. mobilis.
  • Chi-squared values for Z. mobilis were determined as described in Example 1 , with the following differences. Briefly, non-redundant protein coding regions for Z mobilis were obtained from GenBank sequences to determine an observed number of occurrences for each codon pair. The expected number of occurrences of each codon pair was calculated under the assumption that the codon pairs are used randomly. Chi- squared values chisql, chisq2, chisq3 and z scores of chisq3 were calculated as described in Example 1.
  • the nucleotide sequence for the gene encoding the TrCBH-I protein was modified to optimize codon usage for Z. mobilis.
  • a graphical display for the native gene (SEQ ID NO: 169) encoding the TrCBH-I protein (SEQ ID NO: 170) in Z mobilis was prepared by plotting z scores of translational kinetics values for codon pair utilization in Z. mobilis as a function of codon pair position. The graphical display is provided in Figure 43A.
  • the nucleotide sequence for the gene encoding the TrCBH-I protein was modified to no longer contain codon pairs having z scores in Z mobilis greater than 3.
  • the resulting nucleotide sequence (SEQ ID NO: 179) was found to encode a protein (SEQ ID NO: 180) with 100% amino acid sequence identity to wild-type TrCBH-I (SEQ ID NO: 170).
  • a graphical display for the codon pair utilization-modified gene (SEQ ID NO: 179) encoding the TrCBH-I protein (SEQ ID NO: 180) expressed in Z. mobilis was prepared by plotting z scores of translational kinetics values for codon pair utilization in Z mobilis as a function of codon pair position. The graphical display is provided in Figure 43 B.
  • Protein expression is induced by addition of 0.002 or 0.02% L-arabinose and grown for 3hrs at 37°C. Cells are harvested by centrifugation and the cell pellets are resuspended in phosphate buffered saline. Cells are disrupted by sonication and supernatant and pellet fractions are resolved in a 4-20% SDS-polyacrylamide gel (Pierce). Proteins are transferred to Immobilon-P (Millipore, Bedford, MA) and are incubated with rabbit polyclonal anti-CBH-II antibody diluted 1 :20,000. Rabbit IgG is visualized using a HRP- conjugated secondary antibody and ECL + Plus (Amersham, Buckinghamshire, UK) according to manufacturer's instructions.
  • This example describes optimization of a DNA sequence encoding T. aurantiacus endoglucanase (EGl) for expression in yeast.
  • the chi-squared value "chisql” was generated by the expected and observed values determined.
  • the chsql was recalculated to remove any influence of non-randomness in amino acid pair frequencies, yielding "chisq2.”
  • the chsq2 was re-calculated to remove any influence of non- randomness in dinucleotide frequencies, yielding "chisq3.”
  • z scores of chisq3 were calculated by determining the mean chisq3 value and corresponding standard deviation for all codon pairs, and normalizing each chisq3 value to be reported in terms of number of standard deviations from the mean chisq3 values.
  • the nucleotide sequence for the gene encoding the EGl protein was modified to optimize codon usage for S. cerevisiae.
  • the DNA sequence encoding EGl (SEQ ID NO: 181) was derived from GenBank accession number M16190 by removing untranslated sequence (5' untranslated region and introns).
  • a graphical display for the native gene (SEQ ID NO: 181) encoding the EGl protein (SEQ ID NO: 182) in S. cerevisiae was prepared by plotting z scores of translational kinetics values for codon pair utilization in S. cerevisiae as a function of codon pair position.
  • the graphical display is provided in Figure 44A.
  • the nucleotide sequence for the gene encoding the EGl protein was modified to no longer contain codon pairs having z scores in S. cerevisiae greater than 3.
  • the resulting nucleotide sequence (SEQ ID NO: 183) was found to encode a protein (SEQ ID NO: 184) with 100% amino acid sequence identity to wild-type EGl (SEQ ID NO: 182).
  • a graphical display for the codon pair utilization-modified gene (SEQ ID NO: 183) encoding the EGl protein (SEQ ID NO: 184) expressed in S. cerevisiae was prepared by plotting z scores of translational kinetics values for codon pair utilization in S. cerevisiae as a function of codon pair position. The graphical display is provided in Figure 44B.

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Organic Chemistry (AREA)
  • Engineering & Computer Science (AREA)
  • Wood Science & Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Zoology (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Biochemistry (AREA)
  • Biomedical Technology (AREA)
  • Biotechnology (AREA)
  • Molecular Biology (AREA)
  • Microbiology (AREA)
  • Medicinal Chemistry (AREA)
  • Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Plant Pathology (AREA)
  • Preparation Of Compounds By Using Micro-Organisms (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)

Abstract

La présente invention concerne des séquences de polynucléotides et de gènes synthétiques, codant pour des enzyme dégradant la cellulose et l'hémicellulose, aptes à s'exprimer dans un organisme hôte et ayant une cinétique traductionnelle améliorée et/ou raffinée, ainsi que des procédés de fabrication correspondant. Le nucléotide ainsi obtenu codant pour l'enzyme dégradant la cellulose et l'hémicellulose devrait être traduit rapidement sur toute sa longueur. L'expression du nucléotide codant pour l'enzyme dégradant la cellulose et l'hémicellulose résultant devrait générer des niveaux d'expression protéique améliorés dans les cas où des pauses de traduction inappropriées ou excessives réduisent l'expression de la protéine. En outre, l'expression du nucléotide codant pour l'enzyme dégradant la cellulose et l'hémicellulose résultant devrait générer des niveaux d'expression des polypeptides actifs et/ou repliés sous forme native et fonctionnels améliorés dans les cas où des pauses de traduction inappropriées ou excessives induisent l'expression d'une enzyme dégradant le cellulose et l'hémicellulose inactive, insoluble, agglomérée ou dysfonctionnelle ou très peu active.
PCT/US2008/006379 2007-06-29 2008-05-14 Séquences nucléotidiques codant pour l'enzyme dégradant la cellulose et l'hémicellulose et ayant une cinétique traductionnelle raffinée, et procédé de production correspondant WO2009005564A2 (fr)

Applications Claiming Priority (12)

Application Number Priority Date Filing Date Title
US94727707P 2007-06-29 2007-06-29
US94708607P 2007-06-29 2007-06-29
US94732907P 2007-06-29 2007-06-29
US60/947,277 2007-06-29
US60/947,086 2007-06-29
US60/947,329 2007-06-29
US94749607P 2007-07-02 2007-07-02
US94761707P 2007-07-02 2007-07-02
US60/947,617 2007-07-02
US60/947,496 2007-07-02
US94778407P 2007-07-03 2007-07-03
US60/947,784 2007-07-03

Publications (2)

Publication Number Publication Date
WO2009005564A2 true WO2009005564A2 (fr) 2009-01-08
WO2009005564A3 WO2009005564A3 (fr) 2009-03-05

Family

ID=39734194

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2008/006379 WO2009005564A2 (fr) 2007-06-29 2008-05-14 Séquences nucléotidiques codant pour l'enzyme dégradant la cellulose et l'hémicellulose et ayant une cinétique traductionnelle raffinée, et procédé de production correspondant

Country Status (1)

Country Link
WO (1) WO2009005564A2 (fr)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120149065A1 (en) * 2010-11-15 2012-06-14 Edeniq, Inc. Use of manganese peroxidase for enzymatic hydrolysis of lignocellulosic material
WO2014028773A3 (fr) * 2012-08-16 2014-04-17 Bangladesh Jute Research Institute Enzymes d'altération de la lignine provenant de macrophomia phaseolina et utilisations de celles-ci
CN104357414A (zh) * 2014-11-28 2015-02-18 上海市农业科学院 来源于双色蜡蘑的漆酶基因及其应用
CN104388441A (zh) * 2014-10-30 2015-03-04 上海市农业科学院 来源于瓜类炭疽病菌的漆酶基因及其应用

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5082767A (en) * 1989-02-27 1992-01-21 Hatfield G Wesley Codon pair utilization
WO2003070957A2 (fr) * 2002-02-20 2003-08-28 Novozymes A/S Production de polypeptides de plantes
WO2007130650A2 (fr) * 2006-05-04 2007-11-15 The Regents Of The University Of California Procédés de calcul de valeurs cinétiques translationelles à base de paire de codons, et procédés de production de séquences nucléotidiques codant pour un polypeptide à partir de ces valeurs
WO2008000632A1 (fr) * 2006-06-29 2008-01-03 Dsm Ip Assets B.V. Procédé pour obtenir une expression de polypeptides améliorée

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5082767A (en) * 1989-02-27 1992-01-21 Hatfield G Wesley Codon pair utilization
WO2003070957A2 (fr) * 2002-02-20 2003-08-28 Novozymes A/S Production de polypeptides de plantes
WO2007130650A2 (fr) * 2006-05-04 2007-11-15 The Regents Of The University Of California Procédés de calcul de valeurs cinétiques translationelles à base de paire de codons, et procédés de production de séquences nucléotidiques codant pour un polypeptide à partir de ces valeurs
WO2007130606A2 (fr) * 2006-05-04 2007-11-15 The Regents Of The University Of California Analyse de cinétique translationnelle utilisant des afficheurs graphiques de valeurs cinétiques translationnelles de paires de codon
WO2008000632A1 (fr) * 2006-06-29 2008-01-03 Dsm Ip Assets B.V. Procédé pour obtenir une expression de polypeptides améliorée

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
DATABASE EMBL [Online] 25 November 2003 (2003-11-25), "Pycnoporus sanguineus laccase mRNA, complete cds." XP002496141 retrieved from EBI accession no. EMBL:AY458017 Database accession no. AY458017 *
GONZALEZ ET AL: "Identification of a new laccase gene and confirmation of genomic predictions by cDNA sequences of Trametes sp. I-62 laccase family" MYCOLOGICAL RESEARCH, ELSEVIER, GB, vol. 107, no. 6, 1 June 2003 (2003-06-01), pages 727-735, XP022443249 ISSN: 0953-7562 *
GUSTAFSSON C ET AL: "Codon bias and heterologous protein expression" TRENDS IN BIOTECHNOLOGY, ELSEVIER PUBLICATIONS, CAMBRIDGE, GB, vol. 22, no. 7, 1 July 2004 (2004-07-01), pages 346-353, XP004520507 ISSN: 0167-7799 *
HATFIELD G WESLEY ET AL: "Optimizing scaleup yield for protein production: Computationally Optimized DNA Assembly (CODA) and Translation Engineering(TM)" BIOTECHNOLOGY ANNUAL REVIEW, XX, XX, vol. 13, 1 January 2007 (2007-01-01), pages 27-42, XP009092735 *
HATFIELD G.W. ET AL.: "Codon pair utilization bias in bacteria, yeast and mammals" [Online] 1993, CRC PRESS , BOCA RATON, LA , XP002495824 Retrieved from the Internet: URL:http://www.codagenomics.com/technology/pubs/Synthesis_Book%20Chapter%207.pdf> [retrieved on 2008-09-15] cited in the application the whole document *
IRWIN B ET AL: "codon pair utilization biases influence translational elongation step times" JOURNAL OF BIOLOGICAL CHEMISTRY, AMERICAN SOCIETY OF BIOLOCHEMICAL BIOLOGISTS, BIRMINGHAM,; US, vol. 270, no. 39, 29 September 1995 (1995-09-29), pages 22801-22806, XP002406003 ISSN: 0021-9258 *
KITTLE J.D., JR: "Radical Changes in the Engineering of Synthetic Genes for Protein Expression" BIOPHARM INTERNATIONAL, [Online] February 2006 (2006-02), XP002495822 Retrieved from the Internet: URL:http://www.codagenomics.com/technology/pubs/Biopharm_pdf_BP3-61-06e.pdf> [retrieved on 2008-09-15] *
TRINH RYAN ET AL: "Optimization of codon pair use within the (GGGGS)3 linker sequence results in enhanced protein expression." MOLECULAR IMMUNOLOGY, vol. 40, no. 10, January 2004 (2004-01), pages 717-722, XP002495823 ISSN: 0161-5890 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120149065A1 (en) * 2010-11-15 2012-06-14 Edeniq, Inc. Use of manganese peroxidase for enzymatic hydrolysis of lignocellulosic material
US8686123B2 (en) 2010-11-15 2014-04-01 Edeniq, Inc. Use of manganese peroxidase for enzymatic hydrolysis of lignocellulosic material
WO2014028773A3 (fr) * 2012-08-16 2014-04-17 Bangladesh Jute Research Institute Enzymes d'altération de la lignine provenant de macrophomia phaseolina et utilisations de celles-ci
CN104388441A (zh) * 2014-10-30 2015-03-04 上海市农业科学院 来源于瓜类炭疽病菌的漆酶基因及其应用
CN104357414A (zh) * 2014-11-28 2015-02-18 上海市农业科学院 来源于双色蜡蘑的漆酶基因及其应用

Also Published As

Publication number Publication date
WO2009005564A3 (fr) 2009-03-05

Similar Documents

Publication Publication Date Title
US10179904B2 (en) Polypeptides having cellobiohydrolase activity and polynucleotides encoding same
US10883129B2 (en) Methods and compositions for degrading cellulosic material
US20200157517A1 (en) Methods for enhancing the degradation or conversion of cellulosic material
US11891637B2 (en) GH61 polypeptide variants and polynucleotides encoding same
DK3152315T3 (en) ENZYME COMPOSITIONS AND APPLICATIONS THEREOF
US20220279818A1 (en) Enzyme blends and processes for producing a high protein feed ingredient from a whole stillage byproduct
CA2905033C (fr) Expression de beta-glucosidases pour l'hydrolyse de la lignocellulose et oligomeres associes
WO2010096562A2 (fr) Cellules de levure exprimant un cellulosome exogène et procédés d'utilisation de celles-ci
NZ581258A (en) Methods of increasing the cellulolytic enhancing activity of a polypeptide
WO2010005553A1 (fr) Isolement et caractérisation de cellobiohydrolase i (cbh 1) de schizochytrium aggregatum
CA2778998A1 (fr) Expression heterologue de genes fongiques de la cellobiohydrolase 2 presents dans la levure
WO2009005564A2 (fr) Séquences nucléotidiques codant pour l'enzyme dégradant la cellulose et l'hémicellulose et ayant une cinétique traductionnelle raffinée, et procédé de production correspondant
US10577594B2 (en) Polypeptides having arabinofuranosidase activity and polynucleotides encoding same
CN111094562A (zh) 具有海藻糖酶活性的多肽及其在产生发酵产物的方法中的用途
WO2008137958A1 (fr) Séquences de nucléotides codant pour la cellobiohydrolase ayant une cinétique traductionnelle raffinée et procédés pour leur préparation
WO2008144012A2 (fr) Séquences de nucléotide codant une enzyme de métabolisation de xylose et d'arabinose, avec une cinétique translationnelle affinée et procédés de réalisation
US11053489B2 (en) Cellobiohydrolase variants and polynucleotides encoding same
WO2023170628A1 (fr) Alpha-amylases bacteriennes et archaéales
US8753860B1 (en) Polypeptides having cellobiohydrolase activity and polynucleotides encoding same
US20190093093A1 (en) Polypeptides Having Cellobiohydrolase Activity And Polynucleotides Encoding Same
US20200087645A1 (en) Polypeptides Having Xylanase Activity And Polynucleotides Encoding Same
WO2018026868A1 (fr) Polypeptides présentant une activité endoglucanase et polynucléotides codant pour ceux-ci
US20180216089A1 (en) Polypeptides Having Beta-Xylosidase Activity And Polynucleotides Encoding Same

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08767804

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08767804

Country of ref document: EP

Kind code of ref document: A2