WO2010091441A2 - Cellobiohydrolases chimériques fonctionnelles stables - Google Patents

Cellobiohydrolases chimériques fonctionnelles stables Download PDF

Info

Publication number
WO2010091441A2
WO2010091441A2 PCT/US2010/027248 US2010027248W WO2010091441A2 WO 2010091441 A2 WO2010091441 A2 WO 2010091441A2 US 2010027248 W US2010027248 W US 2010027248W WO 2010091441 A2 WO2010091441 A2 WO 2010091441A2
Authority
WO
WIPO (PCT)
Prior art keywords
seq
segment
amino acid
residue
sequence
Prior art date
Application number
PCT/US2010/027248
Other languages
English (en)
Other versions
WO2010091441A9 (fr
WO2010091441A3 (fr
Inventor
Frances H. Arnold
Pete Heinzelman
Jeremy Minshull
Sridhar Govindarajan
Alan Villalobos
Original Assignee
California Institute Of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by California Institute Of Technology filed Critical California Institute Of Technology
Publication of WO2010091441A2 publication Critical patent/WO2010091441A2/fr
Publication of WO2010091441A9 publication Critical patent/WO2010091441A9/fr
Publication of WO2010091441A3 publication Critical patent/WO2010091441A3/fr

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12PFERMENTATION OR ENZYME-USING PROCESSES TO SYNTHESISE A DESIRED CHEMICAL COMPOUND OR COMPOSITION OR TO SEPARATE OPTICAL ISOMERS FROM A RACEMIC MIXTURE
    • C12P19/00Preparation of compounds containing saccharide radicals
    • C12P19/02Monosaccharides
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1089Design, preparation, screening or analysis of libraries using computer algorithms
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/62DNA sequences coding for fusion proteins
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/24Hydrolases (3) acting on glycosyl compounds (3.2)
    • C12N9/2402Hydrolases (3) acting on glycosyl compounds (3.2) hydrolysing O- and S- glycosyl compounds (3.2.1)
    • C12N9/2405Glucanases
    • C12N9/2434Glucanases acting on beta-1,4-glucosidic bonds
    • C12N9/2437Cellulases (3.2.1.4; 3.2.1.74; 3.2.1.91; 3.2.1.150)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y302/00Hydrolases acting on glycosyl compounds, i.e. glycosylases (3.2)
    • C12Y302/01Glycosidases, i.e. enzymes hydrolysing O- and S-glycosyl compounds (3.2.1)
    • C12Y302/01091Cellulose 1,4-beta-cellobiosidase (3.2.1.91)

Definitions

  • the present disclosure relates to biomolecular engineering and design, and engineered proteins and nucleic acids.
  • the disclosure provides a chimeric polypeptide comprising at least two domains from two different parental cellobiohydrolase II (CBH II) polypeptides, wherein the domains comprise from N- to C- terminus: (segment I)- (segment 2) -(segment 3) -(segment 4) -(segment 5) -(segment 6) -(segment 7) -(segment 8); wherein: segment 1 comprises a sequence that is at least 50-100% identical to amino acid residue from about 1 to about xi of SEQ ID NO: 2 ("1"), SEQ ID NO : 4 (“2") or SEQ ID NO : 6 ("3"); segment 2 comprises a sequence that is at least 50-100% identical to amino acid residue xi to about x 2 of SEQ ID NO: 2
  • segment 3 comprises a sequence that is at least 50-100% identical to amino acid residue x 2 to about X 3 of SEQ ID N0:2 ("1"), SEQ ID NO : 4 ("2") or SEQ ID NO : 6
  • segment 4 comprises a sequence that is at least 50-100% identical to amino acid residue x 3 to about x 4 of SEQ ID NO:2 ("1"), SEQ ID NO: 4 ("2") or SEQ ID NO : 6 ("3");
  • segment 5 comprises a sequence that is at least 50-100% identical to about amino acid residue x 4 to about x 5 of SEQ ID NO:2 ("1"), SEQ ID NO : 4 ("2") or SEQ ID NO : 6 (“3”);
  • segment 6 comprises a sequence that is at least 50- 100% identical to amino acid residue x 5 to about x 6 of SEQ ID NO: 2
  • segment 7 comprises a sequence that is at least 50-100% identical to amino acid residue x 6 to about X 7 of SEQ ID NO:2 ("1"), SEQ ID NO : 4 ("2") or SEQ ID NO : 6
  • segment 8 comprises a sequence that is at least 50-100% identical to amino acid residue x 7 to about x 8 of SEQ ID NO:2 ("1"), SEQ ID NO:4 ("2") or SEQ ID NO : 6 ("3"); wherein X 1 is residue 43, 44, 45, 46, or 47 of SEQ ID NO : 2 , or residue 42, 43, 44, 45, or 46 of SEQ ID NO:4 or SEQ ID NO : 6 ; X 2 is residue 70, 71, 72, 73, or 74 of SEQ ID NO:2, or residue 68, 69, 70, 71, 72, 73, or 74 of SEQ ID NO:4 or SEQ ID NO : 6 ; x 3 is residue 113, 114, 115, 116, 117 or 118 of SEQ ID NO:2, or residue 110, 111, 112, 113, 114, 115, or 116 of SEQ ID NO:4 or SEQ ID NO : 6 ; x 4 is residue residue 113,
  • segment 2 is from about amino acid residue xi to about x 2 of SEQ ID NO: 2 ("1"), SEQ ID NO: 4 (“2") or SEQ ID NO : 6 ("3") and having about 1-10 conservative amino acid substitutions
  • segment 3 is from about amino acid residue x 2 to about X 3 of SEQ ID NO:2 (“1"), SEQ ID NO : 4 ("2") or SEQ ID NO : 6
  • segment 4 is from about amino acid residue x 3 to about x 4 of SEQ ID NO:2 ("1"), SEQ ID NO : 4 ("2") or SEQ ID NO : 6 (“3") and having about 1-10 conservative amino acid substitutions
  • segment 5 is from about amino acid residue x 4 to about x 5 of SEQ ID NO: 2 (“1"), SEQ ID NO : 4
  • segment 6 is from about amino acid residue x 5 to about X 6 of SEQ ID NO:2 ("1"), SEQ ID NO : 4 (“2") or SEQ ID NO : 6
  • segment 7 is from about amino acid residue x 6 to about X7 of SEQ ID NO:2 ("1"), SEQ ID NO : 4 ("2") or SEQ ID NO : 6 ("3") and having about 1-10 conservative amino acid substitutions
  • segment 8 is from about amino acid residue x 7 to about x 8 of SEQ ID NO:2 ("1"), SEQ ID NO: 4 ("2") or SEQ ID NO : 6 ("3") and having about 1-10 conservative amino acid substitutions.
  • the chimeric polypeptide has at least one segment selected from the following: segment 1 from SEQ ID NO : 2 ; segment 6 from SEQ ID NO: 6, segment 7 from SEQ ID NO : 6 and segment 8 from SEQ ID NO: 4.
  • the chimeric polypeptide can be described as having segments IX 2 X 3 X 4 X 5 332, wherein X 2 comprises a sequence that is at least 50-100% identical to amino acid residue xi to about x 2 of SEQ ID NO:2 ("1"), SEQ ID NO : 4 ("2") or SEQ ID NO : 6 ("3"); X 3 comprises a sequence that is at least 50-100% identical to amino acid residue X 2 to about X 3 of SEQ ID NO:2 ("1"), SEQ ID NO : 4 (“2") or SEQ ID NO : 6
  • X4 comprises a sequence that is at least 50-100% identical to amino acid residue x 3 to about x 4 of SEQ ID NO: 2 ("1"), SEQ ID NO : 4
  • X 5 comprises a sequence that is at least 50-100% identical to about amino acid residue x 4 to about x 5 of SEQ ID NO:2 ("1"), SEQ ID NO : 4 ("2") or SEQ ID NO : 6 ("3") .
  • the chimeric polypeptide comprises a segment structure selected from the group consisting of 11113132, 21333331, 21311131, 22232132, 33133132, 33213332, 13333232, 12133333, 13231111, 11313121, 11332333, 12213111, 23311333, 13111313, 31311112, 23231222, 33123313, 22212231, 21223122, 21131311, 23233133, 31212111, 12222332 and 32333113.
  • the cimeric polypeptide comprises a segment structure selected from the group set forth in Table 1.
  • the disclosure also provides a polynucleotide encoding a polypeptide as described above.
  • One of skill can readily determine the exact sequence desired using the degeneracy of the genetic code, by reference to the amino acid sequences herein and by reference to the polynucleotide sequences herein.
  • the disclosure also provides a vector comprising a polynucleotide of the disclosure as well as host cells comprising a polynucleotide or vector of the disclosure.
  • the disclosure provides an enzymatic preparation comprising a polypeptide described above.
  • the disclosure also provides a method of treating a biomass comprising cellulose, the method comprising contacting the biomass with a chimeric polypeptide as described above. [0010] The disclosure provides a method of treating a biomass comprising cellulose, the method comprising contacting the biomass with a host cell comprising and expressing a polynucleotide and chimeric polypeptide of the disclosure, respectively.
  • the disclosure also provides a method of generating a thermostable chimeric cellobiohydrolase polypeptide, comprising recombining segments from at least 2 parental cellobiohydrolase polypeptide wherein the chimeric polypeptide comprises from N- to C- terminus 8 segments wherein: segment 1 comprises a sequence that is at least 50-100% identical to amino acid residue from about 1 to about Xi of SEQ ID NO:2 ("1"), SEQ ID NO : 4 (“2") or SEQ ID NO : 6 (“3"); segment 2 comprises a sequence that is at least 50-100% identical to amino acid residue xi to about x 2 of SEQ ID NO:2 ("1"), SEQ ID NO: 4 (“2”) or SEQ ID NO : 6 ("3"); segment 3 comprises a sequence that is at least 50-100% identical to amino acid residue x 2 to about X 3 of SEQ ID NO:2 ("1"), SEQ ID NO : 4 (“2”) or SEQ ID NO : 6 (“3”); segment 4 comprises
  • segment 7 comprises a sequence that is at least 50-100% identical to amino acid residue x 6 to about X 7 of SEQ ID NO:2 ("1"), SEQ ID NO : 4 ("2") or SEQ ID NO : 6
  • segment 8 comprises a sequence that is at least 50-100% identical to amino acid residue x 7 to about x 8 of SEQ ID NO:2 ("1"), SEQ ID NO:4 ("2") or SEQ ID NO : 6 ("3"); wherein X 1 is residue 43, 44, 45, 46, or 47 of SEQ ID NO : 2 , or residue 42, 43, 44, 45, or 46 of SEQ ID NO:4 or SEQ ID NO : 6 ; X 2 is residue 70, 71, 72, 73, or 74 of SEQ ID NO:2, or residue 68, 69, 70, 71, 72, 73, or 74 of SEQ ID NO:4 or SEQ ID NO : 6 ; x 3 is residue 113, 114, 115, 116, 117 or 118 of SEQ ID NO:2, or residue 110, 111, 112, 113, 114, 115, or 116 of SEQ ID NO:2 or SEQ ID NO : 3 ; x 4 is residue residue
  • Figure 1 shows SDS-PAGE gel of candidate CBH II parent gene yeast expression culture supernatants .
  • Gel Lanes (Left-to- Right) 1-H. jeco ⁇ na, 2- Empty vector, 3-H. msolens, 4-C. thermophilum, 5-H. jeco ⁇ na (duplicate), 6-P. chrysospo ⁇ um, 1-T. emersonii, 8-Empty vector (duplicate) , 9-H. jeco ⁇ na (triplicate) .
  • Numbers at bottom of gel represent concentration of reducing sugar
  • FIG. 2A-C shows illustrations of CBH II chimera library block boundaries.
  • B Linear representation of H. msolens catalytic domain showing secondary structure elements, disulfide bonds and block divisions denoted by black arrows.
  • C Sidechain contact map denoting contacts (side chain heavy atoms within 4.5A) that can be broken upon recombination. The majority of broken contacts occur between consecutive blocks .
  • Figure 3 shows a number of broken contacts (E) and number of mutations from closest parent (m) for 23 secreted/active and 25 not secreted/not active sample set chimeras.
  • Figure 4 shows specific activity, normalized to pH 5.0, as a function of pH for parent CBH II enzymes and three thermostable chimeras. Data presented are averages for two replicates, where error bars for HJPlus and H. jeco denote values for two independent trials. 16-hr reaction, 300 ug enzyme/g PASC, 50°C, 12.5 mM sodium citrate/12.5 mM sodium phosphate buffer at pH as shown. [0016] Figure 5 shows long-time cellulose hydrolysis assay results ( ug glucose reducing sugar equivalent/ug CBH II enzyme) for parents and thermostable chimeras across a range of temperatures. Error bars indicate standard errors for three replicates of HJPlus and H. msolens CBH II enzymes. 40-hr reaction, 100 ug enzyme/g PASC, 50 mM sodium acetate, pH 4.8.
  • Figure 6 shows normalized residual activities for validation set chimeras after a 12-h incubation at 63 °C. Residual activities for CBH II enzymes in concentrated culture supernatants determined in 2-hr assay with PASC as substrate, 50 °C, 25 mM sodium acetate buffer, pH 4.8.
  • FIG. 7 Map for parent and chimera CBH II enzyme expression vector Yep352/PGK91-1- ss.
  • Vector pictured contains wild type H. jeco ⁇ na cel ⁇ a (CBH II enzyme) gene.
  • CBD/linker amino acid sequence following the ss Lys-Arg Kex2 site is: ASCSSVWGQCGGQNWSGPTCCASGSTCVYSND YYSQCLPGAASSSSSTRAASTTSRVSPTTSRSSSATPPPGSTTTRVPPVGSGTATYS (SEQ ID NO: 8) .
  • thermostable CBH II enzymes fewer than 10 natural thermostable gene sequences are annotated in the CAZy database .
  • the majority of biomass conversion processes use mixtures of fungal cellulases (primarily CBH II, cellobiohydrolase class I (CBH I), endoglucanases and ⁇ -glucosidase) to achieve high levels of cellulose hydrolysis.
  • Generating a diverse group of thermostable CBH II enzyme chimeras is the first step in building an inventory of stable, highly active cellulases from which enzyme mixtures can be formulated and optimized for specific applications and feedstocks.
  • SCHEMA has been used previously to create families of hundreds of active ⁇ -lactamase and cytochrome P450 enzyme chimeras.
  • SCHEMA uses protein structure data to define boundaries of contiguous amino acid "blocks" which minimize ⁇ £>, the library average number of amino acid sidechain contacts that are broken when the blocks are swapped among different parents. It has been shown that the probability that a ⁇ -lactamase chimera was folded and active was inversely related to the value of E for that sequence.
  • the RASPP Recombination as Shortest Path Problem
  • More than 20% of the -500 unique chimeras characterized from a ⁇ -lactamase collection comprised of 8 blocks from 3 parents (3 8 6,561 possible sequences) were catalytically active.
  • thermostabilities of SCHEMA chimeras can be predicted based on sequence-stability data from a small sample of the sequences.
  • Linear regression modeling of thermal inactivation data for 184 cytochrome P450 chimeras showed that SCHEMA blocks made additive contributions to thermostability. More than 300 chimeras were predicted to be thermostable by this model, and all 44 that were tested were more stable than the most stable parent. It was estimated that as few as 35 thermostability measurements could be used to predict the most thermostable chimeras.
  • the thermostable P450 chimeras displayed unique activity and specificity profiles, demonstrating that chimeragenesis can lead to additional useful enzyme properties.
  • the disclosure demonstrates that SCHEMA recombination of CBH II enzymes can generate chimeric cellulases that are active on phosphoric acid swollen cellulose (PASC) at high temperatures, over extended periods of time, and broad ranges of pH.
  • PASC phosphoric acid swollen cellulose
  • a diverse family of novel CBH II enzymes was constructed by swapping blocks of sequence from three fungal CBH II enzymes. Twenty-three of 48 chimeric sequences sampled from this set were secreted in active form by S. cerevisiae, and five have half-lives at 63°C that were greater than the most stable parent. Given that this 48-member sample set represents less than 1% of the total possible 6,561 sequences, the disclosure provides hundreds of active chimeras, a number that extends well beyond the approximately twenty fungal CBH II enzymes in the CAZy database.
  • thermostability data was validated by finding that all 10 catalytically active chimeras in the second CBH II validation set were more thermostable than the most stable parent, a naturally-thermostable CBH II from the thermophilic fungus, H. msolens .
  • This disclosure demonstrates that a samle of 33 new CBH II enzymes that are expressed in catalytically active form in S. cerevisiae, 15 of which are more thermostable than the most stable parent from which they were constructed. These 15 thermostable enzymes are diverse in sequence, differing from each other and their closest natural homologs at as many as 94 and 58 amino acid positions, respectively.
  • thermostabilities of CBH II chimeras in the combined sample and validation sets indicates that the four thermostabilizing blocks identified, block 1 (i.e., domain 1), parent 1 (BlPl); block 6 (i.e., domain 6), parent 3 (B6P3) ; B7P3 and B8P2, make cumulative contributions to thermal stability when present in the same chimera.
  • block 1 i.e., domain 1
  • BlPl parent 1
  • B6P3 parent 3
  • B7P3 and B8P2 B7P3 and B8P2
  • msolens enzyme contain at least two stabilizing blocks, with five of the six most thermostable chimeras in this group containing either three or four stabilizing blocks.
  • Minimizing the number of broken contacts upon recombination allows the blocks to be approximated as decoupled units that make independent contributions to the stability of the entire protein, thus leading to cumulative or even additive contributions to chimera thermostability.
  • SCHEMA was effective in minimizing such broken contacts: whereas there are 303 total interblock contacts defined in the H. msolens parent CBH II crystal structure, the CBH II SCHEMA library design results in only 33 potential broken contacts.
  • jecorma CBH II declines sharply as pH increases above the optimum value of 5, HJPlus, created by substituting stabilizing blocks onto the most industrially relevant H. jecorma CBH II enzyme, retains significantly more activity at these higher pHs ( Figure 4) .
  • the narrow pH/activity profile of H. jecorma CBH II has been attributed to the deprotonation of several carboxyl-carboxylate pairs, which destabilizes the protein above a pH of about 6.
  • HJPlus exhibits both relatively high specific activity and high thermostability.
  • Figure 5 shows that these properties lead to good performance in long-time hydrolysis experiments: HJPlus hydrolyzed cellulose at temperatures 7-15 °C higher than the parent CBH II enzymes and also had a significantly increased long-time activity relative to all the parents at their temperature optima, bettering H. jeco ⁇ na CBH II by a factor of 1.7.
  • the specific activity of the HJPlus chimera is less than that of the H. jeco ⁇ na CBH II parent, this increased long-time activity can be attributed to the ability of the thermostable HJPlus to retain activity at optimal hydrolysis temperatures over longer reaction timer .
  • thermostable chimeras shared HJPlus' s broad temperature operating range. This observation supports a positive correlation between ti /2 at elevated temperature and maximum operating temperature, and suggests that many of the thermostable chimeras among the 6,561 CBH II chimera sequences will also be capable of degrading cellulose at elevated temperatures. While this ability to hydrolyze the amorphous PASC substrate at elevated temperatures bodes well for the potential utility of thermostable fungal CBH II chimeras, studies with more challenging crystalline substrates and substrates containing lignin will provide a more complete assessment of this novel CBH II enzyme family's relevance to biomass degradation applications.
  • amino acid is a molecule having the structure wherein a central carbon atom is linked to a hydrogen atom, a carboxylic acid group (the carbon atom of which is referred to herein as a “carboxyl carbon atom”), an amino group (the nitrogen atom of which is referred to herein as an "amino nitrogen atom"), and a side chain group, R.
  • carboxylic acid group the carbon atom of which is referred to herein as a "carboxyl carbon atom”
  • amino group the nitrogen atom of which is referred to herein as an "amino nitrogen atom”
  • R side chain group
  • Protein refers to any polymer of two or more individual amino acids (whether or not naturally occurring) linked via a peptide bond.
  • the term “protein” is understood to include the terms “polypeptide” and “peptide” (which, at times may be used interchangeably herein) within its meaning.
  • proteins comprising multiple polypeptide subunits e.g., DNA polymerase III, RNA polymerase II
  • other components for example, an RNA molecule, as occurs in telomerase
  • fragments of proteins and polypeptides are also within the scope of the disclosure and may be referred to herein as "proteins.”
  • a stabilized protein comprises a chimera of two or more parental peptide segments .
  • Peptide segment or "peptide domain” refers to a portion or fragment of a larger polypeptide or protein.
  • a peptide segment or domain need not on its own have functional activity, although in some instances, a peptide segment or domain may correspond to a segment or domain of a polypeptide wherein the segment or domain has its own biological activity.
  • a stability- associated peptide segment or domain is a peptide segment or domain found in a polypeptide that promotes stability, function, or folding compared to a related polypeptide lacking the peptide segment.
  • a destabilizing-associated peptide segment is a peptide segment that is identified as causing a loss of stability, function or folding when present in a polypeptide.
  • BlPl, B6P3, B7P3 and B8P2 are segments/domains that promote thermostability in a chimeric polypeptide of the disclosure.
  • a chimera has at least 1, 2, 3, or 4 thermostabilizing segments.
  • the disclosure provides chimeras that comprise at least 8 domains (i.e., B1-B2-B3-B4-B5-B6-B7-B8) comprising 1, 2, 3 or 4 domains comprising sequences that are at least 80-100% identical to a sequence selected from the group consisting of amino acid residue from about 1 to about xi of SEQ ID NO: 2; from about amino acid residue x 5 to about x 6 of SEQ ID NO: 6; about amino acid residue x 6 to about X 7 of SEQ ID NO: 6; and about amino acid residue x 7 to about x 8 of SEQ ID NO:4; wherein: X 1 is residue 43, 44, 45, 46, or 47 of SEQ ID NO:2, X 5 is residue 216, 217, 218, 219, 220, 221, 222 or 223 of SEQ ID NO:6; x 6 is residue 253, 254, 255, 256, 257, 258, 259 or 260 of SEQ ID NO:6; x 7 is residue
  • a particular amino acid sequence of a given protein ⁇ i.e., the polypeptide's "primary structure," when written from the amino-terminus to carboxy-terminus) is determined by the nucleotide sequence of the coding portion of a mRNA, which is in turn specified by genetic information, typically genomic DNA (including organelle DNA, e.g., mitochondrial or chloroplast DNA) .
  • genomic DNA including organelle DNA, e.g., mitochondrial or chloroplast DNA
  • “Fused,” “operably linked,” and “operably associated” are used interchangeably herein to broadly refer to a chemical or physical coupling of two otherwise distinct domains or peptide segments, wherein each domain or peptide segment when operably linked can provide a functional polypeptide having a desired activity. Domains or peptide segments can be directly linked or connected through peptide linkers such that they are functional or can be fused through other intermediates or chemical bonds. For example, two domains can be part of the same coding sequence, wherein the polynucleotides are in frame such that the polynucleotide when transcribed encodes a single mRNA that when translated comprises both domains as a single polypeptide.
  • both domains can be separately expressed as individual polypeptides and fused to one another using chemical methods.
  • the coding domains will be linked "in-frame" either directly of separated by a peptide linker and encoded by a single polynucleotide.
  • Various coding sequences for peptide linkers and peptide are known in the art.
  • Polynucleotide or “nucleic acid sequence” refers to a polymeric form of nucleotides. In some instances a polynucleotide refers to a sequence that is not immediately contiguous with either of the coding sequences with which it is immediately contiguous (one on the 5' end and one on the 3' end) in the naturally occurring genome of the organism from which it is derived.
  • the term therefore includes, for example, a recombinant DNA which is incorporated into a vector; into an autonomously replicating plasmid or virus; or into the genomic DNA of a prokaryote or eukaryote, or which exists as a separate molecule (e.g., a cDNA) independent of other sequences.
  • the nucleotides of the disclosure can be ribonucleotides, deoxyribonucleotides, or modified forms of either nucleotide.
  • a polynucleotides as used herein refers to, among others, single-and double-stranded DNA, DNA that is a mixture of single- and double- stranded regions, single- and double-stranded RNA, and RNA that is mixture of single- and double-stranded regions, hybrid molecules comprising DNA and RNA that may be single-stranded or, more typically, double-stranded or a mixture of single- and double- stranded regions.
  • the term polynucleotide encompasses genomic DNA or RNA (depending upon the organism, i.e., RNA genome of viruses), as well as mRNA encoded by the genomic DNA, and cDNA.
  • Nucleic acid segment refers to a portion of a larger polynucleotide molecule.
  • the polynucleotide segment need not correspond to an encoded functional domain of a protein; however, in some instances the segment will encode a functional domain of a protein.
  • a polynucleotide segment can be about 6 nucleotides or more in length (e.g., 6-20, 20-50, 50-100, 100-200, 200-300, 300-400 or more nucleotides in length) .
  • a stability-associated peptide segment can be encoded by a stability-associated polynucleotide segment, wherein the peptide segment promotes stability, function, or folding compared to a polypeptide lacking the peptide segment.
  • "Chimera” refers to a combination of at least two segments or domains of at least two different parent proteins or polypeptides. As appreciated by one of skill in the art, the segments need not actually come from each of the parents, as it is the particular sequence that is relevant, and not the physical nucleic acids themselves. For example, a chimeric fungal class II cellobiohydrolases (CBH II cellulases) will have at least two segments from two different parent CBH II polypeptides.
  • CBH II cellulases chimeric fungal class II cellobiohydrolases
  • a chimeric polypeptide can comprise more than two segments from two different parent proteins. For example, there may be 2, 3, 4, 5-10, 10-20, or more parents for each final chimera or library of chimeras.
  • the segment of each parent polypeptide can be very short or very long, the segments can range in length of contiguous amino acids from 1 to about 90%, 95%, 98%, or 99% of the entire length of the protein. In one embodiment, the minimum length is 10 amino acids. In one embodiment, a single crossover point is defined for two parents.
  • the crossover location defines where one parent's amino acid segment will stop and where the next parent's amino acid segment will start.
  • a simple chimera would only have one crossover location where the segment before that crossover location would belong to a first parent and the segment after that crossover location would belong to a second parent.
  • the chimera has more than one crossover location. For example, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11-30, or more crossover locations. How these crossover locations are named and defined are both discussed below.
  • Contiguous is meant to denote that there is nothing of significance interrupting the segments. These contiguous segments are connected to form a contiguous amino acid sequence.
  • a CBH II chimera from Humicola msolens (hereinafter “1”) and H. jecori (hereinafter “2”) with two crossovers at 100 and 150, could have the first 100 amino acids from 1, followed by the next 50 from 2, followed by the remainder of the amino acids from 1, all connected in one contiguous amino acid chain.
  • the CBH II chimera could have the first 100 amino acids from 2, the next 50 from 1 and the remainder followed by 2.
  • variants of chimeras exist as well as the exact sequences.
  • Constant amino acid substitution refers to the interchangeability of residues having similar side chains, and thus typically involves substitution of the amino acid in the polypeptide with amino acids within the same or similar defined class of amino acids.
  • an amino acid with an aliphatic side chain may be substituted with another aliphatic amino acid, e.g., alanine, valine, leucine, isoleucine, and methionine;
  • an amino acid with hydroxyl side chain is substituted with another amino acid with a hydroxyl side chain, e.g., serine and threonine;
  • an amino acids having aromatic side chains is substituted with another amino acid having an aromatic side chain, e.g., phenylalanine, tyrosine, tryptophan, and histidine;
  • an amino acid with a basic side chain is substituted with another amino acid with a basis side chain, e.g., lysine, arginine, and histidine;
  • Non-conservative substitution refers to substitution of an amino acid in the polypeptide with an amino acid with significantly differing side chain properties. Non-conservative substitutions may use amino acids between, rather than within, the defined groups and affects (a) the structure of the peptide backbone in the area of the substitution (e.g., proline for glycine) (b) the charge or hydrophobicity, or (c) the bulk of the side chain.
  • an exemplary non-conservative substitution can be an acidic amino acid substituted with a basic or aliphatic amino acid; an aromatic amino acid substituted with a small amino acid; and a hydrophilic amino acid substituted with a hydrophobic amino acid.
  • isolated polypeptide refers to a polypeptide which is separated from other contaminants that naturally accompany it, e.g., protein, lipids, and polynucleotides.
  • the term embraces polypeptides which have been removed or purified from their naturally-occurring environment or expression system (e.g., host cell or in vitro synthesis) .
  • substantially pure polypeptide refers to a composition in which the polypeptide species is the predominant species present (i.e., on a molar or weight basis it is more abundant than any other individual macromolecular species in the composition) , and is generally a substantially purified composition when the object species comprises at least about 50 percent of the macromolecular species present by mole or % weight.
  • a substantially pure polypeptide composition will comprise about 60% or more, about 70% or more, about 80% or more, about 90% or more, about 95% or more, and about 98% or more of all macromolecular species by mole or % weight present in the composition.
  • the object species is purified to essential homogeneity ⁇ i.e., contaminant species cannot be detected in the composition by conventional detection methods) wherein the composition consists essentially of a single macromolecular species.
  • Solvent species, small molecules ( ⁇ 500 Daltons) , and elemental ion species are not considered macromolecular species.
  • Reference sequence refers to a defined sequence used as a basis for a sequence comparison.
  • a reference sequence may be a subset of a larger sequence, for example, a segment of a full-length gene or polypeptide sequence.
  • a reference sequence can be at least 20 nucleotide or amino acid residues in length, at least 25 nucleotide or residues in length, at least 50 nucleotides or residues in length, or the full length of the nucleic acid or polypeptide.
  • sequence comparisons between two (or more) polynucleotides or polypeptides are typically performed by comparing sequences of the two polynucleotides or polypeptides over a "comparison window" to identify and compare local regions of sequence similarity.
  • sequence identity means that two amino acid sequences are substantially identical (e.g., on an amino acid-by-amino acid basis) over a window of comparison.
  • sequence similarity refers to similar amino acids that share the same biophysical characteristics.
  • percentage of sequence identity or “percentage of sequence similarity” is calculated by comparing two optimally aligned sequences over the window of comparison, determining the number of positions at which the identical residues (or similar residues) occur in both polypeptide sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the window of comparison (i.e., the window size), and multiplying the result by 100 to yield the percentage of sequence identity (or percentage of sequence similarity) .
  • sequence identity and sequence similarity have comparable meaning as described for protein sequences, with the term “percentage of sequence identity” indicating that two polynucleotide sequences are identical (on a nucleotide-by-nucleotide basis) over a window of comparison.
  • a percentage of polynucleotide sequence identity or percentage of polynucleotide sequence similarity, e.g., for silent substitutions or other substitutions, based upon the analysis algorithm
  • Maximum correspondence can be determined by using one of the sequence algorithms described herein (or other algorithms available to those of ordinary skill in the art) or by visual inspection.
  • the term substantial identity or substantial similarity means that two peptide sequences, when optimally aligned, such as by the programs BLAST, GAP or BESTFIT using default gap weights or by visual inspection, share sequence identity or sequence similarity.
  • substantial identity or substantial similarity means that the two nucleic acid sequences, when optimally aligned, such as by the programs BLAST, GAP or BESTFIT using default gap weights (described elsewhere herein) or by visual inspection, share sequence identity or sequence similarity.
  • One example of an algorithm that is suitable for determining percent sequence identity or sequence similarity is the FASTA algorithm, which is described in Pearson, W. R.
  • PILEUP creates a multiple sequence alignment from a group of related sequences using progressive, pairwise alignments to show relationship and percent sequence identity or percent sequence similarity. It also plots a tree or dendogram showing the clustering relationships used to create the alignment. PILEUP uses a simplification of the progressive alignment method of Feng & Doolittle, (1987) J. MoI. Evol . 35:351-360. The method used is similar to the method described by Higgins & Sharp, CABIOS 5:151- 153, 1989. The program can align up to 300 sequences, each of a maximum length of 5,000 nucleotides or amino acids.
  • the multiple alignment procedure begins with the pairwise alignment of the two most similar sequences, producing a cluster of two aligned sequences. This cluster is then aligned to the next most related sequence or cluster of aligned sequences. Two clusters of sequences are aligned by a simple extension of the pairwise alignment of two individual sequences. The final alignment is achieved by a series of progressive, pairwise alignments.
  • the program is run by designating specific sequences and their amino acid or nucleotide coordinates for regions of sequence comparison and by designating the program parameters.
  • PILEUP a reference sequence is compared to other test sequences to determine the percent sequence identity (or percent sequence similarity) relationship using the following parameters: default gap weight (3.00), default gap length weight (0.10), and weighted end gaps.
  • PILEUP can be obtained from the GCG sequence analysis software package, e.g., version 7.0 (Devereaux et al . , (1984) Nuc . Acids Res. 12:387-395) .
  • Another example of an algorithm that is suitable for multiple DNA and amino acid sequence alignments is the CLUSTALW program (Thompson, J. D. et al . , (1994) Nuc. Acids Res. 22:4673- 4680) .
  • CLUSTALW performs multiple pairwise comparisons between groups of sequences and assembles them into a multiple alignment based on sequence identity. Gap open and Gap extension penalties were 10 and 0.05 respectively.
  • the BLOSUM algorithm can be used as a protein weight matrix (Henikoff and Henikoff, (1992) Proc . Natl. Acad. Sci . USA 89:10915-10919) .
  • “Functional” refers to a polypeptide which possesses either the native biological activity of the naturally-produced proteins of its type, or any specific desired activity, for example as judged by its ability to bind to ligand molecules or carry out an enzymatic reaction.
  • the disclosure describes a directed SCHEMA recombination library to generate cellobiohydrolase enzymes based on a particularly members of this enzyme family, and more particularly cellobiohydrolase II enzymes (e.g., H. msolens is parent “1" (SEQ ID NO:2), H. jeco ⁇ na is parent “2" (SEQ ID NO:4) and C. thermophilum is parent “3" (SEQ ID NO: 6)) .
  • SCHEMA is a computational based method for predicting which fragments of related proteins can be recombined without affecting the structural integrity of the protein (see, e.g., Meyer et al . , (2003) Protein Sci., 12:1686-1693) .
  • the disclosure provides CBH II polypeptides comprising a chimera of parental domains.
  • the polypeptide comprises a chimera having a plurality of domains from N- to C-terminus from different parental CBH II proteins: (segment I)- (segment 2) -(segment 3) -(segment 4) -(segment 5) -(segment 6) -(segment 7) -(segment 8) ; wherein segment 1 comprises amino acid residue from about 1 to about Xi of SEQ ID NO:2 ("1"), SEQ ID NO : 4 (“2") or SEQ ID NO : 6
  • segment 2 is from about amino acid residue xi to about x 2 of SEQ ID NO:2 ("1"), SEQ ID NO : 4 ("2") or SEQ ID NO : 6 ("3"); segment 3 is from about amino acid residue x 2 to about X3 of SEQ ID NO:2 ("1"), SEQ ID NO: 4 ("2”) or SEQ ID NO : 6 ("3”); segment 4 is from about amino acid residue x 3 to about x 4 of SEQ ID NO: 2 ("1"), SEQ ID NO : 4
  • segment 5 is from about amino acid residue x 4 to about x 5 of SEQ ID NO:2 ("1"), SEQ ID NO : 4 ("2") or SEQ ID NO : 6 ("3”); segment 6 is from about amino acid residue x 5 to about X 6 of SEQ ID NO:2 (“1"), SEQ ID NO : 4 ("2") or SEQ ID NO : 6
  • each digit refers to a domain of a chimeric CBH II polypeptide.
  • the number denotes the parental strand of the domain.
  • a chimeric CBH II chimeric polypeptide having the sequence 12111131 indicates that the polypeptide comprises a sequence from the N-terminus to the C-terminus of: amino acids from about 1 to Xi of SEQ ID NO: 2 ("1") linked to amino acids from about xi to x 2 of SEQ ID NO: 4 (“2") linked to amino acids from about x 2 to about x 3 of SEQ ID NO: 2 linked to amino acids from about X 3 to about x 4 of SEQ ID NO: 2 linked to amino acids from about X 4 to about X 5 of SEQ ID NO: 2 linked to amino acids from about x 5 to about X 6 of SEQ ID NO : 2 linked to amino acids from about x 6 to x 7 of SEQ ID NO : 6 (“3") linked to amino acids from about x 7 to x 8 (
  • the CBH II polypeptide has a chimeric segment structure selected from the group consisting of:
  • the polypeptide has improved thermostability compared to a wild-type polypeptide of SEQ ID NO: 2,
  • the activity of the polypeptide can be measured with any one or combination of substrates as described in the examples. As will be apparent to the skilled artisan, other compounds within the class of compounds exemplified by those discussed in the examples can be tested and used.
  • the polypeptide can comprise various changes to the amino acid sequence with respect to a reference sequence.
  • the changes can be a substitution, deletion, or insertion of one or more amino acids.
  • the change can be a conservative or a non-conservative substitution.
  • a chimera may comprise a combination of conservative and non-conservative substitutions.
  • polypeptides can comprise a general structure from N-terminus to C-terminus:
  • segment 1 comprises amino acid residue from about 1 to about Xi of SEQ ID NO:2 ("1"), SEQ ID NO : 4 ("2") or SEQ ID NO : 6 ("3") and having 1-10 conservative amino acid substitutions
  • segment 2 is from about amino acid residue Xi to about x 2 of SEQ ID NO: 2 ("1"), SEQ ID NO:4 (“2") or SEQ ID NO : 6 ("3") and having about 1-10 conservative amino acid substitutions
  • segment 3 is from about amino acid residue x 2 to about x 3 of SEQ ID NO: 2 ("1"), SEQ ID NO: 4 (“2”) or SEQ ID NO : 6 (“3") and having about 1-10 conservative amino acid substitutions
  • segment 4 is from about amino acid residue x 3 to about X 4 of SEQ ID NO:2 (“1"), SEQ ID NO : 4 (“2”) or SEQ ID NO : 6 (“3") and having about 1-10 conservative amino acid substitutions
  • segment 5 comprises amino acid residue from about 1 to about Xi of SEQ ID NO
  • segment 7 is from about amino acid residue x 6 to about X 7 of SEQ ID NO:2 ("1"), SEQ ID NO : 4 (“2") or SEQ ID NO : 6
  • segment 8 is from about amino acid residue x 7 to about x 8 of SEQ ID NO:2 ("1"), SEQ ID NO : 4 ("2") or SEQ ID NO : 6 ("3") and having about 1-10 conservative amino acid substitutions;
  • X 1 is residue 43, 44, 45, 46, or 47 of SEQ ID NO:2, or residue 42, 43, 44, 45, or 46 of SEQ ID NO : 4 or SEQ ID NO: 6;
  • X 2 is residue 70, 71, 72, 73, or 74 of SEQ ID NO:2, or residue 68, 69, 70, 71, 72, 73, or 74 of SEQ ID NO : 4 or SEQ ID NO: 6;
  • X 3 is residue 113, 114, 115, 116, 117 or 118 of SEQ ID NO : 2 , or residue 110, 111, 112, 113, 114, 115, or 116 of SEQ ID NO : 4 or SEQ ID NO
  • the number of substitutions can be 2, 3, 4, 5, 6, 8, 9, or 10, or more amino acid substitutions (e.g., 10-20, 21-30, 31-40 and the like amino acid substitutions) .
  • the functional chimera polypeptides can have cellobiohydrolase activity along with increased thermostability, such as for a defined substrate discussed in the Examples, and also have a level of amino acid sequence identity to a reference cellobiohydrolase, or segments thereof.
  • the reference enzyme or segment can be that of a wild-type (e.g., naturally occurring) or an engineered enzyme.
  • the polypeptides of the disclosure can comprise a general structure from N-terminus to C-terminus: wherein segment 1 comprises a sequence that is at least 50- 100% identity to amino acid residue from about 1 to about xi of SEQ ID NO:2 ("1"), SEQ ID NO : 4 ("2") or SEQ ID NO : 6 ("3"); segment 2 comprises a sequence that is at least 50-100% identity to amino acid residue xi to about x 2 of SEQ ID NO:2 ("1"), SEQ ID NO : 4 (“2") or SEQ ID NO : 6 ("3"); segment 3 comprises a sequence that is at least 50- 100% identity to amino acid residue x 2 to about x 3 of SEQ ID NO : 2
  • segment 4 comprises a sequence that is at least 50-100% identity to amino acid residue X3 to about X 4 of SEQ ID NO:2 ("1"), SEQ ID NO : 4 ("2") or SEQ ID NO : 6
  • segment 5 comprises a sequence that is at least 50-100% identity to about amino acid residue x 4 to about x 5 of SEQ ID NO: 2
  • segment 6 comprises a sequence that is at least 50-100% identity to amino acid residue x 5 to about X 6 of SEQ ID NO:2 ("1"), SEQ ID NO : 4 ("2") or SEQ ID NO : 6
  • segment 7 comprises a sequence that is at least 50-100% identity to amino acid residue x 6 to about x 7 of SEQ ID NO:2 ("1"), SEQ ID NO: 4 (“2") or SEQ ID NO : 6 (“3”); and segment 8 comprises a sequence that is at least 50-100% identity to amino acid residue x 7 to about X 8 of SEQ ID NO:2 (“1"), SEQ ID NO : 4 ("2") or SEQ ID NO : 6
  • X 1 is residue 43, 44, 45, 46, or 47 of SEQ ID NO:2, or residue 42, 43, 44, 45, or 46 of SEQ ID NO : 4 or SEQ ID NO: 6
  • X 2 is residue 70, 71, 72, 73, or 74 of SEQ ID NO:2, or residue 68, 69, 70, 71, 72, 73, or 74 of SEQ ID NO : 4 or SEQ ID NO: 6
  • X 3 is residue 113, 114, 115, 116, 117 or 118 of SEQ ID NO : 2 , or residue 110, 111, 112, 113, 114, 115, or 116 of SEQ ID NO : 4 or SEQ ID NO : 6
  • X 4 is residue 153, 154, 155, 156, or 157 of SEQ ID NO:2, or residue 149, 150, 151, 152, 153, 154, 155 or 156 of SEQ ID NO: 4 or SEQ ID NO:
  • each segment of the chimeric polypeptide can have at least 60%, 70%, 80%, 90%, 95%, 96%, 97%, 98%, or 99% or more sequence identity as compared to the reference segment indicated for each of the (segment 1), (segment 2), (segment 3), (segment 4) -(segment 5), (segment 6), (segment 7), and (segment 8) of SEQ ID NO:2, SEQ ID NO : 4 , or SEQ ID NO : 6.
  • the polypeptide variants can have improved thermostability compared to the enzyme activity of the wild-type polypeptide of SEQ ID NO: 2, 4, or 6.
  • the chimeric enzymes described herein may be prepared in various forms, such as lysates, crude extracts, or isolated preparations.
  • the polypeptides can be dissolved in suitable solutions; formulated as powders, such as an acetone powder (with or without stabilizers); or be prepared as lyophilizates .
  • the polypeptide can be an isolated polypeptide.
  • the polypeptides can be in the form of arrays.
  • the enzymes may be in a soluble form, for example, as solutions in the wells of mircotitre plates, or immobilized onto a substrate.
  • the substrate can be a solid substrate or a porous substrate (e.g, membrane), which can be composed of organic polymers such as polystyrene, polyethylene, polypropylene, polyfluoroethylene, polyethyleneoxy, and polyacrylamide, as well as co-polymers and grafts thereof.
  • a solid support can also be inorganic, such as glass, silica, controlled pore glass (CPG) , reverse phase silica or metal, such as gold or platinum.
  • CPG controlled pore glass
  • the configuration of a substrate can be in the form of beads, spheres, particles, granules, a gel, a membrane or a surface. Surfaces can be planar, substantially planar, or non-planar.
  • Solid supports can be porous or non-porous, and can have swelling or non-swelling characteristics.
  • a solid support can be configured in the form of a well, depression, or other container, vessel, feature, or location.
  • a plurality of supports can be configured on an array at various locations, addressable for robotic delivery of reagents, or by detection methods and/or instruments.
  • the disclosure also provides polynucleotides encoding the engineered CBH II polypeptides disclosed herein.
  • the polynucleotides may be operatively linked to one or more heterologous regulatory or control sequences that control gene expression to create a recombinant polynucleotide capable of expressing the polypeptide.
  • Expression constructs containing a heterologous polynucleotide encoding the CBH II chimera can be introduced into appropriate host cells to express the polypeptide.
  • the polynucleotide sequences will be apparent form the amino acid sequence of the engineered CBH II chimera enzymes to one of skill in the art and with reference to the polypeptide sequences and nucleic acid sequence described herein.
  • the knowledge of the codons corresponding to various amino acids coupled with the knowledge of the amino acid sequence of the polypeptides allows those skilled in the art to make different polynucleotides encoding the polypeptides of the disclosure.
  • the disclosure contemplates each and every possible variation of the polynucleotides that could be made by selecting combinations based on possible codon choices, and all such variations are to be considered specifically disclosed for any of the polypeptides described herein.
  • the polynucleotides encode the polypeptides described herein but have about 80% or more sequence identity, about 85% or more sequence identity, about 90% or more sequence identity, about 91% or more sequence identity, about 92% or more sequence identity, about 93% or more sequence identity, about 94% or more sequence identity, about 95% or more sequence identity, about 96% or more sequence identity, about 97% or more sequence identity, about 98% or more sequence identity, or about 99% or more sequence identity at the nucleotide level to a reference polynucleotide encoding the CBH II chimera polypeptides.
  • the isolated polynucleotides encoding the polypeptides may be manipulated in a variety of ways to provide for expression of the polypeptide. Manipulation of the isolated polynucleotide prior to its insertion into a vector may be desirable or necessary depending on the expression vector.
  • the techniques for modifying polynucleotides and nucleic acid sequences utilizing recombinant DNA methods are well known in the art. Guidance is provided in Sambrook et al . , 2001, Molecular Cloning: A Laboratory Manual, 3rd Ed., Cold Spring Harbor Laboratory Press; and Current Protocols in Molecular Biology, Ausubel. F. ed., Greene Pub. Associates, 1998, updates to 2007.
  • the polynucleotides are operatively linked to control sequences for the expression of the polynucleotides and/or polypeptides.
  • the control sequence may be an appropriate promoter sequence, which can be obtained from genes encoding extracellular or intracellular polypeptides, either homologous or heterologous to the host cell.
  • suitable promoters for directing transcription of the nucleic acid constructs of the present disclosure include the promoters obtained from the E. coli lac operon, Bacillus subtilis xylA and xylB genes, Bacillus megatarium xylose utilization genes (e.g.,Rygus et al., (1991) Appl .
  • control sequence may also be a suitable transcription terminator sequence, a sequence recognized by a host cell to terminate transcription.
  • the terminator sequence is operably linked to the 3' terminus of the nucleic acid sequence encoding the polypeptide. Any terminator which is functional in the host cell of choice may be used.
  • control sequence may also be a suitable leader sequence, a nontranslated region of an mRNA that is important for translation by the host cell.
  • the leader sequence is operably linked to the 5' terminus of the nucleic acid sequence encoding the polypeptide. Any leader sequence that is functional in the host cell of choice may be used.
  • control sequence may also be a signal peptide coding region that codes for an amino acid sequence linked to the amino terminus of a polypeptide and directs the encoded polypeptide into the cell's secretory pathway.
  • the 5' end of the coding sequence of the nucleic acid sequence may inherently contain a signal peptide coding region naturally linked in translation reading frame with the segment of the coding region that encodes the secreted polypeptide.
  • the 5' end of the coding sequence may contain a signal peptide coding region that is foreign to the coding sequence. The foreign signal peptide coding region may be required where the coding sequence does not naturally contain a signal peptide coding region.
  • Effective signal peptide coding regions for bacterial host cells can be the signal peptide coding regions obtained from the genes for Bacillus NClB 11837 maltogenic amylase, Bacillus stearothermophilus alpha-amylase, Bacillus licheniformis subtilism, Bacillus licheniformis beta- lactamase, Bacillus stearothermophilus neutral proteases (nprT, nprS, nprM) , and Bacillus subtilis prsA. Further signal peptides are described by Simonen and Palva, (1993) Microbiol Rev 57: 109- 137.
  • the disclosure is further directed to a recombinant expression vector comprising a polynucleotide encoding the engineered CBH II chimera polypeptides, and one or more expression regulating regions such as a promoter and a terminator, a replication origin, etc., depending on the type of hosts into which they are to be introduced.
  • the coding sequence is located in the vector so that the coding sequence is operably linked with the appropriate control sequences for expression .
  • the recombinant expression vector may be any vector (e.g., a plasmid or virus), which can be conveniently subjected to recombinant DNA procedures and can bring about the expression of the polynucleotide sequence.
  • the choice of the vector will typically depend on the compatibility of the vector with the host cell into which the vector is to be introduced.
  • the vectors may be linear or closed circular plasmids .
  • the expression vector may be an autonomously replicating vector, i.e., a vector that exists as an extrachromosomal entity, the replication of which is independent of chromosomal replication, e.g., a plasmid, an extrachromosomal element, a minichromosome, or an artificial chromosome.
  • the vector may contain any means for assuring self-replication.
  • the vector may be one which, when introduced into the host cell, is integrated into the genome and replicated together with the chromosome (s) into which it has been integrated.
  • a single vector or plasmid or two or more vectors or plasmids which together contain the total DNA to be introduced into the genome of the host cell, or a transposon may be used.
  • the expression vector of the disclosure contains one or more selectable markers, which permit easy selection of transformed cells.
  • a selectable marker is a gene the product of which provides for biocide or viral resistance, resistance to heavy metals, prototrophy to auxotrophs, and the like.
  • Examples of bacterial selectable markers are the dal genes from Bacillus subtilis or Bacillus licheniformis, or markers, which confer antibiotic resistance such as ampicillin, kanamycin, chloramphenicol or tetracycline resistance. Other useful markers will be apparent to the skilled artisan.
  • the disclosure provides a host cell comprising a polynucleotide encoding the CBH II chimera polypeptide, the polynucleotide being operatively linked to one or more control sequences for expression of the polypeptide in the host cell.
  • Host cells for use in expressing the polypeptides encoded by the expression vectors of the disclosure are well known in the art and include, but are not limited to, bacterial cells, such as E.
  • coli and Bacillus megate ⁇ um eukaryotic cells, such as yeast cells, CHO cells and the like, insect cells such as Drosophila S2 and Spodoptera Sf9 cells; animal cells such as CHO, COS, BHK, 293, and Bowes melanoma cells; and plant cells.
  • eukaryotic cells such as yeast cells, CHO cells and the like, insect cells such as Drosophila S2 and Spodoptera Sf9 cells
  • animal cells such as CHO, COS, BHK, 293, and Bowes melanoma cells
  • plant cells eukaryotic cells, such as yeast cells, CHO cells and the like, insect cells such as Drosophila S2 and Spodoptera Sf9 cells
  • animal cells such as CHO, COS, BHK, 293, and Bowes melanoma cells
  • plant cells eukaryotic cells, such as yeast cells, CHO cells and the like, insect cells such as Drosophila S2 and Spodoptera Sf
  • the CBH II chimera polypeptides of the disclosure can be made by using methods well known in the art.
  • Polynucleotides can be synthesized by recombinant techniques, such as that provided in Sambrook et al . , 2001, Molecular Cloning: A Laboratory Manual, 3rd Ed., Cold Spring Harbor Laboratory Press; and Current Protocols in Molecular Biology, Ausubel. F. ed., Greene Pub. Associates, 1998, updates to 2007.
  • Polynucleotides encoding the enzymes, or the primers for amplification can also be prepared by standard solid- phase methods, according to known synthetic methods, for example using phosphoramidite method described by Beaucage et al .
  • nucleic acid can be obtained from any of a variety of commercial sources, such as The Midland Certified Reagent Company, Midland, TX, The Great American Gene Company, Ramona, CA, ExpressGen Inc. Chicago, IL, Operon Technologies Inc., Alameda, CA, and many others.
  • Engineered enzymes expressed in a host cell can be recovered from the cells and or the culture medium using any one or more of the well known techniques for protein purification, including, among others, lysozyme treatment, sonication, filtration, salting-out, ultra-centrifugation, chromatography, and affinity separation (e.g., substrate bound antibodies) .
  • Suitable solutions for lysing and the high efficiency extraction of proteins from bacteria, such as E. coli are commercially available under the trade name CelLytic BTM from Sigma-Aldrich of St. Louis MO.
  • Chromatographic techniques for isolation of the polypeptides include, among others, reverse phase chromatography high performance liquid chromatography, ion exchange chromatography, gel electrophoresis, and affinity chromatography. Conditions for purifying a particular enzyme will depend, in part, on factors such as net charge, hydrophobicity, hydrophilicity, molecular weight, molecular shape, etc., and will be apparent to those having skill in the art. [0084] Descriptions of SCHEMA directed recombination and synthesis of chimeric polypeptides are described in the examples herein, as well as in Otey et al . , (2006), PLoS Biol. 4(5) :ell2; Meyer et al . , (2003) Protein Sci .
  • polypeptide can be used in a variety of applications, such as, among others, biofuel generation, cellulose breakdown and the like.
  • a method for processing cellulose includes culturing a recombinant microorganism as provided herein that expresses a chimeric polypeptide of the disclosure in the presence of a suitable cellulose substrate and under conditions suitable for the catalysis by the chimeric polypeptide of the cellulose.
  • a substantially purified chimeric polypeptide of the disclosure is contacted with a cellulose substrate under conditions that allow for the chimeric polypeptide degrade the cellulose.
  • the conditions include temperatures from about 35-65 °C.
  • RNA polymerase mediated techniques e.g., NASBA
  • Appropriate culture conditions are conditions of culture medium pH, ionic strength, nutritive content, etc.; temperature; oxygen/C0 2 /nitrogen content; humidity; and other culture conditions that permit production of the compound by the host microorganism, i.e., by the metabolic action of the microorganism.
  • Appropriate culture conditions are well known for microorganisms that can serve as host cells.
  • CBH II expression plasmid construction Parent and chimeric genes encoding CBH II enzymes were cloned into yeast expression vector YEp352/PGK91-l- ⁇ ss ( Figure 6) .
  • DNA sequences encoding parent and chimeric CBH II catalytic domains were designed with S. cerevisiae codon bias using GeneDesigner software (DNA2.0) and synthesized by DNA2.0.
  • the CBH II catalytic domain genes were digested with Xhol and Kpnl, ligated into the vector between the Xhol and Kpnl sites and transformed into E. coli XL-I Blue (Stratagene) .
  • CBH II genes were sequenced using primers: CBH2L (5' - GCTGAACGTGTCATCGGTTAC-3' (SEQ ID NO: 9) and RSQ3080 (5'- GCAACACCTGGCAATTCCTTACC-3' (SEQ ID NO:10)) .
  • Oterminal His 6 parent and chimera CBH II constructs were made by amplifying the CBH II gene with forward primer CBH2LPCR ⁇ 5' ' -GCTGAACGTGTCATCGTTACTTAG-3' (SEQ ID NO: H)) and reverse primers complementary to the appropriate CBH II gene with His 6 overhangs and stop codons .
  • PCR products were ligated, transformed and sequenced as above.
  • S. cerevisiae strain YDR483W BY4742 ⁇ Mat ⁇ his3 ⁇ l leu2 ⁇ 0 lys2 ⁇ 0 ura3 ⁇ 0 ⁇ KRE2, ATCC No. 4014317 was made competent using the EZ Yeast II Transformation Kit (Zymo Research) , transformed with plasmid DNA and plated on synthetic dropout -uracil agar.
  • SDCAA dextrose casamino acids
  • PASC Phosphoric acid swollen cellulose
  • CBH II enzyme ti /2 s were measured by adding concentrated CBH II expression culture supernatant to 50 mM sodium acetate, pH 4.8 at a concentration giving A 520 of 0.5 as measured in the Nelson- Somogyi reducing sugar assay after incubation with treated PASC as described below. 37.5 uL CBH II enzyme/buffer mixtures were inactivated in a water bath at 63°C. After inactivation, 37.5 uL endoglucanase-treated PASC was added and hydrolysis was carried out for 2 hr at 50°C. Reaction supernatants were filtered through Multiscreen HTS plates (Millipore) . Nelson-Somogyi assay log (A 520 ) values, obtained using a SpectraMax microplate reader (Molecular Devices) corrected for background absorbance, were plotted versus time and CBH II enzyme half-lives obtained from linear regression using Microsoft Excel.
  • purified CBH II enzyme was added to PASC to give a final reaction volume of 75 uL 25 mM sodium acetate, pH 4.8, with 5 g/L PASC and CBH II enzyme concentration of 3 mg enzyme/g PASC. Incubation proceeded for 2 hr in a 50°C water bath and the reducing sugar concentration determined.
  • purified CBH II enzyme was added at a concentration of 300 ug/g PASC in a 75 uL reaction volume. Reactions were buffered with 12.5 mM sodium citrate/12.5 mM sodium phosphate, run for 16 hr at 50°C and reducing sugar determined.
  • CBH II enzymes Five candidate parent genes encoding CBH II enzymes were synthesized with S. cerevisiae codon bias. All five contained identical N-terminal coding sequences, where residues 1-89 correspond to the cellulose binding module (CBM) , flexible linker region and the five N-terminal residues of the H. jeco ⁇ na catalytic domain.
  • CBM cellulose binding module
  • the recombinant CBH II genes were expressed in a glycosylation-deficient dKRE2 S. cerevisiae strain. This strain is expected to attach smaller mannose oligomers to both N-linked and O-linked glycosylation sites than wild type strains, which more closely resembles the glycosylation of natively produced H. jeco ⁇ na CBH II enzyme.
  • Table 2 ClustalW multiple sequence alignment for parent CBH Il enzyme catalytic domains. Blocks 2, 4, 6 and 8 are denoted by boxes and grey shading. Blocks 1,3,5 and 7 are not shaded. (H. inso: SEQ ID NO:2; H. Jeco: SEQ ID NO:4 and C. Then SEQ ID NO:6).
  • the H. msolens CBH II catalytic domain has an ⁇ / ⁇ barrel structure in which the eight helices define the barrel perimeter and seven parallel ⁇ -sheets form the active site (Figure 2A) .
  • Two extended loops form a roof over the active site, creating a tunnel through which the substrate cellulose chains pass during hydrolysis.
  • Five of the seven block boundaries fall between elements of secondary structure, while block 4 begins and ends in the middle of consecutive ⁇ -helices ( Figures 2A, 2B) .
  • the majority of interblock sidechain contacts occur between blocks that are adjacent in the primary structure ( Figure 2C) .
  • a sample set of 48 chimera genes was designed as three sets of 16 chimeras having five blocks from one parent and three blocks from either one or both of the remaining two parents (Table 3) ; the sequences were selected to equalize the representation of each parent at each block position. The corresponding genes were synthesized and expressed. Table 3: Sequences of sample set CBH II enzyme chimeras.
  • Half-lives of thermal inactivation were measured at 63°C for concentrated culture supernatants of the parent and active chimeric CBH II enzymes.
  • the H. msolens, H. jeco ⁇ na and C. thermophilum CBH II parent half-lives were 95, 2 and 25 minutes, respectively.
  • the active sample set chimeras exhibited a broad range of half-lives, from less than 1 minute to greater than 3,000. Five of the 23 active chimeras had half-lives greater than that of the most thermostable parent, H. msolens CBH II.
  • Table 4 Cross validation values for application of 5 linear regression algorithms to CBH II enzyme chimera block stability scores. Algorithm abbreviations: ridge regression (RR), partial least square regression (PLSR) , support vector machine regression (SVMR) , linear programming support vector machine regression (LPSVMR) and linear programming boosting regression (LPBoostR) .
  • RR ridge regression
  • PLSR partial least square regression
  • SVMR support vector machine regression
  • LPSVMR linear programming support vector machine regression
  • LBoostR linear programming boosting regression
  • Block I/parent 1 B6P3, B7P3 and B8P2 were identified as having the greatest stabilizing effects, while B1P3, B2P1, B3P2, B6P2, B7P1, B7P2 and B8P3 were found to be the most strongly destabilizing blocks.
  • Table 5 Qualitative block classification results generated by five linear regression algorithms 1 for sample set CBH II enzyme chimeras. Score of +1 denotes a block with thermostability weight
  • score of 0 denotes block with weight within one standard deviation of neutral and -1 denotes block with weight more than one standard deviation below neutral (destabilizing) .
  • a chimera that has a sum score from the contributions of each block/domain of greater than 0 using a qualitative block classification, wherein the qualitatively classify blocks are defined as stabilizing, destabilizing or neutral, wherein each block' s impact on chimera thermostability is characterized using a scoring system that accounts for the thermostability contribution determined by a plurality of regression algorithms. For each algorithm, blocks with a thermostability weight value more than 1 SD above neutral were scored "+1", blocks within 1 SD of neutral were assigned zero and blocks 1 or more SD below neutral were scored "- 1". A "stability score" for each block was obtained by summing the 1, 0, -1 stability scores from each of the five models.
  • a second set of genes encoding CBH II enzyme chimeras was synthesized in order to validate the predicted stabilizing blocks and identify cellulases more thermostable than the most stable parent.
  • the 24 chimeras included in this validation set (Table 6) were devoid of the seven blocks predicted to be most destabilizing and enriched in the four most stabilizing blocks, where representation was biased toward higher stability scores.
  • the "HJPlus" 12222332 chimera was constructed by substituting the predicted most stabilizing blocks into the H. jeco ⁇ na CBH II enzyme (parent 2) .
  • Table 6 Sequences of 24 validation set CBH II enzyme chimeras, nine of which were expressed in active form.
  • Table 7 Specific activity values (ug glucose reducing sugar equivalent/ug CBH II*hr) for three thermostable CBH II chimeras and parents. Error is give as standard erros for between five and eight replicates per CBH II. 2-hour reaction, 3mg enzyme/g PASC, 50 °C, 25 mM sodium acetate, pH 4.
  • thermostable chimeras using purified enzymes was analyzed.
  • Table 3 the parent and chimera CBH II specific activities were within a factor of four of the most active parent CBH II enzyme, from H. jeco ⁇ na .
  • HJPlus The specific activity of HJPlus was greater than all other CBH II enzymes tested, except for H. jeco ⁇ na CBH II.
  • the pH dependence of cellulase activity is also important, as a broad pH/activity profile would allow the use of a CBH II chimera under a wider range of potential cellulose hydrolysis conditions.
  • H. jeco ⁇ na CBH II has been observed to have optimal activity in the pH range 4 to 6, with activity markedly reduced outside these values.
  • Figure 4 shows that the H. msolens and C. thermophilum CBH II enzymes and all three purified thermostable CBH II chimeras have pH/activity profiles that are considerably broader than that of H. jeco ⁇ na CBH II.
  • thermostable CBH II chimeras in cellulose hydrolysis was tested across a range of temperatures over a 40-hour time interval. As shown in Figure 5, all three thermostable chimeras were active on PASC at higher temperatures than the parent CBH II enzymes. The chimeras retained activity at 70°C, whereas the H. jeco ⁇ na CBH II did not hydrolyze PASC above 57°C and the stable H. msolens enzyme showed no hydrolysis above 63°C. The activity of HJPlus in longtime cellulose hydrolysis assays exceeded that of all the parents at their respective optimal temperatures.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Organic Chemistry (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biomedical Technology (AREA)
  • Biotechnology (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Microbiology (AREA)
  • Molecular Biology (AREA)
  • Plant Pathology (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Analytical Chemistry (AREA)
  • Medicinal Chemistry (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • General Chemical & Material Sciences (AREA)
  • Enzymes And Modification Thereof (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Peptides Or Proteins (AREA)
  • Preparation Of Compounds By Using Micro-Organisms (AREA)

Abstract

La présente invention concerne des polypeptides de fusion chimères de CBH II, des acides nucléiques codant les polypeptides, et des cellules hôtes pour produire les polypeptides.
PCT/US2010/027248 2009-01-16 2010-03-12 Cellobiohydrolases chimériques fonctionnelles stables WO2010091441A2 (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US20528409P 2009-01-16 2009-01-16
US61/205,284 2009-01-16
US16700309P 2009-04-06 2009-04-06
US61/167,003 2009-04-06

Publications (3)

Publication Number Publication Date
WO2010091441A2 true WO2010091441A2 (fr) 2010-08-12
WO2010091441A9 WO2010091441A9 (fr) 2010-09-30
WO2010091441A3 WO2010091441A3 (fr) 2011-01-13

Family

ID=42542692

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2010/027248 WO2010091441A2 (fr) 2009-01-16 2010-03-12 Cellobiohydrolases chimériques fonctionnelles stables

Country Status (1)

Country Link
WO (1) WO2010091441A2 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2576626A2 (fr) * 2010-06-01 2013-04-10 California Institute of Technology Enzymes cellobiohydrolases chimères fonctionnelles et stables de classe i

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070256197A1 (en) * 2006-04-28 2007-11-01 C5-6 Technologies, Inc. Thermostable cellulase and methods of use

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070256197A1 (en) * 2006-04-28 2007-11-01 C5-6 Technologies, Inc. Thermostable cellulase and methods of use

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
HEINZELMAN, P. ET AL.: 'A family of thermostable fungal cellulases created by structure-guided recombination.' PROCEEDINGS OF NATIONAL ACADEMY OF SCIENCES vol. 106, no. 14, 23 March 2009, pages 5610 - 5615 *
MEYER, M. ET AL.: 'Library analysis of SCHEMA-guided protein recombination' PROTEIN SCIENCE. vol. 12, no. 8, August 2003, pages 1686 - 1693 *
MEYER, M. ET AL.: 'Structure-guided SCHEMA recombination of distantly rela ted beta-lactamases.' PROTEIN ENGINEERING, DESIGN, AND SELECTION. vol. 19, no. 12, 06 November 2006, pages 563 - 570 *
SANDGREN, M. ET AL.: 'Structural and biochemical studies of GH family 12 cell ulases: improved thermal stability, and ligand complexes.' PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY. vol. 89, 29 December 2004, pages 246 - 291 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2576626A2 (fr) * 2010-06-01 2013-04-10 California Institute of Technology Enzymes cellobiohydrolases chimères fonctionnelles et stables de classe i
EP2576626A4 (fr) * 2010-06-01 2013-11-20 California Inst Of Techn Enzymes cellobiohydrolases chimères fonctionnelles et stables de classe i
US8962295B2 (en) 2010-06-01 2015-02-24 California Institute Of Technology Stable, functional chimeric cellobiohydrolase class I enzymes
US9284587B2 (en) 2010-06-01 2016-03-15 California Institute Of Technology Stable, functional chimeric cellobiohydrolase class I enzymes
US9708633B2 (en) 2010-06-01 2017-07-18 California Institute Of Technology Stable, functional chimeric cellobiohydrolase class I enzymes

Also Published As

Publication number Publication date
WO2010091441A9 (fr) 2010-09-30
WO2010091441A3 (fr) 2011-01-13

Similar Documents

Publication Publication Date Title
AU2010234521B2 (en) Polypeptides having cellulase activity
US9708633B2 (en) Stable, functional chimeric cellobiohydrolase class I enzymes
US9458443B2 (en) Optimized cellulase enzymes
US20100304464A1 (en) Stable, functional chimeric cellobiohydrolases
JP6340647B2 (ja) 超耐熱性セロビオハイドロラーゼ
US8715996B2 (en) Beta-glucosidase variant enzymes and related polynucleotides
WO2010091441A2 (fr) Cellobiohydrolases chimériques fonctionnelles stables
CN114574453A (zh) 一种宏基因组来源的耐热耐酸性漆酶及其编码基因
dos Santos Goncalves et al. Biotechnological potential of mangrove sediments: identification and functional attributes of thermostable and salinity-tolerant β-glucanase
CN107779443A (zh) 纤维二糖水解酶突变体及其应用
US9334517B2 (en) Endoglucanase having enhanced thermostability and activity
US20160251642A1 (en) Stable fungal cel6 enzyme variants
EP3974526A1 (fr) Enzyme xylanase ayant une thermostabilité et une stabilité alcaline extrêmes
US20130084619A1 (en) Modified cellulases with enhanced thermostability

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10739287

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 10739287

Country of ref document: EP

Kind code of ref document: A2