WO2017147690A1 - Engineered beta-glucosidases and methods of use thereof - Google Patents

Engineered beta-glucosidases and methods of use thereof Download PDF

Info

Publication number
WO2017147690A1
WO2017147690A1 PCT/CA2017/050219 CA2017050219W WO2017147690A1 WO 2017147690 A1 WO2017147690 A1 WO 2017147690A1 CA 2017050219 W CA2017050219 W CA 2017050219W WO 2017147690 A1 WO2017147690 A1 WO 2017147690A1
Authority
WO
WIPO (PCT)
Prior art keywords
seq
polypeptide
bgl
amino acid
acid residue
Prior art date
Application number
PCT/CA2017/050219
Other languages
French (fr)
Inventor
Vincent Martin
Kane LARUE
Original Assignee
Valorbec Société en Commandite
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Valorbec Société en Commandite filed Critical Valorbec Société en Commandite
Publication of WO2017147690A1 publication Critical patent/WO2017147690A1/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12PFERMENTATION OR ENZYME-USING PROCESSES TO SYNTHESISE A DESIRED CHEMICAL COMPOUND OR COMPOSITION OR TO SEPARATE OPTICAL ISOMERS FROM A RACEMIC MIXTURE
    • C12P19/00Preparation of compounds containing saccharide radicals
    • C12P19/14Preparation of compounds containing saccharide radicals produced by the action of a carbohydrase (EC 3.2.x), e.g. by alpha-amylase, e.g. by cellulase, hemicellulase
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/24Hydrolases (3) acting on glycosyl compounds (3.2)
    • C12N9/2402Hydrolases (3) acting on glycosyl compounds (3.2) hydrolysing O- and S- glycosyl compounds (3.2.1)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12PFERMENTATION OR ENZYME-USING PROCESSES TO SYNTHESISE A DESIRED CHEMICAL COMPOUND OR COMPOSITION OR TO SEPARATE OPTICAL ISOMERS FROM A RACEMIC MIXTURE
    • C12P19/00Preparation of compounds containing saccharide radicals
    • C12P19/02Monosaccharides
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12PFERMENTATION OR ENZYME-USING PROCESSES TO SYNTHESISE A DESIRED CHEMICAL COMPOUND OR COMPOSITION OR TO SEPARATE OPTICAL ISOMERS FROM A RACEMIC MIXTURE
    • C12P7/00Preparation of oxygen-containing organic compounds
    • C12P7/02Preparation of oxygen-containing organic compounds containing a hydroxy group
    • C12P7/04Preparation of oxygen-containing organic compounds containing a hydroxy group acyclic
    • C12P7/06Ethanol, i.e. non-beverage
    • C12P7/08Ethanol, i.e. non-beverage produced as by-product or from waste or cellulosic material substrate
    • C12P7/10Ethanol, i.e. non-beverage produced as by-product or from waste or cellulosic material substrate substrate containing cellulosic material
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02EREDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
    • Y02E50/00Technologies for the production of fuel of non-fossil origin
    • Y02E50/10Biofuels, e.g. bio-diesel

Definitions

  • the present invention relates to engineered beta-glucosidases and methods of use thereof. More specifically, the present invention is concerned with engineered beta-glucosidases displaying increased resistance to substrate and/or product inhibition and methods of use thereof.
  • Cellulose is a polysaccharide composed of glucose monomers that is the main constituent of plant cell walls.
  • Cellulosic material e.g., cellulose, cellulosic hydrolysates, and soluble cellodextrins such as cellotriose, cellobiose, etc.
  • cellulosic biofuel production consists in the hydrolyzation of cellulose into sugars (e.g., fermentable sugar such as glucose). Glucose can then be processed to form ethanol through fermentation.
  • CBP Consolidated bioprocessing
  • Saccharolytic enzymes from cellulolytic microbes are well characterized candidates for industrial applications (5).
  • cellobiose and other soluble cellodextrins are released through the synergistic activities of endoglucanases (EGLs; EC 3.2.1.4) and cellobiohydrolases (CBHs; 3.2.1.91 ).
  • ⁇ -glucosidases (BGLs; 3.2.1.21 ) are a third and critical component of natural and engineered cellulase systems. Their role is twofold: (1 ) the hydrolysis of soluble cellodextrins by BGLs produces a fermentable sugar, glucose, while (2) removing hydrolysate intermediates that act as inhibitors towards EGLs and CBHs (6).
  • K m is the Michaelis constant, namely the substrate concentration at which the reaction rate is half of V ma x, V ma x representing the maximum velocity achieved by the system at maximum (saturating) substrate concentrations.
  • product i.e. glucose
  • K i.e. inhibition constant
  • Glucose tolerant BGLs have been identified (13-16), but the mechanism of glucose tolerance is not understood. Inhibition of BGL hydrolytic activity by product and substrate contribute to limiting the efficiency of cellulosic bioconversion processes because the saccharification of biomass by EGLs and CBHs is dependent on the hydrolysis of soluble cellodextrins to glucose (29).
  • the yeast Saccharomyces cerevisiae is well-suited as a CBP platform microorganism because it naturally ferments hexose sugars to ethanol at high yields.
  • S. cerevisiae is non-cellulolytic and a CBP strain designed to secrete cellulases is needed.
  • the development of a cellulolytic S. cerevisiae strain faces two challenges: First, a complete and efficient set (EGL, CBH and BGL) of saccharolytic enzymes must be optimized for the hydrolysis of pre-treated cellulosic biomass. Second, recombinant cellulases must be secreted at sufficient levels such that the production of sugars supports the consumption requirements for growth and fermentation.
  • BGLs have been shown to be limiting in natural cellulase systems (55, 56), providing evidence that expression levels in S. cerevisiae generate poor fermentation yields. Since the secretion of cellulases from S. cerevisiae is well below titers reported for other proteins (57), increasing the heterologous expression of saccharolytic enzymes is a critical component of CDP strain development.
  • BGL inhibition by product and/or high substrate concentration
  • low expression are compounding obstacles to the development of CBP using S. cerevisiae.
  • many studies have been based on the discovery and comparative characterization of enzymes from cellulolytic organisms (7, 9, 1 1-13, 15, 16, 18, 20, 24, 25, 28, 43, 46, 58-68). Attempts have also been made at improving natural BGLs by reducing inhibition through rational design (27), gene fusions (42, 69), increasing expression via codon optimization (43), and using alternative secretion or cell anchoring strategies (43, 70, 71 ).
  • the Applicants used a directed evolution strategy by expressing a library of mutated bgl1 genes in Saccharomyces cerevisiae and used a two-step functional screen to identify improved enzymes.
  • Several amino acid substitutions were identified that improved the activity or expression in the context of secretion from Saccharomyces cerevisiae allowing improved BGLs to be engineered for industrial applications.
  • cellulases can be optimized for high substrate loadings, and identified specific positions as critical component of engineered GH3 BGLs.
  • niger numbering reduced the inhibitory effect of glucose and could be combined with a substitution that reduces the inhibitory effect of high substrate concentration (e.g., 305) to produce a BGL with decreased sensitivity to both product and substrate.
  • high substrate concentration e.g. 305
  • the Applicants mapped a group of beneficial mutations to the ⁇ / ⁇ domain of the molecule and postulated, without being limited by this hypothesis, that this region modulates activity through subunit interactions.
  • Certain BGL variants were identified with mutations in the MFa pre sequence that was used to mediate secretion of the protein.
  • substitutions at Pro 21 or Val 22 of the MFa pre sequence could produce up to a 2-fold increase in supernatant BGL activity and provides evidence that expression and/or secretion was limiting hydrolytic activity of culture supernatants.
  • Applicants showed that several beneficial mutations could be combined in a BGL with increased activity for both synthetic, namely pNPG, the standard for testing BGLs, and natural substrates.
  • a glycoside hydrolase family 3 (GH3) beta-glucosidase (BGL) polypeptide comprising:
  • a GH3 BGL triosephosphateisomerase domain comprising a sequence as set forth in KGY1Y2Y3Y4LGP (SEQ ID NO: 251 ), wherein: Y 2 is N, H, D or T;
  • Y 3 is I, A or V
  • Y 4 is R, K, A, L, I, F, V or P; (b) a GH3 BGL coordinating loop domain comprising a sequence as set forth in: GLDMX1MPX2X3X4X5X6X7X8X9X19X11X12X13X14X15X16 (SEQ ID NO: 252), wherein at least one of Xi to X i6 is smaller than a corresponding reference amino acid residue, wherein the reference amino acid residues are: Xiref is S, T or D;
  • Xsref is Y, T, S, V, L, N, or absent;
  • Xeref is D, F, G, Y, V, C, or absent;
  • Xzref is G, H, L, N, S, T, C, D, E, M or absent;
  • Xsref is S, W, F, Y or absent
  • Xgref is R, N, A, D or absent
  • Xioref is M, S, D, Q, G, T, or absent;
  • Xnref is F, N, G, S, T, E, A, or absent;
  • Xi2ref is G, D, S, N, T, K, R, L, F, or absent;
  • Xisref is Y, F, or W
  • a GH3 BGL ⁇ / ⁇ sandwich domain comprising a sequence as set forth in: Z1Z2Z3Z4Z5Z6 (SEQ ID NO: 253), wherein:
  • Z 2 is L, E, Y,S, K, Q, W, F, P.AorT;
  • Z 4 is L, F, R, G or T;
  • Z 5 is D, K, S, E, RorN;
  • a GH3 BGL ⁇ / ⁇ sandwich domain comprising a sequence as set forth in: A1A2A3A4A5 (SEQ ID NO: 254) and a sequence as set forth in B1B2B3B4B5 (SEQ ID NO: 255) wherein:
  • Ai is R, V, L, T, A, I, MorS;
  • a 2 is A, S, Vor D;
  • a 4 isS, D, G, M,Q, F, Y, orT;
  • a 5 is S, A, T, Q, D, or P;
  • B 2 isV, M, L, Fori;
  • B3 is D, E or absent
  • B 4 isQ, P, S, R, Gor E; and B 5 is W or F,
  • GH3 BGL polypeptide (does not comprise and) is not as set forth in any of the sequences of FIGs. 13A-H (any one of SEQ ID NOs: 22 to 62).
  • a glycoside hydrolase family 3 (GH3) beta-glucosidase (BGL) polypeptide comprising:
  • a GH3 BGL triosephosphateisomerase domain comprising a sequence as set forth in KGY1Y2Y3Y4LGP (SEQ ID NO: 251);
  • GLDMX1MPX2X3X4X5X6X7X8X9X19X11X12X13X14X15X16 (SEQ ID NO: 252), except that at least one of Xi to X16, when present, is smaller than a corresponding reference amino acid residue in a corresponding GH3 BGL reference coordinating loop domain comprising a sequence as set forth in
  • a GH3 BGL ⁇ / ⁇ sandwich domain comprising a sequence as set forth in: A1A2A3A4A5 (SEQ ID NO: 254) and a sequence as set forth in B1B2B3B4B5 (SEQ ID NO: 255),
  • the GH3 BGL polypeptide is not a polypeptide as set forth in any one of SEQ ID NOs: 22 to 44, and is not a secreted form of the polypeptide of any one of SEQ ID NOs: 22 to 44.
  • the GH3 BGL polypeptide does not comprise a polypeptide as set forth in any one of SEQ ID NOs: 22 to 44, and does not comprise not a secreted form of the polypeptide of any one of SEQ ID NOs: 22 to 44
  • Item 2 The GH3 BGL polypeptide of item 1 or 1', wherein at least one of Xe to Xs is smaller than the corresponding reference amino acid residue.
  • Item 3 The GH3 BGL polypeptide of item 1 or 1 ', wherein at least one of Xe to Xs is independently C, V, A, G, S, P, T, D or N.
  • Item 4 The GH3 BGL polypeptide of item 1 or 1 ', wherein X 8 is C, V, A, G, S, P, T, D or N.
  • Item 5. The GH3 BGL polypeptide of item 1 or 1 ', wherein Xs is C, V, A, or G.
  • Item 6 The GH3 BGL polypeptide of any one of items 1 and 1 ' to 5, wherein
  • GLDMXirefMPX2refX3refX4refX5refX6refX7refX8refX9refXl9refXl1refXl2refXl3refXl4refXl5refXl6ref (SEQ ID NO: 252) is:
  • Item 7 The GH3 BGL polypeptide of any one of items 1 to 6, which, except for residues defined in any one of items 1 to 6 and for the proviso defined in item 1 , is as set forth in any of the sequences of FIG. 13A-H and 15A-B or is a secreted form thereof (SEQ ID NOs:22-167).
  • Ustilago maydis SEQ ID NO: 37
  • Ccinl Coprinopsis cinerea (also designated herein Ccin 0.120163) (SEQ ID NO: 30); Umay Ustilago maydis (SEQ ID NO: 29);
  • Rory5 Rhizopus oryzae (SEQ ID NO: 36);
  • Pblal Phycomyces blaskesleeanus (SEQ ID NO: 28);
  • Pbla2 Phycomyces blaskesleeanus (SEQ ID NO: 31 );
  • Roryl Rhizopus oryzae (SEQ ID NO: 32); Roryl: Rhizopus oryzae (SEQ ID NO: 33); Rory3: Rhizopus oryzae (SEQ ID NO: 34);
  • RoryA Rhizopus oryzae (SEQ ID NO:
  • Aspergillus oryzae SEQ ID NO: 41
  • CpelBglX0290 Wickerhamomyces anomalus (Pichia anomal) (SEQ ID NO: 42);
  • SfibBglM22475 Saccharomycopsis fibuligera M22475 (SEQ ID NO: 43);
  • SfibBglM22476 Saccharomycopsis fibuligera M22476 (SEQ ID NO: 44); or any of the predicted secreted forms defined in FIG. 13A-H.
  • Item 8' comprises a polypeptide as set forth in any one of SEQ ID NOs: 163-167, or comprises a secreted form a polypeptide as set forth in any one of SEQ ID NOs: 163-167.
  • Y 3 is I or V
  • the GH3 BGL coordinating loop domain comprises a sequence as set forth in: GLDMXrMPGX2X3'X4'X 5 'X6'X7'X8'X9'XwXi XizXi3Xi4'Xi5'Xi6' (SEQ ID NO: 279), except that at least one of X r to X16', when present, is smaller than a corresponding reference amino acid residue in a corresponding GH3 BGL reference coordinating loop domain comprising a sequence as set forth in
  • Zi is N. V, A, K, Q, T, S, E, F, or Y;
  • Z 2 is L, E, Y, S, K, Q, W, P, A or T;
  • Z 5 is D, K, R or N;
  • a 4 is S, F, M, D, G, Q or T;
  • B 2 is V, M, l or L
  • ⁇ 4 ⁇ is Q, P, S, R, E or absent
  • the GH3 BGL polypeptide is not a polypeptide as set forth in any one of SEQ ID NOs: 22 to 44, and is not a secreted form of the polypeptide of any one of SEQ ID NOs: 22 to 44.
  • Item 9 The GH3 BGL polypeptide of item 1 , wherein:
  • Y 2 is D or T
  • Yi and Y4 are as defined in item 1 ;
  • the GH3 BGL coordinating loop domain comprises a sequence as set forth in: GLDMXrMPGX2X3X X 5 'X6'X7'X8'X9'XwXi XizXi Xi4Xi5'Xi6' (SEQ ID NO: 279), wherein at least one of Xr to ⁇ 6' is smaller than a corresponding reference amino acid residue, wherein the reference amino acid residues are:
  • Xrref is S, T or D
  • X4'ref is Y, L, or absent
  • Xsref is Y, L, T, S, V, H, D, M, E, or absent;
  • Xe'ref is D, F, C, G, Y, V, W, D, M, E, or absent;
  • Xsref is M, S, D, or T
  • Xg'ref is R or absent
  • Xwref is Q or absent
  • Xirref is F, N, E, G,S, Tor A;
  • Xizref is G, T, D, S, N, R, L, T, F;
  • Z 2 is L, E, Y, S, K, Q, W, P, A or T;
  • Z 5 is D, K, R or N;
  • the GH3 BGL ⁇ / ⁇ sandwich domain comprises a sequence at set forth in: A1A2A3A4A5 (SEQ ID NO: 254) and a sequence as set forth in B1B2B3B4B5 (SEQ ID NO: 255) wherein:
  • A2 is as defined in item 1;
  • A3 is as defined in item 1;
  • a 4 isS, F, M, D, G, QorT;
  • A5 is as defined in item 1, B 2 isV, M, lor L;
  • B 3 is D or E
  • B4' is Q, P, S, R, E or absent
  • B5 is as defined in item 1, with the proviso that the GH3 BGL polypeptide (does not comprise and) is not as set forth in any of the sequences of FIGs. 13A-H (any one of SEQ ID NOs: 22 to 62).
  • Item 10 The GH3 BGL polypeptide of item 9 or 9', wherein at least one of X to X$ is smaller than the corresponding reference amino acid residue.
  • Item 1 The GH3 BGL polypeptide of item 9 or 9', wherein at least one of the amino acid residues X* to X$ is C, V, A, G, S, P, T, D or N.
  • Item 12 The GH3 BGL polypeptide of item 9 or 9', wherein X 6 ⁇ is C, V, A, G, S, P, T, D or N.
  • Item 13 The GH3 BGL polypeptide of item 9 or 9', wherein Xe is C, V, A or G.
  • Item 14 The GH3 BGL polypeptide of any one of items 9 and 9' to 13, wherein
  • GLDMXtrefMPGX2'refX3 , refX4 , refX5 , refX6 , refX7 ⁇ ID NO: 279) is:
  • Item 15 The GH3 BGL polypeptide of any one of items 9 and 9' to 14, which except, for residues defined in any one of items 9 to 14 and for the proviso defined in item 9 or 9', is as set forth SEQ ID NO: 164 or is a secreted form thereof (FIGs. 15A-B).
  • the GH3 BGL polypeptide of any one of items 9 and 9' to 14, which except, for residues defined in any one of items 9 and 9' to 14 and for the proviso defined in item 9 or 9', comprises a polypeptide as set forth in SEQ ID NO: 164, or comprises a secreted form the polypeptide as set forth in SEQ ID NO: 164.
  • the GH3 BGL polypeptide of any one of items 9, 9' to 15 and 15' which, except for residues defined in any one of items 9, 9' to 15 and 15and for the proviso defined in item 9 or 9', comprises a polypeptide as set forth in any one of SEQ ID NO: 22-37 and 41 , or comprises a secreted form of the polypeptide as set forth in any one of SEQ ID NO: 22-37 and 41.
  • Yi and Y4 are as defined in item 1 or 1';
  • Y2 and Y3 are as defined in item 9 or 9';
  • the GH3 BGL coordinating loop domain comprises a sequence as set forth in: GLDMXrMPGDX 2 "X3"X4 » X5"X6"XrX8"SX 9 "WG (SEQ ID NO: 280), except that at least one of X r to X 9 « , when present, is smaller than a corresponding reference amino acid residue in a corresponding GH3 BGL reference coordinating loop domain comprising a sequence as set forth in GLDMXrrefMPGDXzrefXa-refX* ⁇ (SEQ ID NO: 280);
  • Z 2 is A, T or S
  • Z3 is as defined in item 1 or 1 ';
  • Z 5 is D, R or N;
  • a 2 is A or V
  • a 4 is Q or T
  • a 5 is S, A, Q or P,
  • the GH3 BGL polypeptide is not a polypeptide as set forth in any one of the sequences of SEQ ID NOs: 22 to 44, and is not a secreted form of the polypeptide of any one of SEQ ID NOs: 22 to 44.
  • Yi and Y4 are as defined in item 1 ;
  • Y2 and Y3 are as defined in item 9;
  • the GH3 BGL coordinating loop domain comprises a sequence as set forth in: GLDMXrMPGDX 2 "X3"X4 » X5"X6"XrX8"SX 9 "WG (SEQ ID NO: 280), wherein at least one of X r to X 9 « is smaller than a corresponding reference amino acid residue, wherein the reference amino acid residues are:
  • Xr is S or T
  • Xz is I, V or T
  • X 3 ⁇ is S, T, D, M or E;
  • Xs is D or N
  • Xs is D, S or T
  • X s is L, T, R or F
  • X 9 is F or Y
  • Z 2 is A, T or S
  • Z3 is as defined in item 1 ;
  • Z 5 is D, R or N;
  • a 2 is A or V
  • A3 is as defined in item 1 ;
  • a 5 is S, A, Q or P,
  • B 2 is I, V or L
  • B 3 is D or E
  • B4 is R or E; and B 5 is W,
  • GH3 BGL polypeptide (does not comprise and) is not as set forth in any of the sequences of FIGs. 13A-H (any one of SEQ ID NOs: 22 to 62).
  • Item 18 The GH3 BGL polypeptide of item 17 or 17', wherein at least one of X2" to X4" is smaller than the corresponding amino acid residue.
  • Item 19 The GH3 BGL polypeptide of item 17 or 17', wherein at least one of X 2 ⁇ to X 4 ⁇ is C, V, A, G, S, P, T, D or N.
  • Item 20 The GH3 BGL polypeptide of item 17 or 17', wherein X 4 ⁇ is C, V, A, G, S, P, T, D or N.
  • Item 21 The GH3 BGL polypeptide of item 17 or 17', wherein X 4 ⁇ is C, V, A, or G.
  • Item 22 The GH3 BGL polypeptide of any one of items 17 and 17' to 21 , wherein
  • GLDMXi r efMPGDX2"refX3"refX4"refX5'refX6"refX7"refX8'refSX9"refW (SEQ ID NO: 280) is:
  • Item 25' The GH3 BGL of claim 17 or 17', wherein:
  • Zi-Z5 are as defined in item 17 or 17';
  • A2-A5 are as defined in item 17 or 17'; and B1-B5 are as defined in item 17 or 17',
  • the GH3 BGL polypeptide is not a polypeptide as set forth in any one of SEQ ID NOs: 22 to 44, and is not a secreted form of the polypeptide of any one of SEQ ID NOs: 22 to 44.
  • Item 23 The GH3 BGL polypeptide of any one of items 17 to 22, which, except for residues defined in any one of items 17 and 17' to 22 or for the proviso defined in item 17 or 17', is as set forth in SEQ ID NO: 165) or is a secreted form thereof (FIGs. 15A-B).
  • Item 24 The GH3 BGL polypeptide of any one of items 17 to 23, which, except for residues defined in any one of items 17 to 22 or for the proviso defined in item 17, is as set forth in Afum: Aspergillus fumigatus (SEQ ID NO: 25); Aacu: Aspergillus aculeatus (SEQ ID NO: 22); Aory: Aspergillus oryzae (SEQ ID NO: 41); Anig: Aspergillus niger (SEQ ID NO: 23); Ncra: Neurospora crassa (SEQ ID NO: 26); or Fgra: Fusarium graminearum (SEQ ID NO: 27); or any of their predicted secreted forms defined in FIGs. 13A-H.
  • Afum Aspergillus fumigatus (SEQ ID NO: 25); Aacu: Aspergillus aculeatus (SEQ ID NO: 22); Aory: Aspergillus oryzae (SEQ ID NO: 41
  • Bi - Bs are as defined in item 17,
  • GH3 BGL polypeptide (does not comprise and) is not as set forth in any of the sequences of FIGs. 13A-H (any one of SEQ ID NOs: 22 to 62).
  • GLDMXi r efMPGDX2"refX3"refX4"refX5'refX6"refX7"refX8'refSX9"refW (SEQ ID NO: 280) is:
  • the GH3 BGL polypeptide of any one of items 25, 25' and 26, which, except for residues as defined in any one of items 25, 25' and 26 or for the proviso defined in item 25 or 25', comprises a polypeptide as set forth in SEQ ID NO: 166, or comprises a secreted form of the polypeptide as set forth in SEQ ID NO: 166.
  • Item 28' comprises a polypeptide as set forth in SEQ ID NO: 166, or comprises a secreted form of the polypeptide as set forth in SEQ ID NO: 166.
  • the GH3 BGL polypeptide of any one of items 25, 25', 26-27 and 27', which except for residues defined in any one of items 25, 25' and 26, comprises a polypeptide as set forth in any one of SEQ ID NOs: 22-23 and 25-27, or comprises a secreted form of the polypeptide as set forth in any one of SEQ ID NOs: 22-23 and 25-27.
  • Y3 is as defined in item 9 or 9';
  • Xr is S
  • X 2 " is I or V
  • X 3 ⁇ is S, T or D;
  • X 4 " is F or Y
  • X 6 ⁇ is D or S
  • Xr is G or A
  • a 5 is S or A, B 2 is I or V;
  • the GH3 BGL polypeptide is not a polypeptide as set forth in any one of SEQ ID NOs: 22 to 44, and is not a secreted form of the polypeptide of any one of SEQ ID NOs: 22 to 44.
  • Item 27. The GH3 BGL polypeptide of item 25 or 26, which, except for residues defined in item 25 or 26 or for the proviso defined in item 25, is as set forth in SEQ ID NO: 166 or is a secreted form thereof (FIGs. 15A-B).
  • Item 28 The GH3 BGL polypeptide of any one of items 25 to 27, which except for residues defined any one of items 25 to 27, is as set forth in Afum: Aspergillus fumigatus (SEQ ID NO: 25); Aacu: Aspergillus aculeatus (SEQ ID NO: 22); Anig: Aspergillus niger (SEQ ID NO: 23); Ncra: Neurospora crassa (SEQ ID NO: 26); or Fgra: Fusanum graminearum (SEQ ID NO: 27); or any of their predicted secreted forms defined in FIGs. 13A-H.
  • Afum Aspergillus fumigatus
  • Aacu Aspergillus aculeatus
  • SEQ ID NO: 22 Anig: Aspergillus niger
  • Ncra Neurospora crassa
  • Fgra Fusanum graminearum
  • Xr is S
  • X 2 " is I or V
  • X 3 ⁇ is S, T or D;
  • X 4 " is F or Y
  • X 6 ⁇ is D or S
  • Xr is G or A
  • a 5 is S or A, B 2 is I or V;
  • GH3 BGL polypeptide (does not comprise and) is not as set forth in any of the sequences of FIGs. 13A-H (any one of SEQ ID NOs: 22 to 62).
  • Item 30 The GH3 BGL polypeptide of item 29 or 29', wherein at least one of X2" to X4" is smaller than the corresponding amino acid residue.
  • Item 31 The GH3 BGL polypeptide of item 29 or 29', wherein at least one of X 2 ⁇ to X 4 ⁇ is C, V, A, G, S, P, T, D or N.
  • Item 32 The GH3 BGL polypeptide of item 29 or 29', wherein X 4 ⁇ is C, V, A, G, S, P, T, D or N.
  • Item 33 The GH3 BGL polypeptide of item 29 or 29', wherein X 4 ⁇ is C, V, A, or G.
  • Item 34 The GH3 BGL polypeptide of any one of items 29 and 29' to 33 wherein
  • GLDMXrrefM PGDXz refXs' refX* ⁇ (SEQ ID NO: 280) is:
  • the GH3 BGL polypeptide of any one of items 29 and 29' to 34 which, except for residues defined in any one of items 29 and 29' to 34 or for the proviso defined in item 29 or 29', comprises a polypeptide as set forth in SEQ ID NO: 167, or comprises a secreted form of the polypeptide as set forth in SEQ ID NO: 167.
  • the GH3 BGL polypeptide of any one of items 29 and 29' to 34 which, except for residues defined in any one of items 29 and 29' to 34 or for the proviso defined in item 29 or 29', comprises a polypeptide as set forth in any one of SEQ ID NOs: 22-23, 25 and 41 , or comprises a secreted form of the polypeptide as set forth in any one of SEQ ID NOs: 22-23, 25 and 41.
  • Item 35 The GH3 BGL polypeptide of any one of items 29 to 34, which, except for residues defined in any one of items 29 to 34 or for the proviso defined in item 29, is as set forth in SEQ ID NO: 167 or is a secreted form thereof (FIGs. 15A-B).
  • Item 36 The GH3 BGL polypeptide of any one of items 29 to 34, which, except for residues defined in any one of items 29 to 34 or for the proviso defined in item 29, is as set forth in Afum: Aspergillus fumigatus (SEQ ID NO: 25); Aory: Aspergillus oryzae (SEQ ID NO: 41 ); Aacu: Aspergillus aculeatus (SEQ ID NO: 22); or Anig: Aspergillus niger (SEQ ID NO: 23); or any of their predicted secreted forms defined in FIGs. 13A-H.
  • Item 37 The GH3 BGL polypeptide of any one of the preceding items, wherein Y4 is A, L, I or V.
  • Item 38 The GH3 BGL polypeptide of any one of the preceding items, wherein Y4 is L.
  • Item 39. The GH3 BGL polypeptide of any one the preceding items, wherein Z3 is V.
  • Item 40 The GH3 BGL polypeptide of any one the preceding items, wherein A3 is Q.
  • Item 41. The GH3 BGL polypeptide of any one of the preceding items, wherein B3 is D.
  • Item 42 The GH3 BGL polypeptide of item 1 , as set forth in FIGs. 15A-B(SEQ ID NO: 163), wherein:
  • amino acid residue at position 340 SEQ ID NO: 163 is R, K, A, L, I, F, V or P;
  • amino acid residue at position 515 of SEQ ID NO: 163 is C, V, A, G, S, P, T, D or N;
  • amino acid residue at position 734 of SEQ ID NO: 163 is V or L;
  • amino acid residue at position 748 of SEQ ID NO: 163 is Q or N and amino acid residue at position 813 of SEQ ID NO: 163 is D or E,
  • GH3 BGL polypeptide (does not comprise and) is not as set forth in any of the sequences of FIGs. 13A-H (any one of SEQ ID NOs: 22 to 62).
  • GH3 BGL polypeptide of item 1 or 1 ' comprising a polypeptide as set forth in SEQ ID NO: 163, or a secreted form thereof, wherein:
  • amino acid residue at position 340 of SEQ ID NO: 163 is R, K, A, L, I, F, V or P;
  • amino acid residue at position 515 of SEQ ID NO: 163 is C, V, A, G, S, P, T, D or N;
  • amino acid residue at position 734 of SEQ ID NO: 163 is V or L;
  • amino acid residue at position 748 of SEQ ID NO: 163 is Q or N and amino acid residue at position 813 of SEQ ID NO: 163 is D or E,
  • the GH3 BGL polypeptide is not a polypeptide as set forth in any one of SEQ ID NOs: 22 to 44, and is not a secreted form of the polypeptide of any one of SEQ ID NOs: 22 to 44.
  • amino acid residue at position 340 of SEQ ID NO: 163 is A, L, I or V;
  • amino acid residue at position 515 of SEQ ID NO: 163 is C, G, A or V;
  • amino acid residue at position 734 of SEQ ID NO: 163 is V;
  • amino acid residue at position 748 of SEQ ID NO: 163 is Q and amino acid residue at position 813 of SEQ ID NO: 163 is D.
  • amino acid residue at position 340 of SEQ ID NO: 163 is L;
  • amino acid residue at position 515 of SEQ ID NO: 163 is C, G, A or V;
  • amino acid residue at position 734 of SEQ ID NO: 163 is V;
  • amino acid residue at position 748 of SEQ ID NO: 163 is Q and amino acid residue at position 813 of SEQ ID NO: 163 is D.
  • Item 45 The GH3 BGL polypeptide of item 9, as set forth in FIGs. 15A-B (SEQ ID NO: 164), wherein:
  • amino acid residue at position 336 of SEQ ID NO: 164 is R, K, A, L, I, F, V or P; amino acid residue at position 507 of SEQ ID NO: 164 is C, V, A, G, S, P, T, D or N;
  • amino acid residue at position 727 of SEQ ID NO: 164 is V or L;
  • amino acid residue at position 741 of SEQ ID NO: 164 is Q or N and amino acid residue at position 806 of SEQ ID NO: 164 is D or E,
  • GH3 BGL polypeptide (does not comprise and) is not as set forth in any of the sequences of FIGs. 13A-H (any one of SEQ ID NOs: 22 to 62).
  • Item 45' The GH3 BGL polypeptide of item 9 or 9', comprising a polypeptide as set forth in SEQ ID NO: 164 or a secreted form thereof, wherein:
  • amino acid residue at position 336 of SEQ ID NO: 164 is R, K, A, L, I, F, V or P;
  • amino acid residue at position 507 of SEQ ID NO: 164 is C, V, A, G, S, P, T, D or N;
  • amino acid residue at position 727 of SEQ ID NO: 164 is V or L;
  • amino acid residue at position 741 of SEQ ID NO: 164 is Q or N and amino acid residue at position 806 of SEQ ID NO: 164 is D or E,
  • the GH3 BGL polypeptide is not a polypeptide as set forth in any one of SEQ ID NOs: 22 to 44, and is not a secreted form of the polypeptide of any one of SEQ ID NOs: 22 to 44.
  • amino acid residue at position 336 of SEQ ID NO: 164 is A, L, I or V;
  • amino acid residue at position 507 of SEQ ID NO: 164 is C, G, A or V;
  • amino acid residue at position 727 of SEQ ID NO: 164 is V;
  • amino acid residue at position 741 of SEQ ID NO: 164 is Q and amino acid residue at position 806 of SEQ ID NO: 164 is D.
  • amino acid residue at position 507 of SEQ ID NO: 164 is C, G, A or V;
  • amino acid residue at position 727 of SEQ ID NO: 164 is V;
  • amino acid residue at position 741 of SEQ ID NO: 164 is Q and amino acid residue at position 806 of SEQ ID NO: 164 is D.
  • Item 48 The GH3 BGL polypeptide of item 17 or 25, as set forth in FIGs. 15A-B (SEQ ID NOs: 165 or 166), wherein:
  • amino acid residue at position 157 of SEQ ID NO: 165 or 166 is R, K, A, L, I, F, V or P;
  • amino acid residue at position 322 of SEQ ID NO: 165 or 166 is C, V, A, G, S, P, T, D or N;
  • amino acid residue at position 498 of SEQ ID NO: 165 or 166 is V or L;
  • amino acid residue at position 512 of SEQ ID NO: 165 or 166 is Q or N and amino acid residue at position 576 of SEQ ID NO: 165 or 166 is D or E,
  • GH3 BGL polypeptide (does not comprise and) is not as set forth in any of the sequences of FIGs. 13A-H (any one of SEQ ID NOs: 22 to 62).
  • Item 48' The GH3 BGL polypeptide of any one of items 17, 17', 25 and 25', comprising a polypeptide as set forth in SEQ ID NOs: 165 or 166 or a secreted form thereof, wherein:
  • amino acid residue at position 157 of SEQ ID NO: 165 or 166 is R, K, A, L, I, F, V or P;
  • amino acid residue at position 322 of SEQ ID NO: 165 or 166 is C, V, A, G, S, P, T, D or N;
  • amino acid residue at position 498 of SEQ ID NO: 165 or 166 is V or L;
  • amino acid residue at position 512 of SEQ ID NO: 165 or 166 is Q or N and amino acid residue at position 576 of SEQ ID NO: 165 or 166 is D or E,
  • the GH3 BGL polypeptide is not a polypeptide as set forth in any one of SEQ ID NOs: 22 to 44, and is not a secreted form of the polypeptide of any one of SEQ ID NOs: 22 to 44.
  • amino acid residue at position 157 of SEQ ID NO: 165 or 166 is A, L, I or V;
  • amino acid residue at position 322 of SEQ ID NO: 165 or 166 is C, G, A or V;
  • amino acid residue at position 512 of SEQ ID NO: 165 or 166 is Q and amino acid residue at position 576 of SEQ ID NO: 165 or 166 is D.
  • Item 50 The GH3 BGL polypeptide of item 48 or 48', wherein:
  • amino acid residue at position 157 of SEQ ID NO: 165 or 166 is L;
  • amino acid residue at position 322 of SEQ ID NO: 165 or 166 is C, G, A or V;
  • amino acid residue at position 498 of SEQ ID NO: 165 or 166 is V;
  • amino acid residue at position 512 of SEQ ID NO: 165 or 166 is Q and amino acid residue at position 576 of SEQ ID NO: 165 or 166 is D.
  • amino acid residue at position 151 of SEQ ID NO: 167 is R, K, A, L, I, F, V or P;
  • amino acid residue at position 316 of SEQ ID NO: 167 is C, V, A, G, S, P, T, D or N;
  • amino acid residue at position 491 of SEQ ID NO: 167 is V or L;
  • amino acid residue at position 505 of SEQ ID NO: 167 is Q or N and amino acid residue at position 568 of SEQ ID NO: 167 is D or E,
  • GH3 BGL polypeptide (does not comprise and) is not as set forth in any of the sequences of FIGs. 13A-H (any one of SEQ ID NOs: 22 to 62).
  • the GH3 BGL polypeptide of item 29 or 29' comprising a polypeptide as set forth in SEQ ID NO: 167 or a secreted form thereof, wherein:
  • amino acid residue at position 151 of SEQ ID NO: 167 is R, K, A, L, I, F, V or P;
  • amino acid residue at position 316 of SEQ ID NO: 167 is C, V, A, G, S, P, T, D or N;
  • amino acid residue at position 491 of SEQ ID NO: 167 is V or L;
  • amino acid residue at position 505 of SEQ ID NO: 167 is Q or N and amino acid residue at position 568 of SEQ ID NO: 167 is D or E, with the proviso that the GH3 BGL polypeptide is not a polypeptide as set forth in any one of SEQ ID NOs: 22 to 44, and is not a secreted form of the polypeptide of any one of SEQ ID NOs: 22 to 44.
  • Item 52 The GH3 BGL polypeptide of item 51 or 51 ', wherein:
  • amino acid residue at position 151 of SEQ ID NO: 167 is A, L, I or V;
  • amino acid residue at position 316 of SEQ ID NO: 167 is C, G, A or V;
  • amino acid residue at position 491 of SEQ ID NO: 167 is V;
  • amino acid residue at position 505 of SEQ ID NO: 167 is Q and amino acid residue at position 568 of SEQ ID NO: 167 is D.
  • Item 53 The GH3 BGL polypeptide of item 51 or 51 ', wherein:
  • amino acid residue at position 151 of SEQ ID NO: 167 is L;
  • amino acid residue at position 316 of SEQ ID NO: 167 is C, G, A or V;
  • amino acid residue at position 491 of SEQ ID NO: 167 is V;
  • amino acid residue at position 505 of SEQ ID NO: 167 is Q and amino acid residue at position 568 of SEQ ID NO: 167 is D.
  • Ustilago maydis SEQ ID NO: 37
  • Ccinl Coprinopsis cinerea (also designated herein Ccin 0.120163) (SEQ ID NO: 30);
  • Umayl Ustilago maydis (SEQ ID NO: 29);
  • Rory5 Rhizopus oryzae (SEQ ID NO: 36);
  • Pbla Phycomyces blaskesleeanus SEQ ID NO: 28);
  • Pbla2 Phycomyces blaskesleeanus (SEQ ID NO: 31 );
  • Rory1 Rhizopus oryzae (SEQ ID NO: 32); Rory2: Rhizopus oryzae (SEQ ID NO: 33);
  • Rory3 Rhizopus oryzae (SEQ ID NO: 34);
  • RoryA Rhizopus or
  • Aspergillus oryzae SEQ ID NO: 41
  • CpelBglX0290 Wickerhamomyces anomalus (Pichia anomal) (SEQ ID NO: 42);
  • SfibBglM22475 Saccharomycopsis fibuligera M22475 (SEQ ID NO: 43);
  • SfibBglM22A76 Saccharomycopsis fibuligera M22476 (SEQ ID NO: 44), or any of their predicted secreted forms defined in FIGs. 13A-H.
  • Ustilago maydis SEQ ID NO: 37
  • Ccinl Coprinopsis cinerea (also designated herein Ccin 0.120163) (SEQ ID NO: 30);
  • Umayl Ustilago maydis (SEQ ID NO: 29);
  • Rory5 Rhizopus oryzae (SEQ ID NO: 36);
  • Pblat Phycomyces blaskesleeanus SEQ ID NO: 28
  • Pbla2 Phycomyces blaskesleeanus
  • Roryl Rhizopus oryzae
  • Rory2 Rhizopus oryzae
  • Rory3 Rhizopus oryzae (SEQ ID NO: 34)
  • RoryA Rhizopus oryzae
  • Item 56 The GH3 BGL polypeptide of any one of items 48 to 50, which, except for residues defined in any one of items 48 to 50 and for the proviso defined in item 48, is as set forth in any one of Afum: Aspergillus fumigatus (SEQ ID NO: 25); Aacu: Aspergillus aculeatus (SEQ ID NO: 22); Aory: Aspergillus oryzae (SEQ ID NO: 41 ); Anig: Aspergillus niger (SEQ ID NO: 23); Ncra: Neurospora crassa (SEQ ID NO: 26); or Fgra: Fusarium graminearum (SEQ ID NO: 27); or any of their predicted secreted forms defined in FIGs. 13A-H.
  • Afum Aspergillus fumigatus (SEQ ID NO: 25); Aacu: Aspergillus aculeatus (SEQ ID NO: 22); Aory: Aspergillus oryzae (
  • Item 57 The GH3 BGL polypeptide of any one of items 51 to 53, which, except for residues defined in any one of items 51 to 53 and for the proviso defined in item 51 , is as set forth in any one of Afum: Aspergillus fumigatus (SEQ ID NO: 25); Aacu: Aspergillus aculeatus (SEQ ID NO: 22); Anig: Aspergillus niger (SEQ ID NO: 23); Ncra: Neurospora crassa (SEQ ID NO: 26); or Fgra: Fusarium graminearum (SEQ ID NO: 27); or any of their predicted secreted forms defined in FIGs. 13A-H.
  • Afum Aspergillus fumigatus
  • Aacu Aspergillus aculeatus
  • SEQ ID NO: 22 Anig: Aspergillus niger
  • Ncra Neurospora crassa
  • Fgra Fusarium graminearum
  • Item 58 The GH3 BGL polypeptide of item 1 , as set forth in any one of the sequences of FIGs. 14A-U (SEQ ID NOs: 63 to 162).
  • the GH3 BGL polypeptide of any one of items 42 and 42' to 44 which except for residues defined in any one of items 42 and 42' to 44 and for the proviso defined in item 42 or 42', comprises a polypeptide as set forth in any one of SEQ ID NOs: 22-44, or comprises a secreted form of the polypeptide as set forth in any one of SEQ ID NOs: 22-44.
  • the GH3 BGL polypeptide of any one of items 45 and 45' to 47 which, except for residues defined in any one of items 45 and 45' to 47 and for the proviso defined in item 45 or 45', comprises a polypeptide as set forth in any one of SEQ ID NO: 22-37 and 41 , or comprises a secreted form of the polypeptide as set forth in any one of SEQ ID NO: 22-37 and 41.
  • the GH3 BGL polypeptide of any one of items 48 and 48' to 50 which, except for residues defined in any one of items 48 and 48' to 50 and for the proviso defined in item 48 or 48', comprises a polypeptide as set forth in any one of SEQ ID NO: 22-23, 25-27 and 41 , or comprises a secreted form of the polypeptide as set forth in any one of SEQ ID NO: 22-23, 25-27 and 41.
  • the GH3 BGL polypeptide of any one of items 51 and 51 ' to 53 which, except for residues defined in any one of items 51 and 51 ' to 53 and for the proviso defined in item 51 or 51 ', comprises a polypeptide as set forth in any one of SEQ ID NO: 22-23, 25 and 41 , or comprises a secreted form of the polypeptide as set forth in any one of SEQ ID NO: 22-23, 25 and 41.
  • Item 58' The GH3 BGL polypeptide of item 1 or 1 ', comprising a polypeptide as set forth in any one of SEQ ID NOs: 63 to 144, or comprising a secreted form of the polypeptide as set forth in any one of SEQ ID NOs: 63 to 144.
  • Item 59' The GH3 BGL polypeptide of any one of the preceding items, comprising a signal peptide.
  • Item 60' The GH3 BGL polypeptide of any one of the preceding items, comprising a signal peptide including an MFa pre sequence which, except for a substitution of the alanine residue at position 9 and/or a substitution of the proline residue at position 21 and/or a substitution of the valine residue at position 22, is as set forth in SEQ ID NO: 9.
  • Item 61' The GH3 BGL polypeptide of item 59 or 59', wherein the signal peptide is a MFa pre sequence which, except for a substitution of the alanine residue at position 9 for a threonine residue and/or a substitution of the proline residue at position 21 for a threonine or a serine residue and/or a substitution of the valine residue at position 22 for an alanine or an aspartate residue, is as set forth in SEQ ID NO: 9.
  • Item 62' The GH3 BGL polypeptide of any one of the preceding items, comprising a signal peptide including an MFa pre sequence as set forth in any one of SEQ ID NOs: 9 and 168-172.
  • Item 63' The GH3 BGL polypeptide of any one of the preceding items , which is a secreted polypeptide form.
  • Item 59 A secreted form of the GH3 BGL polypeptide defined in any one of the preceding items.
  • the GH3 BGL polypeptide of the present invention comprises at least two of the elements (a) to (d) defined in any one of the preceding items (i.e. (a) the triosephosphateisomerase domain; (b) the coordinating loop domain; (c) part Z1-Z6 of the ⁇ / ⁇ sandwich domain; and (d) parts A1 -A5 and B1 -B5 of the ⁇ / ⁇ sandwich domain).
  • it contains at least 3 of such elements.
  • it contains all 4 of such elements.
  • the GH3 BGL polypeptide of the present invention does not comprise a polypeptide as set forth in any one of SEQ ID NOs: 22 to 44, and does not comprise a secreted form of the polypeptide of any one of SEQ ID NOs: 22 to 44.
  • Item 60 A vector comprising a nucleic acid encoding the GH3 BGL polypeptide defined herein.
  • Item 61 The vector of item 60, further comprising a nucleic acid encoding an endoglucanase (EGLs; EC 3.2.1.4).
  • EDLs endoglucanase
  • Item 62 The vector of item 60 or 61 , further comprising a nucleic acid encoding a cellobiohydrolase.
  • Item 63 The vector of any one of items 60 to 62, further comprising a terminator and/or a promoter.
  • Item 64 A host cell expressing (a) the GH3 BGL polypeptide defined in any one of the preceding items; or (b) the vector defined in any one of items 60 to 63 and 60' to 63'.
  • Item 65 A composition comprising (a) (i) the GH3 BGL polypeptide defined in any one of the preceding items; (ii) the vector defined in any one of items 60 to 63 and 60' to 63'; (iii) the host cell defined in item 64 or 64'; or (iv) a cell lysate or a culture medium of (iii); and (a) (i) a carrier; and/or (ii) at least one other cellulase.
  • Item 66 A composition comprising (a) (i) the GH3 BGL polypeptide defined in any one of the preceding items; (ii) the vector defined in any one of items 60 to 63 and 60' to 63'; (iii) the host cell defined in item 64 or 64'; or (iv) a cell lysate or a culture medium of (iii); and (a) (i) a carrier; and/or (ii) at least one other cellulase.
  • a method of converting a cellulosic substrate into a fermentable sugar comprising contacting (i) the GH3 BGL polypeptide defined in any one of the preceding items; or (b) the composition defined in item 65 or 65', with the cellulosic substrate, whereby a fermentable sugar is generated.
  • Item 67 The method of item 66, wherein the cellulosic substrate is soluble cellodextrin.
  • Item 68 The method of item 67, wherein the soluble cellodextrin is cellobiose.
  • Item 69 The method of any one of items 66 to 68, wherein the sugar is glucose.
  • FIGs. 1A-B Directed evolution of a-BGL1.
  • A Schematic of the assembly and expression cassettes from pKL022, pKL024 and pKL029 (SEQ ID NO: 1);
  • B Activities of a wild type population and selection pool from 96 well plate activity assays.
  • C Activity assays for the improved BGLs. Error bars represent mean ⁇ 95% confidence interval of triplicate experiments.
  • FIGs. 2A-B Representative VQ i.e. initial rate for the production of product during the linear phase of the reaction progress curve, versus substrate concentration plots for the production of pNP (product of hydrolization) at a range of pNPG concentrations in the presence of inhibitor for WT anBGL.1 (also designated a-BGL-1 ), Y305C, V22D, Q140L, A480V, K494Q and N557D.
  • Curves legends refer to glucose mM.
  • Ordinate axis is Vo (pmole pNP L 1 mirr 1 ); and abscissa axis is mM pNPG.
  • FIGs. 3A-D Structure/function analysis of beneficial mutations in GH3 BGLs
  • FIG. 3A Molecular mapping of mutations using the A aculeatus BGL1 crystal structure (PDB 4IIB). Chain A, pale grey; Chain B, dark grey. Substitutions identified by mutagenesis and functional selection are shown and labelled on Chain A.
  • FIG. 3B Residues contributing to the substrate binding pocket of A aculeatus BGL1.
  • FIG. 3C Gly 294 - Gly 313 residues coordinate Phe 305 in the +1 subsite.
  • FIG. 3D Alignments of native GH3 residues. Beneficial residues identified by directed evolution by the present invention are underlined. Asterisks (*) indicate beneficial substitutions found in nature.
  • a acu Aspergillus aculeatus BGL1 (SEQ ID NO: 22); A nig: Aspergillus niger BGL1 (SEQ ID NO: 23); A nid: Apergillus nidulans (SEQ ID NO: 24); A turn: Aspergillus fumigatus (SEQ ID NO: 25); N. era: Neurospora crassa (SEQ ID NO: 26); F. gra: Fusarium graminearum (SEQ ID NO: 27); P. blal: Phycomyces blaskesleeanus (SEQ ID NO: 28); U. mayl: Ustilago maydis (SEQ ID NO: 29); C.
  • cin Coprinopsis cinerea (SEQ ID NO: 30); P. blal: Phycomyces blaskesleeanus (SEQ ID NO: 31 ); R. ory1: Rhizopus oryzae (SEQ ID NO: 32); R. ory2: Rhizopus oryzae (SEQ ID NO: 33); R. ory3: Rhizopus oryzae (SEQ ID NO: 34); R. oryA: Rhizopus oryzae (SEQ ID NO: 35); R. ory : Rhizopus oryzae (SEQ ID NO: 36); U. mayl: Ustilago maydis (SEQ ID NO: 37). Positions highlighted in various shades of grey are part of the substrate binding pocket as shown in FIG.
  • FIG. 4A-B Position of residue 305 in substrate binding pocket (using numbering of A niger BGL1) of various GH3 BGL1 members.
  • FIG. 4B Alignments of GH3 BGL1 of different molds including coordinating loop 294- 313 (using numbering of A niger BGL1) (A niger GH3 BGL1 fragment (SEQ ID NO: 2), A fumigatus GH3 BGL1 fragment (SEQ ID NO: 3), A oryzae GH3 BGL1 fragment (SEQ ID NO: 4), Penicillium brasilianum GH3 BGL1 fragment (SEQ ID NO: 5), Magnaporthe grisea GH3 BGL1 fragment (SEQ ID NO: 6), and Neurospora crassa GH3 BGL1 fragment (SEQ ID NO: 7)).
  • FIGs. 5A-B Analysis of ⁇ -glucosidase reaction rates for the production of glucose at a range of cellobiose concentrations.
  • FIG. 5A a-BGL1 ;
  • FIG. 5B engineered variants.
  • FIGs. 6A-B Thin layer chromatography of BGL reactions using (FIG. 6A) 40 mM pNPG; and (FIG. 6B) 50 mM cellobiose.
  • Standards (1 ⁇ ) were 40 mM pNPG, 50 mM cellobiose (C), 50 mM glucose (G), and 25 mM gentiobiose (Ge).
  • FIG. 7 Amino acid sequence of Mfalpha-AnBGL.1 (SEQ ID NO: 8) consisting of underlined Mfalpha fragment sequence (amino acids 1 to 24) (SEQ ID NO: 9), AnBGLI 22-860 (SEQ ID NO: 10) and bolded C-terminal fragment of 25 amino acids (SEQ ID NO: 1 1) including polyhistidine.
  • FIGs. 8A-B Nucleic acid sequence of Mfalpha-AnBGL.1 (SEQ ID NO: 12) consisting of underlined Mfalpha fragment sequence (SEQ ID NO: 13), AnBGLI 22-860 (SEQ ID NO: 14) and bolded C-terminal fragment (SEQ ID NO: 15) including polyhistidine. Specific mutations identified in examples presented herein below are listed as miscellaneous features in FIGs 8A-B.
  • FIGs. 9A-C Nucleic acid sequence of pKL022 (SEQ ID NO: 16).
  • FIGs. 10A-C Nucleic acid sequence of pKL029 (SEQ ID NO: 17).
  • FIGs. 1 1A-C Nucleic acid sequence of pKL024 (SEQ ID NO: 18).
  • FIGs. 12A-E Nucleotide sequence of plasmid pGREG503 (SEQ ID NO: 19); of TDH3 promoter (SEQ ID NO: 20); and of CYC1 terminator (SEQ ID NO: 21 ).
  • FIGs. 13A-H Amino acid sequences of various GH3 BGLs including those disclosed in FIG. 3D, namely those from Aacu: Aspergillus aculeatus (SEQ ID NO: 22); Anig: Aspergillus niger (SEQ ID NO: 23) ; Anid: Apergillus nidulans (SEQ ID NO: 24); Afum: Aspergillus fumigatus (SEQ ID NO: 25); Ncra: Neurospora crassa (SEQ ID NO: 26); Fgra: Fusarium graminearum (SEQ ID NO: 27); Pblal: Phycomyces blaskesleeanus (SEQ ID NO: 28); Umay Ustilago maydis (SEQ ID NO: 29); Ccinl: Coprinopsis cinerea (also designated herein Ccin 0.120163) (SEQ ID NO: 30); Pblal: Phycomyces blaskesleeanus (SEQ ID NO: 31
  • Aspergillus oryzae SEQ ID NO: 41
  • CpelBglX0290 Wickerhamomyces anomalus (Pichia anomal)
  • SfibBglM22475 Saccharomycopsis fibuligera M22475
  • SfibBglM22476 Saccharomycopsis fibuligera M22476
  • predicted secreted forms thereof SEQ ID NOs: 45-62. Sequences highlighted in grey are signal peptides as predicted by SignalP (http://www.cbs.dtu.dk/services/SignalP/).
  • FIG. 14A-U I- Malpha Anig mutants with heterologous C-terminal sequences (SEQ ID NOs: 63-85); II- Malpha Anig mutants without heterologous C-terminal sequences (SEQ ID NOs: 86-108); ⁇ W-Anig mutants with predicted native signal peptide and with heterologous C-terminal sequences (SEQ ID NOs: 109-126); N-Anig mutants with predicted native signal peptide and without heterologous C-terminal sequences (SEQ ID NOs: 127-144); and M-Anig mutants without native signal peptide and without heterologous C-terminal sequences (SEQ ID NOs: 145-162). Sequences highlighted in grey are signal peptides as predicted by SignalP (http://www.cbs.dtu.dk/services/SignalP/).
  • FIGs. 15A-B Consensus derived from alignment of BGL orthologues presented in FIGs. 16A-H (SEQ ID NO: 163); consensus derived from alignment of BGL orthologues presented in FIGs. 17A-G (SEQ ID NO: 164); consensus derived from alignment of BGL orthologues presented in FIGs. 18A-C (SEQ ID NO: 165); consensus derived from alignment of BGL orthologues presented in FIG. 19A-C (SEQ ID NO: 166); and consensus derived from alignment of BGL orthologues presented in FIGs. 20A-B (SEQ ID NO: 167). Sequences highlighted in grey are signal peptides as predicted by SignalP (http://www.cbs.dtu.dk/services/SignalP/).
  • FIG. 16A-H presents an alignment of the BGL1 amino acid sequences of Umay2: Ustilago maydis (SEQ ID NO: 37); Ccint Coprinopsis cinerea (also designated herein Ccin 0.120163) (SEQ ID NO: 30); Umayl: Ustilago maydis (SEQ ID NO: 29); Rory5: Rhizopus oryzae (SEQ ID NO: 36); Pblal: Phycomyces blaskesleeanus (SEQ ID NO: 28); Pbla2: Phycomyces blaskesleeanus (SEQ ID NO: 31 ); Rory1: Rhizopus oryzae (SEQ ID NO: 32); Roryl: Rhizopus oryzae (SEQ ID NO: 33); Rory3: Rhizopus oryzae (SEQ ID NO: 34); RoryA: Rhizopus oryzae (SEQ ID NO: 35); Anid: Apergillus nidulans (SEQ ID NO: 24); A
  • Aspergillus oryzae SEQ ID NO: 41
  • CpelBglX0290 Wickerhamomyces anomalus (Pichia anomal) (SEQ ID NO: 42);
  • SfibBglM22475 Saccharomycopsis fibuligera M22475 (SEQ ID NO: 43);
  • SfibBglM22A76 Saccharomycopsis fibuligera M22476 (SEQ ID NO: 44); and consensus derived therefrom (SEQ ID NO: 163).
  • "*" denotes that the residues in that column are identical in all sequences of the alignment, ":” denotes that conserved substitutions have been observed, and ".” denotes that semi-conserved substitutions have been observed.
  • Consensus sequences derived from these alignments are also presented wherein X is any amino acid. Sequences highlighted in grey are signal peptides as predicted by SignalP (http://www.cbs.dtu.dk/services/SignalP/). Boxes define BGLs domains, namely, in the order of their appearances, the triosephosphateisomerase domain; the coordinating loop domain; and three portions of the ⁇ / ⁇ sandwich domain.
  • FIGs. 17A-G presents an alignment of the BGL1 amino acid sequences of Umay2: Ustilago maydis (SEQ ID NO: 37); Ccint Coprinopsis cinerea (also designated herein Ccin 0.120163) (SEQ ID NO: 30); Umayl : Ustilago maydis (SEQ ID NO: 29); Rory5: Rhizopus oryzae (SEQ ID NO: 36); Pblal: Phycomyces blaskesleeanus (SEQ ID NO: 28); Pblal.
  • Phycomyces blaskesleeanus (SEQ ID NO: 31 ); Roryl: Rhizopus oryzae (SEQ ID NO: 32); Rory2: Rhizopus oryzae (SEQ ID NO: 33); Rory3: Rhizopus oryzae (SEQ ID NO: 34); RoryA: Rhizopus oryzae (SEQ ID NO: 35); Anid: Apergillus nidulans (SEQ ID NO: 24); Afum: Aspergillus fumigatus (SEQ ID NO: 25); Aacu: Aspergillus aculeatus (SEQ ID NO: 22); Anig: Aspergillus niger (SEQ ID NO: 23); Ncra: Neurospora crassa (SEQ ID NO: 26); Fgra: Fusarium graminearum (SEQ ID NO: 27); and consensus derived therefrom (SEQ ID NO: 164).
  • FIGs. 18A-C presents an alignment of the amino acid sequences of BGL1 from Afum: Aspergillus fumigatus (SEQ ID NO: 25); Aory. Aspergillus oryzae (SEQ ID NO: 41 ); Aacu: Aspergillus aculeatus (SEQ ID NO: 22); Anig: Aspergillus niger (SEQ ID NO: 23); Ncra: Neurospora crassa (SEQ ID NO: 26); Fgra: Fusarium graminearum (SEQ ID NO: 27); and consensus derived therefrom (SEQ ID NO: 165).
  • FIGs. 19A-C presents an alignment of the amino acid sequences of BGL1 from Afum: Aspergillus fumigatus (SEQ ID NO: 25); Aacu: Aspergillus aculeatus (SEQ ID NO: 22); Anig: Aspergillus niger (SEQ ID NO: 23); Ncra: Neurospora crassa (SEQ ID NO: 26); Fgra: Fusarium graminearum (SEQ ID NO: 27); and consensus derived therefrom (SEQ ID NO: 166).
  • FIGs. 20A-B presents an alignment of the amino acid sequences of BGL1 from Afum: Aspergillus fumigatus (SEQ ID NO: 25); Aory: Aspergillus oryzae (SEQ ID NO: 41); Aacu: Aspergillus aculeatus (SEQ ID NO: 22); and Anig: Aspergillus niger (SEQ ID NO: 23); and consensus derived therefrom (SEQ ID NO: 167).
  • “*” denotes that the residues in that column are identical in all sequences of the alignment
  • ":” denotes that conserved substitutions have been observed
  • ".” denotes that semi-conserved substitutions have been observed.
  • FIG. 21 presents a phylogenetic tree of the GH3 BGLs of FIGs. 13A-H.
  • FIGs. 22A-B presents the percent identities of the GH3 BGLs of FIGs. 13A-H.
  • the words “comprising” (and any form of comprising, such as “comprise” and “comprises”), “having” (and any form of having, such as “have” and “has”), "including” (and any form of including, such as “includes” and “include”) or “containing” (and any form of containing, such as “contains” and “contain”) are inclusive or open-ended and do not exclude additional, un-recited elements or method steps.
  • the term “consists of” or “consisting of” means including only the elements, steps, or ingredients specifically recited in the particular claimed embodiment or claim.
  • the present invention relates to novel (mutated/recombinant) cellulase enzymes (e.g., ⁇ -glucosidase) that can be used e.g., for more efficiently transforming cellulosic material (e.g., cellobiose) into sugar /degrading cellulosic material (e.g., cellobiose).
  • the enzymes may be encoded by plasmids or chromosomes in a host cell, the host cell being contacted with the substrate; or can be used in vitro directly on the substrate.
  • enzymes encompassed by the present invention include: native or synthetic ⁇ -glucosidase (BGLs; 3.2.1.21 ), and optionally native or synthetic endoglucanases (EGLs; EC 3.2.1.4) and/or native or synthetic cellobiohydrolases (CBHs; 3.2.1.91 ).
  • Useful enzymes for the present invention may be derived from Umay2: Ustiiago maydis (SEQ ID NO: 37); Ccinl: Coprinopsis cinerea (also designated herein Ccin 0.120163) (SEQ ID NO: 30); Umayl: Ustiiago maydis (SEQ ID NO: 29); Rory5: Rhizopus oryzae (SEQ ID NO: 36); PbiaV.
  • Phycomyces blaskesleeanus SEQ ID NO: 28
  • Pbla2 Phycomyces blaskesleeanus
  • SEQ ID NO: 31 Roryl: Rhizopus oryzae (SEQ ID NO: 32); Roryl: Rhizopus oryzae (SEQ ID NO: 33); Rory3: Rhizopus oryzae (SEQ ID NO: 34); RoryA: Rhizopus oryzae (SEQ ID NO: 35);
  • Anid Apergillus nidulans (SEQ ID NO: 24); Afum: Aspergillus fumigatus (SEQ ID NO: 25); Aacu: Aspergillus aculeatus (SEQ ID NO: 22); Anig: Aspergillus niger (SEQ ID NO: 23); Ncra: Neurospora crassa (SEQ ID NO: 26); Fgra: Fusarium graminearum (SEQ ID NO: 27); Mgrisea: Magnaporthe grisea (
  • Aspergillus oryzae SEQ ID NO: 41
  • CpelBglX0290 Wickerhamomyces anomalus (Pichia anomal) (SEQ ID NO: 42);
  • SfibBglM22475 Saccharomycopsis fibuligera M22475 (SEQ ID NO: 43);
  • SfibBglM22476 Saccharomycopsis fibuligera M22476 (SEQ ID NO: 44); etc.
  • the present invention encompasses secreted forms of any of the foregoing. Predicted secreted (e.g., devoid of signal peptide) forms and full (translated) forms of illustrative examples of these enzymes are presented in FIGs. herein (e.g., FIGs.
  • the invention encompasses enzymes of any one of SEQ ID NO: 22-44 comprising a substitution of their native signal peptide with a mutated MF-a signal peptide as described herein (e.g., SEQ ID NO: 9 comprising one or more mutation at positions 9 and/or 21 and/or 22, such as those depicted in SEQ ID NO: 168-172).
  • Useful GH3 BGLs of the present invention have one or more beneficial mutations in that these mutations e.g., (i) increase their resistance to substrate inhibition; and/or (ii) increase their resistance to product inhibition.
  • mutation is meant to refer to a naturally occurring or artificially induced alteration in a nucleotide sequence resulting in a change in the encoded amino acid sequence. It also refers herein to the change in the amino acid sequence per se. It may result in a substitution of (at least one) an amino acid residue by another; a removal of (at least one) amino acid residue or an addition of (at least one) amino acid residue as compare to the original sequence (e.g., native GH3 BGL sequence).
  • Useful GH3 BGLs of the present invention may have one or more mutations in each of (i) the position 305 coordinating loop; (ii) the triosephosphateisomerase domain; and/or (iii) BGL ⁇ / ⁇ sandwich domain.
  • useful GH3 BGLs of the present invention may have one or more mutations in the position 305 coordinating loop (i.e. spanning residues 294 to 313 using the numbering of the Aspergillus aculeatus BGL1 polypeptide sequence) that result in a more open substrate binding pocket (i.e. with less steric hindrance) thereby reducing substrate (e.g., cellobiose) affinity.
  • the present invention encompasses mutation replacing at least one original (e.g., native) amino acid residues from the original (e.g., native) GH3 BGL coordinating loop with at least one less bulky/smaller amino acid residue thereby resulting in a more open substrate binding pocket.
  • the smaller amino acid residue is in the variable region of the coordinating loop. In a more specific embodiment, it is located in the substrate binding pocket (e.g., amino acid at position 305 in A. niger or aculeatus - see FIGs. 3B and 4A).
  • Small amino acid residues in accordance with the present invention include C, V, A, G, S, P, T, D or N.
  • Useful GH3 BGLs of the present invention may also have one or more mutations in the triosephosphateisomerase domain (spanning Leu 19 -Ser 356 in AaBGLI ).
  • the amino acid residue at position 140 (using the A. niger or aculeatus BGL numbering) is replaced with an amino acid residue that is less likely to form a hydrogen bond.
  • Certain amino acid residues are more likely to form a hydrogen bond such as Glutamine, Asparagine, Histidine, Serine Threonine, Tyrosine Cysteine, Methionine, Tryptophan, Aspartate, Glutamate, Glycine.
  • Amino acids other than the above are preferred for position 140. More preferred amino acid residues are R, K, A, L, I, F, V or P.
  • Useful GH3 BGLs of the present invention may also have one or more mutations in the BGL ⁇ / ⁇ sandwich domain (spanning Gin 385 - Gly 588 in AaBGL.1) such as those disclosed in FIGs. 14A-U.
  • each X in the consensus sequences is defined as being any amino acid, or absent when this position is absent in one or more of the orthologues presented in the alignment.
  • each X in the consensus sequences is defined as being any amino acid that constitutes a conserved or semi-conserved substitution of any of the amino acid in the corresponding position in the orthologues presented in the alignment, or absent when this position is absent in one or more of the orthologues presented in the alignment.
  • each X refers to any amino acid belonging to the same class as any of the amino acid residues in the corresponding position in the orthologues presented in the alignment, or absent when this position is absent in one or more of the orthologues presented in the alignment. In another embodiment, each X refers to any amino acid in the corresponding position of the orthologues presented in the alignment, or absent when this position is absent in one or more of the orthologues presented in the alignment.
  • enzymes in accordance with the present invention include enzymes having the specific nucleotide or amino acid sequences described in FIGs. 7-8 and 13-20, or an amino acid sequence that satisfies any of the consensuses as defined above (e.g., FIGs. 15-20), including translated and secreted forms thereof.
  • a secreted form of an enzyme is devoid of signal peptide. Predicted secreted forms are sometimes shaded or delineated with vertical line in attached FIGs. 7-8 and 13-20)) wherein the one or more Xs are defined as above.
  • secreted forms of the enzymes of the present invention includes predicted secreted forms as illustrated herein, namely polypeptides as set forth in SEQ ID NOs: 45-62, 145-162, amino acid residues at positions 199-1 156 of SEQ ID NO: 163, amino acid residues at positions 198-1 147 of SEQ ID NO: 164, amino acid residues at positions 37-902 of SEQ ID NO: 165, amino acid residues at positions 37-901 of SEQ ID NO:166 and amino acid residues at positions 32-874 of SEQ ID NO: 167.
  • Enzymes in accordance with the present invention may also include amino acid sequence that satisfy consensus sequences of catalytic domains of these enzymes.
  • Enzyme sequences in accordance with the present invention also include the specific sequences described in FIGs. 7-8 and 13-20with up to 10 amino acids (9, 8, 7, 6, 5, 4, 3, 2 or 1 ) truncated at the N- and/or C-terminal thereof.
  • the GH3 BGL1 of the present invention is derived from Umay2: Ustilago maydis (SEQ ID NO: 37); Ccinl: Coprinopsis cinerea (also designated herein Ccin 0.120163) (SEQ ID NO: 30); Umayl: Ustilago maydis (SEQ ID NO: 29); Rory5: Rhizopus oryzae (SEQ ID NO: 36); Pbla Phycomyces blaskesleeanus (SEQ ID NO: 28); Pbla2: Phycomyces blaskesleeanus (SEQ ID NO: 31); Rory Rhizopus oryzae (SEQ ID NO: 32); Rory2: Rhizopus oryzae (SEQ ID NO: 33); Rory3: Rhizopus oryzae (SEQ ID NO: 34); RoryA: Rhizopus oryzae (SEQ ID NO: 35); Anid: Apergillus nidulans (SEQ ID NO: 24); Af
  • the BGL1 is derived from Aspergillus niger and is as disclosed in FIGs. 14A-U or is an enzymatically active (i.e. has GH3 BGL activity) variant thereof.
  • the "reference amino acid residue” and the "corresponding reference amino acid residue” in the contexts of domains (e.g., coordinating loop domain, triosephosphateisomerase domain, ⁇ / ⁇ sandwich domain and subdomains) of GH3 BGL polypeptides of the present invention refer to the amino residue present at the specified position of any such domain in any native fungus (e.g., filamentous fungus) GH3 BGL, including but not limited to the native fungi listed herein.
  • the reference coordinating loop domain, the reference triosephosphateisomerase domain, the reference ⁇ / ⁇ sandwich domain and subdomains correspond to any of the sequences as set forth in the boxes as shown in FIGs. 16-20.
  • the enzymes could also be modified for better expression/stability/yield in the host cell (e.g., replacing the native N- terminal membrane-spanning domain by the N-terminal membrane-spanning domain from another plant or yeast gene (e.g., Laduca sativa (lettuce) germacrene A oxidase) or from a yeast ER bound protein (e.g., ergl or erg8); using a heterologous signal peptide promoting increased expression (ex. MFa pre sequence and mutants thereof and native sequences of orthologues).
  • yeast gene e.g., Laduca sativa (lettuce) germacrene A oxidase
  • yeast ER bound protein e.g., ergl or erg8
  • heterologous signal peptide promoting increased expression (ex. MFa pre sequence and mutants thereof and native sequences of orthologues).
  • useful signal peptides for the present invention include (Mrfpsiftavlfaassalaapvnt (MFa native) (SEQ ID NO: 9); Mrfpsiftavlfaassalaapant (MFa mutant) (SEQ ID NO: 168); Mrfpsiftavlfaassalaapdnt (MFa mutant) (SEQ ID NO: 169); Mrfpsifttvlfaassalaatvnt (MFa mutant) (SEQ ID NO: 170); Mrfpsiftavlfaassalaasvnt (MFa mutant) (SEQ ID NO: 171 ); Mrfpsiftavlfaassalaatvnt (MFa mutant) (SEQ ID NO: 172); Mrftlieavaltavslasade (Anig) (SEQ ID NO: 173); Mrfgw
  • signalP http://www.cbs.dtu.dk/services/SignalP/. (See also refs 96 and 99); codon optimization for expression in the heterologous host; using a heterologous C-terminal sequence that may comprise or not a poly-histidine fragment (e.g., gsaagsgefmskgeel (SEQ I D NO: 190) or gsaagsgefmskgeelhhhhhh (SEQ ID NO: 1 1 )); use of different combinations of promoter/terminators for optimal coexpression of multiple enzymes; spatial colocalization of sequential enzymes using a linker system or organelle-specific membrane domain.
  • useful enzymes are as shown in FIGs. 7-8 and 13-20.
  • a substantially identical sequence may comprise one or more conservative amino acid mutations. It is known in the art that one or more conservative amino acid mutations to a reference sequence may yield a mutant peptide with no substantial change in physiological, chemical, or functional properties compared to the reference sequence; in such a case, the reference and mutant sequences would be considered "substantially identical" polypeptides.
  • Conservative amino acid mutation may include addition, deletion, or substitution of an amino acid; a conservative amino acid substitution is defined herein as the substitution of an amino acid residue for another amino acid residue with similar chemical properties (e.g., size, charge, or polarity).
  • a conservative mutation may be an amino acid substitution.
  • a conservative amino acid substitution is defined herein as the substitution of an amino acid residue for another amino acid residue with similar chemical properties (e.g., size, charge, or polarity).
  • Such a conservative amino acid substitution may be a basic, neutral, hydrophobic, or acidic amino acid for another of the same group (See e.g., Table I below).
  • basic amino acid it is meant hydrophilic amino acids having a side chain pK value of greater than 7, which are typically positively charged at physiological pH.
  • Basic amino acids include histidine (His or H), arginine (Arg or R), and lysine (Lys or K).
  • neutral amino acid also “polar amino acid”
  • hydrophilic amino acids having a side chain that is uncharged at physiological pH, but which has at least one bond in which the pair of electrons shared in common by two atoms is held more closely by one of the atoms.
  • Polar amino acids include serine (Ser or S), threonine (Thr or T), cysteine (Cys or C), tyrosine (Tyr or Y), asparagine (Asn or N), and glutamine (Gin or Q).
  • hydrophobic amino acid (also “non-polar amino acid”) is meant to include amino acids exhibiting a hydrophobicity of greater than zero according to the normalized consensus hydrophobicity scale of Eisenberg (1984). Hydrophobic amino acids include proline (Pro or P), isoleucine (He or I), phenylalanine (Phe or F), valine (Val or V), leucine (Leu or L), tryptophan (Trp or W), methionine (Met or M), alanine (Ala or A), and glycine (Gly or G).
  • “Acidic amino acid” refers to hydrophilic amino acids having a side chain pK value of less than 7, which are typically negatively charged at physiological pH.
  • Acidic amino acids include glutamate (Glu or E), and aspartate (Asp or D). Certain amino acid residues are more likely to form a hydrogen bond such as Glutamine, Asparagine, Histidine, Serine Threonine, Tyrosine Cysteine, Methionine, Tryptophan, Aspartate, Glutamate, and Glycine.
  • a semi-conserved amino acid replaces one residue with another one that has similar steric conformation, but does not share chemical properties.
  • Examples of semi-conservative substitutions would include substituting cysteine for alanine or leucine; substituting serine for asparagine; substituting valine for threonine; or substituting proline for alanine.
  • Sequence identity is used to evaluate the similarity of two sequences; it is determined by calculating the percent of residues that are the same when the two sequences are aligned for maximum correspondence between residue positions. Any known method may be used to calculate sequence identity; for example, computer software is available to calculate sequence identity. Without wishing to be limiting, sequence identity can be calculated by software such as NCBI BLAST2, BLAST-P, BLAST-N, COBALT or FASTA-N, CLUSTAL OMEGA or any other appropriate software/tool that is known in the art (Johnson M, ef a/. (2008) Nucleic Acids Res. 36: W5-W9; Papadopoulos JS and Agarwala R (2007) Bioinformatics 23: 1073-79).
  • enzyme sequences in accordance with the present invention include enzymes with amino acid sequences having high percent identities (e.g., at least 39%, 40%, 41 %, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, 55%, 56%, 57%, 58%, 59%, 60%, 61 %, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71 %, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81 %, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91 %, 92%, 93%, 94%, 95%, 96%, 97%, 98% and 99% identity
  • the present invention also relates to nucleic acids comprising nucleotide sequences encoding the above-mentioned enzymes.
  • the nucleic acid may be codon-optimized.
  • the nucleic acid can be a DNA or an RNA.
  • the nucleic acid sequence can be deduced by the skilled artisan on the basis of the disclosed amino acid sequences.
  • the nucleic acid encodes one of the amino acid sequences as presented in any one of FIGs. 7-8 and 13-20 (orthologues and/or consensuses).
  • the nucleic acid for one or more enzymes is as shown in FIG.
  • the present invention also encompasses vectors (plasmids) comprising the above-mentioned nucleic acids.
  • the vectors can be of any type suitable, e.g., for expression of said polypeptides or propagation of genes encoding said polypeptides in a particular organism.
  • the organism may be of eukaryotic or prokaryotic origin (e.g., yeast).
  • the specific choice of vector depends on the host organism and is known to a person skilled in the art.
  • the vector comprises transcriptional regulatory sequences or a promoter operably-linked to a nucleic acid comprising a sequence encoding an enzyme involved in the saccharolytic pathway of the invention.
  • a first nucleic acid sequence is "operably-linked" with a second nucleic acid sequence when the first nucleic acid sequence is placed in a functional relationship with the second nucleic acid sequence.
  • a promoter is operably-linked to a coding sequence if the promoter affects the transcription or expression of the coding sequence.
  • operably-linked DNA sequences are contiguous and, where necessary to join two protein coding regions, in reading frame.
  • enhancers generally function when separated from the promoters by several kilobases and intronic sequences may be of variable lengths, some polynucleotide elements may be operably-linked but not contiguous.
  • Transcriptional regulatory sequences or “transcriptional regulatory elements” are generic terms that refer to DNA sequences, such as initiation and termination signals (terminators), enhancers, and promoters, splicing signals, polyadenylation signals, etc., which induce or control transcription of protein coding sequences with which they are operably-linked.
  • Plasmids useful to express the enzymes of the present invention include the modified centromeric plasmids pGREG503 (FIGs. 12A-C; SEQ ID NO: 19), pGREG504, pGREG505 and pGREG506 from the pGREG series 55 , the 2 ⁇ plasmids pYES2 (Invitrogen), pESC-leu2 derivative pESC-leu2d (Erhart E. and Hollenberg CP., J.Bacteriol 1983, p625), pGC550, pGC552, pGC1322, pBOT-TRP, pBOT-URA, pBOT-HIS and pBOT-LEU.
  • Yeast Artificial Chromosome able to clone fragments of 100-1000kpb could also be used to express multiple enzymes (e.g., 10).
  • Many other useful yeast expression vectors either autonomously replicating low copy-number vectors (YCp or centromeric) or autonomously replicating high copy-number vectors (YEp or 2 ⁇ ) are commercially available, e.g., from Invitrogen (www.lifetechnologies.com), the American Type Culture Collection (ATCC; www.atcc.org) or the Euroscarf collection (http://web.uni-frankfurt.de/fb15/mikro/euroscarf/).
  • Plasmids in accordance with the present invention may also include nucleic acid molecule(s) encoding one or more of the polypeptides as shown in FIGs. 7-8 and 13-20 (orthologues or consensuses).
  • Promoters useful to express the enzymes of the present invention include the constitutive promoters from the following S. cerevisiae CEN.PK2-1 D genes: glyceraldehyde-3-phosphate dehydrogenase 3 (PTDH3) (FIG. 12D, SEQ ID NO: 20), fructose 1 ,6-bisphosphate aldolase (PFBAI), pyruvate decarboxylase 1 (PPDCI) and plasma membrane H + - ATPase 1 (PPMAI) 5).
  • the inducible promoters from galactokinase (PGALI), UDP-glucose-4-epimerase (PGAUO) from pESC-leu2d are also useful for the present invention.
  • the present invention also encompasses using other available promoters (e.g., yeast promoters), with different strengths and different expression profiles.
  • yeast promoters e.g., yeast promoters
  • Examples are the PTEFI and PTEF2 promoters from the translational elongation factor EF-1 alpha paralogs TEF1 and TEF2; promoters of gene coding for enzymes involved in glycolysis such as 3-phosphoglycerate kinase (PPGKI), pyruvate kinase ( ⁇ ), triose- phosphate isomerase (PTPM), glyceraldehyde-3-phosphate dehydrogenase (PTDH2), enolase II (PENO2) or hexose transporter 9 ( ⁇ ).
  • Other useful promoters in accordance with the present invention encompass those found through the promoter database of S. cerevisiae (http://rulai.cshl.edu/cgi-bin/SCPD/getgenelist).
  • Terminators useful for the present invention include terminators from the following S. cerevisiae CEN.PK2_1 D genes: cytochrome C1 (TCYCI) (FIG. 12E, SEQ ID NO: 21 ), alcohol dehydrogenase 1 (TADHI), phosphoglucoisomerase 1 glucose-6-phosphate isomerase (TPGM).
  • TCYCI cytochrome C1
  • TADHI alcohol dehydrogenase 1
  • TPGM phosphoglucoisomerase 1 glucose-6-phosphate isomerase
  • the present invention also encompasses using other suitable yeast terminators, e.g., terminators from genes encoding for enzymes involved in glycolysis and gluconeogenesis such as alcohol dehydrogenase 1 (TADH2), enolase II (TENO2), fructose 1 ,6-bisphosphate aldolase (TFBAI), glyceraldehyde-3- phosphate dehydrogenase (TTDH2) and triose-phosphate isomerase (TTPM).
  • Other useful terminators in accordance with the present invention encompass those found from genes indicated in the promoter database of S. cerevisiae (http://rulai.cshl.edu/cgi-bin/SCPD/getgenelist).
  • heterologous coding sequence refers herein to a nucleic acid molecule that is not normally produced by the host cell in nature.
  • a recombinant expression vector (plasmid) comprising a nucleic acid sequence of the present invention may be introduced into a cell, e.g., a host cell, which may include a living cell capable of expressing the protein coding region from the defined recombinant expression vector.
  • a cell e.g., a host cell, which may include a living cell capable of expressing the protein coding region from the defined recombinant expression vector.
  • beta-glucosidase expression in the cell is under the control of a heterologous promoter.
  • the present invention also relates to cells (host cells) comprising the nucleic acid and/or vector as described above.
  • the suitable host cell may be any cell of eukaryotic (e.g., fungal (e.g., yeast or mold), algal, plant) or prokaryotic (bacterial) origin that is suitable, e.g., for expression of the enzymes or propagation of genes/nucleic acids encoding said enzyme.
  • eukaryotic e.g., fungal (e.g., yeast or mold), algal, plant
  • prokaryotic bacterial origin
  • the specific choice of cell line is routinely determined by a person skilled in the art.
  • the terms "host cell” and “recombinant host cell” are used interchangeably herein. Such terms refer not only to the particular subject cell, but also to the progeny or potential progeny of such a cell.
  • Vectors can be introduced into cells via conventional transformation or transfection techniques.
  • transformation and “transfection” refer to techniques for introducing foreign nucleic acid into a host cell, including calcium phosphate or calcium chloride co-precipitation, DEAE-dextran-mediated transfection, lipofection, electroporation, microinjection and viral-mediated transfection. Suitable methods for transforming or transfecting host cells can for example be found in Sambrook ef a/., Sambrook and Russell and other laboratory manuals. Methods for introducing nucleic acids into mammalian cells in vivo are also known, and may be used to deliver the vector DNA of the invention to a subject for gene therapy.
  • Suitable fungal host cells include, but are not limited to, Ascomycota, Basidiomycota, Deuteromycota, Zygomycota, Fungi imperfecti.
  • Particularly preferred fungal host cells are yeast cells and filamentous fungal cells.
  • the filamentous fungal host cells of the present invention include all filamentous forms of the subdivision Eumycotina and Oomycota. (see, for example, Hawksworth et al., In Ainsworth and Bisby's Dictionary of the Fungi, 8.sup.th edition, 1995, CAB International, University Press, Cambridge, UK, which is incorporated herein by reference).
  • Filamentous fungi are characterized by a vegetative mycelium with a cell wall composed of chitin, cellulose and other complex polysaccharides.
  • the host cell is a cell of a Myceliophthora species, such as Myceliophthora thermophila.
  • the filamentous fungal host cell may be a cell of a species of, but not limited to Achlya, Acremonium, Aspergillus, Aureobasidium, Bjerkandera, Ceriporiopsis, Cephalosporium, Chrysosporium, Cochliobolus, Corynascus, Cryphonectria, Cryptococcus, Copnnus, Coriolus, Diplodia, Endothis, Fusarium, Gibberella, Gliocladium, Humicola, Hypocrea, Myceliophthora, Mucor, Neurospora, Penicillium, Podospora, Phlebia, Piromyces, Pyricularia, Rhizomucor, Rhizopus, Schizophyllum, Scytalidium, Sporotrichum, Talaromyces, Thermoascus, Thielavia, Trametes, Tolypocladium, Trichoderma, Verticillium, Volvariella, or teleo
  • the host cells can be a yeast, a mold or a bacterium (£. coli).
  • it can be an Aspergillus, such as Aspergillus Niger or any other mold as listed in FIGs. 13-15 for example, Saccharomycetaceae such as a Saccharomyces, Pichia or Zygosaccharomyces.
  • Saccharomyces such as Saccharomyces, Pichia or Zygosaccharomyces.
  • it can be a Saccharomyces.
  • it can be a Saccharomyces cerevisiae (S. cerevisiae).
  • the present invention encompasses the use of yeast strains that are aploid, and contain auxotropies for selection that facilitate the manipulation with plasmid.
  • Yeast strains that can be used in the invention include, but are not limited to, CEN.PK, S288C, W303, A363A and YPH499, strains derived from S288C (FY4, DBY12020, DBY12021 , XJ24-249) and strains isogenic to S288C (FY1679, AB972, DC5).
  • the identity and genotype of additional examples of yeast strains can be found at EUROSCARF, available through the World Wide Web at web.uni-frankfurt.de/fb15/mikro/euroscarf/col_index.html or through the Saccharomyces Genome Database (www.yeastgenome.org).
  • the yeast strain is any CEN.PK1 10-10C (MATa his3 M MAL2-8C SUC2) or any other strain used herein or any of their single, double or triple auxotrophs derivatives.
  • the particular strain of yeast cell is S288C (MAT alpha SUC2 mal mel gal2 CUP1 flol flo8-1 hapl), which is commercially available.
  • the particular strain of yeast cell is W303.alpha (MAT.alpha; his3-1 1 , 15 trp 1 -1 leu2-3 ura3-1 ade2-1), which is commercially available.
  • the above-mentioned nucleic acid or vector may be delivered to cells in vivo (to induce the expression of the enzymes and generates ethanol in accordance with the present invention) using methods well known in the art such as direct injection of DNA, receptor-mediated DNA uptake, viral-mediated transfection or non-viral transfection and lipid based transfection.
  • Direct injection has been used to introduce naked DNA into cells in vivo.
  • a delivery apparatus e.g., a "gene gun" for injecting DNA into cells in vivo may be used.
  • Such an apparatus may be commercially available (e.g., from BioRad).
  • Naked DNA may also be introduced into cells by complexing the DNA to a cation, such as polylysine, which is coupled to a ligand for a cell-surface receptor. Binding of the DNA-ligand complex to the receptor may facilitate uptake of the DNA by receptor-mediated endocytosis.
  • a DNA-ligand complex linked to adenovirus capsids which disrupt endosomes, thereby releasing material into the cytoplasm may be used to avoid degradation of the complex by intracellular lysosomes.
  • the present invention encompasses compositions comprising at least one GH3 BGL enzyme of the present invention (in any of its various forms) with one or more optional ingredients (e.g., excipient carriers).
  • the optional ingredients may be, without being so limited, a buffer, a surfactant, and/or a scouring agent.
  • a buffer may be used with a GH3 BGL polypeptide of the present invention (optionally combined with other cellulases, including one or more other BGLs) to maintain a desired pH within the solution in which the GH3 BGL is employed. The exact concentration of buffer employed will depend on several factors which the skilled artisan can determine. Suitable buffers are well known in the art.
  • a surfactant may further be used in combination with the GH3 BGL of the present invention. Suitable surfactants include any surfactant compatible with the GH3 BGL and optional other enzymes (e.g., cellulases) being utilized. Exemplary surfactants include an anionic, a non-ionic, and ampholytic surfactants.
  • Suitable anionic surfactants include, but are not limited to, linear or branched alkylbenzenesulfonates; alkyl or alkenyl ether sulfates having linear or branched alkyl groups or alkenyl groups; alkyl or alkenyl sulfates; olefinsulfonates; alkanesulfonates, etc.
  • Suitable counter ions for anionic surfactants include, for example, alkali metal ions, such as sodium and potassium; alkaline earth metal ions, such as calcium and magnesium; ammonium ion; and alkanolamines having from 1 to 3 alkanol groups of carbon number 2 or 3.
  • Ampholytic surfactants suitable for use in the practice of the present invention include, for example, quaternary ammonium salt sulfonates, betaine-type ampholytic surfactants, etc.
  • Suitable non-ionic surfactants generally include polyoxalkylene ethers, as well as higher fatty acid alkanolamides or alkylene oxide adduct thereof, fatty acid glycerine monoesters, etc. Mixtures of surfactants can also be employed as is known in the art.
  • the GH3 beta-glucosidase enzymes compositions of the present invention may be in the form of an aqueous solution or of a solid (e.g., powder). When aqueous solutions are employed, the beta-glucosidase solution can easily be diluted to allow accurate concentrations.
  • a concentrate can be in any form recognized in the art including, for example, liquids, emulsions, suspensions, gel, pastes, granules, powders, an agglomerate, a solid disk, as well as other forms that are well known in the art.
  • Other materials can also be used with or included in the beta-glucosidase composition of the present invention as desired, including stones, pumice, fillers, solvents, enzyme activators, and anti-redeposition agents depending on the intended use of the composition.
  • compositions of the present invention can also be combined with other enzymes (e.g., useful to degrade cellulosic material and/or to produce sugar (e.g., glucose)).
  • other enzymes e.g., useful to degrade cellulosic material and/or to produce sugar (e.g., glucose)
  • compositions of the present invention may further include enzyme cocktails such as Novozymes Celluclast® 1.5L, Cellulase from Trichoderma reesei ATCC 2692; Novozymes Carezyme 1000L®, Cellulase from Aspergillus sp.; or Novozymes Viscozyme® L cellulolytic enzyme mixture.
  • compositions may include, in addition to at least one GH3 BGL of the present invention, at least two different enzyme types, namely (1) endoglucanase, which cleaves internal beta-1 ,4 linkages resulting in shorter glucooligosaccharides, and (2) cellobiohydrolase, which acts in an "exo” manner processively releasing cellobiose units (beta-1 ,4 glucose-glucose disaccharide).
  • endoglucanase which cleaves internal beta-1 ,4 linkages resulting in shorter glucooligosaccharides
  • cellobiohydrolase which acts in an "exo” manner processively releasing cellobiose units (beta-1 ,4 glucose-glucose disaccharide).
  • Enzymes (e.g., GH3 BGL polypeptides) of the present invention have GH3 BGL activity.
  • GH3 BGL activity is meant to refer to the ability to catalyze the hydrolysis of the 1 ,4-beta-D-glycosidic linkages in cellulose. In specific embodiments, it also refers to at least one of reduced transglycosidation reaction, increased resistance to product inhibition and increased resistance to reduced resistance to substrate inhibition.
  • GH3 BGL enzymes of the present invention may be used in the production of monosaccharides, disaccharides, or oligomers of a mono- or di-saccharide as chemical or fermentation feedstock from biomass.
  • biomass refers to biological material that contains a polysaccharide substrate, such as, for example, cellulose, starch, etc.
  • the present invention hence provides a method of converting a biomass substrate into a fermentable sugar, the method comprising contacting a (i) GH3 BGL polypeptide according to the invention; (ii) a cell expressing the GH3 BGL polypeptide; (iii) a culture medium containing the GH3 BGL polypeptide; (iv) a cell lysate containing the GH3 BGL polypeptide; or (v) a composition comprising any one of (i) to (iv), with the biomass substrate (e.g., cellobiose) under conditions suitable for the production of the fermentable sugar.
  • the biomass substrate e.g., cellobiose
  • the present invention further provides a method of converting a biomass substrate to a fermentable sugar by (a) pretreating a cellulosic material to increase its susceptibility to hydrolysis; (b) contacting the pretreated cellulosic material of step (a) with a cell, culture medium or cell lysate or composition containing/expressing at least one GH3 BGL polypeptide of the present invention (and optionally at least one other enzyme such as cellulases) under conditions suitable for the production of the fermentable sugar.
  • the biomass includes cellulosic substrates including but not limited to, wood, wood pulp, paper pulp, paper and pulp processing waste, corn stover, corn fiber, rice, rice hulls, woody or herbaceous plants, fruit or vegetable pulp, distillers grain, wheat straw, cotton, hemp, flax, sisal, corn cobs, sugar cane bagasse, grasses, switch grass and mixtures thereof.
  • the biomass may optionally be pretreated to increase the susceptibility of cellulose to hydrolysis using methods known in the art such as chemical, physical and biological pretreatments (e.g., steam explosion, pulping, grinding, acid hydrolysis, solvent exposure, etc., as well as combinations thereof).
  • the biomass comprises transgenic plants that express ligninase and/or cellulase enzymes which degrade lignin and cellulose.
  • the biomass may include cellobiose and/or may be treated enzymatically to generate cellobiose for conversion to a soluble sugar (e.g., glucose).
  • substrate refers to any substrate that the BGL of the present invention can hydrolyze directly or indirectly.
  • BGL substrate includes cellulosic (e.g., lignocellulosic) material (cellulose; hemicellulose; and cellulose hydrolysate; or any soluble cellodextrin including cellobiose, cellotriose, cellotetraose; cellopentaose; cellohexaose etc.); and synthetic substrates such as p-nitrophenyl- -D-glucopyranoside (pNPG)).
  • cellulosic e.g., lignocellulosic
  • pNPG p-nitrophenyl- -D-glucopyranoside
  • the GH3 BGL polypeptide/enzyme and GH3 BGL polypeptide-containing compositions, cell culture media, and cell lysates may be reacted with the biomass or pretreated biomass at a temperature in the range of about 25°C to about 100°C, about 30°C to about 90°C, about 30°C to about 80°C, about 40°C to about 80°C and about 35°C to about 75°C
  • the biomass may be reacted with the GH3 BGL polypeptides and GH3 BGL polypeptide-containing compositions, cell culture media, and cell lysates at a temperature about 25°C, at about 30°C, at about 35°C, at about 40°C, at about 45°C, at about 50°C, at about 55°C, at about 60°C, at about 65°C, at about 70°C, at about 75°C, at about 80°C, at about 85°C, at about 90°C, at about 95°C and at about 100°C
  • reaction times for converting a particular biomass substrate to a fermentable sugar may vary but the optimal reaction time can be readily determined.
  • Exemplary reaction times may be in the range of from about 1.0 to about 240 hours, from about 5.0 to about 180 hours and from about 10.0 to about 150 hours.
  • the incubation time may be at least 1 hour, at least 5 hours, at least 10 hours, at least 15 hours, at least 25 hours, at least 50 hours, at least 100 hours, and at least 180 hours.
  • the soluble sugars produced by the methods of the present invention may be used to produce an alcohol (such as, for example, ethanol, butanol, etc.).
  • the present invention therefore provides a method of producing an alcohol, wherein the method comprises (a) providing a fermentable sugar produced using a GH3 BGL polypeptide of the present invention in one of the methods described supra; (b) contacting the fermentable sugar with a fermenting microorganism to produce the alcohol or other metabolic product; and (c) recovering the alcohol or other metabolic product.
  • the GH3 BGL polypeptide of the present invention or composition, cell culture medium, or cell lysate containing the GH3 BGL polypeptide may be used to catalyze the hydrolysis of a biomass substrate to a fermentable sugar in the presence of a fermenting microorganism (e.g., yeast, mold, etc.), to produce an end-product such as ethanol.
  • a fermenting microorganism e.g., yeast, mold, etc.
  • the fermentable sugars e.g., glucose and/or xylose
  • the soluble sugars produced by the use of a GH3 BGL polypeptide of the present invention may also be used in the production of other end-products such as, for example, acetone, an amino acid (e.g., glycine, lysine, etc.), an organic acid (e.g., lactic acid, etc.), glycerol, a diol (e.g., 1 ,3 propanediol, butanediol, etc.) and animal feeds.
  • an amino acid e.g., glycine, lysine, etc.
  • organic acid e.g., lactic acid, etc.
  • glycerol e.g., 1,3 propanediol, butanediol, etc.
  • animal feeds e.g., 1,3 propanediol, butanediol, etc.
  • yeast cultures were grown at 30°C in YPD or yeast nitrogen base (YNB) supplemented as required to maintain the auxotrophic selection marker.
  • Plasmids were constructed by in vivo homologous DNA recombination using a pGREG503 derivative with a unique Kpn ⁇ site and the HIS3 auxotrophic marker (87). The plasmids were assembled into S. cerevisiae by co-transforming linearized plasmid and DNA fragments using the lithium acetate/carrier DNA method.
  • DNA parts were designed with at least 50-bp regions of homology to mediate recombination. Transformants were cultured on solid media for selection (YNB+2% glucose containing 1.5% agar and supplemented with synthetic dropout media without histidine). After transformation, assembled plasmids were extracted from S. cerevisiae and propagated in E. coli for sequencing. Verified constructs were transformed back into S. cerevisiae for subsequent experiments. DNA fragments used in assembling plasmids were amplified by PCR using PhusionTM High-Fidelity DNA polymerase (Thermo Scientific) and primers listed in Table II below.
  • full-length genes were constructed by PCR overlap extension using DNA parts amplified with primers containing the desired nucleotide substitutions.
  • the DNA parts used for in vivo recombination were purified by gel extraction with the GeneJETTM Gel Extraction Kit (Thermo Scientific). Plasmids were purified from E. coli and S. cerevisiae using the GeneJETTM plasmid mini-prep kit (Thermo Scientific). Yeast cells were treated with lytic enzyme (MP Biomedicals) for one hour prior to the lysis step for plasmid extractions.
  • CTCCAGTCAACACTTTGGCCTACTCCCCGCCGTATTA encoding MFa pre sequence to A C (SEQ ID NO: 197) niger bgl1 gene using pKL012 as a
  • GGCCTACTCC (SEQ ID NO: 199) mutation. Encodes P21 T.
  • GGCCTACTCC (SEQ ID NO: 200) mutation. Encodes P21 S.
  • CTACTCCCCGCCG (SEQ ID NO: 201 ) mutation. Encodes V22A.
  • CTACTCCCCGCCG (SEQ ID NO: 202) mutation. Encodes V22D.
  • KL1 12 GTGACAAGGGTGCTGATATCCtATTGGGTCCAGCTG Forward primer to generate
  • TGGCACGTC SEQ ID NO: 205) 917T>C mutation. Encodes V303A.
  • TGGAGCAGAC SEQ ID NO: 207 1 180T>C mutation. Silent.
  • KL1 18 GATCAGATTGAGGCGCTTGCTcAGACCGCCAGTGTC Forward primer to generate
  • ACGCCGACTC (SEQ ID NO: 210) 1506T>A mutation. Silent.
  • CGCAGGAACC (SEQ ID NO: 211 ) 1557A>T mutation.
  • Silent. KL122 CTCTGTCGGCCCAGTCTTGGTTgACGAGTGGTACGA Forward primer to generate CAACCCCAATG (SEQ ID NO: 212) 1678A>G mutation.
  • AAGATTAC SEQ ID NO:214 1818T>C mutation. Silent.
  • ATCACCAAGTTC (SEQ ID NO: 217) 2103T>A mutation. Silent.
  • GTTTCTCTTGG (SEQ ID NO: 218) 2349T>A mutation. Silent.
  • CCCTTGTCAC (SEQ ID NO: 223) 428A>T mutation. Encodes Q140L.
  • GACATGTCC (SEQ ID NO: 225) 917T>C mutation.
  • V303A KL141 CCCAGTAAGACGTGCCACTGTCGcAATCGACGTCTC Reverse primer to generate CCGGCATAGAC (SEQ ID NO: 226) 923A>G mutation.
  • GTCTTAGCAAG (SEQ ID NO: 230) 1506T>A mutation. Silent.
  • TATAACCCTC SEQ ID NO: 231) 1557A>T mutation. Silent.
  • AAATCCGCGG (SEQ ID NO: 236) 1934A>T mutation.
  • GTCTTACTGGG (SEQ ID NO: 240) Y305F.
  • CCGGCATAGAC (SEQ ID NO: 241 ) Y305F.
  • GTCTTACTGGG (SEQ ID NO: 242) Y305W.
  • GTCTTACTGGG (SEQ ID NO: 244) Y305G.
  • CCGGCATAGAC (SEQ ID NO: 245) Y305G.
  • GTCTTACTGGG (SEQ ID NO: 246) Y305V.
  • CCGGCATAGAC (SEQ ID NO: 247) Y305V.
  • GTCTTACTGGG (SEQ ID NO: 248) Y305A.
  • CCGGCATAGAC (SEQ ID NO: 249) Y305A.
  • FIG. 1A Three plasmids were constructed to express BGL1 and assemble mutagenized gene libraries in S. cerevisiae (FIG. 1A).
  • the assembly plasmid, pKL022 (FIGs. 9A-C) was constructed by co-transforming Asc ⁇ /Kpn ⁇ linearized pGREG503 (FIGs. 12A-C), the ScPjom promoter (FIG. 12D), and the SCTCYCI terminator (FIG. 12E) into S. cerevisiae.
  • the promoter and terminator were amplified by PCR using pGREG503 based constructs as templates such that the 5' end of the SCPTDH3 promoter and the 3' end of the SCTCYCI terminator had at least a 50 bp of homology with the linearized plasmid.
  • the region between the SCPTDH3 promoter and the Sc TCYCI terminator was designed with an 86 nucleotide sequence that included a Xhol restriction site, linker, His6 tag, and a stop codon.
  • a second plasmid, pKL024 (FIGs.
  • cerevisiae MFcr pre peptide by replacing the first 21 codons from the wild type gene with the first 24 codons from the MFa gene to construct an a-bgl1 fusion gene (experimental wild type).
  • the primers used to amplify bgl1 were designed to replace the native signal sequence with the MFa sequence at the 5' end of the gene, and the linker region of the pKL022 and pKL024 plasmids at the 3' end (FIG. 1A).
  • PCR amplification of the a-bgl1 gene from pKL029 with the primer pair KL50 and KL51 produced a DNA fragment that would recombine with Xhol linearized pKL022 when co-transformed in S. cerevisiae, and would express a functionally secreted BGL (see sequences in FIGs. 7-8).
  • Error-prone PCR was used to generate sequence diversity of the a-bgl1 gene.
  • the region between the start codon and the linker region of pKL029 was mutagenized by PCR using Taq polymerase under high MgC conditions (5 mM), with an unbalanced dNTP mixture (0.2 mM dATP, 0.2 mM dGTP, 1 mM dTTP, and 1 mM dCTP) and in the presence of MnC (0.15 mM) using the KL50/KL51 primer pair.
  • the PCR products from several reactions were pooled, mixed with Xhol linearized pKL022 and transformed into S.
  • Improved AnBGL.1 s were identified using a growth selection with cellobiose as the sole carbon source followed by an endpoint activity assay using p-nitrophenyl- -D-glucopyranoside (pNPG) as a substrate.
  • pNPG p-nitrophenyl- -D-glucopyranoside
  • the cell density for growth on solid media containing cellobiose was optimized to approximately 13 cells/cm 2 . Cells were also grown on solid YNB+2% glucose for comparison.
  • the O.D.600 was measured for each culture and 50 ⁇ of supernatant was transferred to 150 ⁇ of a 66.6 mM citrate buffer, pH 5.0, containing 2.66 mM pNPG in 96 well plates. Reactions were incubated at room temperature for 30 minutes and quenched with 20 ⁇ of 1 M NaOH. The amount of p-nitophenol (pNP) released was determined by measuring the absorbance at 405 nm using the extinction coefficient 18 mlVHcnr 1 and normalized to cell density. Strains with BGL activities greater than WT+2a were streaked on solid YNB+2% glucose media, and three colonies were re-tested using the same endpoint assay. Strains producing BGLs that exceeded a WT+2SD threshold of activity were chosen for further analysis.
  • pNP p-nitophenol
  • mutant BGLs were compared to wild type a-BGL1 using a time course assay measuring the release of pNP from pNPG.
  • fresh YNB+2% glucose media supplemented with synthetic dropout media lacking histidine was inoculated with 0.5 ml of overnight culture and grown at 30°C for 48 hours. Cells were removed by centrifugation and the supernatants were filtered using a 0.2 ⁇ nylon membrane. Reactions were performed at 30°C in 2 ml microcentrifuge tubes by adding 200 ⁇ of 16 mM pNPG to a mixture of 1 ml of 80 mM citrate buffer, pH 5.0, and 400 ⁇ of culture supernatant.
  • Assays using cellobiose as the substrate were performed in 96 well plates at 30°C with a final reaction volume of 200 ⁇ in 5 mM citrate buffer, pH 5.0. Cellobiose was tested at a range from 2-75 mM (final concentration). a-BGL1 was also tested at 0.5 and 1 mM cellobiose (final concentration).
  • Reactions were started by adding 50 ⁇ of culture supernatant to 150 ⁇ of the citrate/cellobiose solution and stopped after 5 minutes by transferring 100 ⁇ of the mixture to 400 ⁇ of glucose assay solution (62.5 mM Tris-HCL, pH 8.3, 1.25 mM ATP, 1.875 mM NAD, 12.5 mM MgC , 12.5 U/ml hexokinase (Sigma), and 12.5 U/ml glucose-6-phosphate dehydrogenase (Sigma)) and incubated for 30 minutes at room temperature. The amount of glucose released was determined by measuring the amount of NADH produced at 340 nm, using the extinction coefficient 6220 M- 1 cnr 1 .
  • Reactions using 40 mM pNPG and 50 mM cellobiose as substrate were performed using the same conditions as for enzyme kinetics.
  • Ten ⁇ samples were removed from 200 ⁇ reactions at time intervals and stopped using 1 ⁇ of 1 M NaOH.
  • One ⁇ aliquots from each time interval were applied to a silica TLC plates (Whatman), eluted with a n- butanol, ethyl acetate, 2-propanol, acetic acid, and water (1 :3:2: 1 : 1) and developed as previously described (25).
  • Applicants targeted a ⁇ -glucosidase from A niger using directed evolution to adapt it towards heterologous expression and secretion in S. cerevisiae.
  • the strategy utilized the native homologous DNA recombination machinery in S. cerevisiae to assemble a library of mutagenized bgl1 genes, followed by a two-step selection to identify improved mutants. Because wild type S. cerevisiae lacks ⁇ -glucosidase activity, growth on cellobiose was used as the primary selection method. Then was employed an endpoint assay using pNPG as a substrate for quantitative measurement of ⁇ -glucosidase activity.
  • the a-bgl1 gene was mutagenized using error-prone PCR and transformants were cultured as a mutant pool in liquid media to maximize the library size. It was determined that the pooled library contained approximately 1.6-2 ⁇ 10 7 recombinant mutants by growing small quantities (0.0001 -0.1 %) of the transformation mixture on solid media immediately after electroporation. Approximately 3 ⁇ 10 5 variants, or 1.5-2% of the total library, were subsequently screened on solid media containing cellobiose as the sole carbon source. Ninety-five percent of the mutant AnBGL.1 library clones did not grow on cellobiose, showing that most mutations were deleterious. Colonies from the mutant pool varied in size whereas those expressing a-BGL1 showed no observable variation. This allowed colony size to be used as a semiquantitative screen for BGL activity, as has been previously reported (90-92).
  • the BGL activity secreted from cells originating from the largest colonies was measured using pNPG as the substrate and compared to cultures producing the BGL1 protein.
  • the mean activity of the mutant pool decreased while increasing in variability (FIG. 1 B), following the predicted trend for a library of mutagenized enzymes (93).
  • twelve mutants met the threshold cut-off for selection at wild type + 2SD (designated v3-v8, v10, v1 1 , v16, v18, v19 and v20) (FIG. 1 C).
  • mutant bgl1 genes each had 3-9 nucleotide substitutions (see Table III below), corresponding to a mutation rate of 1.6+0.4 bp per kb.
  • Certain mutations occurred in multiple variants (65T>A: v7 & v18; 65T>C: v4 & v19; 428A>T: v5 & v20 and 1707OT: v4 & v6).
  • Six variants had mutations encoding amino acid substitutions immediately following the predicted MFa signal peptide cleavage site at Pro 21 or Val 22 .
  • MFa was used as a signal peptide to standardize expression in the context of other tested BGLs.
  • the K494Q and N557D mutations did produce significant and reproducible improvements (1 17% and 1 19%, respectively), suggesting a cumulative effect that contributed to the 147% increase in activity observed for v16.
  • the A480V mutation (14480T) was characterized in the genetic context of v10 since the other mutations present in the gene (1 180OT and 1506T>A) were silent and did not produce any improvements when tested alone.
  • V22D/Y305C, Q140L/Y305C, Y305C/A480V double substitution mutants and a quadruple DLCV (V22D/Q140L/Y305C/A480V) mutant were engineered. Reaction velocities for all of the combinatorial mutants were modeled using the Michaelis- Menten equation and kinetics parameters were determined (see Table V below). As expected, the addition of V22D, Q140L and A480V to Y305C increased the appVmax for each enzyme.
  • the applicants mapped the positions of the mutations identified through directed evolution (FIG. 3A), using the available crystal structure of a GH3 BGL from Aspergillus aculeatus BGL1 (AaBGLI ) (72).
  • Transglycosidation reactions are based on the affinity for acceptor in the +1 subsite.
  • AnBGLI and other related fungal GH3 BGLs would adopt a three- dimensional structure similar to AaBGLI
  • AnBGLI shares 82% sequence identity with AaBGLI (83% for secreted form of AaBGLI ), and residues forming the active site are well conserved (FIGs. 3B and 3D).
  • the substrate binding pocket of GH3 BGLs is formed by highly conserved residues in the -1 and +1 subsites, though the distal subsites are less well conserved.
  • the Q140L mutation maps to a region of the triosephosphateisomerase domain (AaBGLI Leu 19 -Ser 356 ) and is 9.1 A from the closest substrate-binding residue (AaBGLI Trp 280 ) and is approximately 10 A from the ⁇ / ⁇ sandwich domain.
  • Q140L increased hydrolysis activity by 156% compared to a-BGL1 at 2 mM pNPG (see e.g., V5 in FIG. 1C).
  • the double substituted Q140UY305C variant had slightly higher K m and K, g i UC ose values compared to the Y305C background (see Table V), suggesting decreased affinity for substrate and product in the active site.
  • the m transgi wsida n for the Q140L would be slightly higher than wild type, as an increase in K, NPG was observed (a-BGL1 : 2.98 ⁇ 0.46 mM; Q140L: 3.41 ⁇ 0.21 mM) (see Table IV). Even though the Q140L mutation is not directly involved in forming the substrate binding pocket, the proximity of the active site could explain the change in K m , Kj glucose and K NPG values. In the AaBGLI crystal structure and a homology model of AnBGL.1 , the side-chain nitrogen of Gin 140 forms a hydrogen bond with backbone amide oxygen of Ser 93 .
  • the A480V, K494Q, and N557D mutations are located on the ⁇ / ⁇ sandwich domain (AaBGL.1 Gin 385 - Gly 588 ) that contributes two active site residues (AaBGL.1 Glu 509 and Tyr 511 ) and also mediates a protein-protein interaction between subunits of the functional dimer.
  • the A480V (AaBGL.1 He 480 ) mutation is located on the surface of each subunit buried between the dimer interface.
  • Both the K494Q and the N557D mutations are located on the surface of the molecule. Lys 494 is also proximal to the dimer interface, though it does not contact the opposite subunit directly.
  • the Y305C mutation maps to a short loop (AaBGLI Gly 294 - Gly 313 ) that inserts directly into the +1 subsite of AaBGLI along with Trp 68 and Tyr 511 .
  • the structure of AaBGLI in complex with thiocellobiose shows the contribution of Phe 305 in the substrate-binding pocket, where the ligand docks in a narrow cleft between the three +1 subsite residues. Since the Cys functional group is less bulky than Tyr, it is assumed that the +1 subsite of the A. niger Y305C mutant is more open and has a lower affinity for substrate.
  • EXAMPLE 5 Targeted mutagenesis
  • residue 305 controls substrate inhibition in BGL1 homologs
  • its relative location and effect on enzyme kinetics in other GH3 BGLs was investigated (FIG. 3D).
  • Phe is the most common residue at position 305 (9/16 sequences) and has a similar aromatic functional group as the Tyr residue from AnBGL.1.
  • the position was variable in the remaining seven homologs and included a Cys residue in the BGL from Ustilago maydis, which would suggest that its active site is similar to the Y305C mutant. It was sought to further test the functional significance of this position by constructing Y305F, Y305W, Y305G, Y305V, and Y305A A niger mutants.
  • the Phe residue had a very similar kinetics to the wild type Tyr residue, demonstrating inhibition at high substrate concentrations (see Table VI below).
  • the Y305F had a lower a ppK m , i.e. higher substrate affinity, and higher appKi than the wild type enzyme (See Table IV above), demonstrating a slight change in affinity at the +1 subsite.
  • Substituting Tyr 305 with Ala, Val or Gly produced kinetics similar to the Y305C mutant, demonstrating saturation at high substrate concentrations.
  • Sequence alignments identified a Trp (W) residue in the BGL from Aspergillus nidulans, but the Y305W mutant was non-functional and suggests that the bulky functional group blocked the substrate binding pocket.
  • the loop coordinating position 305 residue (Gly 294 -Gly 313 ) (FIG. 3C and FIG. 4B) contains a short variable region (residues 303-309) between two highly conserved sequences (Gly 294 -Asp 302 and Ser 310 -Gly 313 ).
  • the corresponding variable region from the A nidulans BGL ( 438 GLHWADG 444 (SEQ ID NO: 250) includes an additional residue. It is likely that the Trp residue does not occupy the +1 subsite in the A nidulans homolog and either a His or Ala is present in the substrate binging pocket of the enzyme. Alternatively, the additional residue in the A nidulans sequence could change the orientation of the Trp residue in the +1 subsite such that the substrate binding pocket is not blocked.
  • reaction kinetics was also investigated using the natural substrate for wild type and mutant enzymes.
  • the production of glucose was measured at different cellobiose concentrations for a-BGL1 , Y305C, Y305G, and DLCV enzymes and kinetic parameters were determined (see Table VII below).
  • the results were consistent with experiments using synthetic substrate, where the wild type enzyme showed inhibition at high cellobiose concentrations (FIG. 5A).
  • Y305C, Y305G, and DLCV were modelled using Michaelis-Menten kinetics, showing saturation at high substrate concentrations (FIG. 5B).
  • reaction products confirmed that a-BGL1 and the Y305F variant were inhibited at 40 mM pNPG, and a pNPG transglycosidation product was found to accumulate through the duration of the experiment. Without being limited by this hypothesis, this kinetic profile is likely caused by a competing transglycosidation reaction in which substrate also acts as an acceptor in the +1 subsite (8, 26). In contrast, both the Y305C and Y305G, which showed significantly less transglycosidation reactions, substituted variants completely, consumed the substrate within 3 hours. A transient transglycosidation species was detected, but glucose was the major product by 4 hours and shows that the transglycosidation reaction was reduced but not completely eliminated.
  • Cellulosic biomass hydrolysates can contain varying amounts of oligosaccharides and glucose depending how they were pre-treated and hydrolyzed.
  • Strains producing GH3 BGLs of the present invention (engineered BGL1 derivatives) are tested on a collection of hydrolysates to determine their performance compared to the corresponding native BGL1.
  • Enzyme cocktails contain varying supplemented amounts of BGL. Standard cocktails are tested with various amounts of GH3 BGLs of the present invention.
  • Beta-Glucosidase its role in cellulase synthesis and hydrolysis of cellulose. Int. J. Biochem. 14, 435-43
  • thermostable GH3 beta-glucosidase from Penicillium brasilianum. Appi. Microbioi. Biotechnoi. 86, 143-54
  • Trp-49 of the family 3 beta-glucosidase from Aspergillus niger is important for its transglucosidic activity: creation of novel beta-glucosidases with low transglucosidic efficiencies. Arch. Biochem. Biophys. 455, 1 10-8
  • PEREZ-PONS J. A., CAYETANO, A., REBORDOSA, X., LLOBERAS, J., GUASCH, A., and QUEROL
  • bgl3 A beta-glucosidase gene (bgl3) from Streptomyces sp. strain QM-B814. Molecular cloning, nucleotide sequence, purification and characterization of the encoded enzyme, a new member of family 1 glycosyl hydrolases. Eur. J. Biochem. 223, 557-565
  • thermophilic beta- glucosidase for cellulosic bioethanol production. Appl. Biochem. Biotechnol. 161, 301-12

Abstract

A glycoside hydrolase family 3 (GH3) beta-glucosidase (BGL) polypeptide, comprising: (a) a GH3 BGL triosephosphateisomerase domain comprising a sequence as set forth in KGY1Y2Y3Y4LGP (SEQ ID NO: 251); (b) a GH3 BGL coordinating loop domain comprising a sequence as set forth in: GLDMX1MPX2X3X4X5X6X7X8X9X19X11X12X13X14X15X16 (SEQ ID NO: 252), except that at least one of X1to X16, when present, is smaller than a corresponding reference amino acid residue in a corresponding GH3 BGL reference coordinating loop domain comprising a sequence as set forth in GLDMX1refMPX2refX3refX4refX5refX6refX7refX8refX9refX19refX11refX12refX13refX14refX15refX16ref (SEQ ID NO: 252); (c) a GH3 BGL β/α sandwich domain comprising a sequence as set forth in: Z1Z2Z3Z4Z5Z6 (SEQ ID NO: 253); and/or (d) a GH3 BGL β/α sandwich domain comprising a sequence as set forth in: A1A2A3A4A5 (SEQ ID NO: 254) and a sequence as set forth in B1B2B3B4B5 (SEQ ID NO: 255), with the proviso that the GH3 BGL polypeptide is not a polypeptide as set forth in any one of sequences of SEQ ID NOs: 22 to 44, and is not a secreted form of the polypeptide of any one of SEQ ID NOs: 22 to 44. The GH3 BGL has an improved resistance to substrate inhibition and/or product inhibition. It may be used in methods of converting biomass into a fermentable sugar.

Description

ENGINEERED B ETA-G LUCOSIDASES AND METHODS OF USE THEREOF CROSS REFERENCE TO RELATED APPLICATIONS
This application is a National Entry Application of PCT application Serial No PCT/CA2017/* filed on February 22, 2017 and published in English under PCT Article 21 (2), which itself claims benefit of U.S. provisional application Serial No. 62/302,379, filed on March 2, 2016. All documents above are incorporated herein in their entirety by reference.
STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT
N.A.
FIELD OF THE INVENTION
The present invention relates to engineered beta-glucosidases and methods of use thereof. More specifically, the present invention is concerned with engineered beta-glucosidases displaying increased resistance to substrate and/or product inhibition and methods of use thereof.
REFERENCE TO SEQUENCE LISTING
Pursuant to 37 C.F.R. 1.821 (c), a sequence listing is submitted herewith as an ASCII compliant text file named 765- 13234-204 SEQUENCE LISTING_ST25, that was created on February 22, 2017 and having a size of 1218 kilobytes. The content of the aforementioned file named 765-13234-204 SEQUENCE LISTING_ST25 is hereby incorporated by reference in its entirety.
BACKGROUND OF THE INVENTION
Cellulose is a polysaccharide composed of glucose monomers that is the main constituent of plant cell walls. Cellulosic material (e.g., cellulose, cellulosic hydrolysates, and soluble cellodextrins such as cellotriose, cellobiose, etc.) can be used as a carbon source for the bioproduction of chemicals. For instance, cellulosic biofuel production consists in the hydrolyzation of cellulose into sugars (e.g., fermentable sugar such as glucose). Glucose can then be processed to form ethanol through fermentation.
Cellulosic biorefineries offer the potential to produce fuels and other chemicals from renewable substrates (1). While advances have been made towards economically viable bioconversion processes, the efficiency and cost of saccharolytic enzymes are obstacles that challenge the feasibility of the large-scale production of fermentable sugars from cellulosic plant biomass. A promising strategy is to engineer an organism that expresses a functional cellulase system that simultaneously hydrolyses biomass while fermenting hydrolysate (2-4). Consolidated bioprocessing (CBP) would eliminate the costs of a dedicated saccharification process but depends on the development of a fermentation strain expressing a cellulase system capable of efficiently hydrolyzing cellulosic biomass at high solid loadings. It may also be advantageous for certain applications (e.g., production of animal feed) to degrade cellulosic material (e.g., lignocellulosic material) from plant biomass.
Saccharolytic enzymes from cellulolytic microbes are well characterized candidates for industrial applications (5). In the minimal model of cellulose hydrolysis, cellobiose and other soluble cellodextrins are released through the synergistic activities of endoglucanases (EGLs; EC 3.2.1.4) and cellobiohydrolases (CBHs; 3.2.1.91 ). β-glucosidases (BGLs; 3.2.1.21 ) are a third and critical component of natural and engineered cellulase systems. Their role is twofold: (1 ) the hydrolysis of soluble cellodextrins by BGLs produces a fermentable sugar, glucose, while (2) removing hydrolysate intermediates that act as inhibitors towards EGLs and CBHs (6).
Unfortunately, most characterized microbial BGLs perform best at low substrate concentrations, with m values between 1-3 mM for cellobiose (7-9), inhibition of hydrolysis activity being often observed at substrate concentrations above Km, where transglycosidation reactions occur (7-9, 12, 17-28). Km is the Michaelis constant, namely the substrate concentration at which the reaction rate is half of Vmax, Vmax representing the maximum velocity achieved by the system at maximum (saturating) substrate concentrations. BGLs are also inhibited by product (i.e. glucose) (6, 10), with reported K, (i.e. inhibition constant) values between 1 -10 mM (1 1 , 12). Glucose tolerant BGLs have been identified (13-16), but the mechanism of glucose tolerance is not understood. Inhibition of BGL hydrolytic activity by product and substrate contribute to limiting the efficiency of cellulosic bioconversion processes because the saccharification of biomass by EGLs and CBHs is dependent on the hydrolysis of soluble cellodextrins to glucose (29).
The yeast Saccharomyces cerevisiae is well-suited as a CBP platform microorganism because it naturally ferments hexose sugars to ethanol at high yields. Unfortunately, S. cerevisiae is non-cellulolytic and a CBP strain designed to secrete cellulases is needed. The development of a cellulolytic S. cerevisiae strain faces two challenges: First, a complete and efficient set (EGL, CBH and BGL) of saccharolytic enzymes must be optimized for the hydrolysis of pre-treated cellulosic biomass. Second, recombinant cellulases must be secreted at sufficient levels such that the production of sugars supports the consumption requirements for growth and fermentation. The heterologous expression of saccharolytic enzymes in S. cerevisiae and other hosts is well documented (30-51 ). BGLs secreted from S. cerevisiae enable growth on cellobiose (30, 31 , 43, 52), and the co-expression of BGLs with other cellulases enable growth on pure cellulose or pre-treated cellulosic feedstocks (33, 34, 38, 39, 41 , 44, 45, 53, 54). Despite the evidence of functional cellulase expression in S. cerevisiae, low ethanol yields have been reported from cellobiose (<20 g/L (33)) and amorphous cellulose (<7 g/L (39)). BGLs have been shown to be limiting in natural cellulase systems (55, 56), providing evidence that expression levels in S. cerevisiae generate poor fermentation yields. Since the secretion of cellulases from S. cerevisiae is well below titers reported for other proteins (57), increasing the heterologous expression of saccharolytic enzymes is a critical component of CDP strain development.
BGL inhibition (by product and/or high substrate concentration) and low expression are compounding obstacles to the development of CBP using S. cerevisiae. With the goal of identifying the best BGL for use in industrial processes, many studies have been based on the discovery and comparative characterization of enzymes from cellulolytic organisms (7, 9, 1 1-13, 15, 16, 18, 20, 24, 25, 28, 43, 46, 58-68). Attempts have also been made at improving natural BGLs by reducing inhibition through rational design (27), gene fusions (42, 69), increasing expression via codon optimization (43), and using alternative secretion or cell anchoring strategies (43, 70, 71 ). Structural data is available for several BGLs, with representative structures from glycoside hydrolase family 1 (GH 1) and glycoside hydrolase family 3 (GH3) proteins (72-74). Despite the scope of information available, a BGL tailored to CBP has yet to be discovered or engineered (75). The present description refers to a number of documents, the content of which is herein incorporated by reference in their entirety.
SUMMARY OF THE INVENTION
With the goal of building a CBP strain, the Applicants used a directed evolution strategy by expressing a library of mutated bgl1 genes in Saccharomyces cerevisiae and used a two-step functional screen to identify improved enzymes. Several amino acid substitutions were identified that improved the activity or expression in the context of secretion from Saccharomyces cerevisiae allowing improved BGLs to be engineered for industrial applications.
The Applicants showed that cellulases can be optimized for high substrate loadings, and identified specific positions as critical component of engineered GH3 BGLs.
Numerous BGL variants that supported growth of S. cerevisiae on cellobiose and showed increased activity on the synthetic substrate p-nitrophenyl- -D-glucopyranoside were identified and characterized. By performing kinetics experiments, Applicants found that the hydrolysis activity of certain substituted variants was not inhibited by high substrate concentrations i.e. dramatically reduced transglycosidation activity that causes inhibition of the hydrolytic reaction at high substrate concentrations. Targeted mutagenesis demonstrated that certain positions (e.g., 305 using A. niger numbering) are critical in GH3 BGLs and likely determine the extent to which transglycosidation reactions occur. The Applicants also found that certain mutations (e.g., Gin140 using the A. niger numbering) reduced the inhibitory effect of glucose and could be combined with a substitution that reduces the inhibitory effect of high substrate concentration (e.g., 305) to produce a BGL with decreased sensitivity to both product and substrate. Using the crystal structure of a GH3 BGL from A. aculeatus, the Applicants mapped a group of beneficial mutations to the β/α domain of the molecule and postulated, without being limited by this hypothesis, that this region modulates activity through subunit interactions. Certain BGL variants were identified with mutations in the MFa pre sequence that was used to mediate secretion of the protein. For instance, substitutions at Pro21 or Val22 of the MFa pre sequence could produce up to a 2-fold increase in supernatant BGL activity and provides evidence that expression and/or secretion was limiting hydrolytic activity of culture supernatants. Finally, Applicants showed that several beneficial mutations could be combined in a BGL with increased activity for both synthetic, namely pNPG, the standard for testing BGLs, and natural substrates.
More specifically, in accordance with the present invention, there are provided the following items.
Item 1. A glycoside hydrolase family 3 (GH3) beta-glucosidase (BGL) polypeptide, comprising:
(a) a GH3 BGL triosephosphateisomerase domain comprising a sequence as set forth in KGY1Y2Y3Y4LGP (SEQ ID NO: 251 ), wherein: Y2 is N, H, D or T;
Y3 is I, A or V; and
Y4 is R, K, A, L, I, F, V or P; (b) a GH3 BGL coordinating loop domain comprising a sequence as set forth in: GLDMX1MPX2X3X4X5X6X7X8X9X19X11X12X13X14X15X16 (SEQ ID NO: 252), wherein at least one of Xi to Xi6 is smaller than a corresponding reference amino acid residue, wherein the reference amino acid residues are: Xiref is S, T or D;
Figure imgf000005_0001
X4refisA, I, V, L, G.orT;
Xsref is Y, T, S, V, L, N, or absent;
Xeref is D, F, G, Y, V, C, or absent;
Xzref is G, H, L, N, S, T, C, D, E, M or absent;
Xsref is S, W, F, Y or absent;
Xgref is R, N, A, D or absent;
Xioref is M, S, D, Q, G, T, or absent;
Xnref is F, N, G, S, T, E, A, or absent;
Xi2ref is G, D, S, N, T, K, R, L, F, or absent;
Figure imgf000005_0002
Xisref is Y, F, or W; and
(c) a GH3 BGL β/α sandwich domain comprising a sequence as set forth in: Z1Z2Z3Z4Z5Z6 (SEQ ID NO: 253), wherein:
Zi isN.V, A, K, Q, T, S, E, D, F, orY;
Z2 is L, E, Y,S, K, Q, W, F, P.AorT;
Z4 is L, F, R, G or T;
Z5 is D, K, S, E, RorN; and
Ze is N, D orS; and/or
(d) a GH3 BGL β/α sandwich domain comprising a sequence as set forth in: A1A2A3A4A5 (SEQ ID NO: 254) and a sequence as set forth in B1B2B3B4B5 (SEQ ID NO: 255) wherein:
Ai is R, V, L, T, A, I, MorS;
A2 is A, S, Vor D;
A3 isQorN;
A4 isS, D, G, M,Q, F, Y, orT;
A5 is S, A, T, Q, D, or P; B2 isV, M, L, Fori;
B3 is D, E or absent;
B4isQ, P, S, R, Gor E; and B5 is W or F,
with the proviso that the GH3 BGL polypeptide (does not comprise and) is not as set forth in any of the sequences of FIGs. 13A-H (any one of SEQ ID NOs: 22 to 62).
Item 1 '. A glycoside hydrolase family 3 (GH3) beta-glucosidase (BGL) polypeptide, comprising:
(a) a GH3 BGL triosephosphateisomerase domain comprising a sequence as set forth in KGY1Y2Y3Y4LGP (SEQ ID NO: 251);
(b) a GH3 BGL coordinating loop domain comprising a sequence as set forth in:
GLDMX1MPX2X3X4X5X6X7X8X9X19X11X12X13X14X15X16 (SEQ ID NO: 252), except that at least one of Xi to X16, when present, is smaller than a corresponding reference amino acid residue in a corresponding GH3 BGL reference coordinating loop domain comprising a sequence as set forth in
GLDMXirefMPX2refX3refX4refX5refX6refX7refX8refX9refXl9refXl1refXl2refXl3refXl4refXl5refXl6ref (SEQ ID NO: 252);
(c) a GH3 BGL β/α sandwich domain comprising a sequence as set forth in: Z1Z2Z3Z4Z5Z6 (SEQ ID NO:
253); and/or
(d) a GH3 BGL β/α sandwich domain comprising a sequence as set forth in: A1A2A3A4A5 (SEQ ID NO: 254) and a sequence as set forth in B1B2B3B4B5 (SEQ ID NO: 255),
with the proviso that the GH3 BGL polypeptide is not a polypeptide as set forth in any one of SEQ ID NOs: 22 to 44, and is not a secreted form of the polypeptide of any one of SEQ ID NOs: 22 to 44. In a specific embodiment, the GH3 BGL polypeptide does not comprise a polypeptide as set forth in any one of SEQ ID NOs: 22 to 44, and does not comprise not a secreted form of the polypeptide of any one of SEQ ID NOs: 22 to 44
Item 2. The GH3 BGL polypeptide of item 1 or 1', wherein at least one of Xe to Xs is smaller than the corresponding reference amino acid residue.
Item 3. The GH3 BGL polypeptide of item 1 or 1 ', wherein at least one of Xe to Xs is independently C, V, A, G, S, P, T, D or N.
Item 4. The GH3 BGL polypeptide of item 1 or 1 ', wherein X8 is C, V, A, G, S, P, T, D or N. Item 5. The GH3 BGL polypeptide of item 1 or 1 ', wherein Xs is C, V, A, or G.
Item 6. The GH3 BGL polypeptide of any one of items 1 and 1 ' to 5, wherein
GLDMXirefMPX2refX3refX4refX5refX6refX7refX8refX9refXl9refXl1refXl2refXl3refXl4refXl5refXl6ref (SEQ ID NO: 252) is:
(i) gldmsmpgsaydmfgdfyg (SEQ ID NO: 256);
(ii) gldmtmpgditfsndsyfg (SEQ ID NO: 257);
(iii) gldmtmpgditfsgdsyfg (SEQ ID NO: 258);
(iv) gldmtmpgditfsndsyfg (SEQ ID NO: 259);
(v) gdmdmpgdvsgdsstyfg (SEQ ID NO: 260);
(vi) gldmtmpgditfhsndsyfg (SEQ ID NO: 261 );
(vii) gldmtmpgdivyhsnnsyfg (SEQ ID NO: 262); (viii) gldmtmpggvtvtstdsyfg (SEQ ID NO: 263);
(ix) gldmtmpgdvlccsrqegslwg (SEQ ID NO: 264);
(x) gldmtmpgditfnsgtswwg (SEQ ID NO: 265);
(xi) gldmsmpgdvyfnsntsywr (SEQ ID NO: 266);
(xii) gldmtmpgglnfdgsgpywr (SEQ ID NO: 267);
(xiii) gldmdmpceaqyfg (SEQ ID NO: 268);
(xiv) gldmsmpgevyggwntgtsfwg (SEQ ID NO: 269);
(xv) gldmsmpgellggwntgksywg (SEQ ID NO: 270);
(xvi) gldmsmpgdglhwadgrslwg (SEQ ID NO: 271 );
(xvii) gldmsmpgdisfddglsfwg (SEQ ID NO: 272);
(xviii) gldmsmpgdvtfdsgtsfwg (SEQ ID NO: 273);
(xix) gldmsmpgditfdsatsfwg (SEQ ID NO: 274);
(xx) gldmsmpgdvdydsgtsywg (SEQ ID NO: 275);
(xxi) gldmtmpgdtefntgfsfwg (SEQ ID NO: 276);
(xxii) gldmsmpgdtmfnsgrsywg (SEQ ID NO: 277); or
(xxiii) gldmsmpgdtefntglsfwg (SEQ ID NO: 278).
Item 7. The GH3 BGL polypeptide of any one of items 1 to 6, which, except for residues defined in any one of items 1 to 6 and for the proviso defined in item 1 , is as set forth in any of the sequences of FIG. 13A-H and 15A-B or is a secreted form thereof (SEQ ID NOs:22-167).
Item 8. The GH3 BGL polypeptide of any one of items 1 to 6, which, except for residues defined in any one of items 1 to 6 and for the proviso defined in item 1 , is as set forth in Umay2: Ustilago maydis (SEQ ID NO: 37); Ccinl: Coprinopsis cinerea (also designated herein Ccin 0.120163) (SEQ ID NO: 30); Umay Ustilago maydis (SEQ ID NO: 29); Rory5: Rhizopus oryzae (SEQ ID NO: 36); Pblal: Phycomyces blaskesleeanus (SEQ ID NO: 28); Pbla2: Phycomyces blaskesleeanus (SEQ ID NO: 31 ); Roryl: Rhizopus oryzae (SEQ ID NO: 32); Roryl: Rhizopus oryzae (SEQ ID NO: 33); Rory3: Rhizopus oryzae (SEQ ID NO: 34); RoryA: Rhizopus oryzae (SEQ ID NO: 35); Anid: Apergillus nidulans (SEQ ID NO: 24); Afum: Aspergillus fumigatus (SEQ ID NO: 25); Aacu: Aspergillus aculeatus (SEQ ID NO: 22); Anig: Aspergillus niger (SEQ ID NO: 23); Ncra: Neurospora crassa (SEQ ID NO: 26); Fgra: Fusarium graminearum (SEQ ID NO: 27); Mgrisea: Magnaporthe grisea (SEQ ID NO: 38); Ccinl: Coprinopsis cinerea (also designated herein CcinBGL1020 0.1 1558) (SEQ ID NO: 39); PchrBgl1010 (SEQ ID NO: 40); Aory. Aspergillus oryzae (SEQ ID NO: 41); CpelBglX0290: Wickerhamomyces anomalus (Pichia anomal) (SEQ ID NO: 42); SfibBglM22475: Saccharomycopsis fibuligera M22475 (SEQ ID NO: 43); SfibBglM22476: Saccharomycopsis fibuligera M22476 (SEQ ID NO: 44); or any of the predicted secreted forms defined in FIG. 13A-H.
Item T. The GH3 BGL polypeptide of any one of items 1 and 1' to 6, which, except for residues defined in any one of items 1 and 1' to 6 and for the proviso defined in item 1 or 1 ', comprises a polypeptide as set forth in any one of SEQ ID NOs: 163-167, or comprises a secreted form a polypeptide as set forth in any one of SEQ ID NOs: 163-167. Item 8'. The GH3 BGL polypeptide of any one of items 1 and 1' to 6, which, except for amino acid residues defined in any one of items 1 and 1 ' to 6 and for the proviso defined in item 1 or 1', comprises a polypeptide as set forth in any one of SEQ ID NOs: 22-44, or comprises a secreted form of the polypeptide as set forth in any one of SEQ ID NOs: 22-44.
Item 9'. The GH3 BGL polypeptide of item 1 or 1 ', wherein:
(a) in KGY1Y2Y3Y4LGP (SEQ ID NO: 251 ):
Y3 is I or V;
(b) the GH3 BGL coordinating loop domain comprises a sequence as set forth in: GLDMXrMPGX2X3'X4'X5'X6'X7'X8'X9'XwXi XizXi3Xi4'Xi5'Xi6' (SEQ ID NO: 279), except that at least one of Xr to X16', when present, is smaller than a corresponding reference amino acid residue in a corresponding GH3 BGL reference coordinating loop domain comprising a sequence as set forth in
GLDMXTrefM PGX2refX refX4,refX5,refX6,refX7 ^ ID NO: 279);
(c) in ZiZ2Z3Z4Z5Z6 (SEQ ID NO: 253):
Zi is N. V, A, K, Q, T, S, E, F, or Y;
Z2 is L, E, Y, S, K, Q, W, P, A or T;
Figure imgf000008_0001
Z5 is D, K, R or N; and
Ze is N, D or A; and/or
(d) in: AiA2A3A4A5 (SEQ ID NO: 254) and B1B2B3B4B5 (SEQ ID NO: 255):
Figure imgf000008_0002
A4 is S, F, M, D, G, Q or T;
B2 is V, M, l or L;
Β4· is Q, P, S, R, E or absent;
with the proviso that the GH3 BGL polypeptide is not a polypeptide as set forth in any one of SEQ ID NOs: 22 to 44, and is not a secreted form of the polypeptide of any one of SEQ ID NOs: 22 to 44.
Item 9. The GH3 BGL polypeptide of item 1 , wherein:
(a) in KGY1Y2Y3Y4LGP (SEQ ID NO: 251 ):
Y2 is D or T;
Yi and Y4 are as defined in item 1 ; and
(b) the GH3 BGL coordinating loop domain comprises a sequence as set forth in: GLDMXrMPGX2X3X X5'X6'X7'X8'X9'XwXi XizXi Xi4Xi5'Xi6' (SEQ ID NO: 279), wherein at least one of Xr to Χΐ6' is smaller than a corresponding reference amino acid residue, wherein the reference amino acid residues are:
Xrref is S, T or D;
Figure imgf000009_0001
XsrefisA, V, I, GorT;
X4'ref is Y, L, or absent;
Xsref is Y, L, T, S, V, H, D, M, E, or absent;
Xe'ref is D, F, C, G, Y, V, W, D, M, E, or absent;
XzYefis G, N, C, H, L, S,T, A, D, orN;
Xsref is M, S, D, or T;
Xg'ref is R or absent;
Xwref is Q or absent;
Xirref is F, N, E, G,S, Tor A;
Xizref is G, T, D, S, N, R, L, T, F;
Figure imgf000009_0002
(c) in ZiZ2Z3Z4Z5Z6 (SEQ ID NO: 253):
Zi isN.V, A, K, Q, T, S, E, F, orY;
Z2 is L, E, Y, S, K, Q, W, P, A or T;
Figure imgf000009_0003
Z5 is D, K, R or N; and
Ze is N, D or A; and/or
(d) the GH3 BGL β/α sandwich domain comprises a sequence at set forth in: A1A2A3A4A5 (SEQ ID NO: 254) and a sequence as set forth in B1B2B3B4B5 (SEQ ID NO: 255) wherein:
Figure imgf000009_0004
A2 is as defined in item 1;
A3 is as defined in item 1;
A4 isS, F, M, D, G, QorT;
A5 is as defined in item 1, B2 isV, M, lor L;
B3 is D or E;
B4' is Q, P, S, R, E or absent; and
B5 is as defined in item 1, with the proviso that the GH3 BGL polypeptide (does not comprise and) is not as set forth in any of the sequences of FIGs. 13A-H (any one of SEQ ID NOs: 22 to 62).
Item 10. The GH3 BGL polypeptide of item 9 or 9', wherein at least one of X to X$ is smaller than the corresponding reference amino acid residue.
Item 1 1. The GH3 BGL polypeptide of item 9 or 9', wherein at least one of the amino acid residues X* to X$ is C, V, A, G, S, P, T, D or N.
Item 12. The GH3 BGL polypeptide of item 9 or 9', wherein X6 < is C, V, A, G, S, P, T, D or N. Item 13. The GH3 BGL polypeptide of item 9 or 9', wherein Xe is C, V, A or G.
Item 14. The GH3 BGL polypeptide of any one of items 9 and 9' to 13, wherein
GLDMXtrefMPGX2'refX3,refX4,refX5,refX6,refX7^ ID NO: 279) is:
gldmsmpgsay-dgm-fgdfyg (SEQ ID NO:256);
gldmsmpgdv-yfns-ntsywr (SEQ ID NO: 266);
gldmtmpgdv-lccsrqegslwg (SEQ ID NO: 264);
gldmtmpgdi-tfhs-ndsyfg (SEQ ID NO: 257);
gldmtmpgdi-tfls-gdsyfg (SEQ ID NO: 258);
gldmtmpgdi-tfns-ndsyfg (SEQ ID NO: 259);
gldmdmpgdv-sgsd-sstyfg (SEQ ID NO: 260);
gldmtmpgdi-tfhs-ndsyfg (SEQ ID NO: 261 );
gldmtmpgdi-vyhs-nnsyfg (SEQ ID NO: 262);
gldmtmpggv-tvts-tdsyfg (SEQ ID NO: 263);
gldmsmpgdglhwad-grslwg (SEQ ID NO: 271);
gldmsmpgdi-sfdd-glsfwg (SEQ ID NO: 272);
gldmsmpgdi-tfds-atsfwg (SEQ ID NO: 274);
gldmsmpgdv-dyds-gtsywg (SEQ ID NO: 275);
gldmsmpgdt-mfns-grsywg (SEQ ID NO: 277);
gldmtmpgdt-efnt-gfsfwg (SEQ ID NO: 276); or
gldmsmpgdvtfds-gtsfwg (SEQ ID NO: 273).
Item 15. The GH3 BGL polypeptide of any one of items 9 and 9' to 14, which except, for residues defined in any one of items 9 to 14 and for the proviso defined in item 9 or 9', is as set forth SEQ ID NO: 164 or is a secreted form thereof (FIGs. 15A-B).
Item 16. The GH3 BGL polypeptide of any one of items 9 and 9' to 15, which, except for residues defined in any one of items 9 to 14 and for the proviso defined in item 9 or 9' is as set forth in sequence Umay2: Ustilago maydis (SEQ ID NO: 37); Ccin Coprinopsis cinerea (also designated herein Ccin 0.120163) (SEQ ID NO: 30); Umayl: Ustilago maydis (SEQ ID NO: 29); Rory5: Rhizopus oryzae (SEQ ID NO: 36); Pbla Phycomyces blaskesleeanus (SEQ ID NO: 28); Pbla2: Phycomyces blaskesleeanus (SEQ ID NO: 31 ); Roryl: Rhizopus oryzae (SEQ ID NO: 32); Rory2: Rhizopus oryzae (SEQ ID NO: 33); Rory3: Rhizopus oryzae (SEQ ID NO: 34); RoryA: Rhizopus oryzae (SEQ ID NO: 35); Anid: Apergillus nidulans (SEQ ID NO: 24); Afum: Aspergillus fumigatus (SEQ ID NO: 25); Aacu: Aspergillus aculeatus (SEQ ID NO: 22); Anig: Aspergillus niger (SEQ ID NO: 23); Ncra: Neurospora crassa (SEQ ID NO: 26); Fgra: Fusarium graminearum (SEQ ID NO: 27); and MAory. Aspergillus oryzae (SEQ ID NO: 41) or any of their predicted secreted forms defined in FIGs. 13A-H.
Item 15'. The GH3 BGL polypeptide of any one of items 9 and 9' to 14, which except, for residues defined in any one of items 9 and 9' to 14 and for the proviso defined in item 9 or 9', comprises a polypeptide as set forth in SEQ ID NO: 164, or comprises a secreted form the polypeptide as set forth in SEQ ID NO: 164.
Item 16'. The GH3 BGL polypeptide of any one of items 9, 9' to 15 and 15', which, except for residues defined in any one of items 9, 9' to 15 and 15and for the proviso defined in item 9 or 9', comprises a polypeptide as set forth in any one of SEQ ID NO: 22-37 and 41 , or comprises a secreted form of the polypeptide as set forth in any one of SEQ ID NO: 22-37 and 41.
17'. The GH3 BGL of item 1 or 1 ', wherein:
(a) in KGY1Y2Y3Y4LGP (SEQ ID NO: 251 ):
Yi and Y4 are as defined in item 1 or 1'; and
Y2 and Y3 are as defined in item 9 or 9';
(b) the GH3 BGL coordinating loop domain comprises a sequence as set forth in: GLDMXrMPGDX2"X3"X4»X5"X6"XrX8"SX9"WG (SEQ ID NO: 280), except that at least one of Xr to X9 «, when present, is smaller than a corresponding reference amino acid residue in a corresponding GH3 BGL reference coordinating loop domain comprising a sequence as set forth in GLDMXrrefMPGDXzrefXa-refX* ^ (SEQ ID NO: 280);
(c) in ZiZ2Z3Z4Z5Z6 (SEQ ID NO: 253):
Z2 is A, T or S;
Z3 is as defined in item 1 or 1 ';
Z5 is D, R or N; and
Ze is N or S; and/or
(d) in AiA2A3A4A5 (SEQ ID NO: 254) and in BiB2B3B4B5 (SEQ ID NO: 255):
A2 is A or V;
A4 is Q or T;
A5 is S, A, Q or P,
B2 is I, V or L; B3 is D or E; B5 is W,
with the proviso that the GH3 BGL polypeptide is not a polypeptide as set forth in any one of the sequences of SEQ ID NOs: 22 to 44, and is not a secreted form of the polypeptide of any one of SEQ ID NOs: 22 to 44.
Item 17. The GH3 BGL of item 1 , wherein:
(a) in KGY1Y2Y3Y4LGP (SEQ ID NO: 251 ):
Yi and Y4 are as defined in item 1 ; and
Y2 and Y3 are as defined in item 9;
(b) the GH3 BGL coordinating loop domain comprises a sequence as set forth in: GLDMXrMPGDX2"X3"X4»X5"X6"XrX8"SX9"WG (SEQ ID NO: 280), wherein at least one of Xr to X9 « is smaller than a corresponding reference amino acid residue, wherein the reference amino acid residues are:
Xr is S or T;
Xz is I, V or T;
X3 < is S, T, D, M or E;
Xs is D or N;
Xs is D, S or T;
Xs is L, T, R or F; and
X9 is F or Y;
(c) in Z1Z2Z3Z4Z5Z6 (SEQ ID NO: 253), wherein:
Z2 is A, T or S;
Z3 is as defined in item 1 ;
Z5 is D, R or N; and
Ze is N or S; and/or
(d) in AiA2A3A4A5 (SEQ ID NO: 254) and in BiB2B3B4B5 (SEQ ID NO: 255):
A2 is A or V;
A3 is as defined in item 1 ;
A5 is S, A, Q or P,
B2 is I, V or L;
B3 is D or E;
B4 is R or E; and B5 is W,
with the proviso that the GH3 BGL polypeptide (does not comprise and) is not as set forth in any of the sequences of FIGs. 13A-H (any one of SEQ ID NOs: 22 to 62).
Item 18. The GH3 BGL polypeptide of item 17 or 17', wherein at least one of X2" to X4" is smaller than the corresponding amino acid residue.
Item 19. The GH3 BGL polypeptide of item 17 or 17', wherein at least one of X2 << to X4 << is C, V, A, G, S, P, T, D or N. Item 20. The GH3 BGL polypeptide of item 17 or 17', wherein X4 << is C, V, A, G, S, P, T, D or N. Item 21. The GH3 BGL polypeptide of item 17 or 17', wherein X4 << is C, V, A, or G.
Item 22. The GH3 BGL polypeptide of any one of items 17 and 17' to 21 , wherein
GLDMXi"refMPGDX2"refX3"refX4"refX5'refX6"refX7"refX8'refSX9"refW (SEQ ID NO: 280) is:
(i) gldmsmpgdisfddglsfwg (SEQ ID NO: 272);
(ii) gldmsmpgdvtfdsgtsfwg (SEQ ID NO: 273);
(iii) gldmsmpgditfdsatsfwg (SEQ ID NO: 274);
(iv) gldmsmpgdvdydsgtsywg (SEQ ID NO: 275);
(v) gldmsmpgdtmfnsgrsywg (SEQ ID NO: 277); or
(vi) gldmtmpgdtefntgfsfwg (SEQ ID NO: 276).
Item 23'. The GH3 BGL polypeptide of any one of items 17 and 17' to 22, which, except for residues defined in any one of items 17 and 17' to 22 or for the proviso defined in item 17 or 17', comprises a polypeptide as set forth in SEQ ID NO: 165, or comprises a secreted form of the polypeptide as set forth in SEQ ID NO: 165.
Item 24'. The GH3 BGL polypeptide of any one of items 17 and 17' to 23, which, except for residues defined in any one of items 17 and 17' to 22 or for the proviso defined in item 17 or 17', comprises a polypeptide as set forth in any one of SEQ ID NOs: 22-23, 25-27 and 41 , or comprises a secreted form of the polypeptide as set forth in any one of SEQ ID NOs: 22-23, 25-27 and 41.
Item 25'. The GH3 BGL of claim 17 or 17', wherein:
(a) KGY1Y2Y3Y4LGP (SEQ ID NO: 251 ) is as defined in item 17 or 17';
(b) GLDMXrMPGDX2"X3"X4»X5"X6"XrX8"SX9"WG (SEQ ID NO: 280) is as defined in any one of items 17 and 17' to 24;
(c) in ZiZ2Z3Z4Z5Z6 (SEQ ID NO: 253),
Zi-Z5 are as defined in item 17 or 17'; and
(d) in AiA2A3A4A5 (SEQ ID NO: 254) and in BiB2B3B4B5 (SEQ ID NO: 255):
A2-A5 are as defined in item 17 or 17'; and B1-B5 are as defined in item 17 or 17',
with the proviso that the GH3 BGL polypeptide is not a polypeptide as set forth in any one of SEQ ID NOs: 22 to 44, and is not a secreted form of the polypeptide of any one of SEQ ID NOs: 22 to 44.
Item 23. The GH3 BGL polypeptide of any one of items 17 to 22, which, except for residues defined in any one of items 17 and 17' to 22 or for the proviso defined in item 17 or 17', is as set forth in SEQ ID NO: 165) or is a secreted form thereof (FIGs. 15A-B).
Item 24. The GH3 BGL polypeptide of any one of items 17 to 23, which, except for residues defined in any one of items 17 to 22 or for the proviso defined in item 17, is as set forth in Afum: Aspergillus fumigatus (SEQ ID NO: 25); Aacu: Aspergillus aculeatus (SEQ ID NO: 22); Aory: Aspergillus oryzae (SEQ ID NO: 41); Anig: Aspergillus niger (SEQ ID NO: 23); Ncra: Neurospora crassa (SEQ ID NO: 26); or Fgra: Fusarium graminearum (SEQ ID NO: 27); or any of their predicted secreted forms defined in FIGs. 13A-H.
Item 25. The GH3 BGL of item 17, wherein:
(a) KGY1Y2Y3Y4LGP (SEQ ID NO: 251) is as defined in item 17;
(b) GLDMXrMPGDX2"X3"X4»X5"X6"XrX8"SX9"WG (SEQ ID NO: 280) is as defined in any one of items 17 to 24;
(c) in ZiZ2Z3Z4Z5Z6 (SEQ ID NO: 253):
Zi-Z5 are as defined in item 17;
(d) in AiA2A3A4A5 (SEQ ID NO: 254) and in BiB2B3B4B5 (SEQ ID NO: 255):
A2 - As are as defined in item 17; and
Bi - Bs are as defined in item 17,
with the proviso that the GH3 BGL polypeptide (does not comprise and) is not as set forth in any of the sequences of FIGs. 13A-H (any one of SEQ ID NOs: 22 to 62).
Item 26. The GH3 BGL polypeptide of item 25 or 25', wherein
GLDMXi"refMPGDX2"refX3"refX4"refX5'refX6"refX7"refX8'refSX9"refW (SEQ ID NO: 280) is:
(i) gldmsmpgdisfddglsfwg (SEQ ID NO: 273);
(ii) gldmsmpgditfdsatsfwg (SEQ ID NO: 275);
(iii) gldmsmpgdvdydsgtsywg (SEQ ID NO: 276);
(iv) gldmsmpgdtmfnsgrsywg (SEQ ID NO: 278); or
(v) gldmtmpgdtefntgfsfwg (SEQ ID NO: 277).
Item 27'. The GH3 BGL polypeptide of any one of items 25, 25' and 26, which, except for residues as defined in any one of items 25, 25' and 26 or for the proviso defined in item 25 or 25', comprises a polypeptide as set forth in SEQ ID NO: 166, or comprises a secreted form of the polypeptide as set forth in SEQ ID NO: 166. Item 28'. The GH3 BGL polypeptide of any one of items 25, 25', 26-27 and 27', which except for residues defined in any one of items 25, 25' and 26, comprises a polypeptide as set forth in any one of SEQ ID NOs: 22-23 and 25-27, or comprises a secreted form of the polypeptide as set forth in any one of SEQ ID NOs: 22-23 and 25-27.
Item 29'. The GH3 BGL of item 17 or 17', wherein:
(a) in KGY1Y2Y3Y4LGP (SEQ ID NO: 251 ):
Y3 is as defined in item 9 or 9';
(b) in GLDMXi"MPGDX2"X3"X4"X5"X6"X7"X8"SX9"WG (SEQ ID NO: 280):
Xr is S;
X2" is I or V;
X3 << is S, T or D;
X4" is F or Y;
X5" is D;
X6 << is D or S;
Xr is G or A;
Figure imgf000015_0001
(c) in Z1Z2Z3Z4Z5Z6 (SEQ ID NO: 253), wherein:
Figure imgf000015_0002
Ze is N or S; and/or
(d) in AiA2A3A4A5 (SEQ ID NO: 254) and in BiB2B3B4B5 (SEQ ID NO: 255):
Figure imgf000015_0003
A5 is S or A, B2 is I or V;
B3 is D or E; B5 is W,
with the proviso that the GH3 BGL polypeptide is not a polypeptide as set forth in any one of SEQ ID NOs: 22 to 44, and is not a secreted form of the polypeptide of any one of SEQ ID NOs: 22 to 44. Item 27. The GH3 BGL polypeptide of item 25 or 26, which, except for residues defined in item 25 or 26 or for the proviso defined in item 25, is as set forth in SEQ ID NO: 166 or is a secreted form thereof (FIGs. 15A-B).
Item 28. The GH3 BGL polypeptide of any one of items 25 to 27, which except for residues defined any one of items 25 to 27, is as set forth in Afum: Aspergillus fumigatus (SEQ ID NO: 25); Aacu: Aspergillus aculeatus (SEQ ID NO: 22); Anig: Aspergillus niger (SEQ ID NO: 23); Ncra: Neurospora crassa (SEQ ID NO: 26); or Fgra: Fusanum graminearum (SEQ ID NO: 27); or any of their predicted secreted forms defined in FIGs. 13A-H.
Item 29. The GH3 BGL of item 17, wherein:
(a) in KGY1Y2Y3Y4LGP (SEQ ID NO: 251 ):
Yi and Y4 are as defined in item 1 ;
Y3 IS as defined in item 9;
(b) in GLDMXi"MPGDX2"X3"X4"X5"X6"X7"X8"SX9"WG (SEQ ID NO: 280):
Xr is S;
X2" is I or V;
X3 << is S, T or D;
X4" is F or Y;
X5" is D;
X6 << is D or S;
Xr is G or A;
Figure imgf000016_0001
(c) in Z1Z2Z3Z4Z5Z6 (SEQ ID NO: 253), wherein:
Figure imgf000016_0002
Ze is N or S; and/or
(d) in AiA2A3A4A5 (SEQ ID NO: 254) and in BiB2B3B4B5 (SEQ ID NO: 255):
Figure imgf000016_0003
A5 is S or A, B2 is I or V;
B3 is D or E; B5 is W,
with the proviso that the GH3 BGL polypeptide (does not comprise and) is not as set forth in any of the sequences of FIGs. 13A-H (any one of SEQ ID NOs: 22 to 62).
Item 30. The GH3 BGL polypeptide of item 29 or 29', wherein at least one of X2" to X4" is smaller than the corresponding amino acid residue.
Item 31. The GH3 BGL polypeptide of item 29 or 29', wherein at least one of X2 << to X4 << is C, V, A, G, S, P, T, D or N. Item 32. The GH3 BGL polypeptide of item 29 or 29', wherein X4 << is C, V, A, G, S, P, T, D or N. Item 33. The GH3 BGL polypeptide of item 29 or 29', wherein X4 << is C, V, A, or G.
Item 34. The GH3 BGL polypeptide of any one of items 29 and 29' to 33 wherein
GLDMXrrefM PGDXz refXs' refX* ^ (SEQ ID NO: 280) is:
(i) gldmsmpgdisfddglsfwg (SEQ ID NO: 272);
(ii) gldmsmpgdvtfdsgtsfwg (SEQ ID NO: 273);
(iii) gldmsmpgditfdsatsfwg (SEQ ID NO: 274); or
(iv) gldmsmpgdvdydsgtsywg (SEQ ID NO: 275).
Item 35'. The GH3 BGL polypeptide of any one of items 29 and 29' to 34, which, except for residues defined in any one of items 29 and 29' to 34 or for the proviso defined in item 29 or 29', comprises a polypeptide as set forth in SEQ ID NO: 167, or comprises a secreted form of the polypeptide as set forth in SEQ ID NO: 167.
Item 36'. The GH3 BGL polypeptide of any one of items 29 and 29' to 34, which, except for residues defined in any one of items 29 and 29' to 34 or for the proviso defined in item 29 or 29', comprises a polypeptide as set forth in any one of SEQ ID NOs: 22-23, 25 and 41 , or comprises a secreted form of the polypeptide as set forth in any one of SEQ ID NOs: 22-23, 25 and 41.
Item 35. The GH3 BGL polypeptide of any one of items 29 to 34, which, except for residues defined in any one of items 29 to 34 or for the proviso defined in item 29, is as set forth in SEQ ID NO: 167 or is a secreted form thereof (FIGs. 15A-B).
Item 36. The GH3 BGL polypeptide of any one of items 29 to 34, which, except for residues defined in any one of items 29 to 34 or for the proviso defined in item 29, is as set forth in Afum: Aspergillus fumigatus (SEQ ID NO: 25); Aory: Aspergillus oryzae (SEQ ID NO: 41 ); Aacu: Aspergillus aculeatus (SEQ ID NO: 22); or Anig: Aspergillus niger (SEQ ID NO: 23); or any of their predicted secreted forms defined in FIGs. 13A-H.
Item 37. The GH3 BGL polypeptide of any one of the preceding items, wherein Y4 is A, L, I or V.
Item 38. The GH3 BGL polypeptide of any one of the preceding items, wherein Y4 is L. Item 39. The GH3 BGL polypeptide of any one the preceding items, wherein Z3 is V. Item 40. The GH3 BGL polypeptide of any one the preceding items, wherein A3 is Q. Item 41. The GH3 BGL polypeptide of any one of the preceding items, wherein B3 is D.
Item 42. The GH3 BGL polypeptide of item 1 , as set forth in FIGs. 15A-B(SEQ ID NO: 163), wherein:
(i) amino acid residue at position 340 SEQ ID NO: 163 is R, K, A, L, I, F, V or P;
(ii) amino acid residue at position 515 of SEQ ID NO: 163 is C, V, A, G, S, P, T, D or N;
(iii) amino acid residue at position 734 of SEQ ID NO: 163 is V or L; and/or
(iv) amino acid residue at position 748 of SEQ ID NO: 163is Q or N and amino acid residue at position 813 of SEQ ID NO: 163 is D or E,
with the proviso that the GH3 BGL polypeptide (does not comprise and) is not as set forth in any of the sequences of FIGs. 13A-H (any one of SEQ ID NOs: 22 to 62).
Item 42'. The GH3 BGL polypeptide of item 1 or 1 ', comprising a polypeptide as set forth in SEQ ID NO: 163, or a secreted form thereof, wherein:
(i) amino acid residue at position 340 of SEQ ID NO: 163 is R, K, A, L, I, F, V or P;
(ii) amino acid residue at position 515 of SEQ ID NO: 163 is C, V, A, G, S, P, T, D or N;
(iii) amino acid residue at position 734 of SEQ ID NO: 163 is V or L; and/or
(iv) amino acid residue at position 748 of SEQ ID NO: 163 is Q or N and amino acid residue at position 813 of SEQ ID NO: 163 is D or E,
with the proviso that the GH3 BGL polypeptide is not a polypeptide as set forth in any one of SEQ ID NOs: 22 to 44, and is not a secreted form of the polypeptide of any one of SEQ ID NOs: 22 to 44.
Item 43. The GH3 BGL polypeptide of item 42 or 42', wherein:
(i) amino acid residue at position 340 of SEQ ID NO: 163 is A, L, I or V;
(ii) amino acid residue at position 515 of SEQ ID NO: 163 is C, G, A or V;
(iii) amino acid residue at position 734 of SEQ ID NO: 163 is V; and/or
(iv) amino acid residue at position 748 of SEQ ID NO: 163 is Q and amino acid residue at position 813 of SEQ ID NO: 163 is D.
Item 44. The GH3 BGL polypeptide of item 42 or 42', wherein:
(i) amino acid residue at position 340 of SEQ ID NO: 163 is L;
(ii) amino acid residue at position 515 of SEQ ID NO: 163 is C, G, A or V;
(iii) amino acid residue at position 734 of SEQ ID NO: 163 is V; and/or
(iv) amino acid residue at position 748 of SEQ ID NO: 163 is Q and amino acid residue at position 813 of SEQ ID NO: 163 is D.
Item 45. The GH3 BGL polypeptide of item 9, as set forth in FIGs. 15A-B (SEQ ID NO: 164), wherein:
(i) amino acid residue at position 336 of SEQ ID NO: 164 is R, K, A, L, I, F, V or P; (ii) amino acid residue at position 507 of SEQ ID NO: 164 is C, V, A, G, S, P, T, D or N;
(iii) amino acid residue at position 727 of SEQ ID NO: 164 is V or L; and/or
(iv) amino acid residue at position 741 of SEQ ID NO: 164 is Q or N and amino acid residue at position 806 of SEQ ID NO: 164 is D or E,
with the proviso that the GH3 BGL polypeptide (does not comprise and) is not as set forth in any of the sequences of FIGs. 13A-H (any one of SEQ ID NOs: 22 to 62).
Item 45'. The GH3 BGL polypeptide of item 9 or 9', comprising a polypeptide as set forth in SEQ ID NO: 164 or a secreted form thereof, wherein:
(i) amino acid residue at position 336 of SEQ ID NO: 164 is R, K, A, L, I, F, V or P;
(ii) amino acid residue at position 507 of SEQ ID NO: 164 is C, V, A, G, S, P, T, D or N;
(iii) amino acid residue at position 727 of SEQ ID NO: 164 is V or L; and/or
(iv) amino acid residue at position 741 of SEQ ID NO: 164 is Q or N and amino acid residue at position 806 of SEQ ID NO: 164 is D or E,
with the proviso that the GH3 BGL polypeptide is not a polypeptide as set forth in any one of SEQ ID NOs: 22 to 44, and is not a secreted form of the polypeptide of any one of SEQ ID NOs: 22 to 44.
Item 46. The GH3 BGL polypeptide of item 45 or 45', wherein:
(i) amino acid residue at position 336 of SEQ ID NO: 164 is A, L, I or V;
(ii) amino acid residue at position 507 of SEQ ID NO: 164 is C, G, A or V;
(iii) amino acid residue at position 727 of SEQ ID NO: 164 is V; and/or
(iv) amino acid residue at position 741 of SEQ ID NO: 164 is Q and amino acid residue at position 806 of SEQ ID NO: 164 is D.
Item 47. The GH3 BGL polypeptide of item 45 or 45', wherein:
(i) amino acid residue at position 336 of SEQ ID NO: 164is L;
(ii) amino acid residue at position 507 of SEQ ID NO: 164 is C, G, A or V;
(iii) amino acid residue at position 727 of SEQ ID NO: 164 is V; and/or
(iv) amino acid residue at position 741 of SEQ ID NO: 164 is Q and amino acid residue at position 806 of SEQ ID NO: 164 is D.
Item 48. The GH3 BGL polypeptide of item 17 or 25, as set forth in FIGs. 15A-B (SEQ ID NOs: 165 or 166), wherein:
(i) amino acid residue at position 157 of SEQ ID NO: 165 or 166 is R, K, A, L, I, F, V or P;
(ii) amino acid residue at position 322 of SEQ ID NO: 165 or 166 is C, V, A, G, S, P, T, D or N;
(iii) amino acid residue at position 498 of SEQ ID NO: 165 or 166 is V or L; and/or
(iv) amino acid residue at position 512 of SEQ ID NO: 165 or 166 is Q or N and amino acid residue at position 576 of SEQ ID NO: 165 or 166 is D or E,
with the proviso that the GH3 BGL polypeptide (does not comprise and) is not as set forth in any of the sequences of FIGs. 13A-H (any one of SEQ ID NOs: 22 to 62). Item 48'. The GH3 BGL polypeptide of any one of items 17, 17', 25 and 25', comprising a polypeptide as set forth in SEQ ID NOs: 165 or 166 or a secreted form thereof, wherein:
(i) amino acid residue at position 157 of SEQ ID NO: 165 or 166 is R, K, A, L, I, F, V or P;
(ii) amino acid residue at position 322 of SEQ ID NO: 165 or 166 is C, V, A, G, S, P, T, D or N;
(iii) amino acid residue at position 498 of SEQ ID NO: 165 or 166 is V or L; and/or
(iv) amino acid residue at position 512 of SEQ ID NO: 165 or 166 is Q or N and amino acid residue at position 576 of SEQ ID NO: 165 or 166 is D or E,
with the proviso that the GH3 BGL polypeptide is not a polypeptide as set forth in any one of SEQ ID NOs: 22 to 44, and is not a secreted form of the polypeptide of any one of SEQ ID NOs: 22 to 44.
Item 49. The GH3 BGL polypeptide of item 48 or 48', wherein:
(i) amino acid residue at position 157 of SEQ ID NO: 165 or 166 is A, L, I or V;
(ii) amino acid residue at position 322 of SEQ ID NO: 165 or 166 is C, G, A or V;
(iii) amino acid residue at position 498 of SEQ ID NO: 165 or 166is V; and/or
(iv) amino acid residue at position 512 of SEQ ID NO: 165 or 166 is Q and amino acid residue at position 576 of SEQ ID NO: 165 or 166 is D.
Item 50. The GH3 BGL polypeptide of item 48 or 48', wherein:
(i) amino acid residue at position 157 of SEQ ID NO: 165 or 166 is L;
(ii) amino acid residue at position 322 of SEQ ID NO: 165 or 166 is C, G, A or V;
(iii) amino acid residue at position 498 of SEQ ID NO: 165 or 166 is V; and/or
(iv) amino acid residue at position 512 of SEQ ID NO: 165 or 166 is Q and amino acid residue at position 576 of SEQ ID NO: 165 or 166 is D.
Item 51. The GH3 BGL polypeptide of item 29, as set forth in FIGs. 15A-B (SEQ ID NO: 167), wherein:
(i) amino acid residue at position 151 of SEQ ID NO: 167 is R, K, A, L, I, F, V or P;
(ii) amino acid residue at position 316 of SEQ ID NO: 167 is C, V, A, G, S, P, T, D or N;
(iii) amino acid residue at position 491 of SEQ ID NO: 167 is V or L; and/or
(iv) amino acid residue at position 505 of SEQ ID NO: 167 is Q or N and amino acid residue at position 568 of SEQ ID NO: 167 is D or E,
with the proviso that the GH3 BGL polypeptide (does not comprise and) is not as set forth in any of the sequences of FIGs. 13A-H (any one of SEQ ID NOs: 22 to 62).
Item 51 '. The GH3 BGL polypeptide of item 29 or 29', comprising a polypeptide as set forth in SEQ ID NO: 167 or a secreted form thereof, wherein:
(i) amino acid residue at position 151 of SEQ ID NO: 167 is R, K, A, L, I, F, V or P;
(ii) amino acid residue at position 316 of SEQ ID NO: 167 is C, V, A, G, S, P, T, D or N;
(iii) amino acid residue at position 491 of SEQ ID NO: 167 is V or L; and/or
(iv) amino acid residue at position 505 of SEQ ID NO: 167 is Q or N and amino acid residue at position 568 of SEQ ID NO: 167 is D or E, with the proviso that the GH3 BGL polypeptide is not a polypeptide as set forth in any one of SEQ ID NOs: 22 to 44, and is not a secreted form of the polypeptide of any one of SEQ ID NOs: 22 to 44.
Item 52. The GH3 BGL polypeptide of item 51 or 51 ', wherein:
(i) amino acid residue at position 151 of SEQ ID NO: 167 is A, L, I or V;
(ii) amino acid residue at position 316 of SEQ ID NO: 167 is C, G, A or V;
(iii) amino acid residue at position 491 of SEQ ID NO: 167 is V; and/or
(iv) amino acid residue at position 505 of SEQ ID NO: 167 is Q and amino acid residue at position 568 of SEQ ID NO: 167 is D.
Item 53. The GH3 BGL polypeptide of item 51 or 51 ', wherein:
(i) amino acid residue at position 151 of SEQ ID NO: 167 is L;
(ii) amino acid residue at position 316 of SEQ ID NO: 167 is C, G, A or V;
(iii) amino acid residue at position 491 of SEQ ID NO: 167 is V; and/or
(iv) amino acid residue at position 505 of SEQ ID NO: 167 is Q and amino acid residue at position 568 of SEQ ID NO: 167 is D.
Item 54. The GH3 BGL polypeptide of any one of items 42 to 44, which except for residues defined in any one of items 42 to 44 and for the proviso defined in item 42, is as set forth in any one of Umay2: Ustilago maydis (SEQ ID NO: 37); Ccinl: Coprinopsis cinerea (also designated herein Ccin 0.120163) (SEQ ID NO: 30); Umayl: Ustilago maydis (SEQ ID NO: 29); Rory5: Rhizopus oryzae (SEQ ID NO: 36); Pbla Phycomyces blaskesleeanus (SEQ ID NO: 28); Pbla2: Phycomyces blaskesleeanus (SEQ ID NO: 31 ); Rory1: Rhizopus oryzae (SEQ ID NO: 32); Rory2: Rhizopus oryzae (SEQ ID NO: 33); Rory3: Rhizopus oryzae (SEQ ID NO: 34); RoryA: Rhizopus oryzae (SEQ ID NO: 35); Anid: Apergillus nidulans (SEQ ID NO: 24); Afum: Aspergillus fumigatus (SEQ ID NO: 25); Aacu: Aspergillus aculeatus (SEQ ID NO: 22); Anig: Aspergillus niger (SEQ ID NO: 23); Ncra: Neurospora crassa (SEQ ID NO: 26); Fgra: Fusarium graminearum (SEQ ID NO: 27); Mgrisea: Magnaporthe grisea (SEQ ID NO: 38); Ccin2: Coprinopsis cinerea (also designated herein CcinBGL1020 0.1 1558) (SEQ ID NO: 39); PchrBgl1010 (SEQ ID NO: 40); Aory. Aspergillus oryzae (SEQ ID NO: 41); CpelBglX0290: Wickerhamomyces anomalus (Pichia anomal) (SEQ ID NO: 42); SfibBglM22475: Saccharomycopsis fibuligera M22475 (SEQ ID NO: 43); SfibBglM22A76: Saccharomycopsis fibuligera M22476 (SEQ ID NO: 44), or any of their predicted secreted forms defined in FIGs. 13A-H.
Item 55. The GH3 BGL polypeptide of any one of items 45 to 47, which, except for residues defined in any one of items 45 to 47 and for the proviso defined in item 45, is as set forth in any one of Umay2: Ustilago maydis (SEQ ID NO: 37); Ccinl: Coprinopsis cinerea (also designated herein Ccin 0.120163) (SEQ ID NO: 30); Umayl: Ustilago maydis (SEQ ID NO: 29); Rory5: Rhizopus oryzae (SEQ ID NO: 36); Pblat Phycomyces blaskesleeanus (SEQ ID NO: 28); Pbla2: Phycomyces blaskesleeanus (SEQ ID NO: 31 ); Roryl: Rhizopus oryzae (SEQ ID NO: 32); Rory2: Rhizopus oryzae (SEQ ID NO: 33); Rory3: Rhizopus oryzae (SEQ ID NO: 34); RoryA: Rhizopus oryzae (SEQ ID NO: 35); Anid: Apergillus nidulans (SEQ ID NO: 24); Afum: Aspergillus fumigatus (SEQ ID NO: 25); Aacu: Aspergillus aculeatus (SEQ ID NO: 22); Anig: Aspergillus niger (SEQ ID NO: 23); Ncra: Neurospora crassa (SEQ ID NO: 26); Fgra: Fusarium graminearum (SEQ ID NO: 27); and MAory. Aspergillus oryzae (SEQ ID NO: 41 ); or any of their predicted secreted forms defined in FIGs. 13A-H.
Item 56. The GH3 BGL polypeptide of any one of items 48 to 50, which, except for residues defined in any one of items 48 to 50 and for the proviso defined in item 48, is as set forth in any one of Afum: Aspergillus fumigatus (SEQ ID NO: 25); Aacu: Aspergillus aculeatus (SEQ ID NO: 22); Aory: Aspergillus oryzae (SEQ ID NO: 41 ); Anig: Aspergillus niger (SEQ ID NO: 23); Ncra: Neurospora crassa (SEQ ID NO: 26); or Fgra: Fusarium graminearum (SEQ ID NO: 27); or any of their predicted secreted forms defined in FIGs. 13A-H.
Item 57. The GH3 BGL polypeptide of any one of items 51 to 53, which, except for residues defined in any one of items 51 to 53 and for the proviso defined in item 51 , is as set forth in any one of Afum: Aspergillus fumigatus (SEQ ID NO: 25); Aacu: Aspergillus aculeatus (SEQ ID NO: 22); Anig: Aspergillus niger (SEQ ID NO: 23); Ncra: Neurospora crassa (SEQ ID NO: 26); or Fgra: Fusarium graminearum (SEQ ID NO: 27); or any of their predicted secreted forms defined in FIGs. 13A-H.
Item 58. The GH3 BGL polypeptide of item 1 , as set forth in any one of the sequences of FIGs. 14A-U (SEQ ID NOs: 63 to 162).
Item 54'. The GH3 BGL polypeptide of any one of items 42 and 42' to 44, which except for residues defined in any one of items 42 and 42' to 44 and for the proviso defined in item 42 or 42', comprises a polypeptide as set forth in any one of SEQ ID NOs: 22-44, or comprises a secreted form of the polypeptide as set forth in any one of SEQ ID NOs: 22-44.
Item 55'. The GH3 BGL polypeptide of any one of items 45 and 45' to 47, which, except for residues defined in any one of items 45 and 45' to 47 and for the proviso defined in item 45 or 45', comprises a polypeptide as set forth in any one of SEQ ID NO: 22-37 and 41 , or comprises a secreted form of the polypeptide as set forth in any one of SEQ ID NO: 22-37 and 41.
Item 56'. The GH3 BGL polypeptide of any one of items 48 and 48' to 50, which, except for residues defined in any one of items 48 and 48' to 50 and for the proviso defined in item 48 or 48', comprises a polypeptide as set forth in any one of SEQ ID NO: 22-23, 25-27 and 41 , or comprises a secreted form of the polypeptide as set forth in any one of SEQ ID NO: 22-23, 25-27 and 41.
Item 57'. The GH3 BGL polypeptide of any one of items 51 and 51 ' to 53, which, except for residues defined in any one of items 51 and 51 ' to 53 and for the proviso defined in item 51 or 51 ', comprises a polypeptide as set forth in any one of SEQ ID NO: 22-23, 25 and 41 , or comprises a secreted form of the polypeptide as set forth in any one of SEQ ID NO: 22-23, 25 and 41.
Item 58'. The GH3 BGL polypeptide of item 1 or 1 ', comprising a polypeptide as set forth in any one of SEQ ID NOs: 63 to 144, or comprising a secreted form of the polypeptide as set forth in any one of SEQ ID NOs: 63 to 144.
Item 59'. The GH3 BGL polypeptide of any one of the preceding items, comprising a signal peptide. Item 60'. The GH3 BGL polypeptide of any one of the preceding items, comprising a signal peptide including an MFa pre sequence which, except for a substitution of the alanine residue at position 9 and/or a substitution of the proline residue at position 21 and/or a substitution of the valine residue at position 22, is as set forth in SEQ ID NO: 9.
Item 61'. The GH3 BGL polypeptide of item 59 or 59', wherein the signal peptide is a MFa pre sequence which, except for a substitution of the alanine residue at position 9 for a threonine residue and/or a substitution of the proline residue at position 21 for a threonine or a serine residue and/or a substitution of the valine residue at position 22 for an alanine or an aspartate residue, is as set forth in SEQ ID NO: 9.
Item 62'. The GH3 BGL polypeptide of any one of the preceding items, comprising a signal peptide including an MFa pre sequence as set forth in any one of SEQ ID NOs: 9 and 168-172.
Item 63'. The GH3 BGL polypeptide of any one of the preceding items , which is a secreted polypeptide form.
Item 59. A secreted form of the GH3 BGL polypeptide defined in any one of the preceding items.
In a specific embodiment, the GH3 BGL polypeptide of the present invention comprises at least two of the elements (a) to (d) defined in any one of the preceding items (i.e. (a) the triosephosphateisomerase domain; (b) the coordinating loop domain; (c) part Z1-Z6 of the β/α sandwich domain; and (d) parts A1 -A5 and B1 -B5 of the β/α sandwich domain). In another specific embodiment, it contains at least 3 of such elements. In another specific embodiment, it contains all 4 of such elements.
In another specific embodiment, the GH3 BGL polypeptide of the present invention does not comprise a polypeptide as set forth in any one of SEQ ID NOs: 22 to 44, and does not comprise a secreted form of the polypeptide of any one of SEQ ID NOs: 22 to 44.
Item 60. A vector comprising a nucleic acid encoding the GH3 BGL polypeptide defined herein.
Item 61. The vector of item 60, further comprising a nucleic acid encoding an endoglucanase (EGLs; EC 3.2.1.4).
Item 62. The vector of item 60 or 61 , further comprising a nucleic acid encoding a cellobiohydrolase.
Item 63. The vector of any one of items 60 to 62, further comprising a terminator and/or a promoter.
Item 64. A host cell expressing (a) the GH3 BGL polypeptide defined in any one of the preceding items; or (b) the vector defined in any one of items 60 to 63 and 60' to 63'.
Item 65. A composition comprising (a) (i) the GH3 BGL polypeptide defined in any one of the preceding items; (ii) the vector defined in any one of items 60 to 63 and 60' to 63'; (iii) the host cell defined in item 64 or 64'; or (iv) a cell lysate or a culture medium of (iii); and (a) (i) a carrier; and/or (ii) at least one other cellulase. Item 66. A method of converting a cellulosic substrate into a fermentable sugar comprising contacting (i) the GH3 BGL polypeptide defined in any one of the preceding items; or (b) the composition defined in item 65 or 65', with the cellulosic substrate, whereby a fermentable sugar is generated.
Item 67. The method of item 66, wherein the cellulosic substrate is soluble cellodextrin.
Item 68. The method of item 67, wherein the soluble cellodextrin is cellobiose.
Item 69. The method of any one of items 66 to 68, wherein the sugar is glucose.
Other objects, advantages and features of the present invention will become more apparent upon reading of the following non-restrictive description of specific embodiments thereof, given by way of example only with reference to the accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
In the appended drawings:
FIGs. 1A-B. Directed evolution of a-BGL1. (A) Schematic of the assembly and expression cassettes from pKL022, pKL024 and pKL029 (SEQ ID NO: 1); (B) Activities of a wild type population and selection pool from 96 well plate activity assays. (C) Activity assays for the improved BGLs. Error bars represent mean ± 95% confidence interval of triplicate experiments.
FIGs. 2A-B. Representative VQ i.e. initial rate for the production of product during the linear phase of the reaction progress curve, versus substrate concentration plots for the production of pNP (product of hydrolization) at a range of pNPG concentrations in the presence of inhibitor for WT anBGL.1 (also designated a-BGL-1 ), Y305C, V22D, Q140L, A480V, K494Q and N557D. Curves legends refer to glucose mM. Ordinate axis is Vo (pmole pNP L 1 mirr1); and abscissa axis is mM pNPG.
FIGs. 3A-D. Structure/function analysis of beneficial mutations in GH3 BGLs FIG. 3A. Molecular mapping of mutations using the A aculeatus BGL1 crystal structure (PDB 4IIB). Chain A, pale grey; Chain B, dark grey. Substitutions identified by mutagenesis and functional selection are shown and labelled on Chain A. FIG. 3B. Residues contributing to the substrate binding pocket of A aculeatus BGL1. FIG. 3C. Gly294 - Gly313 residues coordinate Phe305 in the +1 subsite. FIG. 3D. Alignments of native GH3 residues. Beneficial residues identified by directed evolution by the present invention are underlined. Asterisks (*) indicate beneficial substitutions found in nature. A acu: Aspergillus aculeatus BGL1 (SEQ ID NO: 22); A nig: Aspergillus niger BGL1 (SEQ ID NO: 23); A nid: Apergillus nidulans (SEQ ID NO: 24); A turn: Aspergillus fumigatus (SEQ ID NO: 25); N. era: Neurospora crassa (SEQ ID NO: 26); F. gra: Fusarium graminearum (SEQ ID NO: 27); P. blal: Phycomyces blaskesleeanus (SEQ ID NO: 28); U. mayl: Ustilago maydis (SEQ ID NO: 29); C. cin: Coprinopsis cinerea (SEQ ID NO: 30); P. blal: Phycomyces blaskesleeanus (SEQ ID NO: 31 ); R. ory1: Rhizopus oryzae (SEQ ID NO: 32); R. ory2: Rhizopus oryzae (SEQ ID NO: 33); R. ory3: Rhizopus oryzae (SEQ ID NO: 34); R. oryA: Rhizopus oryzae (SEQ ID NO: 35); R. ory : Rhizopus oryzae (SEQ ID NO: 36); U. mayl: Ustilago maydis (SEQ ID NO: 37). Positions highlighted in various shades of grey are part of the substrate binding pocket as shown in FIG. 3B, while positions in white are not. FIGs. 4A-B. FIG. 4A. Position of residue 305 in substrate binding pocket (using numbering of A niger BGL1) of various GH3 BGL1 members. FIG. 4B. Alignments of GH3 BGL1 of different molds including coordinating loop 294- 313 (using numbering of A niger BGL1) (A niger GH3 BGL1 fragment (SEQ ID NO: 2), A fumigatus GH3 BGL1 fragment (SEQ ID NO: 3), A oryzae GH3 BGL1 fragment (SEQ ID NO: 4), Penicillium brasilianum GH3 BGL1 fragment (SEQ ID NO: 5), Magnaporthe grisea GH3 BGL1 fragment (SEQ ID NO: 6), and Neurospora crassa GH3 BGL1 fragment (SEQ ID NO: 7)).
FIGs. 5A-B. Analysis of β-glucosidase reaction rates for the production of glucose at a range of cellobiose concentrations. (FIG. 5A) a-BGL1 ; (FIG. 5B) engineered variants.
FIGs. 6A-B. Thin layer chromatography of BGL reactions using (FIG. 6A) 40 mM pNPG; and (FIG. 6B) 50 mM cellobiose. Standards (1 μΙ) were 40 mM pNPG, 50 mM cellobiose (C), 50 mM glucose (G), and 25 mM gentiobiose (Ge).
FIG. 7. Amino acid sequence of Mfalpha-AnBGL.1 (SEQ ID NO: 8) consisting of underlined Mfalpha fragment sequence (amino acids 1 to 24) (SEQ ID NO: 9), AnBGLI 22-860 (SEQ ID NO: 10) and bolded C-terminal fragment of 25 amino acids (SEQ ID NO: 1 1) including polyhistidine.
FIGs. 8A-B. Nucleic acid sequence of Mfalpha-AnBGL.1 (SEQ ID NO: 12) consisting of underlined Mfalpha fragment sequence (SEQ ID NO: 13), AnBGLI 22-860 (SEQ ID NO: 14) and bolded C-terminal fragment (SEQ ID NO: 15) including polyhistidine. Specific mutations identified in examples presented herein below are listed as miscellaneous features in FIGs 8A-B.
FIGs. 9A-C. Nucleic acid sequence of pKL022 (SEQ ID NO: 16).
FIGs. 10A-C. Nucleic acid sequence of pKL029 (SEQ ID NO: 17).
FIGs. 1 1A-C. Nucleic acid sequence of pKL024 (SEQ ID NO: 18).
FIGs. 12A-E: Nucleotide sequence of plasmid pGREG503 (SEQ ID NO: 19); of TDH3 promoter (SEQ ID NO: 20); and of CYC1 terminator (SEQ ID NO: 21 ).
FIGs. 13A-H. Amino acid sequences of various GH3 BGLs including those disclosed in FIG. 3D, namely those from Aacu: Aspergillus aculeatus (SEQ ID NO: 22); Anig: Aspergillus niger (SEQ ID NO: 23) ; Anid: Apergillus nidulans (SEQ ID NO: 24); Afum: Aspergillus fumigatus (SEQ ID NO: 25); Ncra: Neurospora crassa (SEQ ID NO: 26); Fgra: Fusarium graminearum (SEQ ID NO: 27); Pblal: Phycomyces blaskesleeanus (SEQ ID NO: 28); Umay Ustilago maydis (SEQ ID NO: 29); Ccinl: Coprinopsis cinerea (also designated herein Ccin 0.120163) (SEQ ID NO: 30); Pblal: Phycomyces blaskesleeanus (SEQ ID NO: 31); Roryl: Rhizopus oryzae (SEQ ID NO: 32); Roryl: Rhizopus oryzae (SEQ ID NO: 33); Rory3: Rhizopus oryzae (SEQ ID NO: 34); RoryA: Rhizopus oryzae (SEQ ID NO: 35); Rory5: Rhizopus oryzae (SEQ ID NO: 36); Umay2: Ustilago maydis (SEQ ID NO: 37); Magnaporthe grisea (SEQ ID NO: 38); Ccinl: Coprinopsis cinerea (also designated herein CcinBGL1020 0.1 1558) (SEQ ID NO: 39); PchrBgHOW: Phanerochaete chrysosporium (SEQ ID NO: 40); Aory. Aspergillus oryzae (SEQ ID NO: 41); CpelBglX0290: Wickerhamomyces anomalus (Pichia anomal) (SEQ ID NO: 42); SfibBglM22475: Saccharomycopsis fibuligera M22475 (SEQ ID NO: 43); SfibBglM22476: Saccharomycopsis fibuligera M22476 (SEQ ID NO: 44); and predicted secreted forms thereof (SEQ ID NOs: 45-62). Sequences highlighted in grey are signal peptides as predicted by SignalP (http://www.cbs.dtu.dk/services/SignalP/).
FIG. 14A-U I- Malpha Anig mutants with heterologous C-terminal sequences (SEQ ID NOs: 63-85); II- Malpha Anig mutants without heterologous C-terminal sequences (SEQ ID NOs: 86-108); \W-Anig mutants with predicted native signal peptide and with heterologous C-terminal sequences (SEQ ID NOs: 109-126); N-Anig mutants with predicted native signal peptide and without heterologous C-terminal sequences (SEQ ID NOs: 127-144); and M-Anig mutants without native signal peptide and without heterologous C-terminal sequences (SEQ ID NOs: 145-162). Sequences highlighted in grey are signal peptides as predicted by SignalP (http://www.cbs.dtu.dk/services/SignalP/).
FIGs. 15A-B. Consensus derived from alignment of BGL orthologues presented in FIGs. 16A-H (SEQ ID NO: 163); consensus derived from alignment of BGL orthologues presented in FIGs. 17A-G (SEQ ID NO: 164); consensus derived from alignment of BGL orthologues presented in FIGs. 18A-C (SEQ ID NO: 165); consensus derived from alignment of BGL orthologues presented in FIG. 19A-C (SEQ ID NO: 166); and consensus derived from alignment of BGL orthologues presented in FIGs. 20A-B (SEQ ID NO: 167). Sequences highlighted in grey are signal peptides as predicted by SignalP (http://www.cbs.dtu.dk/services/SignalP/).
FIG. 16A-H presents an alignment of the BGL1 amino acid sequences of Umay2: Ustilago maydis (SEQ ID NO: 37); Ccint Coprinopsis cinerea (also designated herein Ccin 0.120163) (SEQ ID NO: 30); Umayl: Ustilago maydis (SEQ ID NO: 29); Rory5: Rhizopus oryzae (SEQ ID NO: 36); Pblal: Phycomyces blaskesleeanus (SEQ ID NO: 28); Pbla2: Phycomyces blaskesleeanus (SEQ ID NO: 31 ); Rory1: Rhizopus oryzae (SEQ ID NO: 32); Roryl: Rhizopus oryzae (SEQ ID NO: 33); Rory3: Rhizopus oryzae (SEQ ID NO: 34); RoryA: Rhizopus oryzae (SEQ ID NO: 35); Anid: Apergillus nidulans (SEQ ID NO: 24); Afum: Aspergillus fumigatus (SEQ ID NO: 25); Aacu: Aspergillus aculeatus (SEQ ID NO: 22); Anig: Aspergillus niger (SEQ ID NO: 23); Ncra: Neurospora crassa (SEQ ID NO: 26); Fgra: Fusarium graminearum (SEQ ID NO: 27); Mgrisea: Magnaporthe grisea (SEQ ID NO: 38); Ccin2: Coprinopsis cinerea (also designated herein CcinBGL1020 0.1 1558) (SEQ ID NO: 39); PchrBgl1010 (SEQ ID NO: 40); Aory. Aspergillus oryzae (SEQ ID NO: 41); CpelBglX0290: Wickerhamomyces anomalus (Pichia anomal) (SEQ ID NO: 42); SfibBglM22475: Saccharomycopsis fibuligera M22475 (SEQ ID NO: 43); SfibBglM22A76: Saccharomycopsis fibuligera M22476 (SEQ ID NO: 44); and consensus derived therefrom (SEQ ID NO: 163). In this alignment, "*" denotes that the residues in that column are identical in all sequences of the alignment, ":" denotes that conserved substitutions have been observed, and "." denotes that semi-conserved substitutions have been observed. Consensus sequences derived from these alignments are also presented wherein X is any amino acid. Sequences highlighted in grey are signal peptides as predicted by SignalP (http://www.cbs.dtu.dk/services/SignalP/). Boxes define BGLs domains, namely, in the order of their appearances, the triosephosphateisomerase domain; the coordinating loop domain; and three portions of the β/α sandwich domain.
FIGs. 17A-G presents an alignment of the BGL1 amino acid sequences of Umay2: Ustilago maydis (SEQ ID NO: 37); Ccint Coprinopsis cinerea (also designated herein Ccin 0.120163) (SEQ ID NO: 30); Umayl : Ustilago maydis (SEQ ID NO: 29); Rory5: Rhizopus oryzae (SEQ ID NO: 36); Pblal: Phycomyces blaskesleeanus (SEQ ID NO: 28); Pblal. Phycomyces blaskesleeanus (SEQ ID NO: 31 ); Roryl: Rhizopus oryzae (SEQ ID NO: 32); Rory2: Rhizopus oryzae (SEQ ID NO: 33); Rory3: Rhizopus oryzae (SEQ ID NO: 34); RoryA: Rhizopus oryzae (SEQ ID NO: 35); Anid: Apergillus nidulans (SEQ ID NO: 24); Afum: Aspergillus fumigatus (SEQ ID NO: 25); Aacu: Aspergillus aculeatus (SEQ ID NO: 22); Anig: Aspergillus niger (SEQ ID NO: 23); Ncra: Neurospora crassa (SEQ ID NO: 26); Fgra: Fusarium graminearum (SEQ ID NO: 27); and consensus derived therefrom (SEQ ID NO: 164). In this alignment, "*" denotes that the residues in that column are identical in all sequences of the alignment, ":" denotes that conserved substitutions have been observed, and "." denotes that semi-conserved substitutions have been observed. Consensus sequences derived from these alignments are also presented wherein X is any amino acid. Sequences highlighted in grey are signal peptides as predicted by SignalP (http://www.cbs.dtu.dk/services/SignalP/). Boxes define BGLs domains, namely, in the order of their appearances, the triosephosphateisomerase domain; the coordinating loop domain; and three portions of the β/α sandwich domain.
FIGs. 18A-C presents an alignment of the amino acid sequences of BGL1 from Afum: Aspergillus fumigatus (SEQ ID NO: 25); Aory. Aspergillus oryzae (SEQ ID NO: 41 ); Aacu: Aspergillus aculeatus (SEQ ID NO: 22); Anig: Aspergillus niger (SEQ ID NO: 23); Ncra: Neurospora crassa (SEQ ID NO: 26); Fgra: Fusarium graminearum (SEQ ID NO: 27); and consensus derived therefrom (SEQ ID NO: 165). In this alignment, "*" denotes that the residues in that column are identical in all sequences of the alignment, ":" denotes that conserved substitutions have been observed, and "." denotes that semi-conserved substitutions have been observed. Consensus sequences derived from these alignments are also presented wherein X is any amino acid. Sequences highlighted in grey are signal peptides as predicted by SignalP (http://www.cbs.dtu.dk/services/SignalP/). Boxes define BGLs domains, namely, in the order of their appearances, the triosephosphateisomerase domain; the coordinating loop domain; and three portions of the β/α sandwich domain.
FIGs. 19A-C presents an alignment of the amino acid sequences of BGL1 from Afum: Aspergillus fumigatus (SEQ ID NO: 25); Aacu: Aspergillus aculeatus (SEQ ID NO: 22); Anig: Aspergillus niger (SEQ ID NO: 23); Ncra: Neurospora crassa (SEQ ID NO: 26); Fgra: Fusarium graminearum (SEQ ID NO: 27); and consensus derived therefrom (SEQ ID NO: 166). In this alignment, "*" denotes that the residues in that column are identical in all sequences of the alignment, ":" denotes that conserved substitutions have been observed, and "." denotes that semi-conserved substitutions have been observed. Consensus sequences derived from these alignments are also presented wherein X is any amino acid. Sequences highlighted in grey are signal peptides as predicted by SignalP (http://www.cbs.dtu.dk/services/SignalP/). Boxes define BGLs domains, namely, in the order of their appearances, the triosephosphateisomerase domain; the coordinating loop domain; and three portions of the β/α sandwich domain.
FIGs. 20A-B presents an alignment of the amino acid sequences of BGL1 from Afum: Aspergillus fumigatus (SEQ ID NO: 25); Aory: Aspergillus oryzae (SEQ ID NO: 41); Aacu: Aspergillus aculeatus (SEQ ID NO: 22); and Anig: Aspergillus niger (SEQ ID NO: 23); and consensus derived therefrom (SEQ ID NO: 167). In this alignment, "*" denotes that the residues in that column are identical in all sequences of the alignment, ":" denotes that conserved substitutions have been observed, and "." denotes that semi-conserved substitutions have been observed. Consensus sequences derived from these alignments are also presented wherein X is any amino acid. Sequences highlighted in grey are signal peptides as predicted by SignalP (http://www.cbs.dtu.dk/services/SignalP/). Boxes define BGLs domains, namely, in the order of their appearances, the triosephosphateisomerase domain; the coordinating loop domain; and three portions of the β/α sandwich domain. FIG. 21 presents a phylogenetic tree of the GH3 BGLs of FIGs. 13A-H.
FIGs. 22A-B presents the percent identities of the GH3 BGLs of FIGs. 13A-H.
DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS
General Definitions
Headings, and other identifiers, e.g., (a), (b), (i), (ii), etc., are presented merely for ease of reading the specification and claims. The use of headings or other identifiers in the specification or claims does not necessarily require the steps or elements to be performed in alphabetical or numerical order or the order in which they are presented.
In the present description, a number of terms are extensively utilized. In order to provide a clear and consistent understanding of the specification and claims, including the scope to be given such terms, the following definitions are provided.
The use of the word "a" or "an" when used in conjunction with the term "comprising" in the claims and/or the specification may mean "one" but it is also consistent with the meaning of "one or more", "at least one", and "one or more than one".
Throughout this application, the term "about" is used to indicate that a value includes the standard deviation of error for the device or method being employed to determine the value. In general, the terminology "about" is meant to designate a possible variation of up to 10%. Therefore, a variation of 1 , 2, 3, 4, 5, 6, 7, 8, 9 and 10% of a value is included in the term "about". Unless indicated otherwise, use of the term "about" before a range applies to both ends of the range.
As used in this specification and claim(s), the words "comprising" (and any form of comprising, such as "comprise" and "comprises"), "having" (and any form of having, such as "have" and "has"), "including" (and any form of including, such as "includes" and "include") or "containing" (and any form of containing, such as "contains" and "contain") are inclusive or open-ended and do not exclude additional, un-recited elements or method steps.
As used herein, the term "consists of" or "consisting of" means including only the elements, steps, or ingredients specifically recited in the particular claimed embodiment or claim.
Enzymes
The present invention relates to novel (mutated/recombinant) cellulase enzymes (e.g., β-glucosidase) that can be used e.g., for more efficiently transforming cellulosic material (e.g., cellobiose) into sugar /degrading cellulosic material (e.g., cellobiose). The enzymes may be encoded by plasmids or chromosomes in a host cell, the host cell being contacted with the substrate; or can be used in vitro directly on the substrate.
Without being so limited, enzymes encompassed by the present invention include: native or synthetic β-glucosidase (BGLs; 3.2.1.21 ), and optionally native or synthetic endoglucanases (EGLs; EC 3.2.1.4) and/or native or synthetic cellobiohydrolases (CBHs; 3.2.1.91 ). Useful enzymes for the present invention may be derived from Umay2: Ustiiago maydis (SEQ ID NO: 37); Ccinl: Coprinopsis cinerea (also designated herein Ccin 0.120163) (SEQ ID NO: 30); Umayl: Ustiiago maydis (SEQ ID NO: 29); Rory5: Rhizopus oryzae (SEQ ID NO: 36); PbiaV. Phycomyces blaskesleeanus (SEQ ID NO: 28); Pbla2: Phycomyces blaskesleeanus (SEQ ID NO: 31); Roryl: Rhizopus oryzae (SEQ ID NO: 32); Roryl: Rhizopus oryzae (SEQ ID NO: 33); Rory3: Rhizopus oryzae (SEQ ID NO: 34); RoryA: Rhizopus oryzae (SEQ ID NO: 35); Anid: Apergillus nidulans (SEQ ID NO: 24); Afum: Aspergillus fumigatus (SEQ ID NO: 25); Aacu: Aspergillus aculeatus (SEQ ID NO: 22); Anig: Aspergillus niger (SEQ ID NO: 23); Ncra: Neurospora crassa (SEQ ID NO: 26); Fgra: Fusarium graminearum (SEQ ID NO: 27); Mgrisea: Magnaporthe grisea (SEQ ID NO: 38); Ccin2: Coprinopsis cinerea (also designated herein CcinBGL1020 0.11558) (SEQ ID NO: 39); PchrBgHOW: Phanerochaete chrysosporium (SEQ ID NO: 40); Aory. Aspergillus oryzae (SEQ ID NO: 41); CpelBglX0290: Wickerhamomyces anomalus (Pichia anomal) (SEQ ID NO: 42); SfibBglM22475: Saccharomycopsis fibuligera M22475 (SEQ ID NO: 43); SfibBglM22476: Saccharomycopsis fibuligera M22476 (SEQ ID NO: 44); etc. The present invention encompasses secreted forms of any of the foregoing. Predicted secreted (e.g., devoid of signal peptide) forms and full (translated) forms of illustrative examples of these enzymes are presented in FIGs. herein (e.g., FIGs. 7-8 and 13-20) (SEQ ID NOs: 22-167). In a specific embodiment, the invention encompasses enzymes of any one of SEQ ID NO: 22-44 comprising a substitution of their native signal peptide with a mutated MF-a signal peptide as described herein (e.g., SEQ ID NO: 9 comprising one or more mutation at positions 9 and/or 21 and/or 22, such as those depicted in SEQ ID NO: 168-172).
Useful GH3 BGLs of the present invention have one or more beneficial mutations in that these mutations e.g., (i) increase their resistance to substrate inhibition; and/or (ii) increase their resistance to product inhibition.
As used herein the term "mutation" is meant to refer to a naturally occurring or artificially induced alteration in a nucleotide sequence resulting in a change in the encoded amino acid sequence. It also refers herein to the change in the amino acid sequence per se. It may result in a substitution of (at least one) an amino acid residue by another; a removal of (at least one) amino acid residue or an addition of (at least one) amino acid residue as compare to the original sequence (e.g., native GH3 BGL sequence).
Useful GH3 BGLs of the present invention may have one or more mutations in each of (i) the position 305 coordinating loop; (ii) the triosephosphateisomerase domain; and/or (iii) BGL β/α sandwich domain.
More particularly, useful GH3 BGLs of the present invention may have one or more mutations in the position 305 coordinating loop (i.e. spanning residues 294 to 313 using the numbering of the Aspergillus aculeatus BGL1 polypeptide sequence) that result in a more open substrate binding pocket (i.e. with less steric hindrance) thereby reducing substrate (e.g., cellobiose) affinity. The present invention encompasses mutation replacing at least one original (e.g., native) amino acid residues from the original (e.g., native) GH3 BGL coordinating loop with at least one less bulky/smaller amino acid residue thereby resulting in a more open substrate binding pocket. In a preferred embodiment, the smaller amino acid residue is in the variable region of the coordinating loop. In a more specific embodiment, it is located in the substrate binding pocket (e.g., amino acid at position 305 in A. niger or aculeatus - see FIGs. 3B and 4A). Small amino acid residues in accordance with the present invention include C, V, A, G, S, P, T, D or N.
Useful GH3 BGLs of the present invention may also have one or more mutations in the triosephosphateisomerase domain (spanning Leu19-Ser356 in AaBGLI ). In specific embodiments, the amino acid residue at position 140 (using the A. niger or aculeatus BGL numbering) is replaced with an amino acid residue that is less likely to form a hydrogen bond. Certain amino acid residues are more likely to form a hydrogen bond such as Glutamine, Asparagine, Histidine, Serine Threonine, Tyrosine Cysteine, Methionine, Tryptophan, Aspartate, Glutamate, Glycine. Amino acids other than the above are preferred for position 140. More preferred amino acid residues are R, K, A, L, I, F, V or P.
Useful GH3 BGLs of the present invention may also have one or more mutations in the BGL β/α sandwich domain (spanning Gin385 - Gly588 in AaBGL.1) such as those disclosed in FIGs. 14A-U.
Consensuses derived from the alignments of certain of these orthologues are also presented in FIGs. 15-20. In specific embodiment of these consensuses, each X in the consensus sequences (e.g., consensuses in FIGs. 15-20) is defined as being any amino acid, or absent when this position is absent in one or more of the orthologues presented in the alignment. In specific embodiment of these consensuses, each X in the consensus sequences is defined as being any amino acid that constitutes a conserved or semi-conserved substitution of any of the amino acid in the corresponding position in the orthologues presented in the alignment, or absent when this position is absent in one or more of the orthologues presented in the alignment. In FIGs. 16-20, conservative substitutions are denoted by the symbol ":" and semi-conservative substitutions are denoted by the symbol See definitions for "conservative" and "semi-conservative" below. In another embodiment, each X refers to any amino acid belonging to the same class as any of the amino acid residues in the corresponding position in the orthologues presented in the alignment, or absent when this position is absent in one or more of the orthologues presented in the alignment. In another embodiment, each X refers to any amino acid in the corresponding position of the orthologues presented in the alignment, or absent when this position is absent in one or more of the orthologues presented in the alignment.
Hence enzymes in accordance with the present invention include enzymes having the specific nucleotide or amino acid sequences described in FIGs. 7-8 and 13-20, or an amino acid sequence that satisfies any of the consensuses as defined above (e.g., FIGs. 15-20), including translated and secreted forms thereof. A secreted form of an enzyme is devoid of signal peptide. Predicted secreted forms are sometimes shaded or delineated with vertical line in attached FIGs. 7-8 and 13-20)) wherein the one or more Xs are defined as above. Without being so limited, secreted forms of the enzymes of the present invention includes predicted secreted forms as illustrated herein, namely polypeptides as set forth in SEQ ID NOs: 45-62, 145-162, amino acid residues at positions 199-1 156 of SEQ ID NO: 163, amino acid residues at positions 198-1 147 of SEQ ID NO: 164, amino acid residues at positions 37-902 of SEQ ID NO: 165, amino acid residues at positions 37-901 of SEQ ID NO:166 and amino acid residues at positions 32-874 of SEQ ID NO: 167. Enzymes in accordance with the present invention may also include amino acid sequence that satisfy consensus sequences of catalytic domains of these enzymes. Enzyme sequences in accordance with the present invention also include the specific sequences described in FIGs. 7-8 and 13-20with up to 10 amino acids (9, 8, 7, 6, 5, 4, 3, 2 or 1 ) truncated at the N- and/or C-terminal thereof.
In a more specific embodiment, the GH3 BGL1 of the present invention is derived from Umay2: Ustilago maydis (SEQ ID NO: 37); Ccinl: Coprinopsis cinerea (also designated herein Ccin 0.120163) (SEQ ID NO: 30); Umayl: Ustilago maydis (SEQ ID NO: 29); Rory5: Rhizopus oryzae (SEQ ID NO: 36); Pbla Phycomyces blaskesleeanus (SEQ ID NO: 28); Pbla2: Phycomyces blaskesleeanus (SEQ ID NO: 31); Rory Rhizopus oryzae (SEQ ID NO: 32); Rory2: Rhizopus oryzae (SEQ ID NO: 33); Rory3: Rhizopus oryzae (SEQ ID NO: 34); RoryA: Rhizopus oryzae (SEQ ID NO: 35); Anid: Apergillus nidulans (SEQ ID NO: 24); Afum: Aspergillus fumigatus (SEQ ID NO: 25); Aacu: Aspergillus aculeatus (SEQ ID NO: 22); Anig: Aspergillus niger (SEQ ID NO: 23); Ncra: Neurospora crassa (SEQ ID NO: 26); Fgra: Fusarium graminearum (SEQ ID NO: 27); Mgrisea: Magnaporthe grisea (SEQ ID NO: 38); Ccin2: Coprinopsis cinerea (also designated herein CcinBGL1020 0.1 1558) (SEQ ID NO: 39); PchrBgHOIO (SEQ ID NO: 40); Aory. Aspergillus oryzae (SEQ ID NO: 41); CpelBglX0290: Wickerhamomyces anomalus (Pichia anomal) (SEQ ID NO: 42); SfibBglM22475: Saccharomycopsis fibuligera M22475 (SEQ ID NO: 43); SfibBglM22476: Saccharomycopsis fibuligera M22476 (SEQ ID NO: 44); etc. In a more specific embodiment, the BGL1 is derived from Aspergillus niger and is as disclosed in FIGs. 14A-U or is an enzymatically active (i.e. has GH3 BGL activity) variant thereof.
As used herein the "reference amino acid residue" and the "corresponding reference amino acid residue" in the contexts of domains (e.g., coordinating loop domain, triosephosphateisomerase domain, β/α sandwich domain and subdomains) of GH3 BGL polypeptides of the present invention refer to the amino residue present at the specified position of any such domain in any native fungus (e.g., filamentous fungus) GH3 BGL, including but not limited to the native fungi listed herein. In specific embodiments, the reference coordinating loop domain, the reference triosephosphateisomerase domain, the reference β/α sandwich domain and subdomains correspond to any of the sequences as set forth in the boxes as shown in FIGs. 16-20.
The enzymes could also be modified for better expression/stability/yield in the host cell (e.g., replacing the native N- terminal membrane-spanning domain by the N-terminal membrane-spanning domain from another plant or yeast gene (e.g., Laduca sativa (lettuce) germacrene A oxidase) or from a yeast ER bound protein (e.g., ergl or erg8); using a heterologous signal peptide promoting increased expression (ex. MFa pre sequence and mutants thereof and native sequences of orthologues). Without being so limited, useful signal peptides for the present invention include (Mrfpsiftavlfaassalaapvnt (MFa native) (SEQ ID NO: 9); Mrfpsiftavlfaassalaapant (MFa mutant) (SEQ ID NO: 168); Mrfpsiftavlfaassalaapdnt (MFa mutant) (SEQ ID NO: 169); Mrfpsifttvlfaassalaatvnt (MFa mutant) (SEQ ID NO: 170); Mrfpsiftavlfaassalaasvnt (MFa mutant) (SEQ ID NO: 171 ); Mrfpsiftavlfaassalaatvnt (MFa mutant) (SEQ ID NO: 172); Mrftlieavaltavslasade (Anig) (SEQ ID NO: 173); Mrfgwlevaaltaasvana (afum) (SEQ ID NO: 174); Mkfaiplallasgnlala (Ncra) (SEQ ID NO: 175); Mkanwlaaavylaagtda (Fgra) (SEQ ID NO: 176); M I sly tsatf 11 gf ssiyf an hay al (Pblal ) (SEQ ID NO: 177); Mkfstflplaalacaaqsla (Umayl ) (SEQ ID NO: 178); Mvqflsltssalillvacasa (Pbla2) (SEQ ID NO: 179); Myipslsaiaitallaasttvqvaqa (Roryl ) (SEQ ID NO: 180); Myipslsaiavtvllaantaiqitea (Rory2) (SEQ ID NO: 181 ); Myipslsavavavliatntvlpiaea (Rory3) (SEQ ID NO: 182); Mqyagklhgnysiavtvllaanavlpgaea (Rory4) (SEQ ID NO: 183); Mrfsgivatlvagagvsa (Mgrisea) (SEQ I D NO: 184); Mpygtalpiivlvlstaisiltsa (Pchr) (SEQ ID NO: 185); Mklgwievaalaaasvvsa (Aory) (SEQ I D NO: 186); Mllplyglasflvlsqa (Cpel) (SEQ ID NO: 187); Mlmivqllvfalglava (Sfibl ) (SEQ ID NO: 188); and Mllilellvliiglgva (Sfib2) (SEQ ID NO: 189)). Other appropriate signal sequences may be routinely selected using softwares such as SignalP (http://www.cbs.dtu.dk/services/SignalP/. (See also refs 96 and 99); codon optimization for expression in the heterologous host; using a heterologous C-terminal sequence that may comprise or not a poly-histidine fragment (e.g., gsaagsgefmskgeel (SEQ I D NO: 190) or gsaagsgefmskgeelhhhhhh (SEQ ID NO: 1 1 )); use of different combinations of promoter/terminators for optimal coexpression of multiple enzymes; spatial colocalization of sequential enzymes using a linker system or organelle-specific membrane domain. In a more specific embodiment, useful enzymes are as shown in FIGs. 7-8 and 13-20.
A substantially identical sequence may comprise one or more conservative amino acid mutations. It is known in the art that one or more conservative amino acid mutations to a reference sequence may yield a mutant peptide with no substantial change in physiological, chemical, or functional properties compared to the reference sequence; in such a case, the reference and mutant sequences would be considered "substantially identical" polypeptides. Conservative amino acid mutation may include addition, deletion, or substitution of an amino acid; a conservative amino acid substitution is defined herein as the substitution of an amino acid residue for another amino acid residue with similar chemical properties (e.g., size, charge, or polarity).
In a non-limiting example, a conservative mutation may be an amino acid substitution. A conservative amino acid substitution is defined herein as the substitution of an amino acid residue for another amino acid residue with similar chemical properties (e.g., size, charge, or polarity). Such a conservative amino acid substitution may be a basic, neutral, hydrophobic, or acidic amino acid for another of the same group (See e.g., Table I below). By the term "basic amino acid" it is meant hydrophilic amino acids having a side chain pK value of greater than 7, which are typically positively charged at physiological pH. Basic amino acids include histidine (His or H), arginine (Arg or R), and lysine (Lys or K). By the term "neutral amino acid" (also "polar amino acid"), it is meant hydrophilic amino acids having a side chain that is uncharged at physiological pH, but which has at least one bond in which the pair of electrons shared in common by two atoms is held more closely by one of the atoms. Polar amino acids include serine (Ser or S), threonine (Thr or T), cysteine (Cys or C), tyrosine (Tyr or Y), asparagine (Asn or N), and glutamine (Gin or Q). The term "hydrophobic amino acid" (also "non-polar amino acid") is meant to include amino acids exhibiting a hydrophobicity of greater than zero according to the normalized consensus hydrophobicity scale of Eisenberg (1984). Hydrophobic amino acids include proline (Pro or P), isoleucine (He or I), phenylalanine (Phe or F), valine (Val or V), leucine (Leu or L), tryptophan (Trp or W), methionine (Met or M), alanine (Ala or A), and glycine (Gly or G). "Acidic amino acid" refers to hydrophilic amino acids having a side chain pK value of less than 7, which are typically negatively charged at physiological pH. Acidic amino acids include glutamate (Glu or E), and aspartate (Asp or D). Certain amino acid residues are more likely to form a hydrogen bond such as Glutamine, Asparagine, Histidine, Serine Threonine, Tyrosine Cysteine, Methionine, Tryptophan, Aspartate, Glutamate, and Glycine.
A semi-conserved amino acid replaces one residue with another one that has similar steric conformation, but does not share chemical properties. Examples of semi-conservative substitutions would include substituting cysteine for alanine or leucine; substituting serine for asparagine; substituting valine for threonine; or substituting proline for alanine.
Other classifications are also possible as shown in the Table I below.
Table I. amino acid classification.
Class Name of the amino acids
Aliphatic Glycine, Alanine, Valine, Leucine, Isoleucine Hydroxyl or Sulfur/Selenium-containing Serine, Cysteine, Selenocysteine, Threonine, Methionine
Cyclic Proline
Aromatic Phenylalanine, Tyrosine, Tryptophan
Basic Histidine, Lysine, Arginine
Acidic and their Amide Aspartate, Glutamate, Asparagine, Glutamine
Sequence identity is used to evaluate the similarity of two sequences; it is determined by calculating the percent of residues that are the same when the two sequences are aligned for maximum correspondence between residue positions. Any known method may be used to calculate sequence identity; for example, computer software is available to calculate sequence identity. Without wishing to be limiting, sequence identity can be calculated by software such as NCBI BLAST2, BLAST-P, BLAST-N, COBALT or FASTA-N, CLUSTAL OMEGA or any other appropriate software/tool that is known in the art (Johnson M, ef a/. (2008) Nucleic Acids Res. 36: W5-W9; Papadopoulos JS and Agarwala R (2007) Bioinformatics 23: 1073-79).
Percent identities between amino acid sequences of certain enzymes of the present invention are also presented (see e.g., FIGs. 22A-B showing percent identities of pairs of BGLs of the present invention). Hence enzyme sequences in accordance with the present invention include enzymes with amino acid sequences having high percent identities (e.g., at least 39%, 40%, 41 %, 42%, 43%, 44%, 45%, 46%, 47%, 48%, 49%, 50%, 55%, 56%, 57%, 58%, 59%, 60%, 61 %, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71 %, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81 %, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91 %, 92%, 93%, 94%, 95%, 96%, 97%, 98% and 99% identity) with enzymes specifically disclosed in the present invention and in particular with those shown to display useful activity (see e.g., FIGs. 13-15). The substantially identical sequences retain substantially the activity (i.e. BGL activity) and specificity of the reference sequence.
Nucleic acids, host cells
The present invention also relates to nucleic acids comprising nucleotide sequences encoding the above-mentioned enzymes. The nucleic acid may be codon-optimized. The nucleic acid can be a DNA or an RNA. The nucleic acid sequence can be deduced by the skilled artisan on the basis of the disclosed amino acid sequences. In a specific embodiment, the nucleic acid encodes one of the amino acid sequences as presented in any one of FIGs. 7-8 and 13-20 (orthologues and/or consensuses). In another specific embodiment, the nucleic acid for one or more enzymes is as shown in FIG. 8 and/or e.g., Aacu in Genbank D64088 or P48825; Anig G3YDY8; Anid Q5AYH8; Afum Q4WJJ3; Ncra Q7RWP2; Fgra I 1 RR89; Umayl A0A0D1 ECP0; Ccinl A8NIX3; Roryl I 1 BM51 ; Rory2 I 1 BTN0; Rory3 I 1 BUH2; Rory4 I1 BUH3; Rory5 I 1C896; Umay2 A0A0D1 CMS9; Mgrisea Q5EMW3; Ccin2 A8NIX0; Aory Q2UUD6; Cpel P06835; and Sfibl P22506Sfib2 P22507.
The present invention also encompasses vectors (plasmids) comprising the above-mentioned nucleic acids. The vectors can be of any type suitable, e.g., for expression of said polypeptides or propagation of genes encoding said polypeptides in a particular organism. The organism may be of eukaryotic or prokaryotic origin (e.g., yeast). The specific choice of vector depends on the host organism and is known to a person skilled in the art. In an embodiment, the vector comprises transcriptional regulatory sequences or a promoter operably-linked to a nucleic acid comprising a sequence encoding an enzyme involved in the saccharolytic pathway of the invention. A first nucleic acid sequence is "operably-linked" with a second nucleic acid sequence when the first nucleic acid sequence is placed in a functional relationship with the second nucleic acid sequence. For instance, a promoter is operably-linked to a coding sequence if the promoter affects the transcription or expression of the coding sequence. Generally, operably-linked DNA sequences are contiguous and, where necessary to join two protein coding regions, in reading frame. However, since for example enhancers generally function when separated from the promoters by several kilobases and intronic sequences may be of variable lengths, some polynucleotide elements may be operably-linked but not contiguous. "Transcriptional regulatory sequences" or "transcriptional regulatory elements" are generic terms that refer to DNA sequences, such as initiation and termination signals (terminators), enhancers, and promoters, splicing signals, polyadenylation signals, etc., which induce or control transcription of protein coding sequences with which they are operably-linked.
Plasmids useful to express the enzymes of the present invention include the modified centromeric plasmids pGREG503 (FIGs. 12A-C; SEQ ID NO: 19), pGREG504, pGREG505 and pGREG506 from the pGREG series55, the 2μ plasmids pYES2 (Invitrogen), pESC-leu2 derivative pESC-leu2d (Erhart E. and Hollenberg CP., J.Bacteriol 1983, p625), pGC550, pGC552, pGC1322, pBOT-TRP, pBOT-URA, pBOT-HIS and pBOT-LEU. Yeast Artificial Chromosome (YACs) able to clone fragments of 100-1000kpb could also be used to express multiple enzymes (e.g., 10). Many other useful yeast expression vectors, either autonomously replicating low copy-number vectors (YCp or centromeric) or autonomously replicating high copy-number vectors (YEp or 2μ) are commercially available, e.g., from Invitrogen (www.lifetechnologies.com), the American Type Culture Collection (ATCC; www.atcc.org) or the Euroscarf collection (http://web.uni-frankfurt.de/fb15/mikro/euroscarf/).
Plasmids in accordance with the present invention may also include nucleic acid molecule(s) encoding one or more of the polypeptides as shown in FIGs. 7-8 and 13-20 (orthologues or consensuses).
Promoters useful to express the enzymes of the present invention include the constitutive promoters from the following S. cerevisiae CEN.PK2-1 D genes: glyceraldehyde-3-phosphate dehydrogenase 3 (PTDH3) (FIG. 12D, SEQ ID NO: 20), fructose 1 ,6-bisphosphate aldolase (PFBAI), pyruvate decarboxylase 1 (PPDCI) and plasma membrane H+- ATPase 1 (PPMAI) 5). The inducible promoters from galactokinase (PGALI), UDP-glucose-4-epimerase (PGAUO) from pESC-leu2d are also useful for the present invention. The present invention also encompasses using other available promoters (e.g., yeast promoters), with different strengths and different expression profiles. Examples are the PTEFI and PTEF2 promoters from the translational elongation factor EF-1 alpha paralogs TEF1 and TEF2; promoters of gene coding for enzymes involved in glycolysis such as 3-phosphoglycerate kinase (PPGKI), pyruvate kinase (Ρργκι), triose- phosphate isomerase (PTPM), glyceraldehyde-3-phosphate dehydrogenase (PTDH2), enolase II (PENO2) or hexose transporter 9 (ΡΗΧΤΘ). Other useful promoters in accordance with the present invention encompass those found through the promoter database of S. cerevisiae (http://rulai.cshl.edu/cgi-bin/SCPD/getgenelist).
Terminators useful for the present invention include terminators from the following S. cerevisiae CEN.PK2_1 D genes: cytochrome C1 (TCYCI) (FIG. 12E, SEQ ID NO: 21 ), alcohol dehydrogenase 1 (TADHI), phosphoglucoisomerase 1 glucose-6-phosphate isomerase (TPGM). The present invention also encompasses using other suitable yeast terminators, e.g., terminators from genes encoding for enzymes involved in glycolysis and gluconeogenesis such as alcohol dehydrogenase 1 (TADH2), enolase II (TENO2), fructose 1 ,6-bisphosphate aldolase (TFBAI), glyceraldehyde-3- phosphate dehydrogenase (TTDH2) and triose-phosphate isomerase (TTPM). Other useful terminators in accordance with the present invention encompass those found from genes indicated in the promoter database of S. cerevisiae (http://rulai.cshl.edu/cgi-bin/SCPD/getgenelist).
The term "heterologous coding sequence" refers herein to a nucleic acid molecule that is not normally produced by the host cell in nature.
A recombinant expression vector (plasmid) comprising a nucleic acid sequence of the present invention (e.g., expressing a GH3 BGL) may be introduced into a cell, e.g., a host cell, which may include a living cell capable of expressing the protein coding region from the defined recombinant expression vector. Optionally, beta-glucosidase expression in the cell is under the control of a heterologous promoter. Accordingly, the present invention also relates to cells (host cells) comprising the nucleic acid and/or vector as described above. The suitable host cell may be any cell of eukaryotic (e.g., fungal (e.g., yeast or mold), algal, plant) or prokaryotic (bacterial) origin that is suitable, e.g., for expression of the enzymes or propagation of genes/nucleic acids encoding said enzyme. The specific choice of cell line is routinely determined by a person skilled in the art. The terms "host cell" and "recombinant host cell" are used interchangeably herein. Such terms refer not only to the particular subject cell, but also to the progeny or potential progeny of such a cell. Because certain modifications may occur in succeeding generations due to either mutation or environmental influences, such progeny(ies) may not, in fact, be identical to the parent cell, but are still included within the scope of the term as used herein. Vectors can be introduced into cells via conventional transformation or transfection techniques. The terms "transformation" and "transfection" refer to techniques for introducing foreign nucleic acid into a host cell, including calcium phosphate or calcium chloride co-precipitation, DEAE-dextran-mediated transfection, lipofection, electroporation, microinjection and viral-mediated transfection. Suitable methods for transforming or transfecting host cells can for example be found in Sambrook ef a/., Sambrook and Russell and other laboratory manuals. Methods for introducing nucleic acids into mammalian cells in vivo are also known, and may be used to deliver the vector DNA of the invention to a subject for gene therapy.
Suitable fungal host cells include, but are not limited to, Ascomycota, Basidiomycota, Deuteromycota, Zygomycota, Fungi imperfecti. Particularly preferred fungal host cells are yeast cells and filamentous fungal cells. The filamentous fungal host cells of the present invention include all filamentous forms of the subdivision Eumycotina and Oomycota. (see, for example, Hawksworth et al., In Ainsworth and Bisby's Dictionary of the Fungi, 8.sup.th edition, 1995, CAB International, University Press, Cambridge, UK, which is incorporated herein by reference). Filamentous fungi are characterized by a vegetative mycelium with a cell wall composed of chitin, cellulose and other complex polysaccharides. In one embodiment the host cell is a cell of a Myceliophthora species, such as Myceliophthora thermophila. In some embodiments the filamentous fungal host cell may be a cell of a species of, but not limited to Achlya, Acremonium, Aspergillus, Aureobasidium, Bjerkandera, Ceriporiopsis, Cephalosporium, Chrysosporium, Cochliobolus, Corynascus, Cryphonectria, Cryptococcus, Copnnus, Coriolus, Diplodia, Endothis, Fusarium, Gibberella, Gliocladium, Humicola, Hypocrea, Myceliophthora, Mucor, Neurospora, Penicillium, Podospora, Phlebia, Piromyces, Pyricularia, Rhizomucor, Rhizopus, Schizophyllum, Scytalidium, Sporotrichum, Talaromyces, Thermoascus, Thielavia, Trametes, Tolypocladium, Trichoderma, Verticillium, Volvariella, or teleomorphs, or anamorphs, and synonyms or taxonomic equivalents thereof.
In a specific embodiment, the host cells can be a yeast, a mold or a bacterium (£. coli). In a more specific embodiment, it can be an Aspergillus, such as Aspergillus Niger or any other mold as listed in FIGs. 13-15 for example, Saccharomycetaceae such as a Saccharomyces, Pichia or Zygosaccharomyces. In a more specific embodiment, it can be a Saccharomyces. In a more specific embodiment, it can be a Saccharomyces cerevisiae (S. cerevisiae). The present invention encompasses the use of yeast strains that are aploid, and contain auxotropies for selection that facilitate the manipulation with plasmid. Yeast strains that can be used in the invention include, but are not limited to, CEN.PK, S288C, W303, A363A and YPH499, strains derived from S288C (FY4, DBY12020, DBY12021 , XJ24-249) and strains isogenic to S288C (FY1679, AB972, DC5). The identity and genotype of additional examples of yeast strains can be found at EUROSCARF, available through the World Wide Web at web.uni-frankfurt.de/fb15/mikro/euroscarf/col_index.html or through the Saccharomyces Genome Database (www.yeastgenome.org).
In specific examples, the yeast strain is any CEN.PK1 10-10C (MATa his3 M MAL2-8C SUC2) or any other strain used herein or any of their single, double or triple auxotrophs derivatives. In another specific embodiment, the particular strain of yeast cell is S288C (MAT alpha SUC2 mal mel gal2 CUP1 flol flo8-1 hapl), which is commercially available. In another specific embodiment, the particular strain of yeast cell is W303.alpha (MAT.alpha; his3-1 1 , 15 trp 1 -1 leu2-3 ura3-1 ade2-1), which is commercially available.
The above-mentioned nucleic acid or vector may be delivered to cells in vivo (to induce the expression of the enzymes and generates ethanol in accordance with the present invention) using methods well known in the art such as direct injection of DNA, receptor-mediated DNA uptake, viral-mediated transfection or non-viral transfection and lipid based transfection. Direct injection has been used to introduce naked DNA into cells in vivo. A delivery apparatus (e.g., a "gene gun") for injecting DNA into cells in vivo may be used. Such an apparatus may be commercially available (e.g., from BioRad). Naked DNA may also be introduced into cells by complexing the DNA to a cation, such as polylysine, which is coupled to a ligand for a cell-surface receptor. Binding of the DNA-ligand complex to the receptor may facilitate uptake of the DNA by receptor-mediated endocytosis. A DNA-ligand complex linked to adenovirus capsids which disrupt endosomes, thereby releasing material into the cytoplasm, may be used to avoid degradation of the complex by intracellular lysosomes.
Compositions
The present invention encompasses compositions comprising at least one GH3 BGL enzyme of the present invention (in any of its various forms) with one or more optional ingredients (e.g., excipient carriers).
The optional ingredients may be, without being so limited, a buffer, a surfactant, and/or a scouring agent. A buffer may be used with a GH3 BGL polypeptide of the present invention (optionally combined with other cellulases, including one or more other BGLs) to maintain a desired pH within the solution in which the GH3 BGL is employed. The exact concentration of buffer employed will depend on several factors which the skilled artisan can determine. Suitable buffers are well known in the art. A surfactant may further be used in combination with the GH3 BGL of the present invention. Suitable surfactants include any surfactant compatible with the GH3 BGL and optional other enzymes (e.g., cellulases) being utilized. Exemplary surfactants include an anionic, a non-ionic, and ampholytic surfactants.
Suitable anionic surfactants include, but are not limited to, linear or branched alkylbenzenesulfonates; alkyl or alkenyl ether sulfates having linear or branched alkyl groups or alkenyl groups; alkyl or alkenyl sulfates; olefinsulfonates; alkanesulfonates, etc. Suitable counter ions for anionic surfactants include, for example, alkali metal ions, such as sodium and potassium; alkaline earth metal ions, such as calcium and magnesium; ammonium ion; and alkanolamines having from 1 to 3 alkanol groups of carbon number 2 or 3. Ampholytic surfactants suitable for use in the practice of the present invention include, for example, quaternary ammonium salt sulfonates, betaine-type ampholytic surfactants, etc. Suitable non-ionic surfactants generally include polyoxalkylene ethers, as well as higher fatty acid alkanolamides or alkylene oxide adduct thereof, fatty acid glycerine monoesters, etc. Mixtures of surfactants can also be employed as is known in the art.
The GH3 beta-glucosidase enzymes compositions of the present invention may be in the form of an aqueous solution or of a solid (e.g., powder). When aqueous solutions are employed, the beta-glucosidase solution can easily be diluted to allow accurate concentrations. A concentrate can be in any form recognized in the art including, for example, liquids, emulsions, suspensions, gel, pastes, granules, powders, an agglomerate, a solid disk, as well as other forms that are well known in the art. Other materials can also be used with or included in the beta-glucosidase composition of the present invention as desired, including stones, pumice, fillers, solvents, enzyme activators, and anti-redeposition agents depending on the intended use of the composition.
One or more of the GH3 BGL enzymes of the present invention can also be combined with other enzymes (e.g., useful to degrade cellulosic material and/or to produce sugar (e.g., glucose)). Without being so limited, compositions of the present invention may further include enzyme cocktails such as Novozymes Celluclast® 1.5L, Cellulase from Trichoderma reesei ATCC 2692; Novozymes Carezyme 1000L®, Cellulase from Aspergillus sp.; or Novozymes Viscozyme® L cellulolytic enzyme mixture. In a more specific embodiment, compositions may include, in addition to at least one GH3 BGL of the present invention, at least two different enzyme types, namely (1) endoglucanase, which cleaves internal beta-1 ,4 linkages resulting in shorter glucooligosaccharides, and (2) cellobiohydrolase, which acts in an "exo" manner processively releasing cellobiose units (beta-1 ,4 glucose-glucose disaccharide).
Enzyme activity
Enzymes (e.g., GH3 BGL polypeptides) of the present invention have GH3 BGL activity. As used herein the terms "GH3 BGL activity" is meant to refer to the ability to catalyze the hydrolysis of the 1 ,4-beta-D-glycosidic linkages in cellulose. In specific embodiments, it also refers to at least one of reduced transglycosidation reaction, increased resistance to product inhibition and increased resistance to reduced resistance to substrate inhibition.
Methods of using polypeptides/enzymes of the present invention
GH3 BGL enzymes of the present invention, as well as any composition, culture medium, or cell lysate comprising at least one such GH3 BGL enzyme, may be used in the production of monosaccharides, disaccharides, or oligomers of a mono- or di-saccharide as chemical or fermentation feedstock from biomass. As used herein, the term "biomass" refers to biological material that contains a polysaccharide substrate, such as, for example, cellulose, starch, etc. The present invention hence provides a method of converting a biomass substrate into a fermentable sugar, the method comprising contacting a (i) GH3 BGL polypeptide according to the invention; (ii) a cell expressing the GH3 BGL polypeptide; (iii) a culture medium containing the GH3 BGL polypeptide; (iv) a cell lysate containing the GH3 BGL polypeptide; or (v) a composition comprising any one of (i) to (iv), with the biomass substrate (e.g., cellobiose) under conditions suitable for the production of the fermentable sugar. The present invention further provides a method of converting a biomass substrate to a fermentable sugar by (a) pretreating a cellulosic material to increase its susceptibility to hydrolysis; (b) contacting the pretreated cellulosic material of step (a) with a cell, culture medium or cell lysate or composition containing/expressing at least one GH3 BGL polypeptide of the present invention (and optionally at least one other enzyme such as cellulases) under conditions suitable for the production of the fermentable sugar.
In some embodiments, the biomass includes cellulosic substrates including but not limited to, wood, wood pulp, paper pulp, paper and pulp processing waste, corn stover, corn fiber, rice, rice hulls, woody or herbaceous plants, fruit or vegetable pulp, distillers grain, wheat straw, cotton, hemp, flax, sisal, corn cobs, sugar cane bagasse, grasses, switch grass and mixtures thereof. The biomass may optionally be pretreated to increase the susceptibility of cellulose to hydrolysis using methods known in the art such as chemical, physical and biological pretreatments (e.g., steam explosion, pulping, grinding, acid hydrolysis, solvent exposure, etc., as well as combinations thereof). In some embodiments, the biomass comprises transgenic plants that express ligninase and/or cellulase enzymes which degrade lignin and cellulose. The biomass may include cellobiose and/or may be treated enzymatically to generate cellobiose for conversion to a soluble sugar (e.g., glucose).
The term "substrate" as used herein refers to any substrate that the BGL of the present invention can hydrolyze directly or indirectly. Such BGL substrate includes cellulosic (e.g., lignocellulosic) material (cellulose; hemicellulose; and cellulose hydrolysate; or any soluble cellodextrin including cellobiose, cellotriose, cellotetraose; cellopentaose; cellohexaose etc.); and synthetic substrates such as p-nitrophenyl- -D-glucopyranoside (pNPG)). Certain of the foregoing substrates (e.g., cellulose) may optimally use a pre-treatment before the enzymes of the present invention may hydrolyze them.
In some embodiments, the GH3 BGL polypeptide/enzyme and GH3 BGL polypeptide-containing compositions, cell culture media, and cell lysates may be reacted with the biomass or pretreated biomass at a temperature in the range of about 25°C to about 100°C, about 30°C to about 90°C, about 30°C to about 80°C, about 40°C to about 80°C and about 35°C to about 75°C Also the biomass may be reacted with the GH3 BGL polypeptides and GH3 BGL polypeptide-containing compositions, cell culture media, and cell lysates at a temperature about 25°C, at about 30°C, at about 35°C, at about 40°C, at about 45°C, at about 50°C, at about 55°C, at about 60°C, at about 65°C, at about 70°C, at about 75°C, at about 80°C, at about 85°C, at about 90°C, at about 95°C and at about 100°C In addition to the temperatures described above, conditions suitable for converting a biomass substrate to a fermentable sugar that employ a GH3 BGL polypeptide of the present invention (optionally in a composition, cell culture medium, or cell lysate) include carrying out the process at a pH in a range from about pH 3.0 to about 8.5, about pH 3.5 to about 8.5, about pH 4.0 to about 7.5, about pH 4.0 to about 7.0, about pH 4.0 to about 6.5, about pH 4.0 to about 6.0, or about pH 4.0 to about 5.0. The reaction times for converting a particular biomass substrate to a fermentable sugar may vary but the optimal reaction time can be readily determined. Exemplary reaction times may be in the range of from about 1.0 to about 240 hours, from about 5.0 to about 180 hours and from about 10.0 to about 150 hours. For example, the incubation time may be at least 1 hour, at least 5 hours, at least 10 hours, at least 15 hours, at least 25 hours, at least 50 hours, at least 100 hours, and at least 180 hours.
The soluble sugars produced by the methods of the present invention may be used to produce an alcohol (such as, for example, ethanol, butanol, etc.). The present invention therefore provides a method of producing an alcohol, wherein the method comprises (a) providing a fermentable sugar produced using a GH3 BGL polypeptide of the present invention in one of the methods described supra; (b) contacting the fermentable sugar with a fermenting microorganism to produce the alcohol or other metabolic product; and (c) recovering the alcohol or other metabolic product.
In some embodiments, the GH3 BGL polypeptide of the present invention, or composition, cell culture medium, or cell lysate containing the GH3 BGL polypeptide may be used to catalyze the hydrolysis of a biomass substrate to a fermentable sugar in the presence of a fermenting microorganism (e.g., yeast, mold, etc.), to produce an end-product such as ethanol. In this simultaneous saccharification and fermentation (SSF) process, the fermentable sugars (e.g., glucose and/or xylose) are removed from the system by the fermentation.
The soluble sugars produced by the use of a GH3 BGL polypeptide of the present invention may also be used in the production of other end-products such as, for example, acetone, an amino acid (e.g., glycine, lysine, etc.), an organic acid (e.g., lactic acid, etc.), glycerol, a diol (e.g., 1 ,3 propanediol, butanediol, etc.) and animal feeds.
The present invention is illustrated in further details by the following non-limiting examples.
EXAMPLE 1 : Material and methods
Molecular biology
Experiments in yeast were performed using S. cerevisiae CEN.PK1 10-10C (AM 7b his3 M MAL2-8C SUC2). Yeast cultures were grown at 30°C in YPD or yeast nitrogen base (YNB) supplemented as required to maintain the auxotrophic selection marker. Plasmids were constructed by in vivo homologous DNA recombination using a pGREG503 derivative with a unique Kpn\ site and the HIS3 auxotrophic marker (87). The plasmids were assembled into S. cerevisiae by co-transforming linearized plasmid and DNA fragments using the lithium acetate/carrier DNA method. DNA parts were designed with at least 50-bp regions of homology to mediate recombination. Transformants were cultured on solid media for selection (YNB+2% glucose containing 1.5% agar and supplemented with synthetic dropout media without histidine). After transformation, assembled plasmids were extracted from S. cerevisiae and propagated in E. coli for sequencing. Verified constructs were transformed back into S. cerevisiae for subsequent experiments. DNA fragments used in assembling plasmids were amplified by PCR using Phusion™ High-Fidelity DNA polymerase (Thermo Scientific) and primers listed in Table II below. For site-directed mutagenesis, full-length genes were constructed by PCR overlap extension using DNA parts amplified with primers containing the desired nucleotide substitutions. The DNA parts used for in vivo recombination were purified by gel extraction with the GeneJET™ Gel Extraction Kit (Thermo Scientific). Plasmids were purified from E. coli and S. cerevisiae using the GeneJET™ plasmid mini-prep kit (Thermo Scientific). Yeast cells were treated with lytic enzyme (MP Biomedicals) for one hour prior to the lysis step for plasmid extractions.
Table II. Primers used in examples presented herein.
Primer Sequence Description
C1 :506 TAACCCTCACTAAAGGGAACAAAAGCTGGAGCTCGT Forward primer with homology to
TTAAACGGCGCGCCGAGACTGCAGCATTACTTTGAG pGREG vectors and 5' end of AAG (SEQ ID NO: 191) SCPTDH3-
GC22 CGATACTAACGCCGCCATCC (SEQ ID NO: 192) Reverse primer with homology to
pGREG vectors. Used to amplify
SCTCYCL
KL46 GTCTTTTTTTTAGTTTTAAAACACCAAGAACTTAGTTT Forward primer with homology to A
CGAAAAACAATGTTGGCCTACTCCCCGCCGTATTAC niger bgl1 gene. Removes natural (SEQ ID NO: 193) signal sequence. Used for
recombination into pKL022.
KL47 GTTCTTCTCCTTTACTCATGAATTCGCCAGAACCAGC Reverse primer with homology to A
AGCGGAGCCAGCGGATCCGTGAACAGTAGGCAGAG niger bgl1 gene. Used for ACGCC (SEQ ID NO: 194) recombination into pKL022.
Removes stop codon and adds a homology sequence to linker region of the assembly sequence.
KL50 GTCTTTTTTTTAGTTTTAAAACACCAAGAACTTAGTTT Forward primer to with homology to
CGAAAAACAATG (SEQ ID NO: 195) 3' region of SCPTDHS in pKL022,
pKL024, and pKL029.
KL51 GTTCTTCTCCTTTACTCATGAATTCGCCAGAACCAGC Reverse primer to with homology to
AGCGGAGCCAGCGGATCC (SEQ ID NO: 196) the linker region in the assembly
sequence of pKL022 and pKL024 and 3' end of a-bgl1 encoded in pKL029.
KL67 ACTGCAGTTTTATTCGCAGCATCCTCCGCATTAGCTG Used with KL51 to add 50 bp
CTCCAGTCAACACTTTGGCCTACTCCCCGCCGTATTA encoding MFa pre sequence to A C (SEQ ID NO: 197) niger bgl1 gene using pKL012 as a
template to construct pKL029 from pKL024. KL93 TCTGGCGAATTCATGAGTAAAGGAGAAGAACTTCATC Forward primer used to construct ATCATCATCATCATTAATCATGTAATTAGTTATGTCAC pKL022. Use with GC22. Other GC (SEQ ID NO: 198) PCR reaction uses C 1 :506 and
KL51 pair.
KL107 CAGCATCCTCCGCATTAGCTGCTaCAGTCAACACTTT Forward primer to generate 61 C>A
GGCCTACTCC (SEQ ID NO: 199) mutation. Encodes P21 T.
KL108 CAGCATCCTCCGCATTAGCTGCTtCAGTCAACACTTT Forward primer to generate 61C>T
GGCCTACTCC (SEQ ID NO: 200) mutation. Encodes P21 S.
KL109 CATCCTCCGCATTAGCTGCTCCAGcCAACACTTTGGC Forward primer to generate 65T>C
CTACTCCCCGCCG (SEQ ID NO: 201 ) mutation. Encodes V22A.
KL1 10 CATCCTCCGCATTAGCTGCTCCAGaCAACACTTTGGC Forward primer to generate 65T>A
CTACTCCCCGCCG (SEQ ID NO: 202) mutation. Encodes V22D.
KL1 12 GTGACAAGGGTGCTGATATCCtATTGGGTCCAGCTG Forward primer to generate
CCGGCCCTC (SEQ ID NO: 203) 428A>T mutation. Encodes Q140L.
KL1 13 GCGCGAACCTCGACGATAAGACcATGCATGAGCTGT Forward primer to generate
ACCTCTGGCC (SEQ ID NO: 204) 681T>C mutation. Silent.
KL1 14 GGACATGTCTATGCCGGGAGACGcCGATTACGACAG Forward primer to generate
TGGCACGTC (SEQ ID NO: 205) 917T>C mutation. Encodes V303A.
KL1 15 GTCTATGCCGGGAGACGTCGATTgCGACAGTGGCAC Forward primer to generate
GTCTTACTGGG (SEQ ID NO: 206) 923A>G mutation. Encodes
Y305C.
KL1 16 CGTGCAACGCAACCATAGCGAGcTGATCCGCCGTAT Forward primer to generate
TGGAGCAGAC (SEQ ID NO: 207) 1 180T>C mutation. Silent.
KL1 17 GAACAAGAATGGCGTATTCACTGtGACCGATAACTGG Forward primer to generate
GCTATTGATC (SEQ ID NO: 208) 1448C>T mutation. Encodes
A480V.
KL1 18 GATCAGATTGAGGCGCTTGCTcAGACCGCCAGTGTC Forward primer to generate
TCTCTTGTC (SEQ ID NO: 209) 1489A>C mutation. Encodes
K494Q.
KL1 19 CTTGCTAAGACCGCCAGTGTCTCaCTTGTCTTTGTCA Forward primer to generate
ACGCCGACTC (SEQ ID NO: 210) 1506T>A mutation. Silent.
KL120 GAGGGTTATATCAATGTCGACGGtAACCTGGGTGAC Forward primer to generate
CGCAGGAACC (SEQ ID NO: 211 ) 1557A>T mutation. Silent. KL122 CTCTGTCGGCCCAGTCTTGGTTgACGAGTGGTACGA Forward primer to generate CAACCCCAATG (SEQ ID NO: 212) 1678A>G mutation. Encodes
N557D.
KL123 GGTACGACAACCCCAATGTTACtGCTATTCTCTGGGG Forward primer to generate
TGGTCTTC (SEQ ID NO: 213) 1707OT mutation. Silent.
KL126 CGCCCTTCACCTGGGGCAAGACcCGTGAGGCCTACC Forward primer to generate
AAGATTAC (SEQ ID NO:214) 1818T>C mutation. Silent.
KL127 CATTGACTACCGCGGATTTGACAtGCGCAACGAGACT Forward primer to generate
CCTATCTATG (SEQ ID NO: 281 ) 1925A>T mutation. Encodes
K639M.
KL128 CCGCGGATTTGACAAGCGCAACGtGACTCCTATCTAT Forward primer to generate
GAGTTCGGC (SEQ ID NO: 215) 1934A>T mutation. Encodes
Q642V.
KL129 CTGAGGCAGCGCCGACTTTCGGtGAGGTCGGAAATG Forward primer to generate
CGTCGGATTAC (SEQ ID NO: 216) 2067A>T mutation. Silent.
KL130 GCGTCGGATTACCTCTACCCCGAaGGACTGCAGAGA Forward primer to generate
ATCACCAAGTTC (SEQ ID NO: 217) 2103T>A mutation. Silent.
KL131 GGCAAGGTTGCGGGTGATGAAGTaCCTCAACTGTAT Forward primer to generate
GTTTCTCTTGG (SEQ ID NO: 218) 2349T>A mutation. Silent.
KL133 GGAGTAGGCCAAAGTGTTGACTGtAGCAGCTAATGC Reverse primer to generate 61C>A
GGAGGATGCTG (SEQ ID NO: 219) mutation. Encodes P21 T.
KL134 GGAGTAGGCCAAAGTGTTGACTGaAGCAGCTAATGC Reverse primer to generate 61C>T
GGAGGATGCTG (SEQ ID NO: 220) mutation. Encodes P21 S.
KL135 CGGCGGGGAGTAGGCCAAAGTGTTGgCTGGAGCAG Reverse primer to generate 65T>C
CTAATGCGGAGGATG (SEQ ID NO: 221 ) mutation. Encodes V22A.
KL136 CGGCGGGGAGTAGGCCAAAGTGTTGtCTGGAGCAGC Reverse primer to generate 65T>A
TAATGCGGAGGATG (SEQ ID NO: 222) mutation. Encodes V22D.
KL138 GAGGGCCGGCAGCTGGACCCAATaGGATATCAGCA Reverse primer to generate
CCCTTGTCAC (SEQ ID NO: 223) 428A>T mutation. Encodes Q140L.
KL139 GGCCAGAGGTACAGCTCATGCATgGTCTTATCGTCG Reverse primer to generate
AGGTTCGCGC (SEQ ID NO: 224) 681T>C mutation. Silent.
KL140 GACGTGCCACTGTCGTAATCGgCGTCTCCCGGCATA Reverse primer to generate
GACATGTCC (SEQ ID NO: 225) 917T>C mutation. Encodes V303A. KL141 CCCAGTAAGACGTGCCACTGTCGcAATCGACGTCTC Reverse primer to generate CCGGCATAGAC (SEQ ID NO: 226) 923A>G mutation. Encodes
Y305C.
KL142 GTCTGCTCCAATACGGCGGATCAgCTCGCTATGGTT Reverse primer to generate
GCGTTGCACG (SEQ ID NO: 227) 1 180T>C mutation. Silent.
KL143 GATCAATAGCCCAGTTATCGGTCaCAGTGAATACGC Reverse primer to generate
CATTCTTGTTC (SEQ ID NO: 228) 14480T mutation. Encodes
A480V.
KL144 GACAAGAGAGACACTGGCGGTCTgAGCAAGCGCCTC Reverse primer to generate
AATCTGATC (SEQ ID NO: 229) 1489A>C mutation. Encodes
K494Q.
KL145 GAGTCGGCGTTGACAAAGACAAGtGAGACACTGGCG Reverse primer to generate
GTCTTAGCAAG (SEQ ID NO: 230) 1506T>A mutation. Silent.
KL146 GGTTCCTGCGGTCACCCAGGTTaCCGTCGACATTGA Reverse primer to generate
TATAACCCTC (SEQ ID NO: 231) 1557A>T mutation. Silent.
KL148 CATTGGGGTTGTCGTACCACTCGTcAACCAAGACTG Reverse primer to generate
GGCCGACAGAG (SEQ ID NO: 232) 1678A>G mutation. Encodes
N557D.
KL149 GAAGACCACCCCAGAGAATAGCaGTAACATTGGGGT Reverse primer to generate
TGTCGTACC (SEQ ID NO: 233) 1707OT mutation. Silent.
KL152 GTAATCTTGGTAGGCCTCACGgGTCTTGCCCCAGGT Reverse primer to generate
GAAGGGCG (SEQ ID NO: 234) 1818T>C mutation. Silent.
KL153 CATAGATAGGAGTCTCGTTGCGCaTGTCAAATCCGC Reverse primer to generate
GGTAGTCAATG (SEQ ID NO: 235) 1925A>T mutation. Encodes
K639M.
KL154 GCCGAACTCATAGATAGGAGTCaCGTTGCGCTTGTC Reverse primer to generate
AAATCCGCGG (SEQ ID NO: 236) 1934A>T mutation. Encodes
Q642V.
KL155 GTAATCCGACGCATTTCCGACCTCaCCGAAAGTCGG Reverse primer to generate
CGCTGCCTCAG (SEQ ID NO: 237) 2067A>T mutation. Silent.
KL156 GAACTTGGTGATTCTCTGCAGTCCtTCGGGGTAGAG Reverse primer to generate
GTAATCCGACGC (SEQ ID NO: 238) 2103T>A mutation. Silent.
KL157 CCAAGAGAAACATACAGTTGAGGtACTTCATCACCCG Reverse primer to generate CAACCTTGCC (SEQ ID NO: 239) 2349T>A mutation. Silent.
KL173 GTCTATGCCGGGAGACGTCGATTtCGACAGTGGCAC Forward primer used to generate
GTCTTACTGGG (SEQ ID NO: 240) Y305F.
KL174 CCCAGTAAGACGTGCCACTGTCGaAATCGACGTCTC Reverse primer used to generate
CCGGCATAGAC (SEQ ID NO: 241 ) Y305F.
KL175 GTCTATGCCGGGAGACGTCGATTggGACAGTGGCAC Forward primer used to generate
GTCTTACTGGG (SEQ ID NO: 242) Y305W.
KL176 CCCAGTAAGACGTGCCACTGTCccAATCGACGTCTCC Reverse primer used to generate
CGGCATAGAC (SEQ ID NO: 243) Y305W.
KL177 GTCTATGCCGGGAGACGTCGATggCGACAGTGGCAC Forward primer used to generate
GTCTTACTGGG (SEQ ID NO: 244) Y305G.
KL178 CCCAGTAAGACGTGCCACTGTCGccATCGACGTCTC Reverse primer used to generate
CCGGCATAGAC (SEQ ID NO: 245) Y305G.
KL179 GTCTATGCCGGGAGACGTCGATgtCGACAGTGGCAC Forward primer used to generate
GTCTTACTGGG (SEQ ID NO: 246) Y305V.
KL180 CCCAGTAAGACGTGCCACTGTCGacATCGACGTCTC Reverse primer used to generate
CCGGCATAGAC (SEQ ID NO: 247) Y305V.
KL181 GTCTATGCCGGGAGACGTCGATgcCGACAGTGGCAC Forward primer used to generate
GTCTTACTGGG (SEQ ID NO: 248) Y305A.
KL182 CCCAGTAAGACGTGCCACTGTCGgcATCGACGTCTC Reverse primer used to generate
CCGGCATAGAC (SEQ ID NO: 249) Y305A.
*Mutation are written in lower case.
Expression of a secreted β-glucosidase from A niger (AnBGL.1) in S. cerevisiae
Three plasmids were constructed to express BGL1 and assemble mutagenized gene libraries in S. cerevisiae (FIG. 1A). The assembly plasmid, pKL022 (FIGs. 9A-C), was constructed by co-transforming Asc\/Kpn\ linearized pGREG503 (FIGs. 12A-C), the ScPjom promoter (FIG. 12D), and the SCTCYCI terminator (FIG. 12E) into S. cerevisiae. The promoter and terminator were amplified by PCR using pGREG503 based constructs as templates such that the 5' end of the SCPTDH3 promoter and the 3' end of the SCTCYCI terminator had at least a 50 bp of homology with the linearized plasmid. The region between the SCPTDH3 promoter and the Sc TCYCI terminator was designed with an 86 nucleotide sequence that included a Xhol restriction site, linker, His6 tag, and a stop codon. A second plasmid, pKL024 (FIGs. 1 1A-C), was constructed similarly using the Xhol linearized pKL022 and a 500-bp synthetic DNA fragment (Integrated DNA Technologies) designed to insert nucleotides encoding the MFa pre sequence in pKL022. The bgl1 expression plasmid, pKL029 (FIGs. 10A-C), was constructed by co-transforming S. cerevisiae with Xhol linearized pKL024 and the PCR-amplified bgl1 gene (43). Secretion was engineered to be under control of the S. cerevisiae MFcr pre peptide by replacing the first 21 codons from the wild type gene with the first 24 codons from the MFa gene to construct an a-bgl1 fusion gene (experimental wild type). The primers used to amplify bgl1 were designed to replace the native signal sequence with the MFa sequence at the 5' end of the gene, and the linker region of the pKL022 and pKL024 plasmids at the 3' end (FIG. 1A). PCR amplification of the a-bgl1 gene from pKL029 with the primer pair KL50 and KL51 produced a DNA fragment that would recombine with Xhol linearized pKL022 when co-transformed in S. cerevisiae, and would express a functionally secreted BGL (see sequences in FIGs. 7-8).
Error-prone PCR of bgl1
Error-prone PCR was used to generate sequence diversity of the a-bgl1 gene. The region between the start codon and the linker region of pKL029 was mutagenized by PCR using Taq polymerase under high MgC conditions (5 mM), with an unbalanced dNTP mixture (0.2 mM dATP, 0.2 mM dGTP, 1 mM dTTP, and 1 mM dCTP) and in the presence of MnC (0.15 mM) using the KL50/KL51 primer pair. To generate a library of a-bgl1 mutants, the PCR products from several reactions were pooled, mixed with Xhol linearized pKL022 and transformed into S. cerevisiae by electroporation. Electrocompetent S. cerevisiae cells were prepared from 250 ml of an YPD culture (O.D.600 = 1.6) in 50 ml of a 1 M sorbitol solution containing 0.1 M LiAc and 10 mM DTT incubated for one hour at room temperature with gentle mixing. After the LiAc/DTT treatment, cells were washed twice in 50 ml of 1 M sorbitol and suspended to a final volume of 5 ml in ice-cold 1 M sorbitol. Aliquots (0.2 ml) of the electrocompetent cell suspension were removed, mixed with 1.5 μg of PCR product, 1.5 μg of Xhol linearized pKL022, and electroporated (2.5 kV, 25 if) using an ice-chilled cuvette. Cells from ten transformations were pooled, and a small quantity of the mixture was removed to determine the transformation efficiency. The remainder of the mixture was transferred to 100 ml of selective media and grown for two days at 30°C with shaking. The transformation pool was then harvested by centrifugation, suspended in 10 mL of YNB containing 20% glycerol and stored at -80°C in 0.5 mL aliquots.
Selection of improved AnBGL.1 using cellobiose as a substrate
Improved AnBGL.1 s were identified using a growth selection with cellobiose as the sole carbon source followed by an endpoint activity assay using p-nitrophenyl- -D-glucopyranoside (pNPG) as a substrate. For growth selection, an overnight culture was used to inoculate 50 ml of fresh media containing glucose at an O.D.600 = 0.05 and after 6 hours, cells were spread on solid YNB+1 % cellobiose. The cell density for growth on solid media containing cellobiose was optimized to approximately 13 cells/cm2. Cells were also grown on solid YNB+2% glucose for comparison. After 3-4 days of growth, the largest colonies were picked and used to inoculate 200 μΙ of YNB+2% glucose in 96 well plates to supply enzymes for the BGL endpoint activity assay. Four wells from each 96 well plate were inoculated with cells containing pKL029 to express a-BGL1 as a control. The 96 well plates were sealed in plastic bags and incubated at 30°C without shaking for two days. Twenty μΙ of the culture was transferred to 180 μΙ of fresh media and grown for an additional two days at 30°C without shaking. The O.D.600 was measured for each culture and 50 μΙ of supernatant was transferred to 150 μΙ of a 66.6 mM citrate buffer, pH 5.0, containing 2.66 mM pNPG in 96 well plates. Reactions were incubated at room temperature for 30 minutes and quenched with 20 μΙ of 1 M NaOH. The amount of p-nitophenol (pNP) released was determined by measuring the absorbance at 405 nm using the extinction coefficient 18 mlVHcnr1 and normalized to cell density. Strains with BGL activities greater than WT+2a were streaked on solid YNB+2% glucose media, and three colonies were re-tested using the same endpoint assay. Strains producing BGLs that exceeded a WT+2SD threshold of activity were chosen for further analysis.
AnBGL.1 activity assays using pNPG as a substrate
The activity of mutant BGLs were compared to wild type a-BGL1 using a time course assay measuring the release of pNP from pNPG. For protein expression, fresh YNB+2% glucose media supplemented with synthetic dropout media lacking histidine was inoculated with 0.5 ml of overnight culture and grown at 30°C for 48 hours. Cells were removed by centrifugation and the supernatants were filtered using a 0.2 μΜ nylon membrane. Reactions were performed at 30°C in 2 ml microcentrifuge tubes by adding 200 μΙ of 16 mM pNPG to a mixture of 1 ml of 80 mM citrate buffer, pH 5.0, and 400 μΙ of culture supernatant. Two hundred μΙ samples were removed from the reactions at time intervals over a ten-minute period (5 data points) and transferred to 20 μΙ of 1 M NaOH. The amount of pNP released was measured at 405 nm and activities were calculated using the linear portion of the curve containing at least three time points. Reactions were performed in biological triplicate using cultures inoculated from individual colonies. Assay components were pre-incubated at 30°C for 60 minutes prior to adding substrate.
Enzyme kinetics using pNPG or cellobiose as the substrate
Kinetics experiments using pNPG as the substrate were performed in 96 well plates at 30°C. Reactions were initiated by adding 50 μΙ of culture supernatant or 50 μΙ of a 0.01 μ9/μΙ concentrated protein preparation to 150 μΙ of a 66 mM citrate buffer solution (pH 5.0) containing pNPG in a range from 0.1 -10 mM (final concentration). Glucose was tested in a range from 0-100 mM (final concentration) at each pNPG concentration. Reactions were stopped 5 minutes after adding enzyme by transferring 100 μΙ of the assay mixture to 10 μ I of 1 M NaOH in 96 well plates.
Assays using cellobiose as the substrate were performed in 96 well plates at 30°C with a final reaction volume of 200 μΙ in 5 mM citrate buffer, pH 5.0. Cellobiose was tested at a range from 2-75 mM (final concentration). a-BGL1 was also tested at 0.5 and 1 mM cellobiose (final concentration). Reactions were started by adding 50 μΙ of culture supernatant to 150 μΙ of the citrate/cellobiose solution and stopped after 5 minutes by transferring 100 μΙ of the mixture to 400 μΙ of glucose assay solution (62.5 mM Tris-HCL, pH 8.3, 1.25 mM ATP, 1.875 mM NAD, 12.5 mM MgC , 12.5 U/ml hexokinase (Sigma), and 12.5 U/ml glucose-6-phosphate dehydrogenase (Sigma)) and incubated for 30 minutes at room temperature. The amount of glucose released was determined by measuring the amount of NADH produced at 340 nm, using the extinction coefficient 6220 M-1cnr1.
For experiments using concentrated protein preparations, 100 ml of culture supernatant was filtered using a 10 kDa cutoff membrane (VivaSpin) to a final volume of 1 ml. The retentate was diluted to 15 ml using 5 mM citrate buffer, pH 5.0, and buffer exchange process was repeated twice before reducing the sample volume to 500-800 μΙ. Protein concentrations were determined using a Coomassie Protein Assay Kit (Thermo Scientific) with BSA as a standard. Concentrate samples were analysed by SDS PAGE with and without PNGase F (New England BioLabs) treatment. Kinetic parameters for steady-state reactions were determined using GraphPad™ Prism enzyme kinetics module. Analysis of reaction products using pNPG or cellobiose as the substrate
Reactions using 40 mM pNPG and 50 mM cellobiose as substrate were performed using the same conditions as for enzyme kinetics. Ten μΙ samples were removed from 200 μΙ reactions at time intervals and stopped using 1 μΙ of 1 M NaOH. One μΙ aliquots from each time interval were applied to a silica TLC plates (Whatman), eluted with a n- butanol, ethyl acetate, 2-propanol, acetic acid, and water (1 :3:2: 1 : 1) and developed as previously described (25).
Multiple sequence alignments and structural mapping
Multiple sequence alignments were performed using Clustal Omega (88). BGL homology models were generated using the Phyre2 server (89). Structural analyses were performed using PyMOL™.
EXAMPLE 2: Directed evolution of AnBGL/l
Applicants targeted a β-glucosidase from A niger using directed evolution to adapt it towards heterologous expression and secretion in S. cerevisiae. The strategy utilized the native homologous DNA recombination machinery in S. cerevisiae to assemble a library of mutagenized bgl1 genes, followed by a two-step selection to identify improved mutants. Because wild type S. cerevisiae lacks β-glucosidase activity, growth on cellobiose was used as the primary selection method. Then was employed an endpoint assay using pNPG as a substrate for quantitative measurement of β-glucosidase activity. To generate sequence diversity, the a-bgl1 gene was mutagenized using error-prone PCR and transformants were cultured as a mutant pool in liquid media to maximize the library size. It was determined that the pooled library contained approximately 1.6-2 χ 107 recombinant mutants by growing small quantities (0.0001 -0.1 %) of the transformation mixture on solid media immediately after electroporation. Approximately 3 χ 105 variants, or 1.5-2% of the total library, were subsequently screened on solid media containing cellobiose as the sole carbon source. Ninety-five percent of the mutant AnBGL.1 library clones did not grow on cellobiose, showing that most mutations were deleterious. Colonies from the mutant pool varied in size whereas those expressing a-BGL1 showed no observable variation. This allowed colony size to be used as a semiquantitative screen for BGL activity, as has been previously reported (90-92).
The BGL activity secreted from cells originating from the largest colonies was measured using pNPG as the substrate and compared to cultures producing the BGL1 protein. The mean activity of the mutant pool decreased while increasing in variability (FIG. 1 B), following the predicted trend for a library of mutagenized enzymes (93). Of 1371 variants screened using the pNPG activity assay, twelve mutants met the threshold cut-off for selection at wild type + 2SD (designated v3-v8, v10, v1 1 , v16, v18, v19 and v20) (FIG. 1 C).
Sequencing showed that the mutant bgl1 genes each had 3-9 nucleotide substitutions (see Table III below), corresponding to a mutation rate of 1.6+0.4 bp per kb. Certain mutations occurred in multiple variants (65T>A: v7 & v18; 65T>C: v4 & v19; 428A>T: v5 & v20 and 1707OT: v4 & v6). Six variants had mutations encoding amino acid substitutions immediately following the predicted MFa signal peptide cleavage site at Pro21 or Val22. MFa was used as a signal peptide to standardize expression in the context of other tested BGLs. These two amino acids were targeted by four substitutions (P21T, P21S, V22D and V22A). Using the prevalence of mutations at position 21 and 22 of the MFa pre sequence, the evolved AnBGLI s were divided into two groups based on the presence or absence of substitutions at Pro21 and Val22. For the group of BGLs with signal peptide mutations, genes were constructed with single mutations encoding the P21T, P21 S, V22D, and V22A. For the remaining AnBGL.1 variants, genes were constructed to test all of the mutations individually. Silent mutations were included in the experiments to investigate if codon optimization had occurred. All of the mutations tested at position 21 and 22 produced increases in activity similar to those of their parental evolved enzymes containing two or more mutations (Table III below), indicating that these substitutions were driving the improvements of the selected parental enzymes. None of the silent mutations tested produced observable increases in activity. It was possible to establish a relationship between the increase in activity observed for several of the evolved AnBGL.1 s and a single amino acid substitution (v3: Y305C; v5 & v20: Q140L; v10: A480V), but none of the mutations tested individually could account for the activities of v6 and v16 (Table III below). The K494Q and N557D: mutations did produce significant and reproducible improvements (1 17% and 1 19%, respectively), suggesting a cumulative effect that contributed to the 147% increase in activity observed for v16. The A480V mutation (14480T) was characterized in the genetic context of v10 since the other mutations present in the gene (1 180OT and 1506T>A) were silent and did not produce any improvements when tested alone.
Table III. Characterization of improved BGLs.
Variant Mutation Amino acid Substitution Relative activity
v3 923A>G 305 Tyr→Cys 1.68±0.02
1557A>T 516 - 1.05±0.01
1934A>T 642 Glu→Val 0.80±0.03
v4* 65T>C 22 Val→Ala 1.57±0.06
174A>T 55 -
4650T 152 -
1707OT 566 - 0.98±0.03
v5 428A>T 140 Gln→Leu 1.56±0.05
2067A>T 686 - 1.03±0.02
2103T>A 698 - 1.07±0.03
v6 917T>C 303 Val→Ala 1.09±0.04
1707OT 566 - 0.98±0.03
1818T>C 603 - 1.05±0.04
v7* 65T>A 22 Val→Asp 2.03±0.08
297T>C 96 -
643A>G 212 lle→Val
892T>C 295 -
1814A>G 602 Lys→Arg
2019T>C 670 -
2079T>A 690 - v8* 25G>A 9 Ala→Thr
60T>A 20 -
61 OA 21 Pro→Thr 1.77±0.02
93T>C 28 -
462T>C 151 -
9810T 324 -
1 1580T 383 -
19530T 648 -
2389A>C 794 Lys→Gln
v10 1 180T>C 391 - 0.90±0.02
14480T 480 Ala→Val n/a 1506T>A 499 - 0.88±0.05
v11* 61 C>T 21 Pro→Ser 1.93±0.02
366T>C 1 19 -
9540T 315 -
17730T 588 - v16 681T>C 224 - 0.86±0.01
1489A>C 494 Lys→Gln 1.17±0.06
1678A>G 557 Asn→Asp 1.19±0.06
v18* 65T>A 22 Val→Asp 2.03±0.08
19830T 658 - v19* 65T>C 22 Val→Ala 1.57±0.06
219G>A 70 -
1275A>G 422 -
2367T>G 786 -
2441 OT 81 1 Thr→Met
v20 428A>T 140 Gln→Leu 1.56±0.05
1925A>T 639 Lys→Met 0.73±0.04
2349T>A 780 - 0.94±0.01
*variants for which only signal sequence mutations at position 21 or 22 were investigated.
EXAMPLE 3: Kinetic characterization of AnBGL.1 mutants using pNPG
The mutations producing the greatest improvements were further characterized with kinetics experiments. Initial reaction velocities were measured for each mutant at different substrate concentrations (0.1 -10 mM pNPG) and at different inhibitor concentrations (0-100 mM glucose) (see Table IV below). Wild type and all single substitution mutants, except Y305C, were fitted to a substrate inhibition model (FIGs. 2A-B). Y305C showed no inhibition at high substrate concentrations (compare e.g., curve at 10mM pNPG for alpha-BGL1 vs. Y305C) and the reaction velocities were fitted to the Michaelis-Menten equation using a competitive inhibition model (FIGs. 2A-B). Most of the mutants showed an increase in appVmax, (Apparent maximum reaction velocity) similar to the relative activities reported from activity assays at 2 mM pNPG (Table III above). No other significant differences were observed in reaction kinetics between a-BGL1 and the mutant enzymes.
Based on the change in kinetic profile caused by the Cys substitution at position 305, Y305C was used as a background to further explore the contributions of the V22D, Q140L, and A480V mutations. V22D/Y305C, Q140L/Y305C, Y305C/A480V double substitution mutants and a quadruple DLCV (V22D/Q140L/Y305C/A480V) mutant were engineered. Reaction velocities for all of the combinatorial mutants were modeled using the Michaelis- Menten equation and kinetics parameters were determined (see Table V below). As expected, the addition of V22D, Q140L and A480V to Y305C increased the appVmax for each enzyme. appKm and 3ρρΚ; giUCose, inhibition constant for glucose in a Michaelis-Menten competitive inhibition model, for V22D/Y305C and Y305C/A480V were similar to those of Y305C, while the inhibitory effect of glucose was slightly reduced for Q140UY305C and DLCV.
Table IV. Kinetic parameters of BGL1 and evolved variants for synthetic substrate. appKm appVmax appKi appKi (mM) (μηιοΐθ L1 miir1) (mM) (mM)
Q-BGL1 0.82 ± 0.12 1 18.5 ± 10.6 - 2.98 ± 0.46 mM
V22D 0.93 ± 0.23 210.5 ± 31.6 - 2.98 ± 0.75 mM
Q140L 1.09 ± 0.06 249.6 ± 9.2 - 3.41 ± 0.21 mM
Y305C 0.77 ± 0.03 149.7 ± 1.7 1.81 ± 0.08 -
A480V 0.98 ± 0.25 217.7 ± 34.3 - 3.26 ± 0.87 mM
K494Q 0.89 ± 0.12 141.5 ± 1 1.1 - 3.06 ± 0.41 mM
N557D 0.88 ± 0.09 186.2 ± 1 1.0 - 3.16 ± 0.32 mM
Table V. Kinetic parameters of Y305C substituted variants for synthetic substrate. appKm appVmax appKi glucose
(mM) (μηιοΐθ mg-1 min-1) (mM)
Y305C 0.93 ± 0.04 71.0 ± 0.9 1.92 ± 0.10
V22D/Y305C 1.06 ± 0.04 127.4 ± 1.3 2.22 ± 0.09
Q140L/Y305C 1.32 ± 0.05 1 11.1 ± 1.4 3.18 ± 0.15
Y305C/A480V 1.06 ± 0.05 124.3 ± 1.8 2.22 ± 0.12
DLCV 1.69 ± 0.08 221.3 ± 3.6 3.37 ± 0.18
EXAMPLE 4: Mapping of AnBGLI mutations
To investigate the structure/function relationship between the position 305 residue and the transglycosidation reaction, the applicants mapped the positions of the mutations identified through directed evolution (FIG. 3A), using the available crystal structure of a GH3 BGL from Aspergillus aculeatus BGL1 (AaBGLI ) (72). Transglycosidation reactions are based on the affinity for acceptor in the +1 subsite.
Multiple sequence alignments suggest that AnBGLI and other related fungal GH3 BGLs would adopt a three- dimensional structure similar to AaBGLI AnBGLI shares 82% sequence identity with AaBGLI (83% for secreted form of AaBGLI ), and residues forming the active site are well conserved (FIGs. 3B and 3D). The substrate binding pocket of GH3 BGLs is formed by highly conserved residues in the -1 and +1 subsites, though the distal subsites are less well conserved.
The Q140L mutation maps to a region of the triosephosphateisomerase domain (AaBGLI Leu19-Ser356) and is 9.1 A from the closest substrate-binding residue (AaBGLI Trp280) and is approximately 10 A from the β/α sandwich domain. Q140L increased hydrolysis activity by 156% compared to a-BGL1 at 2 mM pNPG (see e.g., V5 in FIG. 1C). The double substituted Q140UY305C variant had slightly higher Km and K, giUCose values compared to the Y305C background (see Table V), suggesting decreased affinity for substrate and product in the active site. Since K, giUcose values could not be calculated for either wild type or the Q140L mutant using a substrate inhibition model, the reaction rates was also fit for a substrate range of 0.1 -1 mM pNPG to the Micheals-Menten equation under the competitive inhibition model allowing the K, giUcose for both enzymes to be approximated. This analysis showed an increase in Km (a-BGL1 : 0.51 ±0.04 mM pNPG; Q140L: 0.74±0.03 mM pNPG) that was consistent with the increase in Km observed using data fit to the substrate inhibition model (a-BGL1 : 0.82±0.12 mM pNPG; Q140L: 1.09±0.06 mM pNPG) (see Table IV) and an increased ,· giUcose value (a-BGL1 : 1.72±0.09 mM; Q140L: 2.73±0.07 mM) for the Q140L mutant compared to wild type. It is therefore possible that the m transgi wsida n for the Q140L would be slightly higher than wild type, as an increase in K, NPG was observed (a-BGL1 : 2.98±0.46 mM; Q140L: 3.41 ±0.21 mM) (see Table IV). Even though the Q140L mutation is not directly involved in forming the substrate binding pocket, the proximity of the active site could explain the change in Km, Kj glucose and K NPG values. In the AaBGLI crystal structure and a homology model of AnBGL.1 , the side-chain nitrogen of Gin140 forms a hydrogen bond with backbone amide oxygen of Ser93. Since Asp92 forms part of the -1 subsite, without being limited by this hypothesis, it is possible that the loss of a hydrogen bond between Ser93 and Gin140 of Q140L would cause a subtle change in the affinity for cellobiose and glucose in the substrate binding pocket, as well as a slight reduction in affinity for acceptor in the +1 subsite.
The A480V, K494Q, and N557D mutations are located on the β/α sandwich domain (AaBGL.1 Gin385 - Gly588) that contributes two active site residues (AaBGL.1 Glu509 and Tyr511) and also mediates a protein-protein interaction between subunits of the functional dimer. The A480V (AaBGL.1 He480) mutation is located on the surface of each subunit buried between the dimer interface. Both the K494Q and the N557D mutations (AaBGL.1 Lys494 and Asp557) are located on the surface of the molecule. Lys494 is also proximal to the dimer interface, though it does not contact the opposite subunit directly.
When tested alone or in combination with Y305C, no significant changes were observed in the kinetic constants for A480V, and N557D. Interestingly, the GH3 BGL from A. fumigatus has three of the discovered substitutions naturally (Q140L, A480V, and N557D (See FIG. 3D), and the GH3 BGL from N. crassa has two of the residues (Leu140 and Val480) (See FIG. 3D). Because kcat values were not determined, the functional relationship between these substitutions and the observed increases in apparent Vmax remains unclear. Without being limited by such hypothesis, it is suspected that the interaction between subunits could modulate activity though changes in the catalytic turnover number.
The Y305C mutation maps to a short loop (AaBGLI Gly294 - Gly313) that inserts directly into the +1 subsite of AaBGLI along with Trp68 and Tyr511. The structure of AaBGLI in complex with thiocellobiose shows the contribution of Phe305 in the substrate-binding pocket, where the ligand docks in a narrow cleft between the three +1 subsite residues. Since the Cys functional group is less bulky than Tyr, it is assumed that the +1 subsite of the A. niger Y305C mutant is more open and has a lower affinity for substrate. EXAMPLE 5: Targeted mutagenesis
To determine if residue 305 controls substrate inhibition in BGL1 homologs, its relative location and effect on enzyme kinetics in other GH3 BGLs was investigated (FIG. 3D). Phe is the most common residue at position 305 (9/16 sequences) and has a similar aromatic functional group as the Tyr residue from AnBGL.1. Interestingly, the position was variable in the remaining seven homologs and included a Cys residue in the BGL from Ustilago maydis, which would suggest that its active site is similar to the Y305C mutant. It was sought to further test the functional significance of this position by constructing Y305F, Y305W, Y305G, Y305V, and Y305A A niger mutants. The Phe residue had a very similar kinetics to the wild type Tyr residue, demonstrating inhibition at high substrate concentrations (see Table VI below). However, the Y305F had a lower appKm, i.e. higher substrate affinity, and higher appKi than the wild type enzyme (See Table IV above), demonstrating a slight change in affinity at the +1 subsite. Substituting Tyr305 with Ala, Val or Gly produced kinetics similar to the Y305C mutant, demonstrating saturation at high substrate concentrations. Sequence alignments identified a Trp (W) residue in the BGL from Aspergillus nidulans, but the Y305W mutant was non-functional and suggests that the bulky functional group blocked the substrate binding pocket.
Closer inspection of sequence alignments shows that the loop coordinating position 305 residue (Gly294-Gly313) (FIG. 3C and FIG. 4B) contains a short variable region (residues 303-309) between two highly conserved sequences (Gly294-Asp302 and Ser310-Gly313). The corresponding variable region from the A nidulans BGL (438GLHWADG444 (SEQ ID NO: 250) includes an additional residue. It is likely that the Trp residue does not occupy the +1 subsite in the A nidulans homolog and either a His or Ala is present in the substrate binging pocket of the enzyme. Alternatively, the additional residue in the A nidulans sequence could change the orientation of the Trp residue in the +1 subsite such that the substrate binding pocket is not blocked.
Table VI. Kinetic parameters of Tyr305 variants for synthetic substrate. appKm appVmax appKj
(mM) (μηιοΐθ L"1 min-1) (mM)
Y305F 0.46 ± 0.04 126.5 ± 5.4 4.78 ± 0.44 mM
Y305W* n/a n/a n/a
Y305G 1.16 ± 0.06 239.7 ± 3.9 -
Y305A 1.55 ± 0.06 238.2 ± 4.1 -
Y305V 1.14 ± 0.08 233.9 ± 5.6
nactive under all tested conditions
EXAMPLE 6: Enzyme kinetics using natural substrate
Since selection in the screening strategy was based on the hydrolysis of cellobiose, reaction kinetics was also investigated using the natural substrate for wild type and mutant enzymes. The production of glucose was measured at different cellobiose concentrations for a-BGL1 , Y305C, Y305G, and DLCV enzymes and kinetic parameters were determined (see Table VII below). The results were consistent with experiments using synthetic substrate, where the wild type enzyme showed inhibition at high cellobiose concentrations (FIG. 5A). Y305C, Y305G, and DLCV were modelled using Michaelis-Menten kinetics, showing saturation at high substrate concentrations (FIG. 5B).
Table VII. Kinetic parameters of BGLs for natural substrate. appKm appVmax appKi cellobiose
(mM) (μηιοΐθ L"1 min-1) (mM)
Q-BGL1 1.2 ± 0.1 61.9 ± 2.5 32.4 ± 4.5
Y305C 5.3 ± 0.6 106.4 ± 3.4 -
Y305G 5.6 ± 0.7 122.4 ± 4.3 -
DLCV 3.6 ± 0.3 207.5 ± 4.3
EXAMPLE 7: Thin layer chromatography (TLC) analysis of reaction products
Transglycosidation reactions catalyzed by BGLs with natural and synthetic acceptors are thought to be the cause of inhibition at high substrate concentrations (7-9, 12, 23, 25, 26, 68, 94). Since kinetics experiments suggested that the transglycosidation activity of BGL1 was lost by substituting Tyr305 with either Cys, Gly, Ala or Val residues, TLC analysis of reactions were performed at high substrate concentrations (FIGs. 6A and 6B). Kinetics experiments showed that a-BGL1 is strongly inhibited by synthetic substrate, where Vo was reduced by approximately 50% at 10 mM pNPG (FIG. 6A). Analysis of reaction products confirmed that a-BGL1 and the Y305F variant were inhibited at 40 mM pNPG, and a pNPG transglycosidation product was found to accumulate through the duration of the experiment. Without being limited by this hypothesis, this kinetic profile is likely caused by a competing transglycosidation reaction in which substrate also acts as an acceptor in the +1 subsite (8, 26). In contrast, both the Y305C and Y305G, which showed significantly less transglycosidation reactions, substituted variants completely, consumed the substrate within 3 hours. A transient transglycosidation species was detected, but glucose was the major product by 4 hours and shows that the transglycosidation reaction was reduced but not completely eliminated.
All 305 substituted variants reached saturation above the wild type appVmax- Alignments show that the position 305 is the least well conserved residue in the active site. Without being limited by this hypothesis, the position 305 residue might therefore act as a "molecular tuning dial" between the hydrolysis and transglycosidase activities. Analysis of the amino acid sequences of these six BGLs and other homologs with either Phe305 or Tyr305 shows that a variable sequence forms part of the loop that coordinates this residue (FIG. 3C and FIG. 4B). Without being limited by this hypothesis, it is possible that this variable sequence creates subtle changes in the orientation of the position 305 residue and could provide the basis for finer tuning between hydrolytic and transglycosidation reactions.
BGL reactions with natural substrate cellobiose maintain a dynamic equilibrium since reaction species can participate in either hydrolytic or transglycosidation reactions (7, 8). TLC analysis of reactions using a-BGL1 , Y305C, Y305F and Y305G at 50 mM cellobiose showed that the transglycosidation reactions were present for all of the tested enzymes and is consistent with the presence of transglycosidation product in reactions using pNPG (FIG. 6B). Since kinetic experiments showed lower inhibition of hydrolytic activity in reactions using a natural substrate than reactions using a synthetic substrate, it was suspected that TLC analysis under these conditions lacks the sensitivity to differentiate wild type and mutant BGLs.
EXAMPLE 8: Effect of engineered BGLs on cellulosic biomass hydrolysates
Cellulosic biomass hydrolysates can contain varying amounts of oligosaccharides and glucose depending how they were pre-treated and hydrolyzed. Strains producing GH3 BGLs of the present invention (engineered BGL1 derivatives) are tested on a collection of hydrolysates to determine their performance compared to the corresponding native BGL1.
EXAMPLE 9: Effect of varying amounts of engineered BGLs in standard enzyme cocktails
Enzyme cocktails contain varying supplemented amounts of BGL. Standard cocktails are tested with various amounts of GH3 BGLs of the present invention.
The scope of the claims should not be limited by the preferred embodiments set forth in the examples, but should be given the broadest interpretation consistent with the description as a whole.
REFERENCES
1. Ragauskas, A. J., Williams, C. K., Davison, B. H., Britovsek, G., Cairney, J., Eckert, C. A., Frederick, W. J., Hallett, J. P., Leak, D. J., Liotta, C. L, Mielenz, J. R, Murphy, R, Templer, R, and Tschaplinski, T. (2006) The path forward for biofuels and biomaterials. Science. 311, 484-9
2. Doran-Peterson, J., Jangid, A., Brandon, S. K., DeCrescenzo-Henriksen, E., Dien, B., and Ingram, L. 0. (2009) Simultaneous saccharification and fermentation and partial saccharification and co-fermentation of lignocellulosic biomass for ethanol production. Methods Wo/. Biol. 581, 263-80
3. Lynd, L. R., van Zyl, W. H., McBride, J. E., and Laser, M. (2005) Consolidated bioprocessing of cellulosic biomass: an update. Curr. Opin. Biotechnoi. 16, 577-83
4. Lynd, L. R., Weimer, P. J., van Zyl, W. H, and Pretorius, I. S. (2002) Microbial cellulose utilization: fundamentals and biotechnology. Microbiol. Mol. Biol. Rev. 66, 506-77
5. Bhat, M. K., and Bhat, S. (1997) Cellulose degrading enzymes and their potential industrial applications. Biotechnoi. Adv. 15, 583-620
6. Shewale, J. G. (1982) Beta-Glucosidase: its role in cellulase synthesis and hydrolysis of cellulose. Int. J. Biochem. 14, 435-43
7. Bohlin, C, Praestgaard, E., Baumann, M. J., Borch, K., Praestgaard, J., Monrad, R. N., and Westh, P. (2012) A comparative study of hydrolysis and transglycosylation activities of fungal β-glucosidases. Appl. Microbiol. Biotechnoi. 97, 159-169
8. Kawai, R., Igarashi, K., Kitaoka, M., Ishii, T., and Samejima, M. (2004) Kinetics of substrate transglycosylation by glycoside hydrolase family 3 glucan (1->3)-beta-glucosidase from the white-rot fungus Phanerochaete chrysosporium. Carbohydr. Res. 339, 2851-7
9. Christakopoulos, P., Goodenough, P. W., Kekos, D., Maoris, B. J., Claeyssens, M., and Bhat, M. K. (1994) Purification and Characterisation of an Extracellular beta-Glucosidase with Transglycosylation and Exo-glucosidase Activities from Fusarium oxysporum. Eur. J. Biochem. 224, 379-385
10. Xiao, Z., Zhang, X., Gregg, D. J., and Saddler, J. N. (2004) Effects of sugar inhibition on cellulases and beta-glucosidase during enzymatic hydrolysis of softwood substrates. Appl. Biochem. Biotechnoi. 113-116, 1 1 15-26
1 1. Chauve, M., Mathis, H., Hue, D., Casanave, D., Monot, F., Ferreira, N. L, and Lopes Ferreira, N. (2010) Comparative kinetic analysis of two fungal beta-glucosidases. Biotechnoi. Biofuels. 3, 3
12. Bohlin, C, Olsen, S. N, Morant, M. D, Patkar, S, Borch, K., and Westh, P. (2010) A comparative study of activity and apparent inhibition of fungal β-glucosidases. Biotechnoi. Bioeng. 107, 943-52
13. Rajasree, K. P., Mathew, G. M., Pandey, A., and Sukumaran, R. K. (2013) Highly glucose tolerant β- glucosidase from Aspergillus unguis: Nil 08123 for enhanced hydrolysis of biomass. J. Ind. Microbiol. Biotechnoi. 40, 967-75
14. Riou, C, Salmon, J. M., Vallier, M. J., Giinata, Z., and Barre, P. (1998) Purification, characterization, and substrate specificity of a novel highly glucose-tolerant beta-glucosidase from Aspergillus oryzae. Appl. Environ. Microbiol. 64, 3607-14
15. Giinata, Z., and Vallier, M. Production of a highly glucose-tolerant extracellular β-glucosidase by three Aspergillus strains. Biotechnoi. Lett. 21 , 219-223
16. Saha, B. C, and Bothast, R. J. (1996) Production, purification, and characterization of a highly glucose- tolerant novel beta-glucosidase from Candida peltata. Appl. Environ. Microbiol. 62, 3165-70
17. Calsavara, L. P., De Moraes, F. F., and Zanin, G. M. (1999) Modeling cellobiose hydrolysis with integrated kinetic models. Appl. Biochem. Biotechnoi. 77-79, 789-806
18. Decker, C. H., Visser, J., and Schreier, P. (2000) β-Glucosidases from Five Black Aspergillus Species: Study of Their Physico-Chemical and Biocatalytic Properties. J. Agric. Food Chem. 48, 4929-4936 19. Grous, W., Converse, A., Grethlein, H., and Lynd, L. (1985) Kinetics of cellobiose hydrolysis using cellobiase composites from Ttrichoderma reesei and Aspergillus niger. Biotechnol. Bioeng. 27, 463-70
20. Han, Y., and Chen, H. (2008) Characterization of beta-glucosidase from corn stover and its application in simultaneous saccharification and fermentation. Bioresour. Technol. 99, 6081-7
21. Hong, J., Ladisch, M. R., Gong, C, Wankat, P. C, and Tsao, G. T. (1981) Combined product and substrate inhibition equation for cellobiase. Biotechnol. Bioeng. 23, 2779-2788
22. Jeoh, T, Baker, J. O., Ali, M. K., Himmel, M. E., and Adney, W. S. (2005) Beta-D-glucosidase reaction kinetics from isothermal titration microcalorimetry. Anai. Biochem. 347, 244-53
23. Korotkova, O. G., Semenova, M. V., Morozova, V. V., Zorov, I. N., Sokolova, L. M., Bubnova, T. M., Okunev, O. N., and Sinitsyn, A. P. (2009) Isolation and properties of fungal β-glucosidases. Biochem. 74, 569-577
24. Krogh, K. B. R. M., Harris, P. V, Olsen, C. L, Johansen, K. S., Hojer-Pedersen, J., Borjesson, J., and Olsson, L. (2010) Characterization and kinetic analysis of a thermostable GH3 beta-glucosidase from Penicillium brasilianum. Appi. Microbioi. Biotechnoi. 86, 143-54
25. Seidle, H. F., Marten, I., Shoseyov, O., and Huber, R. E. (2004) Physical and kinetic properties of the family 3 beta-glucosidase from Aspergillus niger which is important for cellulose breakdown. Protein J. 23, 11-23
26. Seidle, H. F., and Huber, R. E. (2005) Transglucosidic reactions of the Aspergillus niger family 3 beta- glucosidase: qualitative and quantitative analyses and evidence that the transglucosidic rate is independent of pH. Arch. Biochem. Biophys. 436, 254-64
27. Seidle, H. F., Allison, S. J., George, E., and Huber, R. E. (2006) Trp-49 of the family 3 beta-glucosidase from Aspergillus niger is important for its transglucosidic activity: creation of novel beta-glucosidases with low transglucosidic efficiencies. Arch. Biochem. Biophys. 455, 1 10-8
28. Wallecha, A., and Mishra, S. (2003) Purification and characterization of two β-glucosidases from a thermo- tolerant yeast Pichia etchellsii. Biochim. Biophys. Acta - Proteins Proteomics. 1649, 74-84
29. Singhania, R. R., Patel, A. K., Sukumaran, R. K., Larroche, C, and Pandey, A. (2013) Role and significance of beta-glucosidases in the hydrolysis of cellulose for bioethanol production. Bioresour. Techno!. 127, 500-7
30. Saitoh, S., Tanaka, T., and Kondo, A. (2011 ) Co-fermentation of cellulose/xylan using engineered industrial yeast strain OC-2 displaying both β-glucosidase and β-xylosidase. Appi. Microbioi. Biotechnoi. 91 , 1553-9
31. Guo, Z.-P., Zhang, L, Ding, Z.-Y., Gu, Z.-H., and Shi, G.-Y. (201 1 ) Development of an industrial ethanol- producing yeast strain for efficient utilization of cellobiose. Enzyme Microb. Technoi. 49, 105-12
32. Fujita, Y., Takahashi, S., Ueda, M., Tanaka, A., Okada, H., Morikawa, Y., Kawaguchi, T., Arai, M., Fukuda, H., and Kondo, A. (2002) Direct and efficient production of ethanol from cellulosic material with a yeast strain displaying cellulolytic enzymes. Appi. Environ. Microbioi. 68, 5136-41
33. Kotaka, A., Bando, H., Kaya, M., Kato-Murai, M., Kuroda, K., Sahara, H., Hata, Y, Kondo, A., and Ueda, M. (2008) Direct ethanol production from barley beta-glucan by sake yeast displaying Aspergillus oryzae beta- glucosidase and endoglucanase. J. Biosci. Bioeng. 105, 622-7
34. Jeon, E., Hyeon, J. eun, Eun, L. S., Park, B.-S., Kim, S. W., Lee, J., and Han, S. O. (2009) Cellulosic alcoholic fermentation using recombinant Saccharomyces cerevisiae engineered for the production of Clostridium cellulovorans endoglucanase and Saccharomycopsis fibuligera beta-glucosidase. FEMS Microbioi. Lett. 301 , 130-6
35. Tsai, S.-L, Goyal, G., and Chen, W. (2010) Surface display of a functional minicellulosome by intracellular complementation using a synthetic yeast consortium and its application to cellulose hydrolysis and ethanol production. Appi. Environ. Microbioi. 76, 7514-20
36. Kim, S., Baek, S.-H., Lee, K., and Hahn, J.-S. (2013) Cellulosic ethanol production using a yeast consortium displaying a minicellulosome and β-glucosidase. Microb. Ceii Fact. 12, 14
37. Katahira, S., Mizuike, A., Fukuda, H., and Kondo, A. (2006) Ethanol fermentation from lignocellulosic hydrolysate by a recombinant xylose- and cellooligosaccharide-assimilating yeast strain. Appi. Microbioi. Biotechnoi. 72, 1 136^3 38. Fujita, Y., Ito, J., Ueda, M., Fukuda, H., and Kondo, A. (2004) Synergistic Saccharification, and Direct Fermentation to Ethanol, of Amorphous Cellulose by Use of an Engineered Yeast Strain Codisplaying Three Types of Cellulolytic Enzyme. Appl. Environ. Microbiol. 70, 1207-12
39. Jeon, E., Hyeon, J.-E., Suh, D. J., Suh, Y.-W., Kim, S. W., Song, K. H., and Han, S. O. (2009) Production of cellulosic ethanol in Saccharomyces cerevisiae heterologous expressing Clostridium thermocellum endoglucanase and Saccharomycopsis fibuligera beta-glucosidase genes. Wo/. Cells. 28, 369-73
40. Chang, J.-J., Ho, F.-J., Ho, C.-Y., Wu, Y.-C, Hou, Y.-H., Huang, C.-C, Shih, M.-C, and Li, W.-H. (2013) Assembling a cellulase cocktail and a cellodextrin transporter into a yeast host for CBP ethanol production. Biotechnol. Biofuels. 6, 19
41. Wen, F., Sun, J., and Zhao, H. (2010) Yeast surface display of trifunctional minicellulosomes for simultaneous saccharification and fermentation of cellulose to ethanol. Appl. Environ. Microbiol. 76, 1251-60
42. Van Rooyen, R., Hahn-Hagerdal, B., La Grange, D. C, and van Zyl, W. H. (2005) Construction of cellobiose-growing and fermenting Saccharomyces cerevisiae strains. J. Biotechnol. 120, 284-95
43. Wilde, C, Gold, N. D., Bawa, N., Tambor, J. H. M., Mougharbel, L, Storms, R., and Martin, V. J. J. J. (2012) Expression of a library of fungal β-glucosidases in Saccharomyces cerevisiae for the development of a biomass fermenting strain. Appl. Microbiol. Biotechnol. 95, 647-59
44. Yamada, R., Taniguchi, N., Tanaka, T., Ogino, C, Fukuda, H., and Kondo, A. (2011 ) Direct ethanol production from cellulosic materials using a diploid strain of Saccharomyces cerevisiae with optimized cellulase expression. Biotechnol. Biofuels. 4, 8
45. Tsai, S.-L, Oh, J., Singh, S., Chen, R., and Chen, W. (2009) Functional assembly of minicellulosomes on the Saccharomyces cerevisiae cell surface for cellulose hydrolysis and ethanol production. Appl. Environ. Microbiol. 75, 6087-93
46. Machida, M., Ohtsuki, I., Fukui, S., and Yamashita, I. (1988) Nucleotide sequences of Saccharomycopsis fibuligera genes for extracellular beta-glucosidases as expressed in Saccharomyces cerevisiae. Appl. Environ. Microbiol. 54, 3147-55
47. Wood, B. E., and Ingram, L. O. (1992) Ethanol production from cellobiose, amorphous cellulose, and crystalline cellulose by recombinant Klebsiella oxytoca containing chromosomally integrated Zymomonas mobilis genes for ethanol production and plasmids expressing thermostable cellulase genes fr. Appl. Environ. Microbiol. 58, 2103-10
48. Liu, M., and Yu, H. (2012) Cocktail production of an endo^-xylanase and a β-glucosidase from Trichoderma reesei QM 9414 in Escherichia coli. Biochem. Eng. J. 68, 1-6
49. Tang, H., Hou, J., Shen, Y., Xu, L, Yang, H., Fang, X., and Bao, X. (2013) High β-glucosidase secretion in Saccharomyces cerevisiae improves the efficiency of cellulase hydrolysis and ethanol production in simultaneous saccharification and fermentation. J. Microbiol. Biotechnol. 23, 1577-85
50. Van Zyl, W. H., Bloom, M., and Viktor, M. J. (2012) Engineering yeasts for raw starch conversion. Appl. Microbiol. Biotechnol. 95, 1377-88
51. Nakazawa, H., Kawai, T., Ida, N., Shida, Y., Kobayashi, Y., Okada, H., Tani, S., Sumitani, J. -I., Kawaguchi, T., Morikawa, Y., and Ogasawara, W. (2012) Construction of a recombinant Trichoderma reesei strain expressing Aspergillus aculeatus β-glucosidase 1 for efficient biomass conversion. Biotechnol. Bioeng. 109, 92-9
52. McBride, J. E., Zietsman, J. J., Van Zyl, W. H., and Lynd, L. R. (2005) Utilization of cellobiose by recombinant β-glucosidase-expressing strains of Saccharomyces cerevisiae: characterization and evaluation of the sufficiency of expression. Enzyme Microb. Technol. 37, 93-101
53. Yanase, S., Yamada, R., Kaneko, S., Noda, H., Hasunuma, T., Tanaka, T., Ogino, C, Fukuda, H., and Kondo, A. (2010) Ethanol production from cellulosic materials using cellulase-expressing yeast. Biotechnol. J. 5, 449-55
54. Den Haan, R., Rose, S. H., Lynd, L. R., and van Zyl, W. H. (2007) Hydrolysis and fermentation of amorphous cellulose by recombinant Saccharomyces cerevisiae. Metab. Eng. 9, 87-94 55. Ma, L, Zhang, J., Zou, G., Wang, C, and Zhou, Z. (2011 ) Improvement of cellulase activity in Trichoderma reesei by heterologous expression of a beta-glucosidase gene from Penicillium decumbens. Enzyme Microb. Technol. 49, 366-71
56. Dashtban, M., and Qin, W. (2012) Overexpression of an exotic thermotolerant β-glucosidase in trichoderma reesei and its significant increase in cellulolytic activity and saccharification of barley straw. Microb. Cell Fact. 11, 63
57. Park, E.-H., Shin, Y.-M., Lim, Y.-Y, Kwon, T.-H, Kim, D.-H, and Yang, M.-S. (2000) Expression of glucose oxidase by using recombinant yeast. J. Biotechnol. 81 , 35-44
58. Yoneda, A., Kuo, H.-W. D., Ishihara, M, Azadi, P., Yu, S.-M., and Ho, T. D. (2014) Glycosylation variants of a β-glucosidase secreted by a Taiwanese fungus, Chaetomella raphigera, exhibit variant-specific catalytic and biochemical properties. PLoS One. 9, e 106306
59. Tiwari, P., Misra, B. N., and Sangwan, N. S. (2013) β -Glucosidases from the fungus trichoderma: an efficient cellulase machinery in biotechnological applications. Biomed Res. Int. 2013, 203735
60. Sorensen, A., Lubeck, P. S., Liibeck, M., Teller, P. J., and Ahring, B. K. (201 1) β-glucosidases from a new Aspergillus species can substitute commercial β-glucosidases for saccharification of lignocellulosic biomass. Can. J. Microbiol. 57, 638-50
61. Parry, N. J., Beever, D. E., Owen, E., Vandenberghe, I., Beeumen, J. V. A. N., and Bhat, M. K. (2001 ) β- glucosidase purified from Thermoascus aurantiacus. 127, 117-127
62. PEREZ-PONS, J. A., CAYETANO, A., REBORDOSA, X., LLOBERAS, J., GUASCH, A., and QUEROL, E. (1994) A beta-glucosidase gene (bgl3) from Streptomyces sp. strain QM-B814. Molecular cloning, nucleotide sequence, purification and characterization of the encoded enzyme, a new member of family 1 glycosyl hydrolases. Eur. J. Biochem. 223, 557-565
63. Himmel, M. E., Adney, W. S., Fox, J. W., Mitchell, D. J., and Baker, J. O. (1993) Isolation and characterization of two forms of β-d-glucosidase fromAspergillus niger. Appl. Biochem. Biotechnol. 39-40, 213-225
64. Dan, S., Marton, I., Dekel, M., Bravdo, B. A., He, S., Withers, S. G., and Shoseyov, O. (2000) Cloning, expression, characterization, and nucleophile identification of family 3, Aspergillus niger beta-glucosidase. J. Biol. Chem. 275, 4973-80
65. Araujo, E. F., Barros, E. G., Caldas, R. A., and Silva, D. O. (1983) Beta-glucosidase activity of a thermophilic cellulolytic fungus.Humicola sp. Biotechnol. Lett. 5, 781-784
66. Sorensen, A., Ahring, B. K., Lubeck, M., Ubhayasekera, W., Bruno, K. S., Culley, D. E., and Lubeck, P. S.
(2012) Identifying and characterizing the most significant β-glucosidase of the novel species Aspergillus saccharolyticus. Can. J. Microbiol. 58, 1035-46
67. Chen, H.-L, Chen, Y.-C, Lu, M.-Y. J., Chang, J.-J., Wang, H.-T. C, Ke, H.-M., Wang, T.-Y., Ruan, S.-K., Wang, T.-Y., Hung, K.-Y., Cho, H.-Y., Lin, W.-T., Shih, M.-C, and Li, W.-H. (2012) A highly efficient β-glucosidase from the buffalo rumen fungus Neocallimastix patriciarum W5. Biotechnol. Biofuels. 5, 24
68. Langston, J., Sheehy, N., and Xu, F. (2006) Substrate specificity of Aspergillus oryzae family 3 beta- glucosidase. Biochim. Biophys. Acta. 1764, 972-8
69. Gundllapalli, S. B., Pretorius, I. S., and Cordero Otero, R. R. (2007) Effect of the cellulose-binding domain on the catalytic activity of a beta-glucosidase from Saccharomycopsis fibuligera. J. Ind. Microbiol. Biotechnol. 34, 413-21
70. Kotaka, A., Sahara, H., Kuroda, K., Kondo, A., Ueda, M., and Hata, Y. (2010) Enhancement of beta- glucosidase activity on the cell-surface of sake yeast by disruption of SED1. J. Biosci. Bioeng. 109, 442-6
71. Inokuma, K., Hasunuma, T., and Kondo, A. (2014) Efficient yeast cell-surface display of exo- and endo- cellulase using the SED1 anchoring region and its original promoter. Biotechnol. Biofuels. 7, 8
72. Suzuki, K., Sumitani, J., Nam, Y., Nishimaki, T., Tani, S., Wakagi, T., Kawaguchi, T., and Fushinobu, S.
(2013) Crystal structures of glycoside hydrolase family 3 β-glucosidase 1 from Aspergillus aculeatus. Biochem. J. 452, 21 1-221 73. Pozzo, T., Pasten, J. L, Karlsson, E. N., and Logan, D. T. (2010) Structural and functional analyses of beta- glucosidase 3B from Thermotoga neapolitana: a thermostable three-domain representative of glycoside hydrolase 3. J. Wo/. Biol. 397, 724-39
74. Jeng, W.-Y., Wang, N.-C, Lin, M.-H., Lin, C.-T., Liaw, Y.-C, Chang, W.-J., Liu, C.-l., Liang, P.-H., and Wang, A. H.-J. (201 1) Structural and functional analysis of three β-glucosidases from bacterium Clostridium cellulovorans, fungus Trichoderma reesei and termite Neotermes koshunensis. J. Struct. Biol. 173, 46-56
75. Sorensen, A., Lubeck, M., Lubeck, P. S., and Ahring, B. K. (2013) Fungal Beta-glucosidases: a bottleneck in industrial use of lignocellulosic materials. Biomolecules. 3, 612-31
76. Johannes, T. W., and Zhao, H. (2006) Directed evolution of enzymes and biosynthetic pathways. Curr. Opin. Microbiol. 9, 261-7
77. Farinas, E. T., Bulter, T., and Arnold, F. H. (2001 ) Directed enzyme evolution. Curr. Opin. Biotechnol. 12, 545-51
78. Murashima, K., Kosugi, A., and Doi, R. H. (2002) Thermostabilization of cellulosomal endoglucanase EngB from Clostridium cellulovorans by in vitro DNA recombination with non-cellulosomal endoglucanase EngD. Wo/. Microbiol. 45, 617-626
79. Kim, Y.-S., Jung, H.-C, and Pan, J.-G. (2000) Bacterial Cell Surface Display of an Enzyme Library for Selective Screening of Improved Cellulase Variants. Appl. Environ. Microbiol. 66, 788-793
80. Wang, T, Liu, X., Yu, Q., Zhang, X., Qu, Y, Gao, P., and Wang, T. (2005) Directed evolution for engineering pH profile of endoglucanase III from Trichoderma reesei. Biomol. Eng. 22, 89-94
81. Arrizubieta, M. J., and Polaina, J. (2000) Increased thermal resistance and modification of the catalytic properties of a beta-glucosidase by random mutagenesis and in vitro recombination. J. Biol. Chem. 275, 28843-8
82. Lebbink, J. H., Kaper, T., Bron, P., van der Oost, J., and de Vos, W. M. (2000) Improving low-temperature catalysis in the hyperthermostable Pyrococcus furiosus beta-glucosidase CelB by directed evolution. Biochemistry. 39, 3656-65
83. Gonzalez-Blasco, G., Sanz-Aparicio, J., Gonzalez, B., Hermoso, J. A., and Polaina, J. (2000) Directed evolution of beta -glucosidase A from Paenibacillus polymyxa to thermal resistance. J. Biol. Chem. 275, 13708-12
84. Hardiman, E., Gibbs, M., Reeves, R., and Bergquist, P. (2010) Directed evolution of a thermophilic beta- glucosidase for cellulosic bioethanol production. Appl. Biochem. Biotechnol. 161, 301-12
85. McCarthy, J. K., Uzelac, A., Davis, D. F., and Eveleigh, D. E. (2004) Improved catalytic efficiency and active site modification of 1 ,4-beta-D-glucan glucohydrolase A from Thermotoga neapolitana by directed evolution. J. Biol. Chem. 279, 1 1495-502
86. Pei, X.-Q., Yi, Z.-L, Tang, C.-G., and Wu, Z.-L. (2011 ) Three amino acid changes contribute markedly to the thermostability of β-glucosidase BgIC from Thermobifida fusca. Bioresour. Technol. 102, 3337-42
87. Fossati, E., Ekins, A., Narcross, L, Zhu, Y., Falgueyret, J. -P., Beaudoin, G. A. W., Facchini, P. J., and Martin, V. J. J. (2014) Reconstitution of a 10-gene pathway for synthesis of the plant alkaloid dihydrosanguinarine in Saccharomyces cerevisiae. Nat. Commun. 5, 3283
88. Sievers, F., Wilm, A., Dineen, D., Gibson, T. J., Karplus, K., Li, W., Lopez, R., McWilliam, H., Remmert, M., Soding, J., Thompson, J. D., and Higgins, D. G. (201 1) Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Wo/. Syst. Biol. 7, 539
89. Kelley, L. A., Mezulis, S., Yates, C. M., Wass, M. N., and Sternberg, M. J. E. (2015) The Phyre2 web portal for protein modeling, prediction and analysis. Nat. Protoc. 10, 845-858
90. Eriksen, D. T., Hsieh, P. C. H., Lynn, P., and Zhao, H. (2013) Directed evolution of a cellobiose utilization pathway in Saccharomyces cerevisiae by simultaneously engineering multiple proteins. Microb. Cell Fact. 12, 61
91. Du, J., Yuan, Y., Si, T., Lian, J., and Zhao, H. (2012) Customized optimization of metabolic pathways by combinatorial transcriptional engineering. Nucleic Acids Res. 40, e142 92. Kim, B., Du, J., Eriksen, D. T., and Zhao, H. (2013) Combinatorial design of a highly efficient xylose-utilizing pathway in Saccharomyces cerevisiae for the production of cellulosic biofuels. Appl. Environ. Microbiol. 79, 931-41
93. Romero, P. A., and Arnold, F. H. (2009) Exploring protein fitness landscapes by directed evolution. Nat. Rev. Wo/. Cell Biol. 10, 866-76
94. Crombie, H. J., Chengappa, S., Hellyer, A., and Reid, J. S. G. (1998) A xyloglucan oligosaccharide-active, transglycosylating-D-glucosidase from the cotyledons of nasturtium (Tropaeolum majus L) seedlings - purification, properties and characterization of a cDNA clone. Plant J. 15, 27-38
95. Brake, A. J., Merryweather, J. P., Coit, D. G., Heberlein, U. A., Masiarz, F. R., Mullenbach, G. T., Urdea, M. S., Valenzuela, P., and Barr, P. J. (1984) Alpha-factor-directed synthesis and secretion of mature foreign proteins in Saccharomyces cerevisiae. Proc. Natl. Acad. Sci. U. S. A. 81, 4642-6
96. Kjaerulff, S., and Jensen, M. R. (2005) Comparison of different signal peptides for secretion of heterologous proteins in fission yeast. Biochem. Biophys. Res. Commun. 336, 974-82
97. Lee, M. A., Cheong, K. H., Shields, D., Park, S. D., and Hong, S. H. (2002) Intracellular trafficking and metabolic turnover of yeast prepro-alpha-factor-SRIF precursors in GH3 cells. Exp. Wo/. Wed. 34, 285-93
98. Fitzgerald, I., and Glick, B. S. (2014) Secretion of a foreign protein from budding yeasts is enhanced by cotranslational translocation and by suppression of vacuolar targeting. Microb. Cell Fact. 13, 125
99. Lin-Cereghino, G. P., Stark, C. M., Kim, D., Chang, J., Shaheen, N., Poerwanto, H., Agari, K., Moua, P., Low, L. K., Tran, N., Huang, A. D., Nattestad, M., Oshiro, K. T, Chang, J. W., Chavan, A., Tsai, J. W, and Lin- Cereghino, J. (2013) The effect of a-mating factor secretion signal mutations on recombinant protein expression in Pichia pastoris. Gene. 519, 31 1-7
100. Ahmad, M., Hirz, M., Pichler, H., and Schwab, H. (2014) Protein expression in Pichia pastoris: recent achievements and perspectives for heterologous protein production. Appl. Microbiol. Biotechnol. 98, 5301-17
101. Rakestraw, J. A., Sazinsky, S. L, Piatesi, A., Antipov, E., and Wittrup, K. D. (2009) Directed evolution of a secretory leader for the improved expression of heterologous proteins and full-length antibodies in Saccharomyces cerevisiae. Biotechnol. Bioeng. 103, 1192-1201.

Claims

CLAIMS:
1. A glycoside hydrolase family 3 (GH3) beta-glucosidase (BGL) polypeptide, comprising:
(e) a GH3 BGL triosephosphateisomerase domain comprising a sequence as set forth in KGY1Y2Y3Y4LGP (SEQ ID NO: 251);
(f) a GH3 BGL coordinating loop domain comprising a sequence as set forth in:
GLDMXiMPX2X3X4X5X6X7X8X9Xi9XiiXi2Xi3Xi4Xi5Xi6 (SEQ ID NO: 252), except that at least one of Xi to X16, when present, is smaller than a corresponding reference amino acid residue in a corresponding GH3 BGL reference coordinating loop domain comprising a sequence as set forth in
GLDMXirefMPX2refX3refX4refX5refX6refX7refX8refX9refXl9refXl1refXl2refXl3refXl4refXl5refXl6ref (SEQ ID NO: 252);
(g) a GH3 BGL β/α sandwich domain comprising a sequence as set forth in: Z1Z2Z3Z4Z5Z6 (SEQ ID NO:
253); and/or
(h) a GH3 BGL β/α sandwich domain comprising a sequence as set forth in: A1A2A3A4A5 (SEQ ID NO: 254) and a sequence as set forth in B1B2B3B4B5 (SEQ ID NO: 255),
with the proviso that the GH3 BGL polypeptide is not a polypeptide as set forth in any one of SEQ ID NOs: 22 to 44, and is not a secreted form of the polypeptide of any one of SEQ ID NOs: 22 to 44.
2. The GH3 BGL polypeptide of claim 1 , wherein at least one of e to Xs is smaller than the corresponding reference amino acid residue.
3. The GH3 BGL polypeptide of claim 1 , wherein at least one of Χε to Xs is independently C, V, A, G, S, P, T, D or N.
4. The GH3 BGL polypeptide of claim 1 , wherein X8 is C, V, A, G, S, P, T, D or N.
5. The GH3 BGL polypeptide of claim 1 , wherein Χδ is C, V, A, or G.
6. The GH3 BGL polypeptide of any one of claims 1 to 5, wherein
GLDMXirefMPX2refX3refX4refX5refX6refX7refX8refX9refXl9refXl1refXl2refXl3refXl4refXl5refXl6ref (SEQ ID NO: 252) is:
(xxiv) gldmsmpgsaydmfgdfyg (SEQ ID NO: 256);
(xxv) gldmtmpgditfsndsyfg (SEQ ID NO: 257);
(xxvi) gldmtmpgditfsgdsyfg (SEQ ID NO: 258);
(xxvii) gldmtmpgditfsndsyfg (SEQ ID NO: 259);
(xxviii) gdmdmpgdvsgdsstyfg (SEQ ID NO: 260);
(xxix) gldmtmpgditfhsndsyfg (SEQ ID NO: 261);
(xxx) gldmtmpgdivyhsnnsyfg (SEQ ID NO: 262);
(xxxi) gldmtmpggvtvtstdsyfg (SEQ ID NO: 263);
(xxxii) gldmtmpgdvlccsrqegslwg (SEQ ID NO: 264);
(xxxiii) gldmtmpgditfnsgtswwg (SEQ ID NO: 265);
(xxxiv) gldmsmpgdvyfnsntsywr (SEQ ID NO: 266);
(xxxv) gldmtmpgglnfdgsgpywr (SEQ ID NO: 267); (xxxvi) gldmdmpceaqyfg (SEQ ID NO: 268);
(xxxvii) gldmsmpgevyggwntgtsfwg (SEQ ID NO: 269);
(xxxviii) gldmsmpgellggwntgksywg (SEQ ID NO: 270);
(xxxix) gldmsmpgdglhwadgrslwg (SEQ ID NO: 271 );
(xl) gldmsmpgdisfddglsfwg (SEQ ID NO: 272);
(xli) gldmsmpgdvtfdsgtsfwg (SEQ ID NO: 273);
(xlii) gldmsmpgditfdsatsfwg (SEQ ID NO: 274);
(xliii) gldmsmpgdvdydsgtsywg (SEQ ID NO: 275);
(xliv) gldmtmpgdtefntgfsfwg (SEQ ID NO: 276);
(xlv) gldmsmpgdtmfnsgrsywg (SEQ ID NO: 277); or
(xlvi) gldmsmpgdtefntglsfwg (SEQ ID NO: 278).
7. The GH3 BGL polypeptide of any one of claims 1 to 6, which, except for residues as defined in any one of claims 1 to 6 and for the proviso defined in claim 1 , comprises a polypeptide as set forth in any one of SEQ ID NOs: 163- 167, or comprises a secreted form a polypeptide as set forth in any one of SEQ ID NOs: 163-167.
8. The GH3 BGL polypeptide of any one of claims 1 to 6, which, except for amino acid residues defined in any one of claims 1 to 6 and for the proviso defined in claim 1 , comprises a polypeptide as set forth in any one of SEQ ID NOs: 22-44, or comprises a secreted form of the polypeptide as set forth in any one of SEQ ID NOs: 22-44.
9. The GH3 BGL polypeptide of claim 1 , wherein:
(e) in KGY1Y2Y3Y4LGP (SEQ ID NO: 251 ):
Y3 is I or V;
(f) the GH3 BGL coordinating loop domain comprises a sequence as set forth in: GLDMXrMPGX2X3'X4'X5'X6'X7'X8'X9'XwXi XizXi3Xi4'Xi5'Xi6' (SEQ ID NO: 279), except that at least one of Xr to X16', when present, is smaller than a corresponding reference amino acid residue in a corresponding GH3 BGL reference coordinating loop domain comprising a sequence as set forth in
GLDMXTrefM PGX2refX refX4,refX5,refX6,refX7 ^ ID NO: 279);
(g) in ZiZ2Z3Z4Z5Z6 (SEQ ID NO: 253):
Zi is N. V, A, K, Q, T, S, E, F, or Y;
Z2 is L, E, Y, S, K, Q, W, P, A or T;
Figure imgf000062_0001
Z5 is D, K, R or N; and
Ze is N, D or A; and/or
(h) in: AiA2A3A4A5 (SEQ ID NO: 254) and B1B2B3B4B5 (SEQ ID NO: 255):
Figure imgf000062_0002
A4 is S, F, M, D, G, Q or T; B2 is V, M, l or L;
Figure imgf000063_0001
Β4· is Q, P, S, R, E or absent;
with the proviso that the GH3 BGL polypeptide is not a polypeptide as set forth in any one of SEQ ID NOs: 22 to 44, and is not a secreted form of the polypeptide of any one of SEQ ID NOs: 22 to 44.
10. The GH3 BGL polypeptide of claim 9, wherein at least one of X4 < to X$ is smaller than the corresponding reference amino acid residue.
11. The GH3 BGL polypeptide of claim 9, wherein at least one of the amino acid residues X4 to X$ is C, V, A, G, S, P, T, D or N.
12. The GH3 BGL polypeptide of claim 9, wherein X6 < is C, V, A, G, S, P, T, D or N.
13. The GH3 BGL polypeptide of claim 9, wherein X$ is C, V, A or G.
14. The GH3 BGL polypeptide of any one of claims 9 to 13, wherein
GLDMXiYef PGX2YefX3YefX4YefX5YefX6YefX7'refX8'refX9'refXl0'refXl1'refXl2'refXl3'refXl4'refXl5'refXl6'ref (SEQ ID NO: 279) IS.'
(i) gldmsmpgsaydgmfgdfyg (SEQ ID NO: 256);
(ii) gldmsmpgdvyfnsntsywr (SEQ ID NO: 266);
(iii) gldmtmpgdvlccsrqegslwg (SEQ ID NO: 264);
(iv) gldmtmpgditfhsndsyfg (SEQ ID NO: 257);
(v) gldmtmpgditflsgdsyfg (SEQ ID NO: 258);
(vi) gldmtmpgditfnsndsyfg (SEQ ID NO: 259);
(vii) gldmdmpgdvsgsdsstyfg (SEQ ID NO: 260);
(viii) gldmtmpgditfhsndsyfg (SEQ ID NO: 261 );
(ix) gldmtmpgdivyhsnnsyfg (SEQ ID NO: 262);
(x) gldmtmpggvtvtstdsyfg (SEQ ID NO: 263);
(xi) gldmsmpgdglhwadgrslwg (SEQ ID NO: 271);
(xii) gldmsmpgdisfddglsfwg (SEQ ID NO: 272);
(xiii) gldmsmpgditfdsatsfwg (SEQ ID NO: 274);
(xiv) gldmsmpgdvdydsgtsywg (SEQ ID NO: 275);
(xv) gldmsmpgdtmfnsgrsywg (SEQ ID NO: 277);
(xvi) gldmtmpgdtefntgfsfwg (SEQ ID NO: 276); or
(xvii) gldmsmpgdvtfdsgtsfwg (SEQ ID NO: 273).
15. The GH3 BGL polypeptide of any one of claims 9 to 14, which except, for residues defined in any one of claims 9 to 14 and for the proviso defined in claim 9, comprises a polypeptide as set forth in SEQ ID NO: 164, or comprises a secreted form the polypeptide as set forth in SEQ ID NO: 164.
16. The GH3 BGL polypeptide of any one of claims 9 to 15, which, except for residues defined in any one of claims 9 to 14 and for the proviso defined in claim 9, comprises a polypeptide as set forth in any one of SEQ ID NO: 22-37 and 41 , or comprises a secreted form of the polypeptide as set forth in any one of SEQ ID NO: 22-37 and 41.
17. The GH3 BGL of claim 1 , wherein:
(e) in KGY1Y2Y3Y4LGP (SEQ ID NO: 251 ):
Yi and Y4 are as defined in claim 1 ; and
Y2 and Y3 are as defined in claim 9;
(f) the GH3 BGL coordinating loop domain comprises a sequence as set forth in: GLDMXrMPGDX2"X3"X4»X5"X6"XrX8"SX9"WG (SEQ ID NO: 280), except that at least one of Xr to X9 «, when present, is smaller than a corresponding reference amino acid residue in a corresponding GH3 BGL reference coordinating loop domain comprising a sequence as set forth in
Figure imgf000064_0001
(g) in ZiZ2Z3Z4Z5Z6 (SEQ ID NO: 253):
Z2 is A, T or S;
Z3 is as defined in claim 1 ;
Z5 is D, R or N; and
Ze is N or S; and/or
(h) in AiA2A3A4A5 (SEQ ID NO: 254) and in BiB2B3B4B5 (SEQ ID NO: 255):
A2 is A or V; A5 is S, A, Q or P, B2 is I, V or L;
B3 is D or E; B5 is W,
with the proviso that the GH3 BGL polypeptide is not a polypeptide as set forth in any one of the sequences of SEQ ID NOs: 22 to 44, and is not a secreted form of the polypeptide of any one of SEQ ID NOs: 22 to 44.
18. The GH3 BGL polypeptide of claim 17, wherein at least one of X2- to X4" is smaller than the corresponding amino acid residue.
19. The GH3 BGL polypeptide of claim 17, wherein at least one of X2 < to X4 << is C, V, A, G, S, P, T, D or N.
20. The GH3 BGL polypeptide of claim 17, wherein X4 << is C, V, A, G, S, P, T, D or N.
21. The GH3 BGL polypeptide of claim 17, wherein X4 << is C, V, A, or G.
22. The GH3 BGL polypeptide of any one of claims 17 to 21 , wherein
Figure imgf000065_0001
(i) gldmsmpgdisfddglsfwg (SEQ ID NO: 272);
(ii) gldmsmpgdvtfdsgtsfwg (SEQ ID NO: 273);
(iii) gldmsmpgditfdsatsfwg (SEQ ID NO: 274);
(iv) gldmsmpgdvdydsgtsywg (SEQ ID NO: 275);
(v) gldmsmpgdtmfnsgrsywg (SEQ ID NO: 277); or
(vi) gldmtmpgdtefntgfsfwg (SEQ ID NO: 276).
23. The GH3 BGL polypeptide of any one of claims 17 to 22, which, except for residues defined in any one of claims 17 to 22 or for the proviso defined in claim 17, comprises a polypeptide as set forth in SEQ ID NO: 165, or comprises a secreted form of the polypeptide as set forth in SEQ ID NO: 165.
24. The GH3 BGL polypeptide of any one of claims 17 to 23, which, except for residues defined in any one of claims 17 to 22 or for the proviso defined in claim 17, comprises a polypeptide as set forth in any one of SEQ ID NOs: 22-23, 25-27 and 41 , or comprises a secreted form of the polypeptide as set forth in any one of SEQ ID NOs: 22- 23, 25-27 and 41.
25. The GH3 BGL of claim 17, wherein:
(e) KGY1Y2Y3Y4LGP (SEQ ID NO: 251 ) is as defined in claim 17;
(f) GLDMXrMPGDX2"X3"X4»X5"X6"XrX8"SX9"WG (SEQ ID NO: 280) is as defined in any one of claims 17 to 24;
(g) in ZiZ2Z3Z4Z5Z6 (SEQ ID NO: 253),
Zi-Z5 are as defined in claim 17; and
(h) in AiA2A3A4A5 (SEQ ID NO: 254) and in BiB2B3B4B5 (SEQ ID NO: 255):
A2-A5 are as defined in claim 17; and
BH¾ are as defined in claim 17,
with the proviso that the GH3 BGL polypeptide is not a polypeptide as set forth in any one of SEQ ID NOs: 22 to 44, and is not a secreted form of the polypeptide of any one of SEQ ID NOs: 22 to 44.
26. The GH3 BGL polypeptide Of Claim 25, Wherein GLDMXi"refMPGDX2"refX3"refX4"refX5'refX6"refX7"refX8'refSX9"refWG
(SEQ ID NO: 280) is:
(i) gldmsmpgdisfddglsfwg (SEQ ID NO: 273);
(ii) gldmsmpgditfdsatsfwg (SEQ ID NO: 275);
(iii) gldmsmpgdvdydsgtsywg (SEQ ID NO: 276);
(iv) gldmsmpgdtmfnsgrsywg (SEQ ID NO: 278); or (v) gldmtmpgdtefntgfsfwg (SEQ ID NO: 277).
27. The GH3 BGL polypeptide of claim 25 or 26, which, except for residues defined in claim 25 or 26 or for the proviso defined in claim 25, comprises a polypeptide as set forth in SEQ ID NO: 166, or comprises a secreted form of the polypeptide as set forth in SEQ ID NO: 166.
28. The GH3 BGL polypeptide of any one of claims 25 to 27, which except for residues defined in claim 25 or 26, comprises a polypeptide as set forth in any one of SEQ ID NOs: 22-23 and 25-27, or comprises a secreted form of the polypeptide as set forth in any one of SEQ ID NOs: 22-23 and 25-27.
29. The GH3 BGL of claim 17, wherein:
(e) in KGY1Y2Y3Y4LGP (SEQ ID NO: 251 ):
Y3 IS as defined in claim 9;
(f) in GLDMXi"MPGDX2"X3"X4"X5"X6"X7"X8"SX9"WG (SEQ ID NO: 280):
Xr is S;
X2" is I or V;
X3 << is S, T or D;
X4" is F or Y;
X5" is D;
X6 << is D or S;
Xr is G or A;
Figure imgf000066_0001
(g) in Z1Z2Z3Z4Z5Z6 (SEQ ID NO: 253), wherein:
Figure imgf000066_0002
Ze is N or S; and/or
(h) in AiA2A3A4A5 (SEQ ID NO: 254) and in BiB2B3B4B5 (SEQ ID NO: 255):
Figure imgf000066_0003
A5 is S or A, B2 is I or V; B3 is D or E; B5 is W,
with the proviso that the GH3 BGL polypeptide is not a polypeptide as set forth in any one of SEQ ID NOs: 22 to 44, and is not a secreted form of the polypeptide of any one of SEQ ID NOs: 22 to 44.
30. The GH3 BGL polypeptide of claim 29, wherein at least one of X2" to X4" is smaller than the corresponding amino acid residue.
31. The GH3 BGL polypeptide of claim 29, wherein at least one of X2 << to X4 << is C, V, A, G, S, P, T, D or N.
32. The GH3 BGL polypeptide of claim 29, wherein X4 << is C, V, A, G, S, P, T, D or N.
33. The GH3 BGL polypeptide of claim 29, wherein X4" is C, V, A, or G.
34. The GH3 BGL polypeptide of any one of claims 29 to 33, wherein
Figure imgf000067_0001
(SEQ ID NO: 280) is:
(i) gldmsmpgdisfddglsfwg (SEQ ID NO: 272);
(ii) gldmsmpgdvtfdsgtsfwg (SEQ ID NO: 273);
(iii) gldmsmpgditfdsatsfwg (SEQ ID NO: 274); or
(iv) gldmsmpgdvdydsgtsywg (SEQ ID NO: 275).
35. The GH3 BGL polypeptide of any one of claims 29 to 34, which, except for residues defined in any one of claims 29 to 34 or for the proviso defined in claim 29, comprises a polypeptide as set forth in SEQ ID NO: 167, or comprises a secreted form of the polypeptide as set forth in SEQ ID NO: 167.
36. The GH3 BGL polypeptide of any one of claims 29 to 34, which, except for residues defined in any one of claims 29 to 34 or for the proviso defined in claim 29, comprises a polypeptide as set forth in any one of SEQ ID NOs: 22-23, 25 and 41 , or comprises a secreted form of the polypeptide as set forth in any one of SEQ ID NOs: 22-23, 25 and 41.
37. The GH3 BGL polypeptide of any one of claims 1 to 36, wherein Y4 is A, L, I or V.
38. The GH3 BGL polypeptide of any one of claims 1 to 36, wherein Y4 is L.
39. The GH3 BGL polypeptide of any one of claims 1 to 38, wherein Z3 is V.
40. The GH3 BGL polypeptide of any one of claims 1 to 39, wherein A3 is Q.
41. The GH3 BGL polypeptide of any one of claims 1 to 40, wherein B3 is D.
42. The GH3 BGL polypeptide of claim 1 , comprising a polypeptide as set forth in SEQ ID NO: 163, or a secreted form thereof, wherein: (v) amino acid residue at position 340 of SEQ ID NO: 163 is R, K, A, L, I, F, V or P;
(vi) amino acid residue at position 515 of SEQ ID NO: 163 is C, V, A, G, S, P, T, D or N;
(vii) amino acid residue at position 734 of SEQ ID NO: 163 is V or L; and/or
(viii) amino acid residue at position 748 of SEQ ID NO: 163 is Q or N and amino acid residue at position 813 of SEQ ID NO: 163 is D or E,
with the proviso that the GH3 BGL polypeptide is not a polypeptide as set forth in any one of SEQ ID NOs: 22 to 44, and is not a secreted form of the polypeptide of any one of SEQ ID NOs: 22 to 44.
43. The GH3 BGL polypeptide of claim 42, wherein:
(i) amino acid residue at position 340 of SEQ ID NO: 163 is A, L, l or V;
(ii) amino acid residue at position 515 of SEQ ID NO: 163 is C, G. A or V;
(iii) amino acid residue at position 734 of SEQ ID NO: 163 is V; and/or
(iv) amino acid residue at position 748 of SEQ ID NO: 163 is Q and amino acid residue at position 813 of SEQ ID NO: 163 is D.
44. The GH3 BGL polypeptide of claim 42, wherein:
(i) amino acid residue at position 340 of SEQ ID NO: 163 is L;
(ii) amino acid residue at position 515 of SEQ ID NO: 163 is C, G. A or V;
(iii) amino acid residue at position 734 of SEQ ID NO: 163 is V; and/or
(iv) amino acid residue at position 748 of SEQ ID NO: 163 is Q and amino acid residue at position 813 of SEQ ID NO: 163 is D.
45. The GH3 BGL polypeptide of claim 9, comprising a polypeptide as set forth in SEQ ID NO: 164 or a secreted form thereof, wherein:
(v) amino acid residue at position 336 of SEQ ID NO: 164 is R, K, A, L, I, F, V or P;
(vi) amino acid residue at position 507 of SEQ ID NO: 164 is C, V, A, G, S, P, T, D or N;
(vii) amino acid residue at position 727 of SEQ ID NO: 164 is V or L; and/or
(viii) amino acid residue at position 741 of SEQ ID NO: 164 is Q or N and amino acid residue at position 806 of SEQ ID NO: 164 is D or E,
with the proviso that the GH3 BGL polypeptide is not a polypeptide as set forth in any one of SEQ ID NOs: 22 to 44, and is not a secreted form of the polypeptide of any one of SEQ ID NOs: 22 to 44.
46. The GH3 BGL polypeptide of claim 45, wherein:
(i) amino acid residue at position 336 of SEQ ID NO: 164 is A, L, I or V;
(ii) amino acid residue at position 507 of SEQ ID NO: 164 is C, G, A or V;
(iii) amino acid residue at position 727 of SEQ ID NO: 164 is V; and/or
(iv) amino acid residue at position 741 of SEQ ID NO: 164 is Q and amino acid residue at position 806 of SEQ ID NO: 164 is D.
47. The GH3 BGL polypeptide of claim 45, wherein: (i) amino acid residue at position 336 of SEQ ID NO: 164 is L;
(ii) amino acid residue at position 507 of SEQ ID NO: 164 is C, G, A or V;
(iii) amino acid residue at position 727 of SEQ ID NO: 164 is V; and/or
(iv) amino acid residue at position 741 of SEQ ID NO: 164 is Q and amino acid residue at position 806 of SEQ ID NO: 164 is D.
48. The GH3 BGL polypeptide of claim 17 or 25, comprising a polypeptide as set forth in SEQ ID NOs: 165 or 166 or a secreted form thereof, wherein:
(v) amino acid residue at position 157 of SEQ ID NO: 165 or 166 is R, K, A, L, I, F, V or P;
(vi) amino acid residue at position 322 of SEQ ID NO: 165 or 166 is C, V, A, G, S, P, T, D or N;
(vii) amino acid residue at position 498 of SEQ ID NO: 165 or 166 is V or L; and/or
(viii) amino acid residue at position 512 of SEQ ID NO: 165 or 166 is Q or N and amino acid residue at position 576 of SEQ ID NO: 165 or 166 is D or E,
with the proviso that the GH3 BGL polypeptide is not a polypeptide as set forth in any one of SEQ ID NOs: 22 to 44, and is not a secreted form of the polypeptide of any one of SEQ ID NOs: 22 to 44.
49. The GH3 BGL polypeptide of claim 48, wherein:
(i) amino acid residue at position 157 of SEQ ID NO: 165 or 166 is A, L, I or V;
(ii) amino acid residue at position 322 of SEQ ID NO: 165 or 166 is C, G, A or V;
(iii) amino acid residue at position 498 of SEQ ID NO: 165 or 166 is V; and/or
(iv) amino acid residue at position 512 of SEQ ID NO: 165 or 166 is Q and amino acid residue at position 576 of SEQ ID NO: 165 or 166 is D.
50. The GH3 BGL polypeptide of claim 48, wherein:
(i) amino acid residue at position 157 of SEQ ID NO: 165 or 166 is L;
(ii) amino acid residue at position 322 of SEQ ID NO: 165 or 166 is C, G, A or V;
(iii) amino acid residue at position 498 of SEQ ID NO: 165 or 166 is V; and/or
(iv) amino acid residue at position 512 of SEQ ID NO: 165 or 166 is Q and amino acid residue at position 576 of SEQ ID NO: 165 is D.
51. The GH3 BGL polypeptide of claim 29, comprising a polypeptide as set forth in SEQ ID NO: 167 or a secreted form thereof, wherein:
(v) amino acid residue at position 151 of SEQ ID NO: 167 is R, K, A, L, I, F, V or P;
(vi) amino acid residue at position 316 of SEQ ID NO: 167 is C, V, A, G, S, P, T, D or N;
(vii) amino acid residue at position 491 of SEQ ID NO: 167 is V or L; and/or
(viii) amino acid residue at position 505 of SEQ ID NO: 167 is Q or N and amino acid residue at position 568 of SEQ ID NO: 167 is D or E,
with the proviso that the GH3 BGL polypeptide is not a polypeptide as set forth in any one of SEQ ID NOs: 22 to 44, and is not a secreted form of the polypeptide of any one of SEQ ID NOs: 22 to 44.
52. The GH3 BGL polypeptide of claim 51 , wherein:
(i) amino acid residue at position 151 of SEQ ID NO: 167 is A, L, l or V;
(ii) amino acid residue at position 316 of SEQ ID NO: 167 is C, G. A or V;
(iii) amino acid residue at position 491 of SEQ ID NO: 167 is V; and/or
(iv) amino acid residue at position 505 of SEQ ID NO: 167 is Q and amino acid residue at position 568 of SEQ ID NO: 166 is D.
53. The GH3 BGL polypeptide of claim 51 , wherein:
(i) amino acid residue at position 151 of SEQ ID NO: 167 is L;
(ii) amino acid residue at position 316 of SEQ ID NO: 167 is C, G. A or V;
(iii) amino acid residue at position 491 of SEQ ID NO: 167 is V; and/or
(iv) amino acid residue at position 505 of SEQ ID NO: 167 is Q and amino acid residue at position 568 of SEQ ID NO: 167 is D.
54. The GH3 BGL polypeptide of any one of claims 42 to 44, which except for residues defined in any one of claims 42 to 44 and for the proviso defined in claim 42, comprises a polypeptide as set forth in any one of SEQ ID NOs: 22-44, or comprises a secreted form of the polypeptide as set forth in any one of SEQ ID NOs: 22-44.
55. The GH3 BGL polypeptide of any one of claims 45 to 47, which, except for residues defined in any one of claims 45 to 47 and for the proviso defined in claim 45, comprises a polypeptide as set forth in any one of SEQ ID NO: 22-37 and 41 , or comprises a secreted form of the polypeptide as set forth in any one of SEQ ID NO: 22-37 and 41.
56. The GH3 BGL polypeptide of any one of claims 48 to 50, which, except for residues defined in any one of claims 48 to 50 and for the proviso defined in claim 48, comprises a polypeptide as set forth in any one of SEQ ID NO: 22-23, 25-27 and 41 , or comprises a secreted form of the polypeptide as set forth in any one of SEQ ID NO: 22- 23, 25-27 and 41.
57. The GH3 BGL polypeptide of any one of claims 51 to 53, which, except for residues defined in any one of claims 51 to 53 and for the proviso defined in claim 51 , comprises a polypeptide as set forth in any one of SEQ ID NO: 22-23, 25 and 41 , or comprises a secreted form of the polypeptide as set forth in any one of SEQ ID NO: 22-23, 25 and 41.
58. The GH3 BGL polypeptide of claim 1 , comprising a polypeptide as set forth in any one of SEQ ID NOs: 63 to 144, or comprising a secreted form of the polypeptide as set forth in any one of SEQ ID NOs: 63 to 144.
59. The GH3 BGL polypeptide of any one of claims 1 to 58, comprising a signal peptide.
60. The GH3 BGL polypeptide of any one of claims 1 to 57, comprising a signal peptide including an MFa pre sequence which, except for a substitution of the alanine residue at position 9 and/or a substitution of the proline residue at position 21 and/or a substitution of the valine residue at position 22, is as set forth in SEQ ID NO: 9.
61. The GH3 BGL polypeptide of claim 59, wherein the signal peptide is a MFa pre sequence which, except for a substitution of the alanine residue at position 9 for a threonine residue and/or a substitution of the proline residue at position 21 for a threonine or a serine residue and/or a substitution of the valine residue at position 22 for an alanine or an aspartate residue, is as set forth in SEQ ID NO: 9.
62. The GH3 BGL polypeptide of any one of claims 1 to 57, comprising a signal peptide including an MFa pre sequence as set forth in any one of SEQ ID NOs: 9 and 168-172.
63. The GH3 BGL polypeptide of any one of claims 1 to 62, which is a secreted polypeptide form.
64. A vector comprising a nucleic acid encoding the GH3 BGL polypeptide defined in any one of claims 1 to 63.
65. The vector of claim 64, further comprising a nucleic acid encoding an endoglucanase (EGLs; EC 3.2.1.4).
66. The vector of claim 64 or 65, further comprising a nucleic acid encoding a cellobiohydrolase.
67. The vector of any one of claims 64 to 66, further comprising a terminator and/or a promoter.
68. A host cell expressing (a) the GH3 BGL polypeptide defined in any one of claims 1 to 63; or (b) the vector defined in any one of claims 64 to 67.
69. A composition comprising (a) (i) the GH3 BGL polypeptide defined in any one of claims 1 to 63; (ii) the vector defined in any one of claims 64 to 67; (iii) the host cell defined in claim 68; or (iv) a cell lysate or a culture medium of (iii); and (b) (i) a carrier; and/or (ii) at least one other cellulase.
70. A method of converting a cellulosic substrate into a fermentable sugar comprising contacting (i) the GH3 BGL polypeptide defined in any one of claims 1 to 63; or (b) the composition defined in claim 69, with the cellulosic substrate, whereby a fermentable sugar is generated.
71. The method of claim 70, wherein the cellulosic substrate is soluble cellodextrin.
72. The method of claim 71 , wherein the soluble cellodextrin is cellobiose.
73. The method of any one of claims 70 to 72, wherein the sugar is glucose.
PCT/CA2017/050219 2016-03-02 2017-02-22 Engineered beta-glucosidases and methods of use thereof WO2017147690A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201662302379P 2016-03-02 2016-03-02
US62/302,379 2016-03-02

Publications (1)

Publication Number Publication Date
WO2017147690A1 true WO2017147690A1 (en) 2017-09-08

Family

ID=59742417

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CA2017/050219 WO2017147690A1 (en) 2016-03-02 2017-02-22 Engineered beta-glucosidases and methods of use thereof

Country Status (1)

Country Link
WO (1) WO2017147690A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023225459A2 (en) 2022-05-14 2023-11-23 Novozymes A/S Compositions and methods for preventing, treating, supressing and/or eliminating phytopathogenic infestations and infections

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010135836A1 (en) * 2009-05-29 2010-12-02 Iogen Energy Corporation Novel beta-glucosidase enzymes
WO2014202623A2 (en) * 2013-06-19 2014-12-24 Dsm Ip Assets B.V. Rasamsonia gene and use thereof
WO2015042064A1 (en) * 2013-09-17 2015-03-26 Edeniq, Inc. Improved beta-glucosidase enzymes for increased biomass saccharification

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010135836A1 (en) * 2009-05-29 2010-12-02 Iogen Energy Corporation Novel beta-glucosidase enzymes
WO2014202623A2 (en) * 2013-06-19 2014-12-24 Dsm Ip Assets B.V. Rasamsonia gene and use thereof
WO2015042064A1 (en) * 2013-09-17 2015-03-26 Edeniq, Inc. Improved beta-glucosidase enzymes for increased biomass saccharification

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
AHMAD, M. ET AL.: "Protein expression in Pichia pastoris: recent achievements and perspectives for heterologous protein production", APPLIED MICROBIOLOGY AND BIOTECHNOLOGY, vol. 98, no. 12, June 2014 (2014-06-01), pages 5301 - 5317, XP002745344 *
LARUE, K. ET AL.: "Directed evolution of a fungal beta-glucosidase in Saccharomyces cerevisiae", BIOTECHNOLOGY FOR BIOFUELS, vol. 9, no. 52, 3 March 2016 (2016-03-03), pages 1 - 15, XP055371204 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023225459A2 (en) 2022-05-14 2023-11-23 Novozymes A/S Compositions and methods for preventing, treating, supressing and/or eliminating phytopathogenic infestations and infections

Similar Documents

Publication Publication Date Title
DK2519630T3 (en) METHOD OF TREATING CELLULOS MATERIAL AND CBHII / CEL6A ENZYMES THAT CAN BE USED THEREOF
US11168315B2 (en) Expression of beta-glucosidases for hydrolysis of lignocellulose and associated oligomers
EP3212776B1 (en) Compositions and methods related to beta-glucosidase
CA2689910A1 (en) Compositions for degrading cellulosic material
US20150252343A1 (en) Beta-glucosidase from magnaporthe grisea
US20150252340A1 (en) Compositions and methods of us
US20150252344A1 (en) Beta-glucosidase from neurospora crassa
EP3031926B1 (en) Thermostable beta-glucosidase
WO2015084596A1 (en) Compositions comprising a beta-glucosidase polypeptide and methods of use
US10077436B2 (en) Beta-glucosidase enzymes for increased biomass saccharification
Chen et al. A highly active beta-glucanase from a new strain of rumen fungus Orpinomyces sp. Y102 exhibits cellobiohydrolase and cellotriohydrolase activities
WO2017147690A1 (en) Engineered beta-glucosidases and methods of use thereof
US9080162B2 (en) Cellulase variants
WO2010118007A2 (en) Enhanced cellulase expression in s. degradans
EP3031912B1 (en) Thermostable ss-xylosidase
WO2019074828A1 (en) Cellobiose dehydrogenase variants and methods of use thereof

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17759029

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 17759029

Country of ref document: EP

Kind code of ref document: A1