WO2015042543A2 - Biofuel production enzymes and uses thereof - Google Patents

Biofuel production enzymes and uses thereof Download PDF

Info

Publication number
WO2015042543A2
WO2015042543A2 PCT/US2014/056827 US2014056827W WO2015042543A2 WO 2015042543 A2 WO2015042543 A2 WO 2015042543A2 US 2014056827 W US2014056827 W US 2014056827W WO 2015042543 A2 WO2015042543 A2 WO 2015042543A2
Authority
WO
WIPO (PCT)
Prior art keywords
seq
node
ascomycota
hagwlgwpan
eagtwfqayf
Prior art date
Application number
PCT/US2014/056827
Other languages
French (fr)
Other versions
WO2015042543A3 (en
Inventor
Julio M. Fernandez
Raul Perez-Jimenez
Original Assignee
The Trustees Of Columbia University In The City Of New York
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by The Trustees Of Columbia University In The City Of New York filed Critical The Trustees Of Columbia University In The City Of New York
Publication of WO2015042543A2 publication Critical patent/WO2015042543A2/en
Publication of WO2015042543A3 publication Critical patent/WO2015042543A3/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12PFERMENTATION OR ENZYME-USING PROCESSES TO SYNTHESISE A DESIRED CHEMICAL COMPOUND OR COMPOSITION OR TO SEPARATE OPTICAL ISOMERS FROM A RACEMIC MIXTURE
    • C12P7/00Preparation of oxygen-containing organic compounds
    • C12P7/02Preparation of oxygen-containing organic compounds containing a hydroxy group
    • C12P7/04Preparation of oxygen-containing organic compounds containing a hydroxy group acyclic
    • C12P7/06Ethanol, i.e. non-beverage
    • C12P7/08Ethanol, i.e. non-beverage produced as by-product or from waste or cellulosic material substrate
    • C12P7/10Ethanol, i.e. non-beverage produced as by-product or from waste or cellulosic material substrate substrate containing cellulosic material
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/24Hydrolases (3) acting on glycosyl compounds (3.2)
    • C12N9/2402Hydrolases (3) acting on glycosyl compounds (3.2) hydrolysing O- and S- glycosyl compounds (3.2.1)
    • C12N9/2405Glucanases
    • C12N9/2434Glucanases acting on beta-1,4-glucosidic bonds
    • C12N9/2437Cellulases (3.2.1.4; 3.2.1.74; 3.2.1.91; 3.2.1.150)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y302/00Hydrolases acting on glycosyl compounds, i.e. glycosylases (3.2)
    • C12Y302/01Glycosidases, i.e. enzymes hydrolysing O- and S-glycosyl compounds (3.2.1)
    • C12Y302/01004Cellulase (3.2.1.4), i.e. endo-1,4-beta-glucanase
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12PFERMENTATION OR ENZYME-USING PROCESSES TO SYNTHESISE A DESIRED CHEMICAL COMPOUND OR COMPOSITION OR TO SEPARATE OPTICAL ISOMERS FROM A RACEMIC MIXTURE
    • C12P2203/00Fermentation products obtained from optionally pretreated or hydrolyzed cellulosic or lignocellulosic material as the carbon source
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02EREDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
    • Y02E50/00Technologies for the production of fuel of non-fossil origin
    • Y02E50/10Biofuels, e.g. bio-diesel

Definitions

  • the invention relates to a modified method to predictably alter and optimize enzymes, mainly by identifying and resurrecting suitable ancestral strains.
  • the ancestral cellulases can be classified as cellobiohydrolases II.
  • Modernization can be classified as cellobiohydrolases II.
  • cellobiohydrolases from fungi are used in the biofuel industry to produce cellulosic ethanol.
  • These enzymes are useful for the production of Cellulosic Ethanol for biofuels.
  • Some aspects of the present invention provide for ancestral fungal cellulases. Cellulase enzymes are useful for the production of cellulosic ethanol for biofuels. In some embodiments, ancestral cellulases can be used for the hydrolysis of carbohydrate polymers that comprise cellulose. Some aspects of the present invention provide for microorganisms that express an ancestral cellulase. Microorganisms are useful for the production of cellulosic ethanol for biofuels. In some embodiments, microorganisms can be used for the hydrolysis and/or fermentation of cellulose.
  • the present invention provides for an isolated polypeptide comprising about 90% identity to any one of the amino acid sequences of SEQ ID NO: 58, SEQ ID NO: 59, SEQ ID NO: 60, SEQ ID NO: 61, SEQ ID NO: 62, SEQ ID NO: 63, SEQ ID NO: 64, SEQ ID NO: 65, SEQ ID NO: 66, SEQ ID NO: 67, SEQ ID NO: 68, SEQ ID NO: 69, SEQ ID NO: 70, SEQ ID NO: 71, SEQ ID NO: 72, SEQ ID NO: 73, SEQ ID NO: 74, SEQ ID NO: 75, SEQ ID NO: 76, SEQ ID NO: 77, SEQ ID NO: 78, SEQ ID NO: 79, SEQ ID NO: 80, SEQ ID NO: 81, SEQ ID NO: 82, SEQ ID NO: 83, SEQ ID NO: 84, SEQ ID NO: 85, SEQ ID NO: 58, SEQ ID NO:
  • the signal peptide of the isolated polypeptide is removed.
  • the present invention provides for a nucleic acid encoding a polypeptide of the present invention.
  • the present invention provides for a recombinant microorganism, wherein said microorganism expresses a nucleic acid of the present invention.
  • the present invention provides for a recombinant microorganism, wherein said microorganism expresses a nucleic acid encoding a polypeptide of the present invention, or a combination thereof.
  • the recombinant microorganism is a fungus.
  • the recombinant microorganism is from the phylum Basidomycota, from the phylum Ascomycota, from the subkingdom dikarya, or from the class Sordariomycetes.
  • the recombinant microorganism is a yeast. In some embodiments, the recombinant microorganism is a bacteria. In some embodiments, the recombinant microorganism is Saccharomyces cerevisiae. In some embodiments, the recombinant microorganism is selected from the group consisting of Saccharomyces sp., Pichia sp., Sclerotium rolfsii, Phanenerochate chrysosporium, Trichoderma sp., Aspergillus sp., Schizophyllum sp., and Penicillium sp.
  • the recombinant microorganism is selected from the group consisting of E.coli sp., Clostridium sp., Cellulomonas sp., Bacillus sp., Thermomonospora sp., Ruminococcus sp., Bactericides sp., Erwinia sp.,
  • the present invention provides for a method for the production of cellulosic ethanol, comprising adding an isolated polypeptide of the present invention, or a combination thereof, to a source material of cellulose for cellulose processing.
  • the method further comprises adding a recombinant microorganism of the present invention, or a combination thereof.
  • the isolated polypeptide and recombinant microorganism are added sequentially, in any order.
  • the isolated polypeptide and recombinant microorganism are added simultaneously.
  • carbohydrate polymers are depolymerized.
  • the present invention provides for a method for the production of cellulosic ethanol, comprising adding a recombinant microorganism of the present invention, or a combination thereof, to a source material of cellulose for cellulose processing.
  • the method further comprises adding a polypeptide of the present invention, or a combination thereof.
  • microorganism are added sequentially, in any order.
  • the isolated polypeptide and recombinant microorganism are added simultaneously.
  • carbohydrate polymers are depolymerized.
  • the present invention provides for a method for cellulose processing, comprising adding a polypeptide of the present invention, or a combination thereof, to a source material of cellulose.
  • the method further comprises adding a recombinant microorganism of the present invention, or a combination thereof.
  • the isolated polypeptide and recombinant microorganism are added sequentially, in any order.
  • the isolated polypeptide and recombinant microorganism are added simultaneously.
  • carbohydrate polymers are depolymerized.
  • the present invention provides for a method for cellulose processing, comprising adding a recombinant microorganism of the present invention, or a combination thereof, to a source material of cellulose.
  • the method further comprises adding a polypeptide of the present invention, or a combination thereof.
  • the isolated polypeptide and recombinant microorganism are added sequentially, in any order.
  • the isolated polypeptide and recombinant microorganism are added simultaneously.
  • carbohydrate polymers are depolymerized.
  • FIG. 1A-C is a phylogenetic tree of fungal cellulases obtained using BEAST.
  • FIG. 2A-C is a phylogenetic tree of fungal cellulases obtained using MrBayes.
  • Cellulases are enzymes that can catalyze the hydrolysis of the ⁇ -1,4 glucosidic bonds in cellulose, the predominant component of plant matter. In nature, cellulases facilitate microbial conversion of insoluble cellulose contained within biomass into soluble sugars (EA Bayer et al. Current Opinion in Structural Biology, 8:548-557, 1998).
  • Cellobiohydrolases from fungi can be used in the bio fuel industry to produce cellulosic ethanol. Before the sugars in lignocellulosic biomass, such as wood, can be fermented into ethanol, the lignin that encapsulates the cellulose and the cellulose's unique structural conformation within can be addressed with either acid or enzyme hydrolysis (PC Badger, In: J. Janick and A. Whipkey (eds.), Trends in new crops and new uses. ASHS Press, Alexandria, VA. 2002.)
  • Bayesian analysis is a method used to analyze data that uses previous information in the generation of a functional result. This method of analysis was used in tandem with phylogenic studies in this technology.
  • the sequences used in the trees and the sequence alignment are shown in List #1 (Appendix 1), the phylogenetic trees are shown in FIG. 1A-C (BEAST) and FIG. 2A-C (MrBayes).
  • Resurrected sequences for each tree were generated by two different statistical methods, marginal reconstruction (List #2 and #4) and joint reconstruction (List #3 and #5). These sequences present a reconstruction of fungal cellulase enzymes to be used in the production of bioethanol as a green-fuel source.
  • an "ancestral cellulase molecule” refers to an ancestral cellulose protein, or a fragment thereof.
  • An “ancestral cellulase molecule” can also refer to a nucleic acid (including, for example, genomic DNA, complementary DNA (cDNA), synthetic DNA, as well as any form of corresponding RNA) which encodes a polypeptide corresponding to an ancestral cellulase protein, or fragment thereof.
  • an ancestral cellulase molecule comprises the amino acid sequence shown in SEQ ID NO: 58, SEQ ID NO: 59, SEQ ID NO: 60, SEQ ID NO: 61, SEQ ID NO: 62, SEQ ID NO: 63, SEQ ID NO: 64, SEQ ID NO: 65, SEQ ID NO: 66, SEQ ID NO: 67, SEQ ID NO: 68, SEQ ID NO: 69, SEQ ID NO: 70, SEQ ID NO: 71, SEQ ID NO: 72, SEQ ID NO: 73, SEQ ID NO: 74, SEQ ID NO: 75, SEQ ID NO: 76, SEQ ID NO: 77, SEQ ID NO: 78, SEQ ID NO: 79, SEQ ID NO: 80, SEQ ID NO: 81, SEQ ID NO: 82, SEQ ID NO: 83, SEQ ID NO: 84, SEQ ID NO: 85, SEQ ID NO: 86, SEQ ID NO: 87, SEQ ID NO:
  • an ancestral cellulase molecule can be encoded by a recombinant nucleic acid encoding an ancestral cellulase protein, or fragment thereof.
  • the ancestral cellulase molecules of the invention can be obtained from various sources and can be produced according to various techniques known in the art.
  • a nucleic acid that encodes an ancestral cellulase molecule can be obtained by synthetic or semi-synthetic methods, by screening DNA libraries, or by amplification from a natural source.
  • An ancestral cellulase molecule can include a fragment or portion of an ancestral cellulase protein.
  • An ancestral cellulase molecule can include a variant of the above described examples, such as a fragment thereof.
  • an ancestral cellulase molecule comprises a variant of an ancestral cellulase protein or polypeptide encoded by an ancestral cellulase nucleic acid sequence wherein the variant has an amino acid identity to SEQ ID NOS: 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111,
  • Such variants can include those having at least from about 46% to about 50% identity to SEQ ID NOS: 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140,
  • protein variants can include amino acid sequence modifications.
  • amino acid sequence modifications fall into one or more of three classes: substitutional, insertional or deletional variants.
  • Insertions can include amino and/or carboxyl terminal fusions as well as intrasequence insertions of single or multiple amino acid residues. Insertions ordinarily will be smaller insertions than those of amino or carboxyl terminal fusions, for example, on the order of one to four residues. Deletions are characterized by the removal of one or more amino acid residues from the protein sequence.
  • variants ordinarily are prepared by site-specific mutagenesis of nucleotides in the DNA encoding the protein, thereby producing DNA encoding the variant, and thereafter expressing the DNA in recombinant cell culture.
  • an ancestral cellulase molecule can be modified by deletion of the sequence encoding the signal peptide.
  • Signal peptides are polypeptide sequences variable in length and amino acid
  • Signal peptides direct the secretion of polypeptide molecules through a prokaryotic or eukaryotic cell membrane.
  • Signal peptides have a tripartite structure consisting of a hydrophobic core flanked by a positively charged n-region and a neutral but polar c-region on either side (Tuteja, R., (2005) Arch. BioChem. Biophys. 441 : 107- 111).
  • Signal peptide sequences can be identified by various methods, known to one of skill in the art.
  • signal peptide sequences within a polypeptide sequence can be identified using various prediction tools including, but not limited to, Phobius (http://phobius.sbc.su.se/), Predotar (http://urgi.versailles.inra.fr/predotar/predotar.html), SignalP (www.cbs.dtu.dk/services/SignalP/), and TargetP (www.cbs.dtu.dk/services/TargetP/).
  • Phobius http://phobius.sbc.su.se/
  • Predotar http://urgi.versailles.inra.fr/predotar/predotar.html
  • SignalP www.cbs.dtu.dk/services/SignalP/
  • TargetP www.cbs.dtu.dk/services/TargetP/
  • an ancestral cellulase molecule comprises a protein or polypeptide encoded by a nucleic acid sequence encoding an ancestral cellulase protein, such as the sequences shown in SEQ ID NOS: 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 12
  • an ancestral cellulase molecule can be encoded by a recombinant nucleic acid encoding an ancestral cellulase protein, or fragment thereof.
  • the nucleic acid can be any type of nucleic acid, including genomic DNA, complementary DNA (cDNA), synthetic or semi-synthetic DNA, as well as any form of corresponding RNA.
  • a nucleic acid encoding an ancestral cellulase protein can comprise a recombinant nucleic acid encoding such a protein.
  • the nucleic acid can be a non-naturally occurring nucleic acid created artificially (such as by assembling, cutting, ligating or amplifying sequences).
  • Restriction enzymes can be used to cut nucleic acid sequences in a sequence specific manner, as is known in the art. Restriction enzyme recognition sequences can be added to the ends of a nucleic acid sequence encoding an ancestral cellulase protein.
  • the nucleic acid sequence of a restriction enzyme site can encode amino acids. Amino acids encoded by a restriction enzyme site can form part of the sequence of an ancestral cellulase protein, or may encode additional amino acids at the ends of a polypeptide sequence of an ancestral cellulase protein. Nucleic acid sequences can be double-stranded or single-stranded.
  • the invention further provides for nucleic acids that are complementary to an ancestral cellulase molecule.
  • Complementary nucleic acids can hybridize to the nucleic acid sequence described above under stringent hybridization conditions.
  • stringent hybridization conditions include temperatures above 30°C, above 35°C, in excess of 42°C, and/or salinity of less than about 500 mM, or less than 200 mM.
  • Hybridization conditions can be adjusted by the skilled artisan via modifying the temperature, salinity and/or the concentration of other reagents such as SDS or SSC.
  • an ancestral cellulase molecule can be added to a source material of cellulose for cellulose processing.
  • an ancestral cellulase molecule can be added as an isolated recombinant protein.
  • molecule can be added as an isolated modified recombinant protein.
  • an ancestral cellulase protein, or fragment thereof can be modified by removal of the signal peptide.
  • an isolated polypeptide comprising about 90% identity to the amino acid sequence of SEQ ID NOS: 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140,
  • an ancestral cellulase molecule can be added to a source material of cellulose for cellulose processing by addition of a recombinant microorganism that expresses a nucleic acid encoding an ancestral cellulase protein, or fragment thereof.
  • an ancestral cellulase molecule can be added to a source material of cellulose for cellulose processing by addition of a recombinant microorganism that expresses a nucleic acid encoding an amino acid sequence of SEQ ID NOS: 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122,
  • an ancestral cellulase molecule can be added to a source material of cellulose for cellulose processing by addition of a recombinant protein, or by addition of a recombinant microorganism that expresses a nucleic acid encoding an ancestral cellulase protein, or a combination thereof.
  • the recombinant protein and the recombinant microorganism can be added sequentially, in any order, or simultaneously.
  • the invention utilizes conventional molecular biology, microbiology, and recombinant DNA techniques available to one of ordinary skill in the art. Such techniques are well known to the skilled worker and are explained fully in the literature. See, e.g., Maniatis, Fritsch & Sambrook, "DNA Cloning: A Practical Approach,” Volumes I and II (D. N. Glover, ed., 1985);
  • an ancestral cellulase e.g., a molecule comprising the amino acid sequence shown in SEQ ID NOS: 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134,
  • an ancestral cellulase
  • the invention provides for ancestral cellulase molecules that are encoded by nucleotide sequences.
  • the ancestral cellulase molecule can be a polypeptide encoded by a nucleic acid (including genomic DNA, complementary DNA (cDNA), synthetic DNA, as well as any form of corresponding RNA).
  • an ancestral cellulase molecule can be encoded by a recombinant nucleic acid encoding an ancestral cellulase protein, or fragment thereof.
  • the ancestral cellulase molecules of the invention can be obtained from various sources and can be produced according to various techniques known in the art.
  • the ancestral cellulase molecule of the invention can be produced via recombinant DNA technology and such recombinant nucleic acids can be prepared by conventional techniques, including chemical synthesis, genetic engineering, enzymatic techniques, or a combination thereof.
  • a nucleic acid that encodes an ancestral cellulase molecule can be obtained by screening DNA libraries, or by amplification from a natural source.
  • a nucleic acid amplified from a natural source is modified by various mutagenesis methods known in the art to obtain the ancestral cellulase molecules of the invention.
  • an ancestral cellulase molecule can be "codon-optimized," as known in the art.
  • An ancestral cellulase molecule can be a fragment of ancestral cellulase protein.
  • the ancestral cellulase protein fragment can encompass any portion of at least about 8 consecutive amino acids of SEQ ID NOS: 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128,
  • the fragment can comprise at least about 10 consecutive amino acids, at least about 20 consecutive amino acids, at least about 30 consecutive amino acids, at least about 40 consecutive amino acids, a least about 50 consecutive amino acids, at least about 60 consecutive amino acids, at least about 70 consecutive amino acids, at least about 80 consecutive amino acids, at least about 90 consecutive amino acids, at least about 100 consecutive amino acids, at least about 110 consecutive amino acids, at least about 120 consecutive amino acids, at least about 130 consecutive amino acids, at least about 140 consecutive amino acids, at least about 150 consecutive amino acids, at least about 200 consecutive amino acids, at least about 250 consecutive amino acids, at least about 300 consecutive amino acids, at least about 350 consecutive amino acids, or at least about 400 consecutive amino acids of SEQ ID NOS: 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87,
  • Fragments include all possible amino acid lengths between about 8 and about 400 amino acids, for example, lengths between about 10 and about 400 amino acids, between about 15 and about 400 amino acids, between about 20 and about 400 amino acids, between about 35 and about 400 amino acids, between about 40 and about 400 amino acids, between about 50 and about 400 amino acids, between about 70 and about 400 amino acids, between about 100 and about 400 amino acids, between about 200 and about 400 amino acids, between about 300 and about 400 amino acids, or between about 350 and about 400 amino acids.
  • a fragment of a nucleic acid sequence that comprises an ancestral cellulase molecule can encompass any portion of at least about 8 consecutive nucleotides. In one embodiment, the fragment can comprise at least about 10 nucleotides, at least about 15 nucleotides, at least about 20 nucleotides, or at least about 30 nucleotides.
  • the ancestral cellulase molecules can be recombinant enzymes, and can be produced in a variety of ways known in the art.
  • polypeptides e.g., a molecule comprising the amino acid sequence shown in SEQ ID NOS: 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123,
  • the nucleic acid is expressed in an expression cassette, for example, to achieve overexpression in a cell.
  • the nucleic acids of the invention can be an R A, cDNA, cDNA-like, or a DNA of interest in an expressible format, such as an expression cassette, which can be expressed from a natural promoter or an entirely heterologous promoter.
  • the nucleic acid of interest can encode a protein, and may or may not include introns. Any recombinant expression system can be used, including, but not limited to, the recombinant microorganisms of the invention, as well as other bacterial, fungal, mammalian, insect, or plant cell expression systems.
  • Nucleic acid sequences comprising an ancestral cellulase molecule that encode a polypeptide can be synthesized, in whole or in part, using chemical methods known in the art.
  • an ancestral cellulase molecule can be produced using chemical methods to synthesize its amino acid sequence, such as by direct peptide synthesis using solid-phase techniques. Protein synthesis can either be performed using manual techniques or by automation. Automated synthesis can be achieved, for example, using Applied Biosystems 431 A Peptide Synthesizer (Perkin Elmer).
  • fragments of an ancestral cellulase molecule can be separately synthesized and combined using chemical methods to produce a full-length molecule.
  • Host cells transformed with a nucleic acid sequence encoding an ancestral cellulase molecule such as, e.g., a molecule comprising the amino acid sequence shown in SEQ ID NOS: 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129
  • polypeptide produced by a transformed cell can be secreted or contained intracellularly depending on the sequence and/or the vector used.
  • Methods for protein production by recombinant technology in different host systems are well known in the art (Sambrook, et al., "Molecular Cloning: a Laboratory Manual” (2001); Gellissen, G., “Novel Microbial and Eukaryotic Expression Systems” (2005)).
  • Expression vectors containing a nucleic acid sequence encoding an ancestral cellulase molecule can be designed to contain signal sequences which direct secretion of soluble polypeptide molecules encoded by an ancestral cellulase molecule, through a prokaryotic or eukaryotic cell membrane.
  • An ancestral cellulase molecule can be produced as an extracellular enzyme that is secreted into the culture medium, from which it can easily be recovered and isolated.
  • the spent culture medium of the production host can be used as such, or the host cells can be removed therefrom, and/or it can be concentrated, filtrated or fractionated. It can also be dried.
  • an ancestral cellulase molecule, or fragment thereof can be modified by removal of the signal peptide which can allow the polypeptide molecules to be contained intracellularly.
  • An isolated polypeptide of the present invention includes, but is not limited to, culture medium containing the polypeptide from which cells and cell debris have been removed.
  • polypeptides can be isolated e.g. by adding anionic and/or cationic polymers to the spent culture medium to enhance precipitation of cells, cell debris and other unwanted enzymes.
  • the medium can be filtrated using an inorganic filtering agent and a filter to remove the
  • the filtrate can be further processed using a semi-permeable membrane to remove excess of salts, sugars and metabolic products.
  • a synthetic peptide can be substantially purified via high performance liquid
  • composition of a synthetic ancestral cellulase molecule can be confirmed by amino acid analysis or sequencing. Additionally, any portion of an amino acid sequence comprising a protein encoded by an ancestral cellulase molecule can be altered during direct synthesis and/or combined using chemical methods with sequences from other proteins to produce a variant polypeptide or a fusion protein.
  • the invention further encompasses methods for using a protein or polypeptide encoded by a nucleic acid sequence of an ancestral cellulase molecule.
  • the polypeptide can be modified, such as by glycosylations and/or acetylations and/or chemical reaction or coupling, and can contain one or several non-natural or synthetic amino acids.
  • An example of an ancestral cellulase molecule comprises the amino acid sequence shown in SEQ ID NOS: 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140,
  • the invention encompasses variants of a protein encoded by an ancestral cellulase molecule.
  • Some aspects of the present invention provide for recombinant microorganisms that express a nucleic acid encoding an ancestral cellulase enzyme (e.g., the amino acid sequence shown in SEQ ID NOS: 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128,
  • microorganisms can include both prokaryotic and eukaryotic microorganisms, such as bacteria and yeast.
  • the microorganism is a fungus.
  • the microorganism is from the phylum Basidomycota, from the phylum Ascomycota, from the subkingdom dikarya, or from the class Sordariomycetes.
  • the phylum Basidomycota from the phylum Ascomycota
  • subkingdom dikarya from the class Sordariomycetes.
  • microorganism is a yeast. In yet another embodiment, the microorganism is a bacteria. In another embodiment, the microorganism is E.Coli sp., Clostridium sp., Cellulomonas sp., Bacillus sp., Thermomonospora sp., Ruminococcus sp., Bacteriodes sp., Erwinia sp., Acetovibrio sp.,
  • microorganism is Saccharomyces sp., Pichia sp., Sclerotium rolfsii, Phanenerochate chrysosporium, Trichoderma sp., Aspergillus sp., Schizophyllum sp., and Penicillium sp.
  • Any microorganism may be utilized according to the present invention.
  • a microorganism is a eukaryotic or prokaryotic microorganism.
  • a microorganism is a yeast, such as Saccharomyces cerevisiae.
  • a microorganism is a bacteria, such as a gram-positive bacteria or a gram-negative bacteria.
  • microorganisms may be used according to the present invention.
  • other organisms from the genera Achaetomium, Acremonium, Aspergillus, Botrytis, Chaetomium, Chrysosporium, Collybia, Fames, Fusarium, Humicola, Hypocrea, Lentinus, Metanacarpus, Myceliophthora, Myriococcum, Neurospora, Penicillium, Phanerochaete, Phlebia, Pleurotus, Podospora, Polyporus, Pycnoporus, Rhizoctonia, Scytalidium, Thermoascus, Thielavia, Trametes and Trichoderma.
  • Additional organisms include, but are not limited to Acetobacter aceti,
  • Achromobacter Acidiphilium, Acinetobacter, Actinomadura, Actinoplanes, Aeropyrum pernix, Agrobacterium, Alcaligenes, Ananas comosus (M), Arthrobacter, Aspargillus niger, Aspargillus oryze, Aspergillus melleus, Aspergillus pulverulentus, Aspergillus saitoi, Aspergillus sojea, Aspergillus usamii, Bacillus alcalophilus, Bacillus amyloliquefaciens, Bacillus brevis, Bacillus circulans, Bacillus clausii, Bacillus lentus, Bacillus licheniformis, Bacillus macerans, Bacillus stearothermophilus, Bacillus subtilis, Bifidobacterium, Brevibacillus brevis, Burkholderia cepacia, Candida cylindracea, Candida rugosa, Carica papay
  • Methanobacterium bryantii Microbacterium imperiale, Micrococcus lysodeikticus, Microlunatus, Mucor javanicus, Mycobacterium, Myrothecium, Nitrobacter, Nitrosomonas, Nocardia, Papaya carica, Pediococcus, Pediococcus halophilus, Penicillium, Penicillium camemberti, Penicillium citrinum, Penicillium emersonii, Penicillium roqueforti, Penicillum lilactinum, Penicillum multicolor, Paracoccus pantotrophus, Propionibacterium, Pseudomonas, Pseudomonas fluorescens, Pseudomonas denitrificans, Pyrococcus, Pyrococcus furiosus, Pyrococcus horikoshii, Rhizobium, Rhizomucor miehei, Rhizomucor pusill
  • the organisms can be utilized as recombinant microorganisms provided herein, and, can be utilized according to the various methods of the present invention.
  • a recombinant microorganism may be engineered to secrete an ancestral cellulase molecule into the culture media, such as by incorporating a signal peptide or an autotransporter domain into the ancestral cellulase molecule.
  • ancestral cellulase molecules can be fused with any combination of signal peptides and or autotransporter domains found in secreted proteins as is known in the art.
  • ancestral cellulase molecules can be designed to maximize the secretion of ancestal cellulase molecules into the culture media, and may also include the use of many different linker sequences that fuse signal peptides, ancestal cellulase molecules, and autotransporters that improve the efficiency of secretion or the cell surface presentation.
  • an ancestral cellulase molecule can be modified by deletion of the sequence encoding the signal peptide.
  • an ancestral cellulase molecule is purified from the culture media. In other embodiments, an ancestral cellulase molecule is not purified from the culture media.
  • any other recombinant expression system can be used to obtain an isolated ancestral cellulase molecule.
  • Bacterial Expression Systems One skilled in the art understands that expression of desired protein products in prokaryotes is most often carried out in E. coli with vectors that contain constitutive or inducible promoters. Some non-limiting examples of bacterial cells for
  • transformations include E.Coli sp., Clostridium sp., Cellulomonas sp., Bacillus sp.,
  • Thermomonospora sp. Ruminococcus sp., Bacteriodes sp., Erwinia sp., Acetovibrio sp.,
  • E. coli strains DH5 or MC1061/p3 (Invitrogen Corp., San Diego, Calif)
  • colonies can then be screened for the appropriate plasmid expression.
  • a number of expression vectors can be selected.
  • Non-limiting examples of such vectors include multifunctional E. coli cloning and expression vectors such as BLUESCRIPT (Stratagene).
  • Some E. coli expression vectors (also known in the art as fusion-vectors) are designed to add a number of amino acid residues, usually to the N-terminus of the expressed recombinant protein.
  • Such fusion vectors can serve three functions: 1) to increase the solubility of the desired recombinant protein; 2) to increase expression of the recombinant protein of interest; and 3) to aid in recombinant protein purification by acting as a ligand in affinity purification.
  • vectors which direct the expression of high levels of fusion protein products that are readily purified, may also be used.
  • fusion expression vectors include pGEX, which fuse glutathione S-tranferase (GST) to desired protein; pcDNA 3.1/V5-His A B & C (Invitrogen Corp, Carlsbad, CA) which fuse 6x-His to the recombinant proteins of interest; pMAL (New England Biolabs, MA) which fuse maltose E binding protein to the target recombinant protein; the E.
  • coli expression vector pUR278 (Ruther et al, (1983) EMBO 12: 1791), wherein the coding sequence may be ligated individually into the vector in frame with the lac Z coding region in order to generate a fusion protein; and pIN vectors (Inouye et al, (1985) Nucleic Acids Res. 13:3101-3109; Van Heeke et al, (1989) J. Biol. Chem. 24:5503-5509. Fusion proteins generated by the likes of the above-mentioned vectors are generally soluble and can be purified easily from lysed cells via adsorption and binding of the fusion protein to an affinity matrix.
  • fusion proteins can be purified from lysed cells via adsorption and binding to a matrix of glutathione agarose beads subsequently followed by elution in the presence of free glutathione.
  • the pGEX vectors are designed to include thrombin or factor Xa protease cleavage sites so that the cloned target can be released from the GST moiety.
  • an ancestral cellulase molecule is not purified from the culture media.
  • microorganisms such as bacteria (e.g., E. coli and B. subtilis) transformed with recombinant bacteriophage DNA, plasmid DNA or cosmid DNA expression vectors containing coding sequences for an ancestral cellulase molecule may alternatively be used to produce the molecule of interest.
  • a non-limiting example includes plant cell systems infected with recombinant virus expression vectors (for example, tobacco mosaic virus, TMV; cauliflower mosaic virus, CaMV) or transformed with recombinant plasmid expression vectors (e.g., Ti plasmid) containing coding sequences for an ancestral cellulase molecule.
  • sequences encoding an ancestral cellulase molecule can be driven by any of a number of promoters.
  • viral promoters such as the 35S and 19S promoters of CaMV can be used alone or in combination with the omega leader sequence from tobacco mosaic virus TMV.
  • plant promoters such as the small subunit of RUBISCO or heat shock promoters, can be used. These constructs can be introduced into plant cells by direct DNA transformation or by pathogen-mediated transfection.
  • an insect system also can be used to express an ancestral cellulase molecule.
  • Autographa californica nuclear polyhedrosis virus (AcNPV) is used as a vector to express foreign genes in Spodoptera frugiperda cells or in Trichoplusia larvae. Sequences encoding a trefoil family molecule can be cloned into a nonessential region of the virus, such as the polyhedrin gene, and placed under control of the polyhedrin promoter.
  • a fungal system also can be used to express an ancestral cellulase molecule.
  • Fungi can be transformed with recombinant fungal expression vectors containing coding sequences for an ancestral cellulase molecule.
  • Some non- limiting examples of fungi for transformation include, Saccharomyces sp., Pichia sp., Sclerotium rolfsii, Phanenerochate chrysosporium, Trichoderma sp., Aspergillus sp., Schizophyllum sp., and Penicillium sp..
  • fungi from the subkingdom dikarya, from the phylum Basidomycota, from the phylum Ascomycota, or from the class Sordariomycetes can be
  • Mammalian Expression Systems Mammalian cells can also contain an expression vector (for example, one that harbors a nucleotide sequence encoding an ancestral cellulase molecule for expression of a desired product.
  • Expression vectors containing such a nucleic acid sequence linked to at least one regulatory sequence in a manner that allows expression of the nucleotide sequence in a host cell can be introduced via methods known in the art.
  • the vector can be a recombinant DNA or RNA vector, and includes DNA plasmids or viral vectors.
  • a number of viral-based expression systems can be used to express an ancestral cellulase molecule in
  • mammalian host cells e.g., adeno-associated virus, retrovirus, adenovirus, lentivirus or alphavirus.
  • Regulatory sequences are well known in the art, and can be selected to direct the expression of a protein or polypeptide of interest (such as an ancestral cellulase molecule) in an appropriate host cell as described in Goeddel, Gene Expression Technology: Methods in Enzymology 185, Academic Press, San Diego, Calif. (1990).
  • Non- limiting examples of regulatory sequences include: polyadenylation signals, promoters, enhancers, and other expression control elements. Practitioners in the art understand that designing an expression vector can depend on factors, such as the choice of host cell to be transfected and/or the type and/or amount of desired protein to be expressed.
  • Enhancer regions which are those sequences found upstream or downstream of the promoter region in non-coding DNA regions, are also known in the art to be important in optimizing expression. If needed, origins of replication from viral sources can be employed, such as if a prokaryotic host is utilized for introduction of plasmid DNA. However, in eukaryotic organisms, chromosome integration is a common mechanism for DNA replication.
  • a gene that encodes a selectable marker (for example, resistance to antibiotics or drugs, such as ampicillin, neomycin, G418, and hygromycin) can be introduced into host cells along with the gene of interest in order to identify and select clones that stably express a gene encoding a protein of interest.
  • the gene encoding a selectable marker can be introduced into a host cell on the same plasmid as the gene of interest or can be introduced on a separate plasmid. Cells containing the gene of interest can be identified by drug selection wherein cells that have incorporated the selectable marker gene will survive in the presence of the drug. Cells that have not incorporated the gene for the selectable marker die. Surviving cells can then be screened for the production of the desired protein molecule (for example, an ancestral cellulase molecule).
  • a host cell strain can be chosen for its ability to modulate the expression of the inserted sequences or to process the expressed ancestral cellulase molecule in the desired fashion.
  • modifications of the polypeptide include, but are not limited to, acetylation, carboxylation, glycosylation, phosphorylation, lipidation, and acylation.
  • Post-translational processing which cleaves a "prepro" form of the polypeptide also can be used to facilitate correct insertion, folding and/or function.
  • Different host cells which have specific cellular machinery and characteristic mechanisms for post-translational activities can be chosen to ensure the correct modification and processing of the foreign protein.
  • An exogenous nucleic acid can be introduced into a cell via a variety of techniques known in the art, such as lipofection, microinjection, calcium phosphate or calcium chloride precipitation, DEAE-dextrin-mediated transfection, or electroporation. Electroporation is carried out at approximate voltage and capacitance to result in entry of the DNA construct(s) into cells of interest. Other methods used to transfect cells can also include modified calcium phosphate precipitation, polybrene precipitation, liposome fusion, and receptor-mediated gene delivery.
  • a host cell strain which modulates the expression of the inserted sequences, or modifies and processes the nucleic acid in a specific fashion desired also may be chosen. Such modifications (for example, glycosylation and other post-translational modifications) and processing (for example, cleavage) of protein products may be important for the function of the protein.
  • Different host cell strains have characteristic and specific mechanisms for the post-translational processing and modification of proteins and gene products. As such, appropriate host systems or cell lines can be chosen to ensure the correct modification and processing of the foreign protein expressed, such as an ancestral cellulase molecule.
  • eukaryotic host cells possessing the cellular machinery for proper processing of the primary transcript, glycosylation, and phosphorylation of the gene product may be used.
  • Non- limiting examples of host cells include E.Coli sp., Clostridium sp., Cellulomonas sp., Bacillus sp., Thermomonospora sp., Ruminococcus sp., Bacteriodes sp., Erwinia sp., Acetovibrio sp., Microbispora sp., Streptomyces sp., Saccharomyces sp., Pichia sp., Sclerotium rolfsii, Phanenerochate chrysosporium, Trichoderma sp., Aspergillus sp., Schizophyllum sp., and Penicillium sp..
  • Various culturing parameters can be used with respect to the host cell being cultured. Appropriate culture conditions for host cells are well known in the art or can be determined by the skilled artisan (see, for example, Madigan M. et al., "Brock Biology of Microorganisms", 2012). Cell culturing conditions can vary according to the type of host cell selected. Commercially available medium can be utilized.
  • Cells suitable for culturing can contain introduced expression vectors, such as plasmids or viruses.
  • the expression vector constructs can be introduced via transformation, microinjection, transfection, lipofection, electroporation, or infection.
  • the expression vectors can contain coding sequences, or portions thereof, encoding the proteins for expression and production.
  • Expression vectors containing sequences encoding the produced proteins and polypeptides, as well as the appropriate transcriptional and translational control elements, can be generated using methods well known to and practiced by those skilled in the art. These methods include synthetic techniques, in vitro recombinant DNA techniques, and in vivo genetic recombination which are described in J.
  • An ancestral cellulase molecule (such as, e.g., a molecule comprising the amino acid sequence shown in SEQ ID NOS: 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 13
  • a purified ancestral cellulase molecule can be separated from other compounds which normally associate with the ancestral cellulase molecules, in the cell, such as certain proteins, carbohydrates, or lipids, using methods practiced in the art.
  • the cell culture medium or cell lysate is centrifuged to remove particulate cells and cell debris.
  • the desired polypeptide molecule (for example, an ancestral cellulase molecule) is isolated or purified away from contaminating soluble proteins and polypeptides by suitable purification techniques.
  • Non- limiting purification methods for proteins include: size exclusion chromatography; affinity chromatography; ion exchange chromatography; ethanol precipitation; reverse phase HPLC;
  • chromatography on a resin such as silica, or cation exchange resin, e.g., DEAE; chromatofocusing; SDS-PAGE; ammonium sulfate precipitation; gel filtration using, e.g., Sephadex G-75, Sepharose; and the like.
  • resin such as silica, or cation exchange resin, e.g., DEAE
  • chromatofocusing SDS-PAGE
  • ammonium sulfate precipitation gel filtration using, e.g., Sephadex G-75, Sepharose; and the like.
  • Other additives such as protease inhibitors (e.g., PMSF or proteinase K) can be used to inhibit proteolytic degradation during purification.
  • Purification procedures that can select for carbohydrates can also be used, e.g., ion-exchange soft gel chromatography, or HPLC using cation- or anion-exchange resins, in which the more acidic fraction(s) is/are
  • Some aspects of the present invention provide for ancestral fungal cellulases.
  • Cellulase enzymes are useful for the production of cellulosic ethanol for biofuels.
  • ancestral cellulases can be used for the hydrolysis of carbohydrate polymers that comprise cellulose.
  • microorganisms that express an ancestral cellulase.
  • Microorganisms are useful for the production of cellulosic ethanol for biofuels.
  • microorganisms can be used for the hydrolysis and/or fermentation of cellulose.
  • the production of cellulosic ethanol biofuels from cellulosic materials can be performed by various techniques known in the art. See, for example, Canilha L.
  • the starting material for the production of cellulosic biofuels can be cellulosic materials (i.e. any material comprising lignocellulose, cellulose, hemicellulose, or a combination thereof).
  • a source material of cellulose can be any cellulosic material.
  • cellulosic materials include, but are not limited to, fruits, plants, vegetables, woods, grasses, inedible parts of plants, byproducts of lawn and tree maintenance, corn stover, Panicum virgatum, Miscanthus grass species, wood chips, sugarcane residues, sugarcane bagasse, straw, pulp and paper residues, waste paper, textile fibers (e.g., cotton, linen, hemp, jute) and cellulosic fibers (e.g., modal, viscose, lyocel).
  • textile fibers e.g., cotton, linen, hemp, jute
  • cellulosic fibers e.g., modal, viscose, lyocel
  • cellulosic materials can be processed by various techniques known in the art. Most of the carbohydrates in cellulosic material are in the form of lignocellulose, which can comprise cellulose, hemicellulose, pectin and/or lignin. Cellulosic material can be pre-treated by physical and/or chemical means. Pre- treatment can make the cellulose fraction more accessible to hydrolysis. Cellulose and/or hemicellulose comprising the cellulosic materials can then be hydro lysed into sugars (e.g., glucose). In some embodiments, ancestral cellulases can be used for the hydrolysis of cellulose. In other embodiments, the carbohydrate polymers of cellulose are depolymerized by an ancestral cellulase. Sugars made available by hydrolysis can be used by microorganisms to produce ethanol by fermentation.
  • sugars e.g., glucose
  • the present invention provides a method for the production of cellulosic ethanol from a source material of cellulose.
  • Cellulase enzymes can be added to cellulosic materials by various techniques.
  • ancestral cellulases are added to cellulosic materials as an isolated polypeptide.
  • recombinant microorganisms that express ancestral cellulases are added to cellulosic materials.
  • microorganisms that do not express cellulase enzymes can be genetically modified to express ancestral cellulases.
  • Microorganisms can be modified in a variety of ways, such as, but not limited to, to express cellulases, to express large volumes of cellulases, to express modified cellulases, and to express ancestral cellulases.
  • ancestral cellulases can be added to cellulosic materials by addition of isolated polypeptides and by addition of recombinant microorganisms that express ancestral cellulases. Isolated polypeptides and recombinant microorganisms can be added simultaneously or sequentially, in any order.
  • Ancestral cellulases needed for the hydrolysis of the cellulosic material according to the invention may be added in an enzymatically effective amount either simultaneously e.g.
  • any combination of the ancestral cellulase molecules comprising an amino acid sequence having about 90% identity to SEQ ID NOS: 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124,
  • Bayesian analysis is a method used to analyze data that uses previous information in the generation of a functional result. This method of analysis was used in tandem with phylogenic studies in this technology. Inconsistencies were corrected using the literature and also running additional tests using MEGA and PAUP software. The final tree was used to reconstruct the most probabilistic ancestral sequence for each node of the tree. PAML software was used. Resurrected sequences for each tree were generated by two different statistical methods, marginal reconstruction and joint reconstruction.
  • the phylogenetic trees are shown in FIGS. 1 and 2, and the resurrected sequences for each tree are listed as SEQ ID NOS: 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, and 113 (BEAST tree; marginal reconstruction), and SEQ ID NOS: 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129,
  • tr_B2ZZ24_Basidiomycota_Irpex_lacteus ⁇ 0.11830) : 0.06639) : 0.04137) : 0.03784) : 0.04405) : 0.02083) : 0.01904) : 0.05531) : 0.03039) : 0.11441, (tr G2XV25 Ascomycota Botryotinia fuckeliana: 0.17147,
  • VTALIVVYDLPNRDCFAEASNGELHLDQNGTQRYREYIAPIKQILAAHSGQRIAAVIEPDSLP IATNLGGK RCDETTASYRDNVAHTLKELNMPHVYQYIDAAHSGWLGWPDNQKKGAKIFAEVIKAAGSPANVRGFATNVAN YTQLSYTAESYDQQDNPCFGEFDYVDAMASALSAEGLGDKHFI I DTSRNGVGNI - REDWGYWCNNKGAGMGQRPKANGGATNLDAFVWVKPPGDSDGVGQEGQPRYDLFCGKE- NADTRAPQAGQWFHEYFVECVKNANPAL
  • SEQ ID NO: 5 (tr_Q2U2I8_Ascomycota_Aspergillus_oryzae) SASWHHLTDSSFTDRVCCI SDGDQQSLKPVTTVPSPEFQSSVNSKQALVALSP—
  • SEQ ID NO: 15 (tr_E3Q540_Ascomycota_Colletotrichum_graminicola) QACASQWGQCGGQGWTGPSCCAAGSVCTVSNPFYSQCLPGSTVASSTSTVRTSSTPWSPSRTSTVTGSVST TSAGTGTTPP—PTGGATYTGPFVGVNLWANSYYASEISTLAIPSLS- PALATAAAKVAKVPTFMWMDTRSKIPLVDATLADIRKANQAGA —
  • SEQ ID NO: 20 (tr_H9C5Tl_Ascomycota_Hypocrea_orientalis) QACSSVWGQCGGQNWSGPTCCASGSTCVYSNDYYSQCLPG-AASSSSSTRASSTTA—RASSTT- SRSSATPPPGSSTTRVPPVGSGTATYSGPFVGVTPWANAYYASEVSSLAIPSLT- GAMATAAAAVAKVPSFMWLDTFDKTPLMEQTLADIRTANKNGG —
  • SEQ ID NO: 21 (tr_DlMGM6_Ascomycota_Trichoderma_longibrachiatum)
  • SEQ ID NO: 23 (tr llRIJl Ascomycota Gibberella zeae)
  • SEQ ID NO: 24 (tr_Q8NIB5_Ascomycota_Talaromyces_emersonii) QSLWGQCGGSSWTGATSCAAGATCSTINPYYAQCVPAT-TTLTTTTKPTSTGG
  • SEQ ID NO: 26 (tr_B6QMM6_Ascomycota_Penicillium_marneffei) QSVWGQCGGQGYTGATSCAAGSTCSTQNPYYAQCIPA
  • SEQ ID NO: 27 (tr_G7XQ80_Ascomycota_Aspergillus_kawachii) QTLWGQCGGQGYSGATSCVAGATCSTINEYYAQCTPAT-SATTLKTTTSTTTA
  • SEQ ID NO: 28 (sp_A2QYR9_Ascomycota_Aspergillus_niger) QTLWGQCGGQGYSGATSCVAGATCATVNEYYAQCTPA-AGTSSATTLKTTTSS
  • SEQ ID NO: 30 sp QOCFPl Ascomycota Aspergillus terreus
  • SEQ ID NO: 33 (sp_AlCCN4_Ascomycota_Aspergillus_clavatus) QTMWGQCGGAGWSGATDCVAGGVCSTQNAYYAQCLPG ATTATTLSTTSKG—
  • SEQ ID NO: 34 (tr_093837_Ascomycota_Acremonium_cellulolyticus) QSVWGQCGGQGWSGATSCAAGSTCSTLNPYYAQCIPG TATSTTLVKTTSS
  • SEQ ID NO: 35 (tr_B5TMG4_Acomycota_Penicillium_fliniculosum) QSVWGQCGGQGWSGATSCAAGSTCSTLNPYYAQCIPG TATSTTLVKTTSS
  • SEQ ID NO: 36 (sp_Q4WFK4_Ascomycota_Neosartorya_fumigata) QTVWGQCGGQGWSGPTSCVAGAACSTLNPYYAQCIPGA--STTLTTTTAATTT
  • SEQ ID NO: 37 (sp_Q5B2E8_Ascomycota_Emericella_nidulans) QTLYGQCGGSGWTGATSCVAGAACSTLNQWYAQCLPA—ATTTSTTLTTTTSS
  • SEQ ID NO: 38 (tr_B8MHF4_Ascomycota_Talaromyces_stipitatus) VWGQCGGQGWTGATICAAGATCSAINSYYAQCTPA AAASTTLVTKTSS
  • SEQ ID NO: 39 (sp_AlDJQ7_Ascomycota_Neosartorya_fischeri) QTVWGQCGGQGWSGPTNCVAGAACSTLNPYYAQCIPG ATATSTTLSTTTT
  • SEQ ID NO: 40 (gi_367023495_Ascomycota_Myceliophthora_thermophila) QNCGAVWTQCGGNGWQGPTCCASGSTCVAQNEWYSQCLPNSPSSTSTSQRSTSTSSSTTRSGSS- SSSSTTPPPVSSPTSI PGGATSTASYSGPFSGVRLFANDYYRSEVHNLAI PSMT-
  • SEQ ID NO : 41 (gi_310790274_Ascomycota_Glomerella_graminicola)
  • SEQ ID NO : 42 (gi_302405457_Ascomycota_Verticillium_albo-atrum)
  • SEQ ID NO: 43 (gi_345565889_Ascomycota_Arthrobotrys_oligospora) LWGQCGGIGWTGATNCVAGAACSTLNPYYAQCLSAAATTPRTTTTPATTTR—
  • SEQ ID NO: 45 (tr_Q66PNl_Ascomycota_Trichoderma_parceramosum) QACSSVWGQCGGQNWSGPTCCAAGSTCVYSNDYYSQCPPG-AASSSSSTRASSTT —RVSSTT- STSSATPPPGSTTTRVPPVGSGTATYSGPFVGVTPWANAYYASEVSSLAIPSLT- GAMATAAAAVAKVPSFMWLDTLDKTPLMEQTLADIRTANKNGG —
  • SEQ ID NO: 46 (tr_G4TC42_Basidiomycota_Piriformospora_indica) AGQWGQCGGNGYTGPTQCPSGWVCTPVSPWYYQCLQGTRSSSSSSSRSTSS—
  • SEQ ID NO: 47 (tr_Q9Y894_Basidiomycota_Volvariella_volvacea) QRPWGQCGGPGWTGPTCCVTGCTCPVTND-YSQCLPG—TTTTTPGPPSTTTT
  • SEQ ID NO: 48 (tr_Q96TP4_Basidiomycota_Pleurotus_sajor-caju) VGEWGQCGGINYTGSTTCDAGLVCNVINDYYHQCLP
  • SEQ ID NO: 50 (tr_C4B8Il_Basidiomycota_Coniophora_puteana) VAAYGQCGGQDWTGATACASGTACTKVNDYYYQCLPG
  • SEQ ID NO : 51 (tr_E2 JAJ2_Basidiomycota_Neolentinus_lepideus) SPIYGQCGGTGWTGATTCASGSTCVFSNPYYSQCLPGA-TTTTTSPQPTTTTTTT
  • SEQ ID NO: 52 (tr_A8CED8_Basidiomycota_Polyporus_arcularius) APVYGQCGGIGWSGATTCVSGSVCTKQNDYYSQCLPG-AASSAPTSPPTTSAP
  • SEQ ID NO: 53 (tr_A8NEJ3_Basidiomycota_Coprinopsis_cinerea) RPLYAQCGGTGWTGETTCVSGAVCEVINQWYHQCLPGS QPPVTTQPPV
  • WPTTSQPPVWPTNPP GGTPVPSTGPFEGYDIYLSPYYAEEVEA- AAAMIDDPVLKAKALKVKEIPTFIWFDVVRKTPDLGRYLADATAIQQRTGRK-
  • SEQ ID NO: 54 (tr_B2ZZ24_Basidiomycota_Irpex_lacteus) AQTWAQCGGIGFTGPTTCVAGSVCTKQNDYYSQCIPGS TTPTSAPT
  • SEQ ID NO: 56 (tr_Q6E5B I Basidiomycota Volvariella volvacea) SPLYGQCGGNGWTGPKTCVSGATCTVINDWYWQCLPG NGPTS
  • SEQ ID NO: 57 (tr_F8Q7V9_Basidiomycota_Serpula_lacrymans) ASLYGQCGGVGWTGATTCDSGSSCQEINSYYSQCLPG STTVPTTPTTQPA
  • QNCQTLWGQC GGQGWTGATS CVAGATCSTL NPYYAQCLPA TATTTTSTTT TTTTSSSTTT TSAATATTTT TPTSTTTTTS APSGPTTTAT ASGPFSGYQL YVNPYYSSEV QSLAIPSLTD GSLAAKASAA AKVPSFVWLD TAAKVPTMGT YLADIRAKNA AGANPPIAGI FWYDLPDRD CAALASNGEY SIADNGVENY KAYIDSIREQ LVKYSDVHII LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC TNYALTQLNL PNVAMYLDAG HAGWLGWPAN LGPAAQLFAS VYKNAGSPAA VRGLATNVAN YNAWSISSCP SYTQGNSVCD EKRYINALAP LLKAQGFSDA HFIVDTGRNG KQPTGQQAWG DWCNVIGTGF GVRPTTNTGD ALVDAFVWVK PGGESDGTSD TSA
  • QACASQWGQC GGQGWSGPTC CASGSTCVVS NAFYSQCLPG SATTSTSSTR STTTTSVTST SSTTTATTSV SPPPGTTVTS PPAGPSGGAT YTGPFAGVNL WANSYYRSEV STLAIPSLSD GALATAAAKV AKVPTFQWMD TTAKIPLMDG TLADIRKANK AGGNPPYAGQ FWYNLPDRD CAAAASNGEL SIADDGVAKY KAYIDSIRAI LVKYSDIRII LVIEPDSLAN LVTNMNVAKC ANAQAAYLEC TNYAVTQLNL PNVAMYLDAG HAGWLGWPAN LPPAAALFAN VYKDAGKPKA LRGLVTNVSN YNGWNISSAP SYTQGNPNYD EKHYI DALAP LLSQEGWSDA KFIVDQGRSG KQPTGQQAWG DWCNAIGTGF GVRPTANTGD ALVDAFVWVK PGGESDGTSD T
  • QACASQWGQC GGQGWTGPSC CAAGSVCTVS NPFYSQCLPG STVASSTSTV RTSSTPVVSP SRTSTVTGSV STTSAGTGTT PPAGPTGGAT YTGPFVGVNL WANSYYASEI STLAIPSLSD PALATAAAKV AKVPTFMWMD TRSKIPLVDA TLADIRKANQ AGANPPYAGE FWYNLPDRD CAAAASNGEL SIADGGVAKY KQYI DDIRAM VVKYSDIRII LTIEPDSLAN LVTNLNVPKC AGAQAAYLEG TNYAVTQLNL PNVAMYLDGG HAGWLGWPAN LPPAAAMYAK VYKDAGKPKA LRGLVTNVSN YNGYSISTAP SYTQGNANYD EKHYIEALAP LLSAEGWSDA KFIVDQGRSG KQPTGQLAWG DWCNAIGTGF GVRPTANTGS TLVDAFVWVK PGGESDG
  • SEQ ID NO: 134 (node #78) QNCQTLWGQC GGQGWTGATS CVAGATCSTL NPYYAQCLPA TASTTTSTTT PTTTSSSTTT TSAATTTTTT TPTPTTTTTS APSGPTTTAT ASGPFSGYQL YVNPYYSSEV QSLAIPSLTD GSLAAKASAA AKVPSFVWLD TAAKVPTMGT YLADIRAK A AGANPPIAGI FWYDLPDRD CAALASNGEY SIADNGVEKY KAYIDSIREQ LKKYSDVHI I LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC TNYALTQLNL PNVAMYLDAG HAGWLGWPAN LGPAADLFAS VYKNAGSPAA VRGLATNVAN YNAWSISSCP SYTQGNSVCD EKRYINALAP LLKAQGFSDA HFIVDTGRNG KQPTGQQAWG DWCNVIGTGF GVRPTTNT
  • SEQ ID NO: 139 (node #83) QNCQTVWGQC GGQGWSGPTS CVAGAACSTL NPYYAQCIPG AATSTTTTTT TATTTTSTTT TSTTTTQTTT KPTTTGPTTS APSGPTITVT ASGPFSGYQL YANPYYSSEV HTLAMPSLPD SSLQPKASAV AEVPSFVWLD VAAKVPTMGT YLADIQAKNK AGANPPIAGI FWYDLPDRD CAALASNGEY S IAN GVANY KAYI DAIRAQ LVKYSDVHTI LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC VDYALKQLNL PNVAMYLDAG HAGWLGWPAN LGPAATLFAK VYTDAGSPAA VRGLATNVAN YNAWSLSTCP SYTQGDPNCD EKKYINAMAP LLKEAGFSDA HFIMDTSRNG VQPTKQNAWG DWCNVIGTGF GVRPSTNTGD
  • SEQ ID NO: 144 (node #88) QNCQSVWGQC GGQGWTGATS CAAGSTCSTL NPYYAQCIPA TATTATSTTL VKTTSSTSVG TSTATTSTTT TPTTTTTTTT ASTTATTTAA ASGPFSGYQL YANPYYSSEV HTLAIPSLTD GSLAAAATKA AEIPSFVWLD TAAKVPTMGT YLANIEAANK AGASPPIAGI FWYDLPDRD CAAAASNGEY TVANNGVANY KAYIDSIVAQ LKAYPDVHTI LI IEPDSLAN MVTNLSTAKC AEAQSAYYEC VNYALINLNL ANVAMYIDAG HAGWLGWSAN LSPAAQLFAT VYKNASSPAS LRGLATNVAN YNAWSISSAP SYTSGDSNYD EKLYINALSP LLTSNGWPNA HFIMDTSRNG VQPTKQQAWG DWCNVIGTGF GVQPTTNTGD PLEDAFV
  • SEQ ID NO: 154 (node #98) QNCGAVWTQC GGNGWQGPTC CASGSTCVAQ NEWYSQCLPN SPSSTSTSQR STSTSSSTTR SGSSTSSSST TPPPVSSPTS IPGGATSTAS YSGPFSGVRL FANDYYRSEV HNLAI PSMTD GTLAAKASAV AEVPSFQWLD RNVTIDTMVQ TLSQVRALNK AGANPPYAAQ LWYDLPDRD CAAAASNGEF SIANGGAANY RSYI DAIRKH IIEYSDIRII LVIEPDSMAN MVTNMNVAKC SNAASTYHEL TVYALKQLNL PNVAMYLDAG HAGWLGWPAN IQPAAELFAG IYNDAGKPAA VRGLATNVAN YNAWSIASAP SYTSPNPNYD EKHYIEAFSP LLNSAGFSPA RFIVDTGRNG KQPTGQQQWG DWCNVKGTGF GVRPTANTGH ELV
  • SEQ ID NO: 159 (node #103) QACASVWGQC GGQGWSGATC CASGSTCVVS NDFYSQCLPG SATTSTSSTV STTTTSVTTT SSTTTATTST STPPGTTVTS APSGPSGTAT YTGPFSGVNL WANSYYRSEV STLAIPSLSD GAMATAAAKV AKVPSFQWMD TAAKVPLMEG TLADIRKANK AGGNPPYAGQ FWYDLPDRD CAAAASNGEY SIADDGVAKY KAYIDSIRAI LVKYSDIRII LVIEPDSLAN LVTNMNVAKC ANAQAAYLEC TNYAITQLNL PNVAMYLDAG HAGWLGWPAN LPPAAQLFAK VYKDAGKPRA LRGLVTNVSN YNGWNISSAP SYTQGNPNYD EKHYIHALSP LLTQEGWSDA KFIVDQGRSG KQPTGQQAWG DWCNAIGTGF GVRPTANT
  • QACASQWGQC GGQGWSGPTC CASGSTCVVS NAFYSQCLPG SATTSTSSTV STTTTSVTST SSTTTATTSV STPPGTTVTS PPSGPSGGAT YTGPFAGVNL WANSYYRSEV STLAIPSLSD GALATAAAKV AKVPTFQWMD TAAKVPLMDG TLADIRKANK AGGNPPYAGQ FWYNLPDRD CAAAASNGEL SIADDGVAKY KAYIDSIRAI LVKYSDIRII LVIEPDSLAN LVTNMNVAKC ANAQAAYLEC TNYAVTQLNL PNVAMYLDAG HAGWLGWPAN LPPAAALFAK VYKDAGKPKA LRGLVTNVSN YNGWNISSAP SYTQGNPNYD EKHYI DALAP LLTQEGWSDA KFIVDQGRSG KQPTGQQAWG DWCNAIGTGF GVRPTANTGD ALVDAFVWVK PGGESDGTSD T
  • QACASQWGQC GGQGWTGPSC CAAGSVCTVS NPFYSQCLPG STVASSTSTV RTSSTPVVSP SRTSTVTGSV STTSAGTGTT PPSGPTGGAT YTGPFVGVNL WANSYYASEI STLAIPSLSD PALATAAAKV AKVPTFMWMD TRSKIPLVDA TLADIRKANQ AGANPPYAGE FWYNLPDRD CAAAASNGEL SIADGGVAKY KQYI DDIRAM VVKYSDIRII LTIEPDSLAN LVTNLNVPKC AGAQAAYLEG TNYAVTQLNL PNVAMYLDGG HAGWLGWPAN LPPAAAMYAK VYKDAGKPKA LRGLVTNVSN YNGYSISTAP SYTQGNANYD EKHYIEALAP LLSAEGWSDA KFIVDQGRSG KQPTGQLAWG DWCNAIGTGF GVRPTANTGS TLVDAFVWVK PGGESDG
  • SEQ ID NO: 164 (node #108) QACASVWGQC GGQGWSGATC CASGSTCVVS NDYYSQCLPG SATTSTSSTS SSTTTSSTRA SSTTTSSSST TPPPGSTTTS APPVGSGTAT YSGPFSGVNP WANSYYASEV SSLAIPSLTD GAMATAAAAV AKVPSFMWLD TLAKTPLMES TLADIRAANK AGGNPPYAGQ FWYDLPDRD CAAAASNGEY SIADNGVAKY KNYIDTIRAI LVKYSDIRTI LVIEPDSLAN LVTNLSVAKC ANAQAAYLEC INYAITQLNL PNVAMYLDAG HAGWLGWPAN QQPAAQLFAN VYKNASSPRA LRGLATNVAN YNGWNITSAP SYTQGNSVYN EKLYIHAISP LLTQQGWSNA YFITDQGRSG KQPTGQQAWG DWCNVIGTGF GIRPSANTGD
  • SEQ ID NO: 185 (node #73) QNCAPVWGQC GGIGWTGPTT CVSGSTCTKQ NDYYSQCLPG SAATTTVTTS PTSSASGSSV SSHSGSSTTS SSPTTPTTTS APSGPSSTPP AAGPWTGYQI YLSPYYANEV AALAAKQITD PTLAAKAASV ANI PTFTWLD SVAKIPDLGT YLADASALGK SSGQKPQLVQ IWYDLPDRD CAAKASNGEF SIADNGQANY QNYIDQIVAQ IKQFPDVRVV AVIEPDSLAN LVTNLNVQKC ANAKTTYLAC VNYALKQLSS VGVYMYMDAG HAGWLGWPAN LSPAAQLFAQ VYKNAGKSPF IKGLATNVAN YNALSAASPD PITQGDPNYD EIHYINALAP LLQQAGFFPA TFIVDQGRSG VQNIGRQQWG D
  • QNCQTLWGQC GGQGWTGATS CVAGAACSTL NPYYAQCLPA TATTTTTTTT TTTTTSSTTT TSATTTSTTT TPTTTTTTTT APSSVTTTAT ASGPFSGYQL YANPYYSSEV HSLAI PSLTD GSLAPAATAV AKVPSFVWLD TAAKVPTMGT YLADIRAKNA AGANPPIAGI FWYDLPDRD CAALASNGEY S IANNGVANY KAYIDSIRAI LVKYSDVHTI LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC INYALTQLNL PNVAMYLDAG HAGWLGWPAN LSPAAQLFAS VYKNAGSPAA LRGLATNVAN YNAWSISTCP SYTQGDSVCD EKRYINALAP LLKAQGFPDA HFIVDTGRNG KQPTGQQAWG DWCNVIGTGF GVRPTTNTGD ALLDAFVWVK PGGESDGTSD TSAARY
  • SEQ ID NO: 195 (node #83) QNCQTLWGQC GGQGWTGATS CVAGAACSTL NPYYAQCLPA TATTTTSTTT TTTTSSSTTT TSAATATTTT TPTSTTTTTS APSSVTTTAT ASGPFSGYQL YVNPYYSSEV QSLAIPSLTD GSLAPAATAA AKVPSFVWLD TAAKVPTMGT YLADIRSQNA AGANPPIAGQ FWYDLPDRD CAALASNGEF SIADNGVEHY KAYI DS IREI LVKYSDVHTI LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC TNYALTQLNL PNVAMYLDAG HAGWLGWPAN LQPAANLFAS VYKNAGSPAA LRGLATNVAN YNAWSISTCP SYTQGNSVCD EKQYINALAP LLKAQGFPDA HFIVDTGRNG KQPTGQQAWG DWCNVIGTGF
  • SEQ ID NO: 200 (node #88) QNCQTVWGQC GGIGWSGPTS CVAGAACSTQ NPYYAQCLPG TATTTTTTTT TSTTTSSTTT TPSTGTTTTS APSSTTITAT PSGPFSGYQL YANPYYSSEV HTLAI PSLAD SSLAPKASAV AKVPSFVWLD TAAKVPTMGT YLADIKAKNA AGANPPIAGI FWYDLPDRD CAALASNGEY SIANGGVANY KKYI DAIRAQ LLKYPDVHTI LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC INYALKQLNL PNVAMYLDAG HAGWLGWPAN IGPAAQLFAS VYKDAGSPAA LRGLATNVAN YNAWSISTCP SYTQGDSNCD EKRYINALAP LLKAQGFPDA HFIMDTSRNG VQPTKQQAWG DWCNVIGTGF GVRPTTNTGD GVRPTTNTGD
  • QACASQWGQC GGQGWSGPTC CASGSTCVVQ NAFYSQCLPG SATTATSSTR STTTTSVTST SSTSTATTSV STPPATTVTT PPAGPSGGAT YTGPFAGVNL WANSYYRSEV STLAIPSLSD GALATAAAKV AKVPTFQWMD TAAKVPLMDG TLADIRKANK AGGNPPYAGQ FWYNLPDRD CAAAASNGEL SIADDGVAKY KAYIDSIRAI LVKYSDIRTI LVIEPDSLAN MVTNMNVPKC ANAQAAYKEC TNYAVKQLNL PNVAMYLDAG HAGWLGWPAN LPPAAALFAK IYKDAGKPKA LRGLATNVSN YNAWNISSAP SYTQPNPNYD EKHYIEAFAP LLSQEGWSDA KFIVDQGRSG KQPTGQQAWG DWCNAIGTGF GVRPTANTGH ALVDAFVWVK PGGESDGTSD TTAARY
  • SEQ ID NO: 215 (node #103) QACASQWGQC GGQGWTGPSC CAAGSVCTVS NPFYSQCLPG STVASSTSTV RTSSTPVVSP SRTSTVTGSV STTSAGTGTT PPAGPTGGAT YTGPFVGVNL WANSYYASEI STLAIPSLSD PALATAAAKV AKVPTFMWMD TRSKIPLVDA TLADIRKANQ AGANPPYAGE FWYNLPDRD CAAAASNGEL SIADGGVAKY KQYI DDIRAM VVKYSDIRII LTIEPDSLAN LVTNLNVPKC AGAQAAYLEG TNYAVTQLNL PNVAMYLDGG HAGWLGWPAN LPPAAAMYAK VYKDAGKPKA LRGLVTNVSN YNGYSISTAP SYTQGNANYD EKHYIEALAP LLSAEGWSDA KFIVDQGRSG KQPTGQLAWG DWCNAIGTGF

Landscapes

  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Genetics & Genomics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • Biotechnology (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • General Chemical & Material Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Preparation Of Compounds By Using Micro-Organisms (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)

Abstract

The invention discloses enzymes for use in biofuel production.

Description

BIOFUEL PRODUCTION ENZYMES AND USES THEREOF
[0001] This application claims priority to U.S. Provisional Application No. 61/880,466, filed September 20, 2013, the contents of which are hereby incorporated by reference in their entirety.
[0002] All patents, patent applications and publications cited herein are hereby incorporated by reference in their entirety. The disclosures of these publications in their entireties are hereby incorporated by reference into this application in order to more fully describe the state of the art as known to those skilled therein as of the date of the invention described and claimed herein.
[0003] This patent disclosure contains material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the U.S. Patent and Trademark Office patent file or records, but otherwise reserves any and all copyright rights.
BACKGROUND OF THE INVENTION
[0004] Industrial enzymes have many applications in detergents, textile production,
pharmaceuticals, biofuel production, as well as many other applications. Industrial enzymes can be limited by their stability at a wide range of temperatures and pH levels, and currently there is no reliable method to broaden the range of use without simultaneously affecting the enzyme's activity. A common practice involves randomly inserting mutations in existing enzymes and screening for variants that exhibit the desired characteristics; however, due to the enormous combinatorial possibilities, this can become costly and work-intensive and does not guarantee success. The invention relates to a modified method to predictably alter and optimize enzymes, mainly by identifying and resurrecting suitable ancestral strains. By examining the molecular and evolutionary biology involved in the enzyme chemistry, this method establishes a fast and economically efficient system to develop and/or improve enzymes, especially industrial enzymes used in harsh
environments such as high temperature or low pH.
[0005] The ancestral cellulases can be classified as cellobiohydrolases II. Modern
cellobiohydrolases from fungi are used in the biofuel industry to produce cellulosic ethanol.
SUMMARY OF THE INVENTION
[0006] These enzymes are useful for the production of Cellulosic Ethanol for biofuels. [0007] Some aspects of the present invention provide for ancestral fungal cellulases. Cellulase enzymes are useful for the production of cellulosic ethanol for biofuels. In some embodiments, ancestral cellulases can be used for the hydrolysis of carbohydrate polymers that comprise cellulose. Some aspects of the present invention provide for microorganisms that express an ancestral cellulase. Microorganisms are useful for the production of cellulosic ethanol for biofuels. In some embodiments, microorganisms can be used for the hydrolysis and/or fermentation of cellulose.
[0008] In another aspect, the present invention provides for an isolated polypeptide comprising about 90% identity to any one of the amino acid sequences of SEQ ID NO: 58, SEQ ID NO: 59, SEQ ID NO: 60, SEQ ID NO: 61, SEQ ID NO: 62, SEQ ID NO: 63, SEQ ID NO: 64, SEQ ID NO: 65, SEQ ID NO: 66, SEQ ID NO: 67, SEQ ID NO: 68, SEQ ID NO: 69, SEQ ID NO: 70, SEQ ID NO: 71, SEQ ID NO: 72, SEQ ID NO: 73, SEQ ID NO: 74, SEQ ID NO: 75, SEQ ID NO: 76, SEQ ID NO: 77, SEQ ID NO: 78, SEQ ID NO: 79, SEQ ID NO: 80, SEQ ID NO: 81, SEQ ID NO: 82, SEQ ID NO: 83, SEQ ID NO: 84, SEQ ID NO: 85, SEQ ID NO: 86, SEQ ID NO: 87, SEQ ID NO: 88, SEQ ID NO: 89, SEQ ID NO: 90, SEQ ID NO: 91, SEQ ID NO: 92, SEQ ID NO: 93, SEQ ID NO: 94, SEQ ID NO: 95, SEQ ID NO: 96, SEQ ID NO: 97, SEQ ID NO: 98, SEQ ID NO: 99, SEQ ID NO: 100, SEQ ID NO: 101, SEQ ID NO: 102, SEQ ID NO: 103, SEQ ID NO: 104, SEQ ID NO: 105, SEQ ID NO: 106, SEQ ID NO: 107, SEQ ID NO: 108, SEQ ID NO: 109, SEQ ID NO: 110, SEQ ID NO: 111, SEQ ID NO: 112, SEQ ID NO: 113, SEQ ID NO: 114, SEQ ID NO: 115, SEQ ID NO: 116, SEQ ID NO: 117, SEQ ID NO: 118, SEQ ID NO: 119, SEQ ID NO: 120, SEQ ID NO: 121, SEQ ID NO: 122, SEQ ID NO: 123, SEQ ID NO: 124, SEQ ID NO: 125, SEQ ID NO: 126, SEQ ID NO: 127, SEQ ID NO: 128, SEQ ID NO: 129, SEQ ID NO: 130, SEQ ID NO: 131, SEQ ID NO: 132, SEQ ID NO: 133, SEQ ID NO: 134, SEQ ID NO: 135, SEQ ID NO: 136, SEQ ID NO: 137, SEQ ID NO: 138, SEQ ID NO: 139, SEQ ID NO: 140, SEQ ID NO: 141, SEQ ID NO: 142, SEQ ID NO: 143, SEQ ID NO: 144, SEQ ID NO: 145, SEQ ID NO: 146, SEQ ID NO: 147, SEQ ID NO: 148, SEQ ID NO: 149, SEQ ID NO: 150, SEQ ID NO: 151, SEQ ID NO: 152, SEQ ID NO: 153, SEQ ID NO: 154, SEQ ID NO: 155, SEQ ID NO: 156, SEQ ID NO: 157, SEQ ID NO: 158, SEQ ID NO: 159, SEQ ID NO: 160, SEQ ID NO: 161, SEQ ID NO: 162, SEQ ID NO: 163, SEQ ID NO: 164, SEQ ID NO: 165, SEQ ID NO: 166, SEQ ID NO: 167, SEQ ID NO: 168, SEQ ID NO: 169, SEQ ID NO: 170, SEQ ID NO: 171, SEQ ID NO: 172, SEQ ID NO: 173, SEQ ID NO: 174, SEQ ID NO: 175, SEQ ID NO: 176, SEQ ID NO: 177, SEQ ID NO: 178, SEQ ID NO: 179, SEQ ID NO: 180, SEQ ID NO: 181, SEQ ID NO: 182, SEQ ID NO: 183, SEQ ID NO: 184, SEQ ID NO: 185, SEQ ID NO: 186, SEQ ID NO: 187, SEQ ID NO: 188, SEQ ID NO: 189, SEQ ID NO: 190, SEQ ID NO: 191, SEQ ID NO: 192, SEQ ID NO: 193, SEQ ID NO: 194, SEQ ID NO: 195, SEQ ID NO: 196, SEQ ID NO: 197, SEQ ID NO: 198, SEQ ID NO: 199, SEQ ID NO: 200, SEQ ID NO: 201, SEQ ID NO: 202, SEQ ID NO: 203, SEQ ID NO: 204, SEQ ID NO: 205, SEQ ID NO: 206, SEQ ID NO: 207, SEQ ID NO: 208, SEQ ID NO: 209, SEQ ID NO: 210, SEQ ID NO: 211, SEQ ID NO: 212, SEQ ID NO: 213, SEQ ID NO: 214, SEQ ID NO: 215, SEQ ID NO: 216, SEQ ID NO: 217, SEQ ID NO: 218, SEQ ID NO: 219, SEQ ID NO: 220, SEQ ID NO: 221, SEQ ID NO: 222, SEQ ID NO: 223, SEQ ID NO: 224, SEQ ID NO: 225, SEQ ID NO: 226, SEQ ID NO: 227, SEQ ID NO: 228, SEQ ID NO: 229, SEQ ID NO: 230, SEQ ID NO: 231, SEQ ID NO: 232, SEQ ID NO: 233, SEQ ID NO: 234, SEQ ID NO: 235, SEQ ID NO: 236, SEQ ID NO: 237, SEQ ID NO: 238, SEQ ID NO: 239, SEQ ID NO: 240, SEQ ID NO: 241, SEQ ID NO: 242, SEQ ID NO: 243, SEQ ID NO: 244, SEQ ID NO: 245, SEQ ID NO: 246, SEQ ID NO: 247, SEQ ID NO: 248, SEQ ID NO: 249, SEQ ID NO: 250, SEQ ID NO: 251, SEQ ID NO: 252, SEQ ID NO: 253, SEQ ID NO: 254, SEQ ID NO: 255, SEQ ID NO: 256, SEQ ID NO: 257, SEQ ID NO: 258, SEQ ID NO: 259, SEQ ID NO: 260, SEQ ID NO: 261, SEQ ID NO: 262, SEQ ID NO: 263, SEQ ID NO: 264, SEQ ID NO: 265, SEQ ID NO: 266, SEQ ID NO: 267, SEQ ID NO: 268, SEQ ID NO: 269, SEQ ID NO: 270, SEQ ID NO: 271, SEQ ID NO: 272, SEQ ID NO: 273, SEQ ID NO: 274, SEQ ID NO: 275, SEQ ID NO: 276, SEQ ID NO: 277, SEQ ID NO: 278, SEQ ID NO: 279, SEQ ID NO: 280, or SEQ ID NO: 281.
[0009] In some embodiments the signal peptide of the isolated polypeptide is removed.
[0010] In another aspect, the present invention provides for a nucleic acid encoding a polypeptide of the present invention.
[0011] In another aspect, the present invention provides for a recombinant microorganism, wherein said microorganism expresses a nucleic acid of the present invention. In another aspect, the present invention provides for a recombinant microorganism, wherein said microorganism expresses a nucleic acid encoding a polypeptide of the present invention, or a combination thereof. In some embodiments, the recombinant microorganism is a fungus. In some embodiments, the recombinant microorganism is from the phylum Basidomycota, from the phylum Ascomycota, from the subkingdom dikarya, or from the class Sordariomycetes. In some embodiments, the recombinant microorganism is a yeast. In some embodiments, the recombinant microorganism is a bacteria. In some embodiments, the recombinant microorganism is Saccharomyces cerevisiae. In some embodiments, the recombinant microorganism is selected from the group consisting of Saccharomyces sp., Pichia sp., Sclerotium rolfsii, Phanenerochate chrysosporium, Trichoderma sp., Aspergillus sp., Schizophyllum sp., and Penicillium sp. In some embodiments, the recombinant microorganism is selected from the group consisting of E.coli sp., Clostridium sp., Cellulomonas sp., Bacillus sp., Thermomonospora sp., Ruminococcus sp., Bactericides sp., Erwinia sp.,
Acetovibrio sp., Microbispora sp., and Streptomyces sp.
[0012] In another aspect, the present invention provides for a method for the production of cellulosic ethanol, comprising adding an isolated polypeptide of the present invention, or a combination thereof, to a source material of cellulose for cellulose processing. In some embodiments, the method further comprises adding a recombinant microorganism of the present invention, or a combination thereof. In some embodiments, the isolated polypeptide and recombinant microorganism are added sequentially, in any order. In some embodiments, the isolated polypeptide and recombinant microorganism are added simultaneously. In some embodiments, carbohydrate polymers are depolymerized.
[0013] In another aspect, the present invention provides for a method for the production of cellulosic ethanol, comprising adding a recombinant microorganism of the present invention, or a combination thereof, to a source material of cellulose for cellulose processing. In some embodiments, the method further comprises adding a polypeptide of the present invention, or a combination thereof. In some embodiments, the isolated polypeptide and recombinant
microorganism are added sequentially, in any order. In some embodiments, the isolated polypeptide and recombinant microorganism are added simultaneously. In some embodiments, carbohydrate polymers are depolymerized.
[0014] In another aspect, the present invention provides for a method for cellulose processing, comprising adding a polypeptide of the present invention, or a combination thereof, to a source material of cellulose. In some embodiments, the method further comprises adding a recombinant microorganism of the present invention, or a combination thereof. In some embodiments, the isolated polypeptide and recombinant microorganism are added sequentially, in any order. In some embodiments, the isolated polypeptide and recombinant microorganism are added simultaneously. In some embodiments, carbohydrate polymers are depolymerized.
[0015] In another aspect, the present invention provides for a method for cellulose processing, comprising adding a recombinant microorganism of the present invention, or a combination thereof, to a source material of cellulose. In some embodiments, the method further comprises adding a polypeptide of the present invention, or a combination thereof. In some embodiments, the isolated polypeptide and recombinant microorganism are added sequentially, in any order. In some embodiments, the isolated polypeptide and recombinant microorganism are added simultaneously. In some embodiments, carbohydrate polymers are depolymerized. BRIEF DESCRIPTION OF THE FIGURES
[0016] FIG. 1A-C is a phylogenetic tree of fungal cellulases obtained using BEAST.
[0017] FIG. 2A-C is a phylogenetic tree of fungal cellulases obtained using MrBayes.
DETAILED DESCRIPTION OF THE INVENTION
[0018] Cellulases are enzymes that can catalyze the hydrolysis of the β-1,4 glucosidic bonds in cellulose, the predominant component of plant matter. In nature, cellulases facilitate microbial conversion of insoluble cellulose contained within biomass into soluble sugars (EA Bayer et al. Current Opinion in Structural Biology, 8:548-557, 1998).
[0019] Cellobiohydrolases from fungi can be used in the bio fuel industry to produce cellulosic ethanol. Before the sugars in lignocellulosic biomass, such as wood, can be fermented into ethanol, the lignin that encapsulates the cellulose and the cellulose's unique structural conformation within can be addressed with either acid or enzyme hydrolysis (PC Badger, In: J. Janick and A. Whipkey (eds.), Trends in new crops and new uses. ASHS Press, Alexandria, VA. 2002.)
[0020] The industry needs highly stable and active cellulases that can withstand harsh conditions in ethanol production and processing, such as high temperature, low pH, and/or substrate pretreatment. Previous work demonstrated that ancestral enzymes from thioredoxin superfamily were more thermally stable, pH resistant, and more active than their modern relatives. These organisms lived on the primordial earth, in an environment that was much hotter and more acidic than today. Consequently, their enzymes should have a higher thermal and acidic stability than their modern counterparts (R Perez- Jiminez et al. Nature Structural & Molecular Biology, 18: 592- 596, 2011).
[0021] Previous methods for novel enzyme production include directed evolution with its time consuming trial and error approach, yet in the last two decades methods based on statistical theory have been developed to computationally reconstruct ancestral protein sequences (MF Cole & EA Gaucher, Journal of Molecular Evolution, 72(2): 193-203, 2011.1).
[0022] With rising oil prices and increased emphasis on negating the threats of global warming, additional sources of green technologies are being investigated for the generation of bio fuels and textiles. There is an incresing effort to produce highly stable and highly active cellulases to overcome the limitations imposed by the industrial process of ethanol production. These limitations are, for instance, high temperature, low pH or substrate pretreatment. The field of paleoenzymology relates to how ancient organisms subsisted and the biomolecules that supported the environment. In previous work, the inventors demonstrated that ancestral enzymes from the thioredoxin superfamily were more thermally stable, pH resitant and more active than their modern relatives.
Resurrection of ancestral cellulases
[0023] Ancestral sequence reconstruction methods was applied to resurrect cellobiohydrolases (Lists #2-5) going back in time up to -1200 Myr. A set of 57 modern enzymes was used from different fungi classes, including from the fungal classes basidiomycota and ascomycota. Most fungal cellulases used in industry belong to these classes. The sequences were aligned using MUSCLE software. Further corrections by hand were necessary to obtain a suitable aligment. The alignment was then used to construct a phylogeny using different software platforms which have been used to build trees using bayesian analysis. MrBayes and BEAST software were used. These two programs use Bayesian MCMC analysis but differ in the procedure to estimate posterior probability. Bayesian analysis is a method used to analyze data that uses previous information in the generation of a functional result. This method of analysis was used in tandem with phylogenic studies in this technology. The sequences used in the trees and the sequence alignment are shown in List #1 (Appendix 1), the phylogenetic trees are shown in FIG. 1A-C (BEAST) and FIG. 2A-C (MrBayes). Resurrected sequences for each tree were generated by two different statistical methods, marginal reconstruction (List #2 and #4) and joint reconstruction (List #3 and #5). These sequences present a reconstruction of fungal cellulase enzymes to be used in the production of bioethanol as a green-fuel source.
[0024] The cellulases that will be resurrected correspond to:
[0025] Last common ancestor of dikarya (1208 Myr): DiCA
[0026] Last common ancestor of Basidiomycota (966 Myr): BasCA
[0027] Last common ancestor of Ascomycota (1144 Myr): AsCA
[0028] Last common ancestor of Sordariomycetes (433 Myr):SorCA
[0029] Divergence times were obtained from the literature at http://www.timetree.org.
[0030] List #1 - List of sequences used for the reconstruction including sequence alignment obtained using MUSCLE software [0031] These sequences are depicted in Appendix 1.
[0032] List #2 - List of ancestral cellulases identified using PAML and BEAST tree and marginal reconstruction
[0033] These sequence are depicted in Appendix 2. [0034] Overall accuracy of the 56 ancestral sequences:
0. .73286 0. .77790 0. .76008 0. .91959 0. .89280 0. .90887 0. .95255 0. .93816
0. .90432 0. .95619 0. .95425 0. .95206 0. .95104 0. .92638 0. .95074 0. .95235
0. .94447 0. .94992 0. .96438 0. .96090 0. .95961 0. .96336 0. .97409 0. .96579
0. .97062 0. .98076 0. .96522 0. .96265 0. .95532 0. .97974 0. .98115 0. .98873
0. .95290 0. .94859 0. .94683 0. .94096 0. .96499 0. .99693 0. .96739 0. .96529
0. .99798 0. .97104 0. .96363 0. .97372 0. .94962 0. .93436 0. .94652 0. .92912
0. .99438 0. .99534 0. .96712 0. .99564 0. .99671 0. .99763 0. .99732 0. .99732 for a site.
0. .00000 0. .00000 0. .00000 0. .00000 0. .00000 0. .00000 0. .00000 0. .00000
0. .00000 0. .00000 0. .00000 0. .00000 0. .00000 0. .00000 0. .00000 0. .00000
0. .00000 0. .00000 0. .00000 0. .00000 0. .00000 0. .00000 0. .00000 0. .00000
0. .00000 0. .00002 0. .00000 0. .00000 0. .00000 0. .00002 0. .00003 0. .00157
0. .00000 0. .00000 0. .00000 0. .00000 0. .00000 0. .17416 0. .00000 0. .00000
0. .30665 0. .00000 0. .00000 0. .00000 0. .00000 0. .00000 0. .00000 0. .00000
0. .02582 0. .07825 0. .00000 0. .11031 0. .19425 0. .30481 0. .25855 0. .25854 for the sequence .
[0035] List #3 - List of ancestral cellulases identified using PAML and BEAST tree and joint reconstruction
[0036] These sequence are depicted in Appendix 3.
[0037] List #4 - List of ancestral cellulases identified using PAML and MrBayes tree and marginal reconstruction
[0038] These sequence are depicted in Appendix 4. [0039] Overall accuracy of the 56 ancestral sequences:
0.72173 0.77466 0.75315 0.91751 0.87852 0.91543 0.92806 0.94903 0.95256 0.95396 0.93356 0.95213 0.95098 0.92654 0.94973 0.95130 0.93970 0.95214 0.96248 0.97864 0.98074 0.98884 0.96116 0.96830 0.96101 0.96458 0.95809 0.97466 0.96879 0.97922 0.96335 0.95372 0.95050 0.95523 0.96708 0.99637 0.99717 0.99780 0.99749 0.99749 0.94868 0.93537 0.95008 0.93285 0.99555 0.99464 0.94892 0.94287 0.96922 0.99716 0.97010 0.96577 0.99788 0.97263 0.97318 0.96569 for a site. 0..00000 0..00000 0..00000 0..00000 0..00000 0..00000 0..00000 0..00000
0. .00000 0. .00000 0. .00000 0. .00000 0. .00000 0. .00000 0. .00000 0. .00000
0. .00000 0. .00000 0. .00000 0. .00001 0. .00002 0. .00128 0. .00000 0. .00000
0. .00000 0. .00000 0. .00000 0. .00000 0. .00000 0. .00000 0. .00000 0. .00000
0. .00000 0. .00000 0. .00000 0. .16745 0. .24852 0. .33624 0. .28548 0. .28548
0. .00000 0. .00000 0. .00000 0. .00000 0. .08614 0. .03539 0. .00000 0. .00000
0. .00000 0. .19691 0. .00000 0. .00000 0. .28189 0. .00000 0. .00000 0. .00000 for the sequence .
0. .96892 0. .99664 0. .96978 0. .96542 0. .99737 0. .97228 0. .97279 0. .96532
[0040] List #5 - List of ancestral cellulases identified using PAML and MrBayes tree and joint reconstruction
[0041] These sequence are depicted in Appendix 5. Molecules of the invention
[0042] The singular forms "a," "an," and "the" include plural reference unless the context clearly dictates otherwise.
[0043] The term "about" is used herein to mean approximately, in the region of, roughly, or around. When the term "about" is used in conjunction with a numerical range, it modifies that range by extending the boundaries above and below the numerical values set forth. In general, the term "about" is used herein to modify a numerical value above and below the stated value by a variance of 20%.
[0044] As used herein, an "ancestral cellulase molecule" refers to an ancestral cellulose protein, or a fragment thereof. An "ancestral cellulase molecule" can also refer to a nucleic acid (including, for example, genomic DNA, complementary DNA (cDNA), synthetic DNA, as well as any form of corresponding RNA) which encodes a polypeptide corresponding to an ancestral cellulase protein, or fragment thereof. For example, an ancestral cellulase molecule comprises the amino acid sequence shown in SEQ ID NO: 58, SEQ ID NO: 59, SEQ ID NO: 60, SEQ ID NO: 61, SEQ ID NO: 62, SEQ ID NO: 63, SEQ ID NO: 64, SEQ ID NO: 65, SEQ ID NO: 66, SEQ ID NO: 67, SEQ ID NO: 68, SEQ ID NO: 69, SEQ ID NO: 70, SEQ ID NO: 71, SEQ ID NO: 72, SEQ ID NO: 73, SEQ ID NO: 74, SEQ ID NO: 75, SEQ ID NO: 76, SEQ ID NO: 77, SEQ ID NO: 78, SEQ ID NO: 79, SEQ ID NO: 80, SEQ ID NO: 81, SEQ ID NO: 82, SEQ ID NO: 83, SEQ ID NO: 84, SEQ ID NO: 85, SEQ ID NO: 86, SEQ ID NO: 87, SEQ ID NO: 88, SEQ ID NO: 89, SEQ ID NO: 90, SEQ ID NO: 91, SEQ ID NO: 92, SEQ ID NO: 93, SEQ ID NO: 94, SEQ ID NO: 95, SEQ ID NO: 96, SEQ ID NO: 97, SEQ ID NO: 98, SEQ ID NO: 99, SEQ ID NO: 100, SEQ ID NO: 101, SEQ ID NO: 102, SEQ ID NO: 103, SEQ ID NO: 104, SEQ ID NO: 105, SEQ ID NO: 106, SEQ ID NO: 107, SEQ ID NO: 108, SEQ ID NO: 109, SEQ ID NO: 110, SEQ ID NO: 111, SEQ ID NO: 112, SEQ ID NO: 113, SEQ ID NO: 114, SEQ ID NO: 115, SEQ ID NO: 116, SEQ ID NO: 117, SEQ ID NO: 118, SEQ ID NO: 119, SEQ ID NO: 120, SEQ ID NO: 121, SEQ ID NO: 122, SEQ ID NO: 123, SEQ ID NO: 124, SEQ ID NO: 125, SEQ ID NO: 126, SEQ ID NO: 127, SEQ ID NO: 128, SEQ ID NO: 129, SEQ ID NO: 130, SEQ ID NO: 131, SEQ ID NO: 132, SEQ ID NO: 133, SEQ ID NO: 134, SEQ ID NO: 135, SEQ ID NO: 136, SEQ ID NO: 137, SEQ ID NO: 138, SEQ ID NO: 139, SEQ ID NO: 140, SEQ ID NO: 141, SEQ ID NO: 142, SEQ ID NO: 143, SEQ ID NO: 144, SEQ ID NO: 145, SEQ ID NO: 146, SEQ ID NO: 147, SEQ ID NO: 148, SEQ ID NO: 149, SEQ ID NO: 150, SEQ ID NO: 151, SEQ ID NO: 152, SEQ ID NO: 153, SEQ ID NO: 154, SEQ ID NO: 155, SEQ ID NO: 156, SEQ ID NO: 157, SEQ ID NO: 158, SEQ ID NO: 159, SEQ ID NO: 160, SEQ ID NO: 161, SEQ ID NO: 162, SEQ ID NO: 163, SEQ ID NO: 164, SEQ ID NO: 165, SEQ ID NO: 166, SEQ ID NO: 167, SEQ ID NO: 168, SEQ ID NO: 169, SEQ ID NO: 170, SEQ ID NO: 171, SEQ ID NO: 172, SEQ ID NO: 173, SEQ ID NO: 174, SEQ ID NO: 175, SEQ ID NO: 176, SEQ ID NO: 177, SEQ ID NO: 178, SEQ ID NO: 179, SEQ ID NO: 180, SEQ ID NO: 181, SEQ ID NO: 182, SEQ ID NO: 183, SEQ ID NO: 184, SEQ ID NO: 185, SEQ ID NO: 186, SEQ ID NO: 187, SEQ ID NO: 188, SEQ ID NO: 189, SEQ ID NO: 190, SEQ ID NO: 191, SEQ ID NO: 192, SEQ ID NO: 193, SEQ ID NO: 194, SEQ ID NO: 195, SEQ ID NO: 196, SEQ ID NO: 197, SEQ ID NO: 198, SEQ ID NO: 199, SEQ ID NO: 200, SEQ ID NO: 201, SEQ ID NO: 202, SEQ ID NO: 203, SEQ ID NO: 204, SEQ ID NO: 205, SEQ ID NO: 206, SEQ ID NO: 207, SEQ ID NO: 208, SEQ ID NO: 209, SEQ ID NO: 210, SEQ ID NO: 211, SEQ ID NO: 212, SEQ ID NO: 213, SEQ ID NO: 214, SEQ ID NO: 215, SEQ ID NO: 216, SEQ ID NO: 217, SEQ ID NO: 218, SEQ ID NO: 219, SEQ ID NO: 220, SEQ ID NO: 221, SEQ ID NO: 222, SEQ ID NO: 223, SEQ ID NO: 224, SEQ ID NO: 225, SEQ ID NO: 226, SEQ ID NO: 227, SEQ ID NO: 228, SEQ ID NO: 229, SEQ ID NO: 230, SEQ ID NO: 231, SEQ ID NO: 232, SEQ ID NO: 233, SEQ ID NO: 234, SEQ ID NO: 235, SEQ ID NO: 236, SEQ ID NO: 237, SEQ ID NO: 238, SEQ ID NO: 239, SEQ ID NO: 240, SEQ ID NO: 241, SEQ ID NO: 242, SEQ ID NO: 243, SEQ ID NO: 244, SEQ ID NO: 245, SEQ ID NO: 246, SEQ ID NO: 247, SEQ ID NO: 248, SEQ ID NO: 249, SEQ ID NO: 250, SEQ ID NO: 251, SEQ ID NO: 252, SEQ ID NO: 253, SEQ ID NO: 254, SEQ ID NO: 255, SEQ ID NO: 256, SEQ ID NO: 257, SEQ ID NO: 258, SEQ ID NO: 259, SEQ ID NO: 260, SEQ ID NO: 261, SEQ ID NO: 262, SEQ ID NO: 263, SEQ ID NO: 264, SEQ ID NO: 265, SEQ ID NO: 266, SEQ ID NO: 267, SEQ ID NO: 268, SEQ ID NO: 269, SEQ ID NO: 270, SEQ ID NO: 271, SEQ ID NO: 272, SEQ ID NO: 273, SEQ ID NO: 274, SEQ ID NO: 275, SEQ ID NO: 276, SEQ ID NO: 277, SEQ ID NO: 278, SEQ ID NO: 279, SEQ ID NO: 280, or SEQ ID NO: 281, or comprises a nucleic acid sequence encoding the amino acid sequence shown in SEQ ID NO: 58, SEQ ID NO: 59, SEQ ID NO: 60, SEQ ID NO: 61, SEQ ID NO: 62, SEQ ID NO: 63, SEQ ID NO: 64, SEQ ID NO: 65, SEQ ID NO: 66, SEQ ID NO: 67, SEQ ID NO: 68, SEQ ID NO: 69, SEQ ID NO: 70, SEQ ID NO: 71, SEQ ID NO: 72, SEQ ID NO: 73, SEQ ID NO: 74, SEQ ID NO: 75, SEQ ID NO: 76, SEQ ID NO: 77, SEQ ID NO: 78, SEQ ID NO: 79, SEQ ID NO: 80, SEQ ID NO: 81, SEQ ID NO: 82, SEQ ID NO: 83, SEQ ID NO: 84, SEQ ID NO: 85, SEQ ID NO: 86, SEQ ID NO: 87, SEQ ID NO: 88, SEQ ID NO: 89, SEQ ID NO: 90, SEQ ID NO: 91, SEQ ID NO: 92, SEQ ID NO: 93, SEQ ID NO: 94, SEQ ID NO: 95, SEQ ID NO: 96, SEQ ID NO: 97, SEQ ID NO: 98, SEQ ID NO: 99, SEQ ID NO: 100, SEQ ID NO: 101, SEQ ID NO: 102, SEQ ID NO: 103, SEQ ID NO: 104, SEQ ID NO: 105, SEQ ID NO: 106, SEQ ID NO: 107, SEQ ID NO: 108, SEQ ID NO: 109, SEQ ID NO: 110, SEQ ID NO: 111, SEQ ID NO: 112, SEQ ID NO: 113, SEQ ID NO: 114, SEQ ID NO: 115, SEQ ID NO: 116, SEQ ID NO: 117, SEQ ID NO: 118, SEQ ID NO: 119, SEQ ID NO: 120, SEQ ID NO: 121, SEQ ID NO: 122, SEQ ID NO: 123, SEQ ID NO: 124, SEQ ID NO: 125, SEQ ID NO: 126, SEQ ID NO: 127, SEQ ID NO: 128, SEQ ID NO: 129, SEQ ID NO: 130, SEQ ID NO: 131, SEQ ID NO: 132, SEQ ID NO: 133, SEQ ID NO: 134, SEQ ID NO: 135, SEQ ID NO: 136, SEQ ID NO: 137, SEQ ID NO: 138, SEQ ID NO: 139, SEQ ID NO: 140, SEQ ID NO: 141, SEQ ID NO: 142, SEQ ID NO: 143, SEQ ID NO: 144, SEQ ID NO: 145, SEQ ID NO: 146, SEQ ID NO: 147, SEQ ID NO: 148, SEQ ID NO: 149, SEQ ID NO: 150, SEQ ID NO: 151, SEQ ID NO: 152, SEQ ID NO: 153, SEQ ID NO: 154, SEQ ID NO: 155, SEQ ID NO: 156, SEQ ID NO: 157, SEQ ID NO: 158, SEQ ID NO: 159, SEQ ID NO: 160, SEQ ID NO: 161, SEQ ID NO: 162, SEQ ID NO: 163, SEQ ID NO: 164, SEQ ID NO: 165, SEQ ID NO: 166, SEQ ID NO: 167, SEQ ID NO: 168, SEQ ID NO: 169, SEQ ID NO: 170, SEQ ID NO: 171, SEQ ID NO: 172, SEQ ID NO: 173, SEQ ID NO: 174, SEQ ID NO: 175, SEQ ID NO: 176, SEQ ID NO: 177, SEQ ID NO: 178, SEQ ID NO: 179, SEQ ID NO: 180, SEQ ID NO: 181, SEQ ID NO: 182, SEQ ID NO: 183, SEQ ID NO: 184, SEQ ID NO: 185, SEQ ID NO: 186, SEQ ID NO: 187, SEQ ID NO: 188, SEQ ID NO: 189, SEQ ID NO: 190, SEQ ID NO: 191, SEQ ID NO: 192, SEQ ID NO: 193, SEQ ID NO: 194, SEQ ID NO: 195, SEQ ID NO: 196, SEQ ID NO: 197, SEQ ID NO: 198, SEQ ID NO: 199, SEQ ID NO: 200, SEQ ID NO: 201, SEQ ID NO: 202, SEQ ID NO: 203, SEQ ID NO: 204, SEQ ID NO: 205, SEQ ID NO: 206, SEQ ID NO: 207, SEQ ID NO: 208, SEQ ID NO: 209, SEQ ID NO: 210, SEQ ID NO: 211, SEQ ID NO: 212, SEQ ID NO: 213, SEQ ID NO: 214, SEQ ID NO: 215, SEQ ID NO: 216, SEQ ID NO: 217, SEQ ID NO: 218, SEQ ID NO: 219, SEQ ID NO: 220, SEQ ID NO: 221, SEQ ID NO: 222, SEQ ID NO: 223, SEQ ID NO: 224, SEQ ID NO: 225, SEQ ID NO: 226, SEQ ID NO: 227, SEQ ID NO: 228, SEQ ID NO: 229, SEQ ID NO: 230, SEQ ID NO: 231, SEQ ID NO: 232, SEQ ID NO: 233, SEQ ID NO: 234, SEQ ID NO: 235, SEQ ID NO: 236, SEQ ID NO: 237, SEQ ID NO: 238, SEQ ID NO: 239, SEQ ID NO: 240, SEQ ID NO: 241, SEQ ID NO: 242, SEQ ID NO: 243, SEQ ID NO: 244, SEQ ID NO: 245, SEQ ID NO: 246, SEQ ID NO: 247, SEQ ID NO: 248, SEQ ID NO: 249, SEQ ID NO: 250, SEQ ID NO: 251, SEQ ID NO: 252, SEQ ID NO: 253, SEQ ID NO: 254, SEQ ID NO: 255, SEQ ID NO: 256, SEQ ID NO: 257, SEQ ID NO: 258, SEQ ID NO: 259, SEQ ID NO: 260, SEQ ID NO: 261, SEQ ID NO: 262, SEQ ID NO: 263, SEQ ID NO: 264, SEQ ID NO: 265, SEQ ID NO: 266, SEQ ID NO: 267, SEQ ID NO: 268, SEQ ID NO: 269, SEQ ID NO: 270, SEQ ID NO: 271, SEQ ID NO: 272, SEQ ID NO: 273, SEQ ID NO: 274, SEQ ID NO: 275, SEQ ID NO: 276, SEQ ID NO: 277, SEQ ID NO: 278, SEQ ID NO: 279, SEQ ID NO: 280, or SEQ ID NO: 281. For example, an ancestral cellulase molecule can be encoded by a recombinant nucleic acid encoding an ancestral cellulase protein, or fragment thereof. The ancestral cellulase molecules of the invention can be obtained from various sources and can be produced according to various techniques known in the art. For example, a nucleic acid that encodes an ancestral cellulase molecule can be obtained by synthetic or semi-synthetic methods, by screening DNA libraries, or by amplification from a natural source. An ancestral cellulase molecule can include a fragment or portion of an ancestral cellulase protein. An ancestral cellulase molecule can include a variant of the above described examples, such as a fragment thereof. Such a variant can comprise a naturally- occurring variant due to allelic variations between individuals (e.g., polymorphisms), mutated alleles, or alternative splicing forms. In one embodiment, an ancestral cellulase molecule comprises a variant of an ancestral cellulase protein or polypeptide encoded by an ancestral cellulase nucleic acid sequence wherein the variant has an amino acid identity to SEQ ID NOS: 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, or 281 of about 70%, about 75%, about 80%, about 85%, about 90%, about 91%, about 92%, about 93%, about 94%, about 95%, about 96%o, about 97%>, about 98%>, or about 99%>. Such variants can include those having at least from about 46% to about 50% identity to SEQ ID NOS: 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, or 281 or having at least from about 50.1% to about 55% identity to SEQ ID NOS: 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, or 281, or having at least from about 55.1% to about 60% identity to SEQ ID NOS: 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, or 281, or having at least from about 60.1% to about 65% identity to SEQ ID NOS: 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, or 281, or having at least from about 65.1% to about 70% identity to SEQ ID NOS: 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, or 281, or having at least from about 70.1% to about 75% identity to SEQ ID NOS: 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, or 281, or having at least from about 75.1% to about 80% identity to SEQ ID NOS: 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, or 281, or having at least from about 80.1% to about 85% identity to SEQ ID NOS: 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, or 281, or having at least from about 85.1% to about 90% identity to SEQ ID NOS: 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, or 281, or having at least from about 90.1% to about 95% identity to SEQ ID NOS: 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, or 281, or having at least from about 95.1% to about 97% identity to SEQ ID NOS: 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, or 281, or having at least from about 97.1% to about 99% identity to SEQ ID NOS: 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, or 281. In another embodiment, an ancestral cellulase molecule can be a fragment of an ancestral cellulase protein.
[0045] According to the invention, protein variants can include amino acid sequence modifications. For example, amino acid sequence modifications fall into one or more of three classes: substitutional, insertional or deletional variants. Insertions can include amino and/or carboxyl terminal fusions as well as intrasequence insertions of single or multiple amino acid residues. Insertions ordinarily will be smaller insertions than those of amino or carboxyl terminal fusions, for example, on the order of one to four residues. Deletions are characterized by the removal of one or more amino acid residues from the protein sequence. These variants ordinarily are prepared by site-specific mutagenesis of nucleotides in the DNA encoding the protein, thereby producing DNA encoding the variant, and thereafter expressing the DNA in recombinant cell culture. In one embodiment, an ancestral cellulase molecule can be modified by deletion of the sequence encoding the signal peptide.
[0046] Signal peptides are polypeptide sequences variable in length and amino acid
composition found at the amino-terminus of some proteins. Signal peptides direct the secretion of polypeptide molecules through a prokaryotic or eukaryotic cell membrane. Signal peptides have a tripartite structure consisting of a hydrophobic core flanked by a positively charged n-region and a neutral but polar c-region on either side (Tuteja, R., (2005) Arch. BioChem. Biophys. 441 : 107- 111). Signal peptide sequences can be identified by various methods, known to one of skill in the art. For example, signal peptide sequences within a polypeptide sequence can be identified using various prediction tools including, but not limited to, Phobius (http://phobius.sbc.su.se/), Predotar (http://urgi.versailles.inra.fr/predotar/predotar.html), SignalP (www.cbs.dtu.dk/services/SignalP/), and TargetP (www.cbs.dtu.dk/services/TargetP/).
[0047] In one embodiment, an ancestral cellulase molecule comprises a protein or polypeptide encoded by a nucleic acid sequence encoding an ancestral cellulase protein, such as the sequences shown in SEQ ID NOS: 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, or 281. In another embodiment, the polypeptide can be modified, such as by glycosylations and/or acetylations and/or chemical reaction or coupling, and can contain one or several non-natural or synthetic amino acids.
[0048] In one embodiment, an ancestral cellulase molecule can be encoded by a recombinant nucleic acid encoding an ancestral cellulase protein, or fragment thereof. The nucleic acid can be any type of nucleic acid, including genomic DNA, complementary DNA (cDNA), synthetic or semi-synthetic DNA, as well as any form of corresponding RNA. For example, a nucleic acid encoding an ancestral cellulase protein can comprise a recombinant nucleic acid encoding such a protein. The nucleic acid can be a non-naturally occurring nucleic acid created artificially (such as by assembling, cutting, ligating or amplifying sequences). Restriction enzymes can be used to cut nucleic acid sequences in a sequence specific manner, as is known in the art. Restriction enzyme recognition sequences can be added to the ends of a nucleic acid sequence encoding an ancestral cellulase protein. The nucleic acid sequence of a restriction enzyme site can encode amino acids. Amino acids encoded by a restriction enzyme site can form part of the sequence of an ancestral cellulase protein, or may encode additional amino acids at the ends of a polypeptide sequence of an ancestral cellulase protein. Nucleic acid sequences can be double-stranded or single-stranded.
[0049] The invention further provides for nucleic acids that are complementary to an ancestral cellulase molecule. Complementary nucleic acids can hybridize to the nucleic acid sequence described above under stringent hybridization conditions. Non- limiting examples of stringent hybridization conditions include temperatures above 30°C, above 35°C, in excess of 42°C, and/or salinity of less than about 500 mM, or less than 200 mM. Hybridization conditions can be adjusted by the skilled artisan via modifying the temperature, salinity and/or the concentration of other reagents such as SDS or SSC.
[0050] In one embodiment, an ancestral cellulase molecule, according to the methods described herein, can be added to a source material of cellulose for cellulose processing. In one embodiment, an ancestral cellulase molecule can be added as an isolated recombinant protein. In another embodiment, molecule can be added as an isolated modified recombinant protein. For example, an ancestral cellulase protein, or fragment thereof, can be modified by removal of the signal peptide. In another embodiment, an isolated polypeptide comprising about 90% identity to the amino acid sequence of SEQ ID NOS: 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, or a combination thereof, can be added to a source material of cellulose for cellulose processing.
[0051] In one embodiment, an ancestral cellulase molecule, according to the methods described herein, can be added to a source material of cellulose for cellulose processing by addition of a recombinant microorganism that expresses a nucleic acid encoding an ancestral cellulase protein, or fragment thereof. In another embodiment, an ancestral cellulase molecule, according to the methods described herein, can be added to a source material of cellulose for cellulose processing by addition of a recombinant microorganism that expresses a nucleic acid encoding an amino acid sequence of SEQ ID NOS: 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281 or a combination thereof.
[0052] In one embodiment, an ancestral cellulase molecule can be added to a source material of cellulose for cellulose processing by addition of a recombinant protein, or by addition of a recombinant microorganism that expresses a nucleic acid encoding an ancestral cellulase protein, or a combination thereof. For example, the recombinant protein and the recombinant microorganism can be added sequentially, in any order, or simultaneously.
[0053] The invention utilizes conventional molecular biology, microbiology, and recombinant DNA techniques available to one of ordinary skill in the art. Such techniques are well known to the skilled worker and are explained fully in the literature. See, e.g., Maniatis, Fritsch & Sambrook, "DNA Cloning: A Practical Approach," Volumes I and II (D. N. Glover, ed., 1985);
"Oligonucleotide Synthesis" (M. J. Gait, ed., 1984); "Nucleic Acid Hybridization" (B. D. Hames & S. J. Higgins, eds., 1985); "Transcription and Translation" (B. D. Hames & S. J. Higgins, eds., 1984); "Immobilized Cells and Enzymes" (IRL Press, 1986): B. Perbal, "A Practical Guide to Molecular Cloning" (1984), and Sambrook, et al., "Molecular Cloning: a Laboratory Manual" (2001).
[0054] One skilled in the art can obtain an ancestral cellulase (e.g., a molecule comprising the amino acid sequence shown in SEQ ID NOS: 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, or 281) in several ways, which include, but are not limited to, isolating the protein via biochemical means or expressing a nucleotide sequence encoding the protein of interest by genetic engineering methods.
[0055] The invention provides for ancestral cellulase molecules that are encoded by nucleotide sequences. The ancestral cellulase molecule can be a polypeptide encoded by a nucleic acid (including genomic DNA, complementary DNA (cDNA), synthetic DNA, as well as any form of corresponding RNA). For example, an ancestral cellulase molecule can be encoded by a recombinant nucleic acid encoding an ancestral cellulase protein, or fragment thereof. The ancestral cellulase molecules of the invention can be obtained from various sources and can be produced according to various techniques known in the art. The ancestral cellulase molecule of the invention can be produced via recombinant DNA technology and such recombinant nucleic acids can be prepared by conventional techniques, including chemical synthesis, genetic engineering, enzymatic techniques, or a combination thereof. For example, a nucleic acid that encodes an ancestral cellulase molecule can be obtained by screening DNA libraries, or by amplification from a natural source. In one embodiment, a nucleic acid amplified from a natural source is modified by various mutagenesis methods known in the art to obtain the ancestral cellulase molecules of the invention. In some embodiment, an ancestral cellulase molecule can be "codon-optimized," as known in the art.
[0056] An ancestral cellulase molecule, can be a fragment of ancestral cellulase protein. For example, the ancestral cellulase protein fragment can encompass any portion of at least about 8 consecutive amino acids of SEQ ID NOS: 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, or 281. The fragment can comprise at least about 10 consecutive amino acids, at least about 20 consecutive amino acids, at least about 30 consecutive amino acids, at least about 40 consecutive amino acids, a least about 50 consecutive amino acids, at least about 60 consecutive amino acids, at least about 70 consecutive amino acids, at least about 80 consecutive amino acids, at least about 90 consecutive amino acids, at least about 100 consecutive amino acids, at least about 110 consecutive amino acids, at least about 120 consecutive amino acids, at least about 130 consecutive amino acids, at least about 140 consecutive amino acids, at least about 150 consecutive amino acids, at least about 200 consecutive amino acids, at least about 250 consecutive amino acids, at least about 300 consecutive amino acids, at least about 350 consecutive amino acids, or at least about 400 consecutive amino acids of SEQ ID NOS: 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, or 281. Fragments include all possible amino acid lengths between about 8 and about 400 amino acids, for example, lengths between about 10 and about 400 amino acids, between about 15 and about 400 amino acids, between about 20 and about 400 amino acids, between about 35 and about 400 amino acids, between about 40 and about 400 amino acids, between about 50 and about 400 amino acids, between about 70 and about 400 amino acids, between about 100 and about 400 amino acids, between about 200 and about 400 amino acids, between about 300 and about 400 amino acids, or between about 350 and about 400 amino acids.
[0057] In one embodiment, a fragment of a nucleic acid sequence that comprises an ancestral cellulase molecule can encompass any portion of at least about 8 consecutive nucleotides. In one embodiment, the fragment can comprise at least about 10 nucleotides, at least about 15 nucleotides, at least about 20 nucleotides, or at least about 30 nucleotides.
Recombinant proteins
[0058] The ancestral cellulase molecules can be recombinant enzymes, and can be produced in a variety of ways known in the art. One skilled in the art understands that polypeptides (e.g., a molecule comprising the amino acid sequence shown in SEQ ID NOS: 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, and the like) can be obtained in several ways, which include but are not limited to, expressing a nucleotide sequence encoding the protein of interest, or fragment thereof, by genetic engineering methods.
[0059] In one embodiment, the nucleic acid is expressed in an expression cassette, for example, to achieve overexpression in a cell. The nucleic acids of the invention can be an R A, cDNA, cDNA-like, or a DNA of interest in an expressible format, such as an expression cassette, which can be expressed from a natural promoter or an entirely heterologous promoter. The nucleic acid of interest can encode a protein, and may or may not include introns. Any recombinant expression system can be used, including, but not limited to, the recombinant microorganisms of the invention, as well as other bacterial, fungal, mammalian, insect, or plant cell expression systems.
[0060] Nucleic acid sequences comprising an ancestral cellulase molecule that encode a polypeptide can be synthesized, in whole or in part, using chemical methods known in the art. Alternatively, an ancestral cellulase molecule can be produced using chemical methods to synthesize its amino acid sequence, such as by direct peptide synthesis using solid-phase techniques. Protein synthesis can either be performed using manual techniques or by automation. Automated synthesis can be achieved, for example, using Applied Biosystems 431 A Peptide Synthesizer (Perkin Elmer). Optionally, fragments of an ancestral cellulase molecule can be separately synthesized and combined using chemical methods to produce a full-length molecule.
[0061] Host cells transformed with a nucleic acid sequence encoding an ancestral cellulase molecule (such as, e.g., a molecule comprising the amino acid sequence shown in SEQ ID NOS: 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, or 281) can be cultured under conditions suitable for the expression and recovery of the protein from cell culture. The
polypeptide produced by a transformed cell can be secreted or contained intracellularly depending on the sequence and/or the vector used. Methods for protein production by recombinant technology in different host systems are well known in the art (Sambrook, et al., "Molecular Cloning: a Laboratory Manual" (2001); Gellissen, G., "Novel Microbial and Eukaryotic Expression Systems" (2005)). Expression vectors containing a nucleic acid sequence encoding an ancestral cellulase molecule can be designed to contain signal sequences which direct secretion of soluble polypeptide molecules encoded by an ancestral cellulase molecule, through a prokaryotic or eukaryotic cell membrane. An ancestral cellulase molecule can be produced as an extracellular enzyme that is secreted into the culture medium, from which it can easily be recovered and isolated. The spent culture medium of the production host can be used as such, or the host cells can be removed therefrom, and/or it can be concentrated, filtrated or fractionated. It can also be dried. In some embodiments an ancestral cellulase molecule, or fragment thereof, can be modified by removal of the signal peptide which can allow the polypeptide molecules to be contained intracellularly.
[0062] An isolated polypeptide of the present invention includes, but is not limited to, culture medium containing the polypeptide from which cells and cell debris have been removed.
Conveniently the polypeptides can be isolated e.g. by adding anionic and/or cationic polymers to the spent culture medium to enhance precipitation of cells, cell debris and other unwanted enzymes. The medium can be filtrated using an inorganic filtering agent and a filter to remove the
precipitants formed. The filtrate can be further processed using a semi-permeable membrane to remove excess of salts, sugars and metabolic products.
[0063] A synthetic peptide can be substantially purified via high performance liquid
chromatography (HPLC). The composition of a synthetic ancestral cellulase molecule can be confirmed by amino acid analysis or sequencing. Additionally, any portion of an amino acid sequence comprising a protein encoded by an ancestral cellulase molecule can be altered during direct synthesis and/or combined using chemical methods with sequences from other proteins to produce a variant polypeptide or a fusion protein.
[0064] The invention further encompasses methods for using a protein or polypeptide encoded by a nucleic acid sequence of an ancestral cellulase molecule. In another embodiment, the polypeptide can be modified, such as by glycosylations and/or acetylations and/or chemical reaction or coupling, and can contain one or several non-natural or synthetic amino acids. An example of an ancestral cellulase molecule comprises the amino acid sequence shown in SEQ ID NOS: 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, or 281. In certain
embodiments, the invention encompasses variants of a protein encoded by an ancestral cellulase molecule.
Microorganisms of the invention
[0065] Some aspects of the present invention provide for recombinant microorganisms that express a nucleic acid encoding an ancestral cellulase enzyme (e.g., the amino acid sequence shown in SEQ ID NOS: 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, or 281). Such microorganisms can include both prokaryotic and eukaryotic microorganisms, such as bacteria and yeast. In one embodiment, the microorganism is a fungus. In another embodiment, the microorganism is from the phylum Basidomycota, from the phylum Ascomycota, from the subkingdom dikarya, or from the class Sordariomycetes. In a further embodiment, the
microorganism is a yeast. In yet another embodiment, the microorganism is a bacteria. In another embodiment, the microorganism is E.Coli sp., Clostridium sp., Cellulomonas sp., Bacillus sp., Thermomonospora sp., Ruminococcus sp., Bacteriodes sp., Erwinia sp., Acetovibrio sp.,
Microbispora sp., Streptomyces sp. In a further embodiment, the microorganism is Saccharomyces sp., Pichia sp., Sclerotium rolfsii, Phanenerochate chrysosporium, Trichoderma sp., Aspergillus sp., Schizophyllum sp., and Penicillium sp. [0066] Any microorganism may be utilized according to the present invention. In certain aspects, a microorganism is a eukaryotic or prokaryotic microorganism. In certain aspects, a microorganism is a yeast, such as Saccharomyces cerevisiae. In certain aspects, a microorganism is a bacteria, such as a gram-positive bacteria or a gram-negative bacteria.
[0067] Other microorganisms may be used according to the present invention. For example, other organisms from the genera Achaetomium, Acremonium, Aspergillus, Botrytis, Chaetomium, Chrysosporium, Collybia, Fames, Fusarium, Humicola, Hypocrea, Lentinus, Metanacarpus, Myceliophthora, Myriococcum, Neurospora, Penicillium, Phanerochaete, Phlebia, Pleurotus, Podospora, Polyporus, Pycnoporus, Rhizoctonia, Scytalidium, Thermoascus, Thielavia, Trametes and Trichoderma. Additional organisms include, but are not limited to Acetobacter aceti,
Achromobacter, Acidiphilium, Acinetobacter, Actinomadura, Actinoplanes, Aeropyrum pernix, Agrobacterium, Alcaligenes, Ananas comosus (M), Arthrobacter, Aspargillus niger, Aspargillus oryze, Aspergillus melleus, Aspergillus pulverulentus, Aspergillus saitoi, Aspergillus sojea, Aspergillus usamii, Bacillus alcalophilus, Bacillus amyloliquefaciens, Bacillus brevis, Bacillus circulans, Bacillus clausii, Bacillus lentus, Bacillus licheniformis, Bacillus macerans, Bacillus stearothermophilus, Bacillus subtilis, Bifidobacterium, Brevibacillus brevis, Burkholderia cepacia, Candida cylindracea, Candida rugosa, Carica papaya (L), Cellulosimicrobium, Cephalosporium, Chaetomium erraticum, Chaetomium gracile, Clostridium, Clostridium butyricum, Clostridium acetobutylicum, Clostridium thermocellum, Corynebacterium (glutamicum), Corynebacterium efficiens, Escherichia coli, Enterococcus, Erwina chrysanthemi, Gliconobacter, Gluconacetobacter, Haloarcula, Humicola insolens, Humicola nsolens, Kitasatospora setae, Klebsiella, Klebsiella oxytoca, Kluyveromyces, Kluyveromyces fragilis, Kluyveromyces lactis, Kocuria, Lactlactis, Lactobacillus, Lactobacillus fermentum, Lactobacillus sake, Lactococcus, Lactococcus lactis, Leuconostoc, Methylocystis, Methanolobus siciliae, Methanogenium organophilum,
Methanobacterium bryantii, Microbacterium imperiale, Micrococcus lysodeikticus, Microlunatus, Mucor javanicus, Mycobacterium, Myrothecium, Nitrobacter, Nitrosomonas, Nocardia, Papaya carica, Pediococcus, Pediococcus halophilus, Penicillium, Penicillium camemberti, Penicillium citrinum, Penicillium emersonii, Penicillium roqueforti, Penicillum lilactinum, Penicillum multicolor, Paracoccus pantotrophus, Propionibacterium, Pseudomonas, Pseudomonas fluorescens, Pseudomonas denitrificans, Pyrococcus, Pyrococcus furiosus, Pyrococcus horikoshii, Rhizobium, Rhizomucor miehei, Rhizomucor pusillus Lindt, Rhizopus, Rhizopus delemar, Rhizopus japonicus, Rhizopus niveus, Rhizopus oryzae, Rhizopus oligosporus, Rhodococcus, Sccharomyces cerevisiae, Sclerotina libertina, Sphingobacterium multivorum, Sphingobium, Sphingomonas, Streptococcus, Streptococcus thermophilus Y-l, Streptomyces, Streptomyces griseus, Streptomyces lividans, Streptomyces murinus, Streptomyces rubiginosus, Streptomyces violaceoruber, Streptoverticillium mobaraense, Tetragenococcus, Thermus, Thiosphaera pantotropha, Trametes, Trichoderma, Trichoderma longibrachiatum, Trichoderma reesei, Trichoderma viride, Trichosporon penicillatum, Vibrio alginolyticus, Xanthomonas, yeast, Zygosaccharomyces rouxii, Zymomonas, and
Zymomonus mobilis. The organisms can be utilized as recombinant microorganisms provided herein, and, can be utilized according to the various methods of the present invention.
[0068] In certain embodiments, a recombinant microorganism may be engineered to secrete an ancestral cellulase molecule into the culture media, such as by incorporating a signal peptide or an autotransporter domain into the ancestral cellulase molecule. In some embodiments, ancestral cellulase molecules can be fused with any combination of signal peptides and or autotransporter domains found in secreted proteins as is known in the art. In some embodiments, ancestral cellulase molecules can be designed to maximize the secretion of ancestal cellulase molecules into the culture media, and may also include the use of many different linker sequences that fuse signal peptides, ancestal cellulase molecules, and autotransporters that improve the efficiency of secretion or the cell surface presentation. In some embodiments, an ancestral cellulase molecule can be modified by deletion of the sequence encoding the signal peptide. In some embodiments, an ancestral cellulase molecule is purified from the culture media. In other embodiments, an ancestral cellulase molecule is not purified from the culture media.
Expression Systems
[0069] In addition to recombinant microorganisms of the invention, as detailed above, any other recombinant expression system can be used to obtain an isolated ancestral cellulase molecule.
[0070] Bacterial Expression Systems. One skilled in the art understands that expression of desired protein products in prokaryotes is most often carried out in E. coli with vectors that contain constitutive or inducible promoters. Some non-limiting examples of bacterial cells for
transformation include E.Coli sp., Clostridium sp., Cellulomonas sp., Bacillus sp.,
Thermomonospora sp., Ruminococcus sp., Bacteriodes sp., Erwinia sp., Acetovibrio sp.,
Microbispora sp., Streptomyces sp., and the bacterial cell line E. coli strains DH5 or MC1061/p3 (Invitrogen Corp., San Diego, Calif), which can be transformed using standard procedures practiced in the art, and colonies can then be screened for the appropriate plasmid expression. In bacterial systems, a number of expression vectors can be selected. Non-limiting examples of such vectors include multifunctional E. coli cloning and expression vectors such as BLUESCRIPT (Stratagene). Some E. coli expression vectors (also known in the art as fusion-vectors) are designed to add a number of amino acid residues, usually to the N-terminus of the expressed recombinant protein. Such fusion vectors can serve three functions: 1) to increase the solubility of the desired recombinant protein; 2) to increase expression of the recombinant protein of interest; and 3) to aid in recombinant protein purification by acting as a ligand in affinity purification. In some instances, vectors, which direct the expression of high levels of fusion protein products that are readily purified, may also be used. Some non-limiting examples of fusion expression vectors include pGEX, which fuse glutathione S-tranferase (GST) to desired protein; pcDNA 3.1/V5-His A B & C (Invitrogen Corp, Carlsbad, CA) which fuse 6x-His to the recombinant proteins of interest; pMAL (New England Biolabs, MA) which fuse maltose E binding protein to the target recombinant protein; the E. coli expression vector pUR278 (Ruther et al, (1983) EMBO 12: 1791), wherein the coding sequence may be ligated individually into the vector in frame with the lac Z coding region in order to generate a fusion protein; and pIN vectors (Inouye et al, (1985) Nucleic Acids Res. 13:3101-3109; Van Heeke et al, (1989) J. Biol. Chem. 24:5503-5509. Fusion proteins generated by the likes of the above-mentioned vectors are generally soluble and can be purified easily from lysed cells via adsorption and binding of the fusion protein to an affinity matrix. For example, fusion proteins can be purified from lysed cells via adsorption and binding to a matrix of glutathione agarose beads subsequently followed by elution in the presence of free glutathione. For example, the pGEX vectors are designed to include thrombin or factor Xa protease cleavage sites so that the cloned target can be released from the GST moiety. In other embodiments, an ancestral cellulase molecule is not purified from the culture media.
[0071] Plant and Insect Expression Systems. Other suitable cell lines, in addition to
microorganisms such as bacteria (e.g., E. coli and B. subtilis) transformed with recombinant bacteriophage DNA, plasmid DNA or cosmid DNA expression vectors containing coding sequences for an ancestral cellulase molecule may alternatively be used to produce the molecule of interest. A non-limiting example includes plant cell systems infected with recombinant virus expression vectors (for example, tobacco mosaic virus, TMV; cauliflower mosaic virus, CaMV) or transformed with recombinant plasmid expression vectors (e.g., Ti plasmid) containing coding sequences for an ancestral cellulase molecule. If plant expression vectors are used, the expression of sequences encoding an ancestral cellulase molecule can be driven by any of a number of promoters. For example, viral promoters such as the 35S and 19S promoters of CaMV can be used alone or in combination with the omega leader sequence from tobacco mosaic virus TMV.
Alternatively, plant promoters such as the small subunit of RUBISCO or heat shock promoters, can be used. These constructs can be introduced into plant cells by direct DNA transformation or by pathogen-mediated transfection. [0072] In another embodiment, an insect system also can be used to express an ancestral cellulase molecule. For example, in one such system Autographa californica nuclear polyhedrosis virus (AcNPV) is used as a vector to express foreign genes in Spodoptera frugiperda cells or in Trichoplusia larvae. Sequences encoding a trefoil family molecule can be cloned into a nonessential region of the virus, such as the polyhedrin gene, and placed under control of the polyhedrin promoter. Successful insertion of the nucleic acid sequences of an ancestral cellulase molecule will render the polyhedrin gene inactive and produce recombinant virus lacking coat protein. The recombinant viruses can then be used to infect S. frugiperda cells or Trichoplusia larvae in which an ancestral cellulase molecule can be expressed.
[0073] Fungal Expression Systems. In another embodiment, a fungal system also can be used to express an ancestral cellulase molecule. Fungi can be transformed with recombinant fungal expression vectors containing coding sequences for an ancestral cellulase molecule. Some non- limiting examples of fungi for transformation include, Saccharomyces sp., Pichia sp., Sclerotium rolfsii, Phanenerochate chrysosporium, Trichoderma sp., Aspergillus sp., Schizophyllum sp., and Penicillium sp.. In some embodiment, fungi from the subkingdom dikarya, from the phylum Basidomycota, from the phylum Ascomycota, or from the class Sordariomycetes can be
transformed with recombinant fungal expression vectors containing coding sequences for an ancestral cellulase molecule.
[0074] Mammalian Expression Systems. Mammalian cells can also contain an expression vector (for example, one that harbors a nucleotide sequence encoding an ancestral cellulase molecule for expression of a desired product. Expression vectors containing such a nucleic acid sequence linked to at least one regulatory sequence in a manner that allows expression of the nucleotide sequence in a host cell can be introduced via methods known in the art. The vector can be a recombinant DNA or RNA vector, and includes DNA plasmids or viral vectors. A number of viral-based expression systems can be used to express an ancestral cellulase molecule in
mammalian host cells (e.g., adeno-associated virus, retrovirus, adenovirus, lentivirus or alphavirus).
[0075] Expression of recombinant proteins. Regulatory sequences are well known in the art, and can be selected to direct the expression of a protein or polypeptide of interest (such as an ancestral cellulase molecule) in an appropriate host cell as described in Goeddel, Gene Expression Technology: Methods in Enzymology 185, Academic Press, San Diego, Calif. (1990). Non- limiting examples of regulatory sequences include: polyadenylation signals, promoters, enhancers, and other expression control elements. Practitioners in the art understand that designing an expression vector can depend on factors, such as the choice of host cell to be transfected and/or the type and/or amount of desired protein to be expressed.
[0076] Enhancer regions, which are those sequences found upstream or downstream of the promoter region in non-coding DNA regions, are also known in the art to be important in optimizing expression. If needed, origins of replication from viral sources can be employed, such as if a prokaryotic host is utilized for introduction of plasmid DNA. However, in eukaryotic organisms, chromosome integration is a common mechanism for DNA replication.
[0077] A gene that encodes a selectable marker (for example, resistance to antibiotics or drugs, such as ampicillin, neomycin, G418, and hygromycin) can be introduced into host cells along with the gene of interest in order to identify and select clones that stably express a gene encoding a protein of interest. The gene encoding a selectable marker can be introduced into a host cell on the same plasmid as the gene of interest or can be introduced on a separate plasmid. Cells containing the gene of interest can be identified by drug selection wherein cells that have incorporated the selectable marker gene will survive in the presence of the drug. Cells that have not incorporated the gene for the selectable marker die. Surviving cells can then be screened for the production of the desired protein molecule (for example, an ancestral cellulase molecule).
[0078] A host cell strain can be chosen for its ability to modulate the expression of the inserted sequences or to process the expressed ancestral cellulase molecule in the desired fashion. Such modifications of the polypeptide include, but are not limited to, acetylation, carboxylation, glycosylation, phosphorylation, lipidation, and acylation. Post-translational processing which cleaves a "prepro" form of the polypeptide also can be used to facilitate correct insertion, folding and/or function. Different host cells which have specific cellular machinery and characteristic mechanisms for post-translational activities can be chosen to ensure the correct modification and processing of the foreign protein.
[0079] An exogenous nucleic acid can be introduced into a cell via a variety of techniques known in the art, such as lipofection, microinjection, calcium phosphate or calcium chloride precipitation, DEAE-dextrin-mediated transfection, or electroporation. Electroporation is carried out at approximate voltage and capacitance to result in entry of the DNA construct(s) into cells of interest. Other methods used to transfect cells can also include modified calcium phosphate precipitation, polybrene precipitation, liposome fusion, and receptor-mediated gene delivery.
[0080] A host cell strain, which modulates the expression of the inserted sequences, or modifies and processes the nucleic acid in a specific fashion desired also may be chosen. Such modifications (for example, glycosylation and other post-translational modifications) and processing (for example, cleavage) of protein products may be important for the function of the protein. Different host cell strains have characteristic and specific mechanisms for the post-translational processing and modification of proteins and gene products. As such, appropriate host systems or cell lines can be chosen to ensure the correct modification and processing of the foreign protein expressed, such as an ancestral cellulase molecule. Thus, eukaryotic host cells possessing the cellular machinery for proper processing of the primary transcript, glycosylation, and phosphorylation of the gene product may be used. Non- limiting examples of host cells include E.Coli sp., Clostridium sp., Cellulomonas sp., Bacillus sp., Thermomonospora sp., Ruminococcus sp., Bacteriodes sp., Erwinia sp., Acetovibrio sp., Microbispora sp., Streptomyces sp., Saccharomyces sp., Pichia sp., Sclerotium rolfsii, Phanenerochate chrysosporium, Trichoderma sp., Aspergillus sp., Schizophyllum sp., and Penicillium sp..
[0081] Various culturing parameters can be used with respect to the host cell being cultured. Appropriate culture conditions for host cells are well known in the art or can be determined by the skilled artisan (see, for example, Madigan M. et al., "Brock Biology of Microorganisms", 2012). Cell culturing conditions can vary according to the type of host cell selected. Commercially available medium can be utilized.
[0082] Cells suitable for culturing can contain introduced expression vectors, such as plasmids or viruses. The expression vector constructs can be introduced via transformation, microinjection, transfection, lipofection, electroporation, or infection. The expression vectors can contain coding sequences, or portions thereof, encoding the proteins for expression and production. Expression vectors containing sequences encoding the produced proteins and polypeptides, as well as the appropriate transcriptional and translational control elements, can be generated using methods well known to and practiced by those skilled in the art. These methods include synthetic techniques, in vitro recombinant DNA techniques, and in vivo genetic recombination which are described in J. Sambrook et al., 201, Molecular Cloning, A Laboratory Manual, Cold Spring Harbor Press, Cold Spring Harbor, N.Y. and in F. M. Ausubel et al, 1989, Current Protocols in Molecular Biology, John Wiley & Sons, New York, N.Y.
Purification of recombinant proteins
[0083] An ancestral cellulase molecule (such as, e.g., a molecule comprising the amino acid sequence shown in SEQ ID NOS: 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, or 281) can be purified from any cell which expresses the polypeptide, including those which have been transfected with expression constructs that express an ancestral cellulase molecule. A purified ancestral cellulase molecule can be separated from other compounds which normally associate with the ancestral cellulase molecules, in the cell, such as certain proteins, carbohydrates, or lipids, using methods practiced in the art. For protein recovery, isolation and/or purification, the cell culture medium or cell lysate is centrifuged to remove particulate cells and cell debris. The desired polypeptide molecule (for example, an ancestral cellulase molecule) is isolated or purified away from contaminating soluble proteins and polypeptides by suitable purification techniques. Non- limiting purification methods for proteins include: size exclusion chromatography; affinity chromatography; ion exchange chromatography; ethanol precipitation; reverse phase HPLC;
chromatography on a resin, such as silica, or cation exchange resin, e.g., DEAE; chromatofocusing; SDS-PAGE; ammonium sulfate precipitation; gel filtration using, e.g., Sephadex G-75, Sepharose; and the like. Other additives, such as protease inhibitors (e.g., PMSF or proteinase K) can be used to inhibit proteolytic degradation during purification. Purification procedures that can select for carbohydrates can also be used, e.g., ion-exchange soft gel chromatography, or HPLC using cation- or anion-exchange resins, in which the more acidic fraction(s) is/are collected.
Cellulases and cellulosic biofuels
[0084] Some aspects of the present invention provide for ancestral fungal cellulases. Cellulase enzymes are useful for the production of cellulosic ethanol for biofuels. In some embodiments, ancestral cellulases can be used for the hydrolysis of carbohydrate polymers that comprise cellulose. Some aspects of the present invention provide for microorganisms that express an ancestral cellulase. Microorganisms are useful for the production of cellulosic ethanol for biofuels. In some embodiments, microorganisms can be used for the hydrolysis and/or fermentation of cellulose. [0085] The production of cellulosic ethanol biofuels from cellulosic materials can be performed by various techniques known in the art. See, for example, Canilha L. et al., (2012) J. Biomed. Biotech., 2012:989572; U.S. Patent No. 8,318,473; U.S. Patent No. 8,409,836; U.S. Patent Application Publication No. 20110171710A1 , which are incorporated by reference in their entireties. The starting material for the production of cellulosic biofuels can be cellulosic materials (i.e. any material comprising lignocellulose, cellulose, hemicellulose, or a combination thereof). A source material of cellulose can be any cellulosic material. Examples of cellulosic materials, include, but are not limited to, fruits, plants, vegetables, woods, grasses, inedible parts of plants, byproducts of lawn and tree maintenance, corn stover, Panicum virgatum, Miscanthus grass species, wood chips, sugarcane residues, sugarcane bagasse, straw, pulp and paper residues, waste paper, textile fibers (e.g., cotton, linen, hemp, jute) and cellulosic fibers (e.g., modal, viscose, lyocel).
[0086] During the production of cellulosic ethanol from cellulosic material, cellulosic materials can be processed by various techniques known in the art. Most of the carbohydrates in cellulosic material are in the form of lignocellulose, which can comprise cellulose, hemicellulose, pectin and/or lignin. Cellulosic material can be pre-treated by physical and/or chemical means. Pre- treatment can make the cellulose fraction more accessible to hydrolysis. Cellulose and/or hemicellulose comprising the cellulosic materials can then be hydro lysed into sugars (e.g., glucose). In some embodiments, ancestral cellulases can be used for the hydrolysis of cellulose. In other embodiments, the carbohydrate polymers of cellulose are depolymerized by an ancestral cellulase. Sugars made available by hydrolysis can be used by microorganisms to produce ethanol by fermentation.
[0087] In some aspects, the present invention provides a method for the production of cellulosic ethanol from a source material of cellulose. Cellulase enzymes can be added to cellulosic materials by various techniques. In one embodiment, ancestral cellulases are added to cellulosic materials as an isolated polypeptide. In another embodiment, recombinant microorganisms that express ancestral cellulases are added to cellulosic materials. For example, microorganisms that do not express cellulase enzymes can be genetically modified to express ancestral cellulases.
Microorganisms can be modified in a variety of ways, such as, but not limited to, to express cellulases, to express large volumes of cellulases, to express modified cellulases, and to express ancestral cellulases. In a further embodiment, ancestral cellulases can be added to cellulosic materials by addition of isolated polypeptides and by addition of recombinant microorganisms that express ancestral cellulases. Isolated polypeptides and recombinant microorganisms can be added simultaneously or sequentially, in any order. [0088] Ancestral cellulases needed for the hydrolysis of the cellulosic material according to the invention may be added in an enzymatically effective amount either simultaneously e.g. in the form of an enzyme mixture, or sequentially, or in combination with any microorganism of the present invention, or in combination with microorganisms that mediate fermentation. Any combination of the ancestral cellulase molecules comprising an amino acid sequence having about 90% identity to SEQ ID NOS: 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, or a fragment thereof, may be used together with any combination of other enzymes, for example hemicellulases, proteases,amylases, laccases, lipases, pectinases, esterases and/or peroxidases. Another enzyme treatment may be carried out before, during or after the cellulase treatment.
[0089] Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Exemplary methods and materials are described below, although methods and materials similar or equivalent to those described herein can also be used in the practice or testing of the present invention.
[0090] All publications and other references mentioned herein are incorporated by reference in their entirety, as if each individual publication or reference were specifically and individually indicated to be incorporated by reference. Publications and references cited herein are not admitted to be prior art. EXAMPLES
EXAMPLE 1 - Reconstuction Of Ancestral Cellulase Sequences using BEAST or MrBayes and Marginal or Joint Reconstruction
[0091] There is an incresing effort to produce highly stable and highly active cellulases to overcome the limitations imposed by the industrial process of ethanol production. These limitations are, for instance, high temperature, low pH or substrate pretreatment. In previous work, the inventors demonstrated that ancestral enzymes from the thioredoxin superfamily were more thermally stable, pH resitant and more active than their modern relatives.
[0092] Following upon this, ancestral sequence reconstruction methods was applied to resurrect cellobiohydrolases going back in time up to -1200 Myr. A set of 57 modern enzymes was used from different fungi classes, including from the fungal classes basidiomycota and ascomycota. Most fungal cellulases used in industry belong to these classes. The sequences were aligned using MUSCLE software. Further corrections by hand were necessary to obtain a suitable aligment. The alignment was then used to construct a phylogeny using different software platforms which have been used to build trees using bayesian analysis. MrBayes and BEAST software were used. These two programs use Bayesian MCMC analysis but differ in the procedure to estimate posterior probability. Bayesian analysis is a method used to analyze data that uses previous information in the generation of a functional result. This method of analysis was used in tandem with phylogenic studies in this technology. Inconsistencies were corrected using the literature and also running additional tests using MEGA and PAUP software. The final tree was used to reconstruct the most probabilistic ancestral sequence for each node of the tree. PAML software was used. Resurrected sequences for each tree were generated by two different statistical methods, marginal reconstruction and joint reconstruction.
[0093] The cellulases that will be resurrected correspond to:
[0094] Last common ancestor of dikarya (1208 Myr): DiCA
[0095] Last common ancestor of Basidiomycota (966 Myr): BasCA
[0096] Last common ancestor of Ascomycota (1144 Myr): AsCA
[0097] Last common ancestor of Sordariomycetes (433 Myr):SorCA
[0098] Divergence times were obtained from the literature at http://www.timetree.org. [0099] The sequences used in the trees and the sequence alignment are listed as SED ID NOS: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, and 57.
[0100] The phylogenetic trees are shown in FIGS. 1 and 2, and the resurrected sequences for each tree are listed as SEQ ID NOS: 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, and 113 (BEAST tree; marginal reconstruction), and SEQ ID NOS: 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, and 169 (BEAST tree; joint reconstruction), and SEQ ID NOS: 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, and 225, (MrBayes tree; marginal
reconstruction), and SEQ ID NOS: 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, and 281, (MrBayes tree; joint reconstruction). These sequences present a reconstruction of fungal cellulase enzymes to be used in the production of bioethanol as a green-fuel source.
[0101] Data set for the tree shown in FIG. 1 A-C (BEAST tree)
[0102] (tr_A9FHT2_Bacteria_Sorangium_cellulosum: 0.33393,
( (tr_Q9UW10_Neocallimasti_Piromyces_rhizinflatus : 0.87406,
(tr_Q96V98_Neocallimasti_Orpinomyces : 0.10445,
tr_H2BPT8_Neocallimasti_Neocallimastix_patriciarum: 0.14608) : 0.47771) : 0.14407, ( (tr_Q9Y894_Basidiomycota_Volvariella_volvacea: 0.71802,
( ( tr_Q96TP4_Basidiomycota_Pleurotus_saj or-caj u : 0.22225,
(tr G4TC42 Basidiomycota Piriformospora indica: 0.42764,
tr_A8NEJ3_Basidiomycota_Coprinopsis_cinerea: 0.26151) : 0.08929) :
0.04523, ( (tr_Q6E5Bl_Basidiomycota_Volvariella_volvacea: 0.14173, tr E2JAJ2 Basidiomycota Neolentinus lepideus : 0.16178) : 0.03271,
(tr Q96VU2 Basidiomycota Lentinula edodes: 0.16807,
( (tr F8Q7V9 Basidiomycota Serpula lacrymans: 0.15303,
tr C4B8I1 Basidiomycota Coniophora puteana: 0.30317) : 0.07425,
(tr A8CED8 Basidiomycota Polyporus arcularius: 0.12840,
(tr B2ZZ24 Basidiomycota Irpex lacteus : 0.11933,
tr_Q02321_Basidiomycota_Phanerochaete_chrysos : 0.12793) : 0.06898) :
0.04263) : 0.04002) : 0.04648) : 0.01257) : 0.07891) : 0.08653,
(tr G2XV25 Ascomycota Botryotinia fuckeliana: 0.18270,
( ( (gi_345565889_Ascornycota_Arthrobotrys_oligospora : 0.39847, (tr Q2U2I8 Ascomycota Aspergillus oryzae: 0.35740,
(sp Q5B2E8 Ascomycota Emericella nidulans: 0.10307,
(tr_G7XQ80_Ascomycota_Aspergillus_ka achii : 0.04818,
sp_A2QYR9_Ascomycota_Aspergillus_niger : 0.03963) : 0.06955) : 0.03973) : 0.02516) : 0.02129, ( ( ( sp_AlDJQ7_Ascomycota_Neosartorya_fischeri :
0.02138, sp_Q4WFK4_Ascomycota_Neosartorya_fumigata: 0.01554) : 0.07438, (tr F1CHI2 Ascomycota Penicillium decumbens: 0.14251,
( sp_QOCFPl_Ascomycota_Aspergillus_terreus : 0.10956,
sp_AlCCN4_Ascomycota_Aspergillus_clavatus : 0.20662) : 0.02784) : 0.03266) :
0.03400, (tr_Q8NIB5_Ascomycota_Talaromyces_emersonii : 0.19503,
(tr_B8MHF4_Ascomycota_Talaromyces_stipitatus : 0.05484,
(tr_B6QMM6_Ascomycota_Penicillium_marneffei : 0.06708,
(tr_093837_Ascomycota_Acremonium_cellulolyticus : 0.00000,
tr_B5TMG4_Acomycota_Penicillium_funiculosum: 0.01344) : 0.03028) :
0.01393) : 0.13530) : 0.04128) : 0.03250) : 0.07018,
( (tr_F2VRZ0_Ascomycota_Phialophora_sp: 0.22991,
( (tr_G4MM92_Ascomycota_Magnaporthe_oryzae : 0.23159,
tr J3NZ73 Ascomycota Gaeumannomyces graminis: 0.17354) : 0.10087,
( ( tr_Q872J7_Ascomycota_Neurospora_crassa : 0.00170,
tr_F8MDR2_Ascomycota_Neurospora_tetrasperma: 0.00705) : 0.19993,
( (tr_G2QW39_Ascomycota_Thielavia_terrestris : 0.16937,
(tr_G2QA39_Ascomycota_Thielavia_heterothallica: 0.00000,
gi_367023495_Ascomycota_Myceliophthora_thermophila: 0.00000) : 0.12051) : 0.04277, ( (tr_Q4 JQF8_Ascomycota_Chaetomium_thermophilum: 0.13113, sp_Q9ClS9_Ascomycota_Humicola_insolens : 0.15422) : 0.04828,
(tr Q2GMP2 Ascomycota Chaetomium globosum: 0.09967,
tr_B2ABX7_Ascomycota_Podospora_anserina: 0.15007) : 0.04638) : 0.03466) :
0.02286) : 0.07004) : 0.06765) : 0.02353,
( ( (tr_HRI Jl_Ascomycota_Gibberella_zeae: 0.09249,
sp P46236 Ascomycota Fusarium oxysporum: 0.11952) : 0.22437,
( (gi_310790274_Ascomycota_Glomerella_graminicola: 0.00000,
tr_E3Q540_Ascomycota_Colletotrichum_graminicola: 0.00000) : 0.18306,
(tr_G2XB72_Ascomycota_Verticillium_dahliae : 0.00433,
gi_302405457_Ascomycota_Verticillium_albo-atrum: 0.00769) : 0.26743) :
0.05719) : 0.06500, (tr_G9NFV6_Ascomycota_Hypocrea_atroviridis : 0.09478,
(tr Q66PN1 Ascomycota Trichoderma parceramosum : 0.01533,
( (tr D1MGM6 Ascomycota Trichoderma longibrachiatum : 0.00931,
tr_H9C5Tl_Ascomycota_Hypocrea_orientalis : 0.00000) : 0.01860,
(tr Q7LSP2 Ascomycota Trichoderma koningii : 0.00224,
(tr Q6UJX9 Acomycota Trichoderma viride: 0.00224,
sp_P07987_Ascomycota~Hypocrea_jecorina: 0.00000) : 0.00000) : 0.00726) : 0.00486) : 0.11555) : 0.10549) : 0.04714) : 0.09634) : 0.02947) : 0.15610) : 0.32737) : 0.31661) ;
[0103] Tree with node labels:
[0104] (l_tr_A9FHT2_Bacteria_Sorangium_cellulosum,
((2 tr Q9UW10 Neocallimasti Piromyces rhizinflatus ,
(3 tr Q96V98 Neocallimasti Orpinomyces,
4 tr H2BPT8 Neocallimasti Neocallimastix patriciarum) 61 ) 60 ,
((47 tr Q9Y894 Basidiomycota Volvariella volvacea,
( ( 48_tr_Q96TP4_Basidiomycota_Pleurotus_saj or-caj u,
(46 tr G4TC42 Basidiomycota Piriformospora indica,
53 tr A8NEJ3 Basidiomycota Coprinopsis cinerea) 66 ) 65 ,
( (56 tr Q6E5B1 Basidiomycota Volvariella volvacea,
51 tr E2JAJ2 Basidiomycota Neolentinus lepideus) 68 , (55_tr_Q96VU2_Basidiomycota_Lentinula_edodes ,
( (57_tr_F8Q7V9_Basidiomycota_Serpula_lacrymans ,
50_tr_C4B8ll_Basidiomycota_Coniophora_puteana) 71 ,
(52_tr_A8CED8_Basidiomycota_Polyporus_arcularius ,
(54_tr_B2ZZ24_Basidiomycota_Irpex_lacteus,
49_tr_Q02321_Basidiomycota_Phanerochaete_chrysos) 73 ) 72 ) 70 ) 69 ) 67
) 64 ) 63 , (25_tr_G2XV25_Ascomycota_Botryotinia_fuckeliana,
( ( (43 gi 345565889_Ascomycota_Arthrobotrys_oligospora,
( 5_tr_Q2U2 l8_Ascomycota_Aspergillus_oryzae ,
(37 sp Q5B2E8 Ascomycota_Emericella_nidulans ,
(27_tr_G7XQ80_Ascomycota_Aspergillus_kawachii,
28_sp_A2QYR9_Ascomycota_Aspergillus_niger) 80 ) 79 ) 78 ) 77 ,
( ( (39_sp_AlDJQ7_Ascomycota_Neosartorya_fischeri,
36_sp_Q4WFK4_Ascomycota_Neosartorya_fumigata) 83 ,
(32 tr F1CHI2 Ascomycota_Penicillium_decumbens ,
(30_sp_Q0CFPl_Ascomycota_Aspergillus_terreus ,
33_sp_AlCCN4_Ascomycota_Aspergillus_clavatus) 85 ) 84 ) 82 ,
(24 tr Q8NIB5 Ascomycota_Talaromyces_emersonii ,
(38_tr_B8MHF4_Ascomycota_Talaromyces_stipitatus,
(26 tr B6QMM6 Ascomycota_Penicillium_marneffei ,
(34 tr 093837 Ascomycota Acremonium_cellulolyticus ,
35_tr_B5TMG4_Acomycota_Penicillium_funiculosum) 89 ) 88 ) 87 ) 86 ) 81 ) 76 , ( (31_tr_F2VRZ0_Ascomycota_Phialophora_sp,
( ( 6_tr_G4MM92_Ascomycota_Magnaporthe_oryzae,
13 tr J3NZ73 Ascomycota Gaeumannomyces_graminis) 93 ,
( ( 9_tr_Q872J7_Ascomycota_Neurospora_crassa,
10_tr_F8MDR2_Ascomycota_Neurospora_tetrasperma) 95 ,
( (12_tr_G2QW39_Ascomycota_Thielavia_terrestris,
(11 tr G2QA39 Ascomycota_Thielavia_heterothallica,
40_gi_367023495_Ascomycota_Myceliophthora_thermophila) 98 ) 97 ,
((16 tr Q4JQF8 Ascomycota_Chaetomium_thermophilum,
17_sp_Q9ClS9_Ascomycota_Humicola_insolens) 100 ,
(7_tr_Q2GMP2_Ascomycota_Chaetomium_globosum,
8_tr_B2ABX7_Ascomycota_Podospora_anserina) 101 ) 99 ) 96 ) 94 ) 92 ) 91 , ( ( (23_tr_IlRIJl_Ascomycota_Gibberella_zeae,
29_sp_P46236_Ascomycota_Fusarium_oxysporum) 104 ,
((41 gi 310790274_Ascomycota_Glomerella_graminicola,
15 tr E3Q540 Ascomycota_Colletotrichum_graminicola) 106 ,
(14_tr_G2XB72_Ascornycota_Verticillium_dahliae,
42_gi_302405457_Ascomycota_Verticillium_albo-atrum) 107 ) 105 ) 103 ,
(22_tr_G9NFV6_Ascomycota_Hypocrea_atroviridis,
(45 tr Q66PN1 Ascomycota_Trichoderma_parceramosum,
( (21 tr D1MGM6 Ascomycota_Trichoderma_longibrachiatum,
20_tr_H9C5Tl_Ascomycota_Hypocrea_orientalis) 111 ,
(19_tr_Q7LSP2_Ascomycota_Trichoderma_koningii,
( 44_tr_Q6UJX9_Acomycota_Trichoderma_viride ,
18_sp_P07987_Ascomycota_Hypocrea_jecorina) 113 ) 112 ) 110 ) 109 ) 108 ) 102 ) 90 ) 75 ) 74 ) 62 ) 59 ) 58 ;
[0105] Nodes 58 to 113 are ancestral.
[0106] Data set for the tree shown in FIG. 2A-C (MrBayes tree)
[0107] (tr_A9FHT2_Bacteria_Sorangiuna_cellulosum: 0.36111,
( (tr_Q9UW10_Neocallimasti_Piromyces_rhizinflatus : 0.84512,
(tr_Q96V98_Neocallimasti_Orpinomyces : 0.10861, tr_H2BPT8_Neocallimasti_Neocallimastix_patriciarum: 0.14219) : 0.50352) : 0.14976, ( (tr_G4TC42_Basidiomycota_Piriformospora_indica: 0.42859, (tr A8NEJ3 Basidiomycota Coprinopsis cinerea: 0.32709,
(tr Q6E5B1 Basidiomycota Volvariella volvacea: 0.14188,
(tr E2JAJ2 Basidiomycota Neolentinus lepideus: 0.17949,
( (tr_Q9Y894_Basidiomycota_Volvariella_volvacea: 0.76962,
tr_Q96TP4_Basidiomycota_Pleurotus_sajor-caju: 0.24277) : 0.02856,
(tr Q96VU2 Basidiomycota Lentinula edodes: 0.16467,
( (tr C4B8I1 Basidiomycota Coniophora puteana: 0.29795,
tr_F8Q7V9_Basidiomycota_Serpula_lacrymans : 0.15442) : 0.07542,
(tr A8CED8 Basidiomycota Polyporus arcularius: 0.13141,
(tr Q02321 Basidiomycota Phanerochaete chrys : 0.12954,
tr_B2ZZ24_Basidiomycota_Irpex_lacteus:~0.11830) : 0.06639) : 0.04137) : 0.03784) : 0.04405) : 0.02083) : 0.01904) : 0.05531) : 0.03039) : 0.11441, (tr G2XV25 Ascomycota Botryotinia fuckeliana: 0.17147,
( ( (tr_B8MHF4_Ascomycota_Talaromyces_stipitatus : 0.05790,
(tr_B6QMM6_Ascomycota_Penicillium_marneffei : 0.06930 ,
(tr_093837_Ascomycota_Acremonium_cellulolyticus : 0.00000,
tr_B5TMG4_Acomycota_Penicillium_funiculosum: 0.01344) : 0.02901) :
0.01210) : 0.12956, (tr_Q8NIB5_Ascomycota_Talaromyces_emersonii : 0.20325, ( (gi_345565889_Ascomycota_Arthrobotrys_oligospora : 0.37978,
(sp Q5B2E8 Ascomycota Emericella nidulans: 0.09092,
(tr Q2U2I8 Ascomycota Aspergillus oryzae: 0.35921,
(tr_G7XQ80_Ascomycota_Aspergillus_ka achii : 0.04208,
sp_A2QYR9_Ascomycota_Aspergillus_niger : 0.04328) : 0.05164) : 0.04767) : 0.04962) : 0.03665, ( ( sp_Q4WFK4_Ascomycota_Neosartorya_fumigata : 0.01789, sp_AlDJQ7_Ascomycota_Neosartorya_fischeri : 0.01897) : 0.08071,
(sp Q0CFP1 Ascomycota Aspergillus terreus: 0.11328,
(tr F1CHI2 Ascomycota Penicillium decumbens: 0.12322,
sp_AlCCN4_Ascomycota_Aspergillus_clavatus : 0.20074) : 0.04367) : 0.03037) :
0.03423) : 0.02484) : 0.03059) : 0.05915,
(tr_F2VRZ0_Ascomycota_Phialophora_sp: 0.21524,
( (tr G9NFV6 Ascomycota Hypocrea atroviridis: 0.09261,
(tr Q66PN1 Ascomycota Trichoderma parceramosum: 0.01591,
( (tr_H9C5Tl_Ascomycota_Hypocrea_orientalis : 0.00000,
tr_DlMGM6_Ascomycota_Trichoderma_longibrachiatum: 0.00931) : 0.01860, (tr Q6UJX9 Acomycota Trichoderma viride: 0.00224,
( sp_P07987_Ascomycota_Hypocrea_j ecorina: 0.00000,
tr_Q7LSP2_Ascomycota_Trichoderma_koningii : 0.00224) : 0.00000) : 0.00698) : 0.00452) : 0.11781) : 0.10402, ( ( (tr_HRIJl_Ascomycota_Gibberella_zeae : 0.08729, sp_P46236_Ascomycota_Fusarium_oxysporum: 0.12340) : 0.22086, ( (tr_G2XB72_Ascomycota_Verticillium_dahliae: 0.00438,
gi_302405457_Ascomycota_Verticillium_albo-atrum: 0.00765) : 0.24517, (tr E3Q540 Ascomycota Colletotrichum graminicola: 0.00000,
gi_310790274_Ascomycota_Glomerella_graminicola: 0.00000) : 0.20139) :
0.05867) : 0.06194, ( (tr_G4MM92_Ascomycota_Magnaporthe_oryzae : 0.23405, tr J3NZ73 Ascomycota Gaeumannomyces graminis: 0.17037) : 0.08573,
( (tr_Q872J7_Ascomycota_Neurospora_crassa : 0.00213,
tr_F8MDR2_Ascomycota_Neurospora_tetrasperma: 0.00661) : 0.20095,
( (tr_G2QW39_Ascomycota_Thielavia_terrestris : 0.16957,
(tr_G2QA39_Ascomycota_Thielavia_heterothallica: 0.00000,
gi_367023495_Ascomycota_Myceliophthora_thermophila: 0.00000) : 0.11989) : 0.05001, ( (tr_Q2GMP2_Ascomycota_Chaetomium_globosum: 0.09874,
tr_B2ABX7_Ascomycota_Podospora_anserina: 0.15174) : 0.04459,
(tr Q4JQF8 Ascomycota Chaetomium thermophilum: 0.13263,
sp_Q9ClS9_Ascomycota_Humicola_insolens : 0.15238) : 0.04709) : 0.03093) : 0.02190) : 0.08297) : 0.05160) : 0.06348) : 0.03415) : 0.08859) : 0.03756) : 0.20502) : 0.29004) : 0.28370);
[0108] Tree with node labels:
[0109] (l_tr_A9FHT2_Bacteria_Sorangium_cellulosum,
( (2_tr_Q9U 10_Neocallimasti_Piromyces_rhizinflatus ,
(3 tr Q96V98_Neocallimasti_Orpinomyces ,
4_tr_H2BPT8_Neocallimasti_Neocallimastix_patriciarum) 61 ) 60 ,
((46 tr G4TC42_Basidiomycota_Piriformospora_indica,
(53 tr A8NEJ3_Basidiomycota_Coprinopsis_cinerea,
(56 tr Q6E5Bl_Basidiomycota_Volvariella_volvacea,
(51 tr E2JAJ2_Basidiomycota_Neolentinus_lepideus ,
( (47_tr_Q9Y894_Basidiomycota_Volvariella_volvacea,
48_tr_Q96TP4_Basidiomycota_Pleurotus_sajor-caju) 68 ,
(55 tr Q96VU2_Basidiomycota_Lentinula_edodes ,
( (50 tr C4B8ll_Basidiomycota_Coniophora_puteana,
57 tr F8Q7V9_Basidiomycota_Serpula_lacrymans ) 71 ,
(52 tr A8CED8_Basidiomycota_Polyporus_arcularius ,
(49 tr Q0232 l_Basidiomycota_Phanerochaete_chrys ,
54_tr_B2ZZ24_Basidiomycota_Irpex_lacteus) 73 ) 72 ) 70 ) 69 ) 67 ) 66 )
65 ) 64 ) 63 , (25_tr_G2XV25_Ascomycota_Botryotinia_fuckeliana,
( ( (38_tr_B8MHF4_Ascomycota_Talaromyces_stipitatus,
(26 tr B6QMM6_Ascomycota_Penicillium_marneffei ,
(34 tr 093837_Ascomycota_Acremonium_cellulolyticus ,
35_tr_B5TMG4_Acomycota_Penicillium_funiculosum) 79 ) 78 ) 77 ,
(24 tr Q8NIB5_Ascomycota_Talaromyces_emersonii ,
( ( 43_gi_345565889_Ascornycota_Arthrobotrys_oligospora,
(37 sp Q5B2E8_Ascomycota_Emericella_nidulans ,
(5 tr Q2U2l8_Ascomycota_Aspergillus_oryzae,
(27 tr G7XQ80_Ascornycota_Aspergi1lus_kawachii ,
28_sp_A2QYR9_Ascomycota_Aspergillus_niger) 85 ) 84 ) 83 ) 82 ,
( (36 sp Q4 FK4_Ascomycota_Neosartorya_fumigata,
39 sp AlDJQ7_Ascomycota_Neosartorya_fischeri ) 87 ,
(30 sp QOCFPl_Ascomycota_Aspergillus_terreus ,
(32 tr FlCHl2_Ascomycota_Penicillium_decumbens ,
33_sp_AlCCN4_Ascomycota_Aspergillus_clavatus) 89 ) 88 ) 86 ) 81 ) 80 ) 76 , (31_tr_F2VRZ0_Ascomycota_Phialophora_sp,
((22 tr G9NFV6_Ascomycota_Hypocrea_atroviridis ,
(45 tr Q66PNl_Ascomycota_Trichoderma_parceramosum,
( (20 tr H9C5Tl_Ascomycota_Hypocrea_orientalis,
21_tr_DlMGM6_Ascomycota_Trichoderma_longibrachiatum) 95 ,
(44 tr Q6UJX9_Acomycota_Trichoderma_viride ,
(18 sp P07987_Ascomycota_Hypocrea_jecorina,
19_tr_Q7LSP2_Ascomycota_Trichoderma_koningii) 97 ) 96 ) 94 ) 93 ) 92 ,
( ( (23 tr I 1RIJl_Ascomycota_Gibberella_zeae ,
29 sp P46236_Ascomycota_Fusarium_oxysporum) 100 ,
( ( 14_tr_G2XB72_Ascomycota_Verticillium_dahliae,
42_gT_302405457_Ascomycota_Verticillium_albo-atrum) 102 ,
(15 tr E3Q540_Ascomycota_Colletotrichum_graminicola,
41_gi_310790274_Ascomycota_Glomerella_graminicola) 103 ) 101 ) 99 ,
((6 tr G4MM92_Ascomycota_Magnaporthe_oryzae,
13 tr J3NZ73_Ascomycota_Gaeumannomyces_graminis) 105 ,
((9 tr Q872J7_Ascomycota_Neurospora_crassa,
10_tr_F8MDR2_Ascomycota_Neurospora_tetrasperma) 107 , ( ( 12_tr_G2QW39_Ascomycota_Thielavia_terrestris ,
(ll_tr_G2QA39_Ascomycota_Thielavia_heterothallica,
40_gi_367023495_Ascomycota_Myceliophthora_thermophila) 110 ) 109 ,
((7 tr Q2GMP2_Ascomycota_Chaetomium_globosum,
8 tr B2ABX7 Ascomycota_Podospora_anserina) 112 ,
(16 tr Q4JQF8_Ascomycota_Chaetomium_thermophilum,
17_sp_Q9ClS9_Ascomycota_Humicola_insolens) 113 ) 111 ) 108 ) 106 ) 104 ) 98 ) 91 ) 90 ) 75 ) 74 ) 62 ) 59 ) 58 ;
[0110] Nodes 58 to 113 are ancestral.
APPENDIX 1
[0111] SEQ ID NO : 1 (tr_A9FHT2_Bacteria_Sorangium_cellulosum) ACGDGG-
GGDTSGTGGGSSGVGAPTSSGVGAGTPTSSGNVDPTTTSSGNVDPTTTSSPTTTSSGNVDPTTSAASGGGNC SPAVSGPFADAHLFVDPGYVKKVDS-
SIAQVTDTALKAKMEKVKQIQTAFWLDRIEAIKELPAYLDAALKLQNELCEP-
VTALIVVYDLPNRDCFAEASNGELHLDQNGTQRYREYIAPIKQILAAHSGQRIAAVIEPDSLP IATNLGGK RCDETTASYRDNVAHTLKELNMPHVYQYIDAAHSGWLGWPDNQKKGAKIFAEVIKAAGSPANVRGFATNVAN YTQLSYTAESYDQQDNPCFGEFDYVDAMASALSAEGLGDKHFI I DTSRNGVGNI - REDWGYWCNNKGAGMGQRPKANGGATNLDAFVWVKPPGDSDGVGQEGQPRYDLFCGKE- NADTRAPQAGQWFHEYFVECVKNANPAL
[0112] SEQ ID NO: 2 (tr_Q9UW10_Neocallimasti_Piromyces_rhizinflatus)
NDCNVDWGVLNGQEWIDKNRCNGGGYCKFESLGYP-CCNGC DVYYTDND—
GRWGVENKCNGYQQPRTTTTTRTTTRTTTTQRPSDFFEN-TLYSNFKFQGEVQS-SIQKLS-
GDMAKKAEKVKYVPTAVWL-AWEGAPRVPQYLDDAGS KTWFVLYMI PTRDCNANASVG
GSATLEKYKGYIDNIYNTFNQYPNSKIVMILEPDTIGNLVTA- NANCMNVQNLHKQGLAYAI SKFGQKNVRVYLDAAHGAWL—SSHADKTAQVIKEILNNAGS- GKLRGITTNVSNYQTVN
DEYSYQMRLNSALQNLGVRDLHYI IDTSRNGANIAQQNQSGTWCNFKGAGLGARPQANSSKPLLDAYMWIKT PGEADGSS—SGSRADPVCGRW-DSLQGAPDAGSWFHDYFVMLLQNANPPF
[0113] SEQ ID NO: 3 (tr_Q96V98_Neocallimasti_Orpinomyces)
L GYPCCSSSDYVDSDGVENGNWCGIPDPT CWAERPCCTTTTTVEYTDSD—
GKWGVENPVPTKTQGPTPTSGSDP-TPGSQLTLSGPFSGVEFFLNPYYVAEVDA- AIAQMSNSSLKAKAEKMKTYSNAIWLDTIKNMQQLETNLKGALAQ-QTGSKK-
VLTVFVVYDLPGRDCHALASNGELLANDSDAQRYKTYI DVIEEKLKYYKSQPWLI IEPDSLANLVTNLNTP ACRDSEQYYLDGHAYLIKKFGLPHVAMYLDIGHAFWLGWDDNREKAAKVYSKVI-
SSGSPGKVRGFTDNVANYTPWEDPSRGPDTEWNPCPDEKRYLEAMHKDFKAAGISSVYFVSDTSRNGHK-
TDRKHPGEWCNQTGVGIGARPQANSSMDYLDAFYWIKPLGESDGTSDTSAARYDGYCGHE-
TAMKPAPEAGQWFQKHFEQGLENANPPL
[0114] SEQ ID NO: 4 (tr_H2BPT8_Neocallimasti_Neocallimastix_patriciarum)
M GYPCCSGNEYTDDDGVENGNWCGIADPVYESCWSESCCSSPNAEVWYTDES—
GKWGVENGIPKGTPTPIDDE PYEISGPFKGVEFYINPYYVDEVDG-
AIAQMTDSSLIAKAEKMKTFSNAIWLDTIKNMQSLETNLQGAQSQHQSSGKD-
ILTVFVVYDLPGRDCHALASNGELLANDGDLARYKSYI DVIEGHLKKYNTQPWLIVEPDSLANLVTNLSTP ACADSEKYYLEGHAYLIKKFGLPHVAMYLDIGHAFWLGWDDNREKAGKVYAKVI-
SSGSPGKVRGFTDNVSNYTPWEDPSRGPETEWNPCPDEKRYLEAMHKDFKAAGIQSVYFVCDTSRNGKK-
VDRKHPGEWCNQTGVGVGARPKASSGMDYLDAFYWIKPLGESDGTSDENAVRYDGYCGHE-
TAMKPAPEAGQWFQKHFEQGIKNANPPL
[0115] SEQ ID NO: 5 (tr_Q2U2I8_Ascomycota_Aspergillus_oryzae) SASWHHLTDSSFTDRVCCI SDGDQQSLKPVTTVPSPEFQSSVNSKQALVALSP—
LLFSAATALPQASVTPSPSS-SVPSGPAPTATAGGPFEGYDLYVNPYYKSEVESLAIPSMT- GSLAEKASAAANVPSFHWLDTTDKVPQMGEFLEDIKTKNAAGANPPTAGI FVVYDLPDRDCAALASNGEFLI SDGGVEKYKAYI DS IREQVEKYSDTQI ILVIEPDSLANLVTNLNVQKCANAQDAYLECTNYALTQLNLPNVA MYLDAGHAGWLGWPANIGPAAELYASVYKNASSPAAVRGLATNVANYNAFSIDSCPSYTQGSTVCDEKTYIN NFAPQLKSAGF-
DAHFIVDTGRNGNQPTGQSQWGDWCNVKNTGFGVRPTTDTGDELVDAFVWVKPGGESDGTSDTSAERYDAHC GYA-DALTPAPEAGTWFQAYFEQLVENANPSL
[0116] SEQ ID NO: 6 (tr_G4MM92_Ascomycota_Magnaporthe_oryzae)
QACAAQWGQCGGQDYTGPTCCQSGSTCVVSNQWYSQCLPGSTTTSRTSTSSSSSTSRTSSSTSRPPTTPTSV PPTITTTTTPSGPGTTASFTGPFAGVNLFPNKFYSSEVHTLAIPSLT- GSLVAKASAVAQVPSFQWLDIAAKVETMPGALADVRAANAAGG —
YAAQLVVYDLPDRDCAAAASNGEFSIADGGWKYKAYI DAIRKQLLAYSDVRTILVIEPDSLANMVTNMGVP KCAGAKDAYLECTIYAVKQLNLPHVAMYLDGGHAGWLGWPANLQPAADLFGKLYADAGKPSQLRGMATNVAN YNAWDLTTAPSYTTPNPNFDEKKYISAFAPLLAAKGW-
SAHFIIDQGRSGKQPTGQKEWGHWCNQQGVGFGRRPSANTGSELADAFVWIKPGGECDGVSDPTAPRFDHFC GTDYGAMSDAPQAGQWFQKYFEMLLTNANPPL
[0117] SEQ ID NO : 7 (tr_Q2GMP2_Ascomycota_Chaetomium_globosum)
QNCATLWGQCGGNGWNGATCCASGSTCTKQNDWYSQCLPGGGTTTKPTSTSTSTSTSSRSTSTSQVSSSTSS PPWTNTS IPGGASSTASYTGPFSGVQMWANDYYRSEVHTLAMPSLT- GAMATKAAKVAEVPSYQWMDRNVTVDTFSGTLAQIRAANQAGASPPYAGI FVVYDLPDRDCAAAASNGEWS I ANGGAANYKAYIKRIRELI IQYSDIRMLLVIEPDSLANMVTNMGVAKCAGAASTYKELTIHALKELNLPNVA MYLDAGHAGWLGWPANIQPAADLFATLYKDAGRPAAVRGLATNVANYNAWSVSSAPAYTSPNPNYDEKHYVE AFSPLLTAAGF-
PAHFITDTGRSGKQPTGQLEWGHWCNAVGTGFGQRPSANTGHDLLDAFVWIKPGGECDGTSDTTAARYDHNC GLA-DALKPAPEAGQWFQAYFEQLLTNANPPF
[0118] SEQ ID NO : 8 (tr_B2 ABX7_Ascomycota_Podospora_anserina)
QNCGSVWSQCGGQGWTGATCCASGSTCVAQNQWYSQCLPGSTTAQAPSSTRTTTSSSSRPTSSSINVPTTTT SAGASVTVPPGGASSTASYSGPFLGVQQWANSYYSSEVHTLAIPSLT-
GPMATKAAAVAKVPSFQWMDRNVTVDTFSGTLADIRAANRAGANPPYAGI FVVYDLPDRDCAAAASNGEWAI ADGGAAKYKAYI DRIRHHLVQYSDIRTILVIEPDSLANMVTNMNVPKCQGAANTYKELTVYALKQLNLPNVA MYLDAGHAGWLGWPANIGPAAELFAGIYKDAGRPTSLRGLATNVANYNGWSLSSAPSYTTPNPNFDEKRFVQ AFSPLLTAAGF-
PAHFITDTGRSGKQPTGQLEWGHWCNAIGTGFGPRPTTDTGLDIEDAFVWIKPGGECDGTSDTTAARYDHHC GFA-DALKPAPEAGQWFQAYFEQLLTNANPPF
[0119] SEQ ID NO: 9 (tr_Q872J7_Ascomycota_Neurospora_crassa)
QNCGSAWSQCGGIGWSGATCCSSGNSCVEINSYYSQCLPGASTSPTSTSKVSSTTSKVTSSSAAQPITTTTA PSVPTT-TIAGGASSTASFTGPFLGVQGWANSYYSSEIYNHAIPSMT-
GSLAAQASAVAKVPTFQWLDRNVTVDTMKSTLEEIRAANKAGANPPYAAHFVVYDLPDRDCAAAASNGEFSI ANGGVANYKTYINAIRKLLIEYSDIRTILVIEPDSLANLVTNTNVAKCANAASAYRECTNYAITQLDLPHVA QYLDAGHGGWLGWPANIQPAATLFADIYKAAGKPKSVRGLVTNVSNYNGWSLSSAPSYTTPNPNYDEKKYIE AFSPLLNAAGF-
PAQFIVDTGRSGKQPTGQIEQGDWCNAIGTGFGVRPTTNTGSSLADAFVWVKPGGESDGTSDTSATRYDYHC GLS-DALKPAPEAGQWFQAYFEQLLKNANPAF
[0120] SEQ ID NO: 10 (tr_F8MDR2_Ascomycota_Neurospora_tetrasperma)
QNCGSAWTQCGGIGWSGATCCSSGNSCVEINSYYSQCLPGASTSPTSTSKVSSTTTKVTSSSAAQPITTTTA PSVPTT-TVAGGASSTASFTGPFLGVQGWANSYYSSEIYNHAIPSMT- GSLAAQASAVAKVPTFQWLDRNVTVDTMKSTLEEIRAANKAGANPPYAAHFVVYDLPDRDCAAAASNGEFSI ANGGVANYKTYINAIRKLLIEYSDIRTILVIEPDSLANLVTNTNVAKCANAASAYKECTNYAITQLDLPHVA QYLDAGHGGWLGWPANIQPAATLFADIYKAAGKPKSVRGLVTNVSNYNGWSLSSAPSYTTPNPNYDEKKYIE AFSPLLNAAGF-
PAQFIVDTGRSGKQPTGQIEQGDWCNAIGTGFGVRPTTNTGSSLADAFVWVKPGGESDGTSDTSATRYDYHC GLS-DALKPAPEAGQWFQAYFEQLLKNANPAF
[0121] SEQ ID NO: 1 1 (tr_G2QA39_Ascomycota_Thielavia_heterothallica)
QNCGAVWTQCGGNGWQGPTCCASGSTCVAQNEWYSQCLPNSPSSTSTSQRSTSTSSSTTRSGSS- SSSSTTPPPVSSPTSI PGGATSTASYSGPFSGVRLFANDYYRSEVHNLAI PSMT-
GTLAAKASAVAEVPSFQWLDRNVTIDTMVQTLSQVRALNKAGANPPYAAQLVVYDLPDRDCAAAASNGEFSI ANGGAANYRSYI DAIRKHI IEYSDIRI ILVIEPDSMANMVTNMNVAKCSNAASTYHELTVYALKQLNLPNVA MYLDAGHAGWLGWPANIQPAAELFAGIYNDAGKPAAVRGLATNVANYNAWSIASAPSYTSPNPNYDEKHYIE AFSPLLNSAGF-
PARFIVDTGRNGKQPTGQQQWGDWCNVKGTGFGVRPTANTGHELVDAFVWVKPGGESDGTSDTSAARYDYHC GLS-DALQPAPEAGQWFQAYFEQLLTNANPPF
[0122] SEQ ID NO: 12 (tr_G2QW39_Ascomycota_Thielavia_terrestris)
QNCGSVWSQCGGIGWSGATCCASGNTCVELNPYYSQCLPNSSKTTSTTTRSSTTSHSSGPTSTSTTTTSSPV VTTPPSTS IPGGASSTASWSGPFSGVQMWANDYYASEVSSLAIPSMT- GAMATKAAEVAKVPSFQWLDRNVTIDTFAHTLSQIRAANQKGANPPYAGI FVVYDLPDRDCAAAASNGEFS I ANNGAANYKTYI DAIRSLVIQYSDIRI I FVIEPDSLANMVTNLNVAKCANAESTYKELTVYALQQLNLPNVA MYLDAGHAGWLGWPANIQPAANLFAEIYTSAGKPAAVRGLATNVANYNGWSLATPPSYTQGDPNYDESHYVQ ALAPLLTANGF-
PAHFITDTGRNGKQPTGQRQWGDWCNVIGTGFGVRPTTNTGLDIEDAFVWVKPGGECDGTSNTTSPRYDYHC GLS-DALQPAPEAGTWFQAYFEQLLTNANPPF
[0123] SEQ ID NO: 13 (tr_J3NZ73_Ascomycota_Gaeumannomyces_graminis)
QSCGSQWSQCGGIGWGGATCCASGSTCVRQNDYYFQCIPGSPTTTTSRSGPTTTTRAPGPSTTVTRGTTTTG GNGGTPTSGGGGAGTTASFTGPFQGVNLWVNDYYASEISTLAIPSLS- GAMATKAAAVAKVPSFEWFDIAAKVGTMPHTLNAIRAANKAGG —
FAAQFVVYDLPDRDCAAAASNGEYSIVDGGVAKYKAYI DS IRAQLVSFSDIRTILVVEPDSLANMVTNLNVP KCANAQAAYRECTLYAIKQLNLPNVAMYLDGGHAGWLGWPANLGPAADLFGKLYVDAGKPSQLRGMATNVAN YNSWNLTSAPAYTSPNPNYDERHYVEAFHPLLAAKGW-
NAHFITDQGRSGKQPTGQLEWGHWCNAMGTGFGMRPSANTGLEIQDAFVWIKPGGECDGTSDTTAARFDRFC GMA-DALKPAPEAGQWFQAYFVQLLTNANPPF
[0124] SEQ ID NO: 14 (tr_G2XB72_Ascomycota_Verticillium_dahliae)
QACASQWGQCGGQGWSGPTCCPSGTTCQLQNAWYSQCLPGAATTAASSTRPATTSSVRSTTVVNPPTTTVAP
PPGTTVAPPP GGATYTGPFAGVNQWANAYYRSEVSSLAVPSLS-
GPLATAAAKVADVPTFQWMDTTAKVPLI DGALADIRRANAAGG —
YAGI FVVYNLPDRDCAAAASNGELSIANDGINKYKAYI DS IRAVLLKYNDIRTLLVIEPDSLANMVTNMGVA KCSNAAAAYKECTKYAVQQLDLPHVAQYLDAGHAGWLGWPANIGPAATIFTDIYKEAGRPKSLRGLATNVSN YNAWNATSPAPYTSPNPNYDEKHYVDAFAPLLRQNGW-
DAKFI I DQGRSGKQPTGQQEWGHWCNALGTGFGLRPTSNTGHPDVDAFVWVKPGGEADGTSDTTAVRYDHFC GSA-SSMKPAPEAGTWFQAYFEQLLRNANPSF
[0125] SEQ ID NO: 15 (tr_E3Q540_Ascomycota_Colletotrichum_graminicola) QACASQWGQCGGQGWTGPSCCAAGSVCTVSNPFYSQCLPGSTVASSTSTVRTSSTPWSPSRTSTVTGSVST TSAGTGTTPP—PTGGATYTGPFVGVNLWANSYYASEISTLAIPSLS- PALATAAAKVAKVPTFMWMDTRSKIPLVDATLADIRKANQAGA —
YAGEFVVYNLPDRDCAAAASNGELSIADGGVAKYKQYI DDIRAMWKYSDIRI ILTIEPDSLANLVTNLNVP KCAGAQAAYLEGTNYAVTQLNLPNVAMYLDGGHAGWLGWPANLPPAAAMYAKVYKDAGKPKALRGLVTNVSN YNGYSISTAPSYTQGNANYDEKHYIEALAPLLSAEGW-
DAKFIVDQGRSGKQPTGQLAWGDWCNAIGTGFGVRPTANTGSTLVDAFVWVKPGGESDGTSDTTAARYDLNC GKA-DALKPAPEAGTWFQAYFEQLLINANPAF
[0126] SEQ ID NO: 16 (tr_Q4JQF8_Ascomycota_Chaetomium_thermophilum)
QSCSSVWGQCGGINYNGPTCCQSGSVCTYLNDWYSQCIPGQ--
GTTSTTARTTSTSTTSTSSVRPTTSNTPVTTAPPTTTI PGGASSTASYNGPFSGVQLWANTYYSSEVHTLAI PSLS-
PELAAKAAKVAEVPSFQWLDRNVTVDTFSGTLAEIRAANQRGANPPYAGI FVVYDLPDRDCAAAASNGEWS I ANNGANNYKRYI DRIRELLIQYSDIRTILVIEPDSLANMVTNMNVQKCSNAASTYKELTVYALKQLNLPHVA MYMDAGHAGWLGWPANIQPAAELFAQIYRDAGRPAAVRGLATNVANYNAWSIASPPSYTSPNPNYDEKHYIE AFAPLLRNQGF-
DAKFIVDTGRNGKQPTGQLEWGHWCNVKGTGFGVRPTANTGHELVDAFVWVKPGGESDGTSDTSAARYDYHC GLS-DALTPAPEAGQWFQAYFEQLLINANPPF
[0127] SEQ ID NO: 17 (sp_Q9Cl S9_Ascomycota_Humicola_insolens)
QNCAPTWGQCGGIGFNGPTCCQSGSTCVKQNDWYSQCLPGS- TTSTTSTSSSSTTSRATSTTRTGGVTSITTAPTRTV-
TI PGGATTTASYNGPFEGVQLWANNYYRSEVHTLAI PQITDPALRAAASAVAEVPSFQWLDRNVTVDTLVET LSEIRAANQAGANPPYAAQIWYDLPDRDCAAAASNGEWAIANNGANNYKGYINRIREILISFSDVRTILVI EPDSLANMVTNMNVAKCSGAASTYRELTIYALKQLDLPHVAMYMDAGHAGWLGWPANIQPAAELFAKIYEDA GKPRAVRGLATNVANYNAWSISSPPPYTSPNPNYDEKHYIEAFRPLLEARGF-
PAQFIVDQGRSGKQPTGQKEWGHWCNAIGTGFGMRPTANTGHQYVDAFVWVKPGGECDGTSDTTAARYDYHC GLE-DALKPAPEAGQWFQAYFEQLLRNANPPF
[0128] SEQ ID NO: 18 (sp_P07987_Ascomycota_Hypocreaj'ecorina)
QACSSVWGQCGGQNWSGPTCCASGSTCVYSNDYYSQCLPG-AASSSSSTRAASTTS—
RVSPTTSRSSSATPPPGSTTTRVPPVGSGTATYSGPFVGVTPWANAYYASEVSSLAIPSLT- GAMATAAAAVAKVPSFMWLDTLDKTPLMEQTLADIRTANKNGG —
YAGQFVVYDLPDRDCAALASNGEYSIADGGVAKYKNYI DTIRQIWEYSDIRTLLVIEPDSLANLVTNLGTP KCANAQSAYLECINYAVTQLNLPNVAMYLDAGHAGWLGWPANQDPAAQLFANVYKNASSPRALRGLATNVAN YNGWNITSPPSYTQGNAVYNEKLYIHAIGPLLANHGWSNAFFITDQGRSGKQPTGQQQWGDWCNVIGTGFGI RPSANTGDSLLDSFVWVKPGGECDGTSDSSAPRFDSHCALP-DALQPAPQAGAWFQAYFVQLLTNANPSF
[0129] SEQ ID NO: 19 (tr_Q7LSP2_Ascomycota_Trichoderma_koningii)
QACSSVWGQCGGQNWSGPTCCASGSTCVYSNDYYSQCLPG-AASSSSSTRAASTTS—
RVSPTTSRSSSATPPPGSTTTRVPPVGSGTATYSGPFVGVTPWANAYYASEVSSLAIPSLT- GAMATAAAAVAKVPSFMWLDTLDKTPLMEQTLADIRTANKNGG —
YAGQFVVYDLPDRDCAALASNGEYSIADGGVAKYKNYI DTIRQIWEYSDIRTLLVIEPDSLANLVTNLGTP KCANAQSAYLECINYAVTQLNLPNVAMYLDAGHAGWLGWPANQDPAAQLFANVYKNASSPRALRGLATNVAN YNGWNITSPPSYTQGNAVYNEKLYIHAIGRLLANHGWSNAFFITDQGRSGKQPTGQQQWGDWCNVIGTGFGI RPSANTGDSLLDSFVWVKPGGECDGTSDSSAPRFDSHCALP-DALQPAPQAGAWFQAYFVQLLTNANPSF
[0130] SEQ ID NO: 20 (tr_H9C5Tl_Ascomycota_Hypocrea_orientalis) QACSSVWGQCGGQNWSGPTCCASGSTCVYSNDYYSQCLPG-AASSSSSTRASSTTA—RASSTT- SRSSATPPPGSSTTRVPPVGSGTATYSGPFVGVTPWANAYYASEVSSLAIPSLT- GAMATAAAAVAKVPSFMWLDTFDKTPLMEQTLADIRTANKNGG —
YAGQFVVYDLPDRDCAALASNGEYSIADGGVDKYKNYI DTIRQIWEYSDIRTLLVIEPDSLANLVTNLGTP KCANAQSAYLECINYAVTQLNLPNVAMYLDAGHAGWLGWPANQDPAAQLFANVYKNASSPRALRGLATNVAN YNGWNITSPPSYTQGNAVYNEQLYIHAIGPLLANHGWSNAFFITDQGRSGKQPTGQQQWGDWCNVIGTGFGI RPSANTGDSLLDSFVWIKPGGECDGTSDSSAPRFDSHCALP-DALQPAPQAGAWFQAYFVQLLTNANPSF
[0131] SEQ ID NO: 21 (tr_DlMGM6_Ascomycota_Trichoderma_longibrachiatum)
QACSSVWGQCGGQNWSGPTCCASGSTCVYSNDYYSQCLPG-AASSSSSTRASSTTARASSTT
SRSSATPPPGSSTTRVPPVGSGTATYSGPFVGVTPWANAYYASEVSSLAIPSLT- GAMATAAAAVAKVPSFMWLDTFDKTPLMEQTLADIRTANKNGG —
YAGQFVVYDLPDRDCAALASNGEYSIADGGVDKYKNYI DTIRQIWEYSDIRTLLVIEPDSLANLVTNLGTP KCANAQSAYLECINYAVTQLNLPNVAMYLDAGHAGWLGWPANQDPAAQLFANVYKNASSPRALRGLATNVAN YNGWNITSPPSYTQGNAVYNEQLYIHAIGPLLANHGWSNAFFITDQGRSGKQPTGQQQWGDWCNVIGTGFGI RPSANTGDSLLDSFVWIKPGGECDGTSDSSAPRFDSHCALP-DALQPAPQAGAWFQAYFVQLLTNANPSF
[0132] SEQ ID NO: 22 (tr_G9NFV6_Ascomycota_Hypocrea_atroviridis)
QACASVWGQCGGQGWSGATCCASGSSCVVSNPYYSQCLPGS
SSSSTLASSTRASSTTVRSSSTTPPPSSST-PPPPVGSGTATYQGPFSGINPWANSFYAQEVSSSAIPSLS- GAMATAAAAAAKVPSFMWLDTLSKTSLLSSTLSDIRAANKAGGN—
YAGQFVVYDLPDRDCAAAASNGEYSIADNGVANYKNYI DTIVGILKTYSDIRTILVIEPDSLANLVTNLSVA KCSNAQAAYLECINYAITQLNLPNVAMYLDAGHAGWLGWPANQQPAAQLFASVYKNASSPRAVRGLATNVAN YNGWNITSAPSYTQGNSVYNEQLYIHAISPLLTQQGWSNTYFITDQGRSGKQPTGQQAWGDWCNVIGTGFGI RPSANTGDSLLDAFTWIKPGGECDGTSNTSATRYDYHCGLS-DALQPAPEAGSWFQAYFVQLLTNANPSF
[0133] SEQ ID NO: 23 (tr llRIJl Ascomycota Gibberella zeae)
QSCSNVWSQCGGQNWSGTPCCTSGNKCVKVNDFYSQCQPGS--SPTSTIVSATTTK
ATTTGSGGSVTSP PPVATNPFSGVDLWANNYYRSEVSTLAIPKLS-
GAMATAAAKVADVPSFQWMDTYDHISFMEESLADIRKANKAGGN—
YAGQFVVYDLPDRDCAAAASNGEYSLDKDGKNKYKAYIAKIKGILQDYSDTRI ILVIEPDSLANMVTNMNVP KCANAASAYKELTIHALKELNLPNVSMYIDAGHGGWLGWPANLPPAAQLYGQLYKDAGKPSRLRGLVTNVSN YNAWKLSSKPDYTESNPNYDEQKYIHALSPLLEQEGWPGAKFIVDQGRSGKQPTGQKAWGDWCNAPGTGFGL RPSANTGDALVDAFVWVKPGGESDGTSDTSAARYDYHCGI D-GAVKPAPEAGTWFQAYFEQLLKNANPSF
[0134] SEQ ID NO: 24 (tr_Q8NIB5_Ascomycota_Talaromyces_emersonii) QSLWGQCGGSSWTGATSCAAGATCSTINPYYAQCVPAT-TTLTTTTKPTSTGG
AAPTTPPPTTTGTTTSP-WTRPASASGPFEGYQLYANPYYASEVISLAIPSLS-
SELVPKASEVAKVPSFVWLDQAAKVPSMGDYLKDIQSQNAAGADPPIAGI FVVYDLPDRDCAAAASNGEFS I ANNGVALYKQYI DS IREQLTTYSDVHTILVIEPDSLANWTNLNVPKCANAQDAYLECINYAITQLDLPNVA MYLDAGHAGWLGWQANLAPAAQLFASVYKNASSPASVRGLATNVANYNAWSI SRCPSYTQGDANCDEEDYVN ALGPLFQEQGF-
PAYFI I DTSRNGVRPTKQSQWGDWCNVIGTGFGVRPTTDTGNPLEDAFVWVKPGGESDGTSNTTSPRYDYHC GLS-DALQPAPEAGTWFQAYFEQLLTNANPLF
[0135] SEQ ID NO: 25 (tr_G2XV25_Ascomycota_Botryotinia_fuckeliana) GAAYAQCGGQGWSGATTCVSGYTCVVNNAYYSQCLPGSAVTTTATTAPTATTPTTI ITSTT- KATTTTGGSSATT TAAVAGPFSGKALYANPYYASEISASAI PSLT-
GAMATKAAAVAKVPTFYWLDTAAKVPLMGTYLANIRALNKAGANPPVAGTFVVYDLPDRDCAAAASNGEYS I ADGGLVKYKAYI DS IVALLKTYSDVSVILVIEPDSLANLVTNLSVAKCSNAQAAYLEGTEYAIAQLNLPNVA MYLDAGHAGWLGWPANIGPAAQLFGQIYKAAGSPAAVRGLATNVANYNAWTSTTCPSYTSGDSNCNEKLYIN ALAPLLTAQGF-
PAHFIMDTSRNGVQPTAQQAWGDWCNLIGTGFGVRPTTNTGDALEDAFVWIKPGGEGDGTSDTTAARYDFHC GLA-DALKPAPEAGTWFQAYFAQLLTNANPSF
[0136] SEQ ID NO: 26 (tr_B6QMM6_Ascomycota_Penicillium_marneffei) QSVWGQCGGQGYTGATSCAAGSTCSTQNPYYAQCIPA
TATSTTLVKTTSSTSVGTTSAPTTTTTKATTTKASTTAT
TAAASGPFSGYQLYANPYYSSEVHTLAI PSLT-
GTLAAAATKAAEIPSFVWLDTAAKVPTMGTYLANIEAANKAGATPPIAGI FVVYDLPDRDCAAAASNGEYTV ANNGVANYKAYI DS IVAQLKAHPDVHTILI IEPDSLANMVTNLSTAKCTEAQPAYYECVNYALI LNLPNVA MYIDAGHAGWLGWSANLSPAAQLFATVYKNASSPAALRGLVTNVANYNAWSI SSAPSYTSGDSNYDEQLYVN ALSPLLTSNGWPNAHFIMDTSRNGVQPTQQKAWGDWCNLIGTGFGVAPTTNTGDPLEDAFVWVKPGGESDGT SNSSATRYDYHCGNS-DSLQPAPEAGSWFQAYFVQLLTNANPPL
[0137] SEQ ID NO: 27 (tr_G7XQ80_Ascomycota_Aspergillus_kawachii) QTLWGQCGGQGYSGATSCVAGATCSTINEYYAQCTPAT-SATTLKTTTSTTTA
AMTTTTATSSPAASASP TTTASASGPFSGYQLYVNPYYSSEVASLAI PSLT-
GSLQAAATAAAKVPSFVWLDTADKVPTMADYLADIKSQNSAGASPPIAGQFVVYDLPDRDCAALASNGEYSI ADNGVEHYKAYI DS IREVLVQYSDVHTLLVIEPDSLANLVTNLNVAKCANAQSAYLECTNYALTQLNLPNVA MYLDAGHAGWLGWPANQQPAADLFASVYKNASSPAAVRGLATNVANYNAWTI SSCPSYTQGNSVCDEQQYIN AIAPLLEAQGF-
DAHFIVDTGRNGKQPTGQQAWGDWCNVINTGFGVRPTTSTGDDLVDAFVWVKPGGESDGTSDSSATRYDAHC GYS-DALQPAPEAGTWFQAYFVQLLTNANPAF
[0138] SEQ ID NO: 28 (sp_A2QYR9_Ascomycota_Aspergillus_niger) QTLWGQCGGQGYSGATSCVAGATCATVNEYYAQCTPA-AGTSSATTLKTTTSS
TTAAVTTTTTTQSPTGSA SPTTTASASGPFSGYQLYVNPYYSSEVASLAIPSLT-
GSLQAAATAAAKVPSFVWLDTAAKVPTMGDYLADIQSQNAAGANPPIAGQFVVYDLPDRDCAALASNGEYSI ADNGVEHYKSYI DS IREILVQYSDVHTLLVIEPDSLANLVTNLNVAKCA AESAYLECTNYALTQLNLPNVA MYLDAGHAGWLGWPANQQPAADLFASVYKNASSPAAVRGLATNVANYNAWTI SSCPSYTQGNSVCDEQQYIN AIAPLLQAQGF-
DAHFIVDTGRNGKQPTGQQAWGDWCNVINTGFGERPTTDTGDALVDAFVWVKPGGESDGTSDSSATRYDAHC GYS-DALQPAPEAGTWFQAYFVQLLTNANPAF
[0139] SEQ ID NO: 29 (sp_P46236_Ascomycota_Fusarium_oxysporum)
QSCSNVWAQCGGQNWSGTPCCTSGNKCVKLNDFYSQCQPG SAEPSSTAAGPSS—
TTATKTTATGGSSTTAGGSVTSAP PAASDPYAGVDLWANNYYRSEVMNLAVPKLS-
GAKATAAAKVADVPSFQWMDTYDHISLMEDTLADIRKANKAGGK—
YAGQFVVYDLPNRDCAAAASNGEYSLDKDGANKYKAYIAKIKGILQNYSDTKVILVIEPDSLANLVTNLNVD KCAKAESAYKELTVYAIKELNLPNVSMYLDAGHGGWLGWPANIGPAAKLYAQIYKDAGKPSRVRGLVTNVSN YNGWKLSTKPDYTESNPNYDEQRYINAFAPLLAQEGWSNVKFIVDQGRSGKQPTGQKAQGDWCNAKGTGFGL RPSTNTGDALADAFVWVKPGGESDGTSDTSAARYDYHCGLD-DALKPAPEAGTWFQAYFEQLLDNANPSF
[0140] SEQ ID NO: 30 (sp QOCFPl Ascomycota Aspergillus terreus)
QTLWGQCGGIGWTGPTNCVAGAACSTQNPYYAQCLPGTSTTLTTTTRVTTTTTSTTSKSSSTGSTTTTKSTG TTTTSGS-STTITSAPSGPFSGYQLYANPYYSSEVHTLAMPSLA- SSLLPAASAAAKVPSFTWLDTAAKVPTMGTYLADIKAKNAAGANPPIAAQFVVYDLPDRDCAALASNGEYSI ANGGVANYKKYI DAIRAQLLNYPDVHTILVIEPDSLANLVTNLNVAKCANAQSAYLECVNYALIQLNLPNVA MYIDAGHAGWLGWPANIGPAAQLFAGVYKDAGAPAALRGLATNVANYNAFSI STCPSYTSGDANCDENRYIN AIAPLLKDQGW-
DAHFIVDTGRNGVQPTKQNAWGDWCNVIGTGFGVRPTTNTGNSLVDAFVWVKPGGESDGTSDSSSARYDAHC GYS-DALQPAPEAGTWFQAYFEQLLKNANPAF
[0141] SEQ ID NO: 31 (tr_F2VRZ0_Ascomycota_Phialophora_sp)
QNCASEWGQCGGTGFTGASCCASGSTCTQQNEYYSQCVPGS
TGQIASTPAATVVGSATSSPSQMTAPAASA SGTTSYSGPFEGVQMWANAYYASEVLNLAVPSLS-
GDMVAKASAVAKVPSFQWLDTAAKVPTMADTLADIAKANQAGASPAYAGLFVVYDLPDRDCAAAASNGEYSI ADNGVANYKAYI DAIKAQLVANSDTRILLVVEPDSLANLVTNMNVAKCA AHDAYLECINYAVTQLNLPNVA MYLDAGHAGWLGWSANLQPAATLFANVYSNAGKPASLRGLATNVANYNAWTIASAPSYTQGDSNYDEKLYVQ ALSPLLSSAGW-
DAHFITDQSRSGKQPTGQNAWGDWCNVIGTGFGTRPTTDTGLDIEDALVWVKPGGECDGTSNTTAARYDYHC GLS-DALQPSPEAGTWFQAYFVQLLTNANPAF
[0142] SEQ ID NO: 32 (tr_FlCHI2_Ascomycota_Penicillium_decumbens)
QTVWGQCGGIGYSGPTSCVAGSSCSTQNSYYAQCLPGSGGGAATTTTTAGQTTKTTMATTTTTSTKTSAGSG GSTTTAP PASNSGPFKGYQPYVNPYYASEVQSLAIPSLA-
ASLAPKASAVAKVPSFVWLDTAAKVPTMGTYLADIKAKNAAGANPPIAGI FVVYDLPDRDCAALASNGEYS I ANGGVANYKKYI DS IRAQLLKYPDVHTILVIEPDSLANLVTNMNVAKCSGAHDAYLECTDYALKQLNLPNVA MYLDAGHAGWLGWPANIGPAADLFASVYKNAGSPAAVRGLATNVANYNAWSI STCPSYTQGDQNCDEKRYIN ALAPLLRANGF-
DAHFIMDTSRNGVQPTKQQAWGDWCNVIGTGFGTPFTTDTGDALQDAFIWVKPGGECDGTSDTSSPRYDAHC GYS-DALKPAPEAGTWFQAYFEQLLVNANPSF
[0143] SEQ ID NO: 33 (sp_AlCCN4_Ascomycota_Aspergillus_clavatus) QTMWGQCGGAGWSGATDCVAGGVCSTQNAYYAQCLPG ATTATTLSTTSKG—
TTTTTTSSTTSTGGGSSSTTTSTSAGPTVTGSPSGPFSGYQQYANPYYSSEVHTLAIPSMT- GALAVKASAVADVPSFVWLDVAAKVPTMGTYLENIRAKNKAGANPPVAGI FVVYDLPDRDCAALASNGEYAI ADGGIAKYKAYI DAIRAQLLKYPDVHTILVIEPDSLANLITNINVAKCSGAKDAYLECINYALKQLNLPNVA MYIDAGHGGWLGWDANIGPAAEMYAKVYKDADAPAALRGLAVNVANYNAWTI DTCPSYTQGNKNCDEKRYIH ALYPLLKAAGW-
DARFIMDTGRNGVQPTKQQAQGDWCNVIGTGFGIRPSSETGDDLLDAFVWVKPGAESDGTSDTTAARYDAHC GYT-DALKPAPEAGQWFQAYFEQLLTNANPAF
[0144] SEQ ID NO: 34 (tr_093837_Ascomycota_Acremonium_cellulolyticus) QSVWGQCGGQGWSGATSCAAGSTCSTLNPYYAQCIPG TATSTTLVKTTSS
TSVGTTSPPTTTTTKASTTATTTAAASGPFSGYQLYANPYYSSEVHTLAI PSLT-
GSLAAAATKAAEIPSFVWLDTAAKVPTMGTYLANIEAANKAGASPPIAGI FVVYDLPDRDCAAAASNGEYTV ANNGVANYKAYI DS IVAQLKAYPDVHTILI IEPDSLANMVTNLSTAKCAEAQSAYYECVNYALINLNLANVA MYIDAGHAGWLGWSANLSPAAQLFATVYKNASAPASLRGLATNVANYNAWSI SSPPSYTSGDSNYDEKLYIN ALSPLLTSNGWPNAHFIMDTSRNGVQPTKQQAWGDWCNVIGTGFGVQPTTNTGDPLEDAFVWVKPGGESDGT SNSSATRYDFHCGYS-DALQPAPEAGTWFQAYFVQLLTNANPAL
[0145] SEQ ID NO: 35 (tr_B5TMG4_Acomycota_Penicillium_fliniculosum) QSVWGQCGGQGWSGATSCAAGSTCSTLNPYYAQCIPG TATSTTLVKTTSS
TSVGTTSPPTTTTTKASTTATTTAAASGPFSGYQLYANPYYSSEVHTLAI PSLT-
GSLAAAATKAAEIPSFVWLDTAAKVPTMGTYLANIEAANKAGASPPIAGI FVVYDLPDRDCAAAASNGEYTV ANNGVANYKAYI DS IVAQLKAYPDVHTILI IEPDSLANMVTNLSTAKCAEAQSAYYECVNYALIKPHLAHVA MYIDAGHAGWLGWSANLSPAAQLFATVYKNASAPASLRGLATNVANYNAWSI SSPPSYTSGDSNYDEKLYIN ALSPLLTSNGWPDAHFIMDTSRNGVQPTKQQAWGDWCNVIGTGFGVQPTTNTGDPLEDAFVWVKPGGESDGT SNSSATRYDFHCGYS-GALQPAPEAGTWFQAYFVQLLTNANPAL
[0146] SEQ ID NO: 36 (sp_Q4WFK4_Ascomycota_Neosartorya_fumigata) QTVWGQCGGQGWSGPTSCVAGAACSTLNPYYAQCIPGA--STTLTTTTAATTT
SQTTTKPTTTGPTTSAP TVTASGPFSGYQLYANPYYSSEVHTLAMPSLP-
SSLQPKASAVAEVPSFVWLDVAAKVPTMGTYLADIQAKNKAGANPPIAGI FVVYDLPDRDCAALASNGEYS I ANNGVANYKAYI DAIRAQLVKYSDVHTILVIEPDSLANLVTNLNVAKCANAQSAYLECVDYALKQLNLPNVA MYLDAGHAGWLGWPANLGPAATLFAKVYTDAGSPAAVRGLATNVANYNAWSLSTCPSYTQGDPNCDEKKYIN AMAPLLKEAGF-
DAHFIMDTSRNGVQPTKQNAWGDWCNVIGTGFGVRPSTNTGDPLQDAFVWIKPGGESDGTSNSTSPRYDAHC GYS-DALQPAPEAGTWFQAYFEQLLTNANPSF
[0147] SEQ ID NO: 37 (sp_Q5B2E8_Ascomycota_Emericella_nidulans) QTLYGQCGGSGWTGATSCVAGAACSTLNQWYAQCLPA—ATTTSTTLTTTTSS
VTTTSNPGSTTTT SSVTVTATASGPFSGYQLYVNPYYSSEVQSIAI PSLT-
GSLAPAATAAAKVPSFVWLDVAAKVPTMATYLADIRSQNAAGANPPIAGQFVVYDLPDRDCAALASNGEFAI SDGGVQHYKDYI DS IREILVEYSDVHVILVIEPDSLANLVTNLNVAKCANAQSAYLECTNYAVTQLNLPNVA MYLDAGHAGWLGWPANLQPAANLYAGVYSDAGSPAALRGLATNVANYNAWAI DTCPSYTQGNSVCDEKDYIN ALAPLLRAQGF-
DAHFITDTGRNGKQPTGQQAWGDWCNVIGTGFGARPSTNTGDSLLDAFVWVKPGGESDGTSDTSAARYDAHC GYS-DALQPAPEAGTWFQAYFVQLLQNANPSF
[0148] SEQ ID NO: 38 (tr_B8MHF4_Ascomycota_Talaromyces_stipitatus) VWGQCGGQGWTGATICAAGATCSAINSYYAQCTPA AAASTTLVTKTSS
TSVGTTSPATTPTTKPST TAAASGPFSGYQLYANPYYSSEVHTLALPSLT-
GSLAAAATKAAEIPSFVWLDTAAKVPTMGTYLANIQAANKAGASPPIAGI FVVYDLPDRDCAAAASNGEYTV ANNGVANYKAYI DS IVKQLKAYPDVHTILI IEPDSLANMVTNLSTAKCSEAQAAYYECVNYALINLNLANVA MYIDAGHAGWLGWPANLSPAAQLFAQVYKNASSPASLRGLATNVANYNAWSLSSAPSYTSGDSNYDEQLYIN ALSPLLTQNGWPNAHFIMDTSRNGVQPTKQQAWGDWCNVIGTGFGVPPTTNTGDPLEDAFVWVKPGGESDGT SNSSATRYDYHCGYS-DALQPAPEAGTWFQAYFVQLLTNANPSL
[0149] SEQ ID NO: 39 (sp_AlDJQ7_Ascomycota_Neosartorya_fischeri) QTVWGQCGGQGWSGPTNCVAGAACSTLNPYYAQCIPG ATATSTTLSTTTT
TQTTTKPTTTGPTTSAP TVTASGPFSGYQLYANPYYSSEVHTLAMPSLP-
SSLQPKASAVAEVPSFVWLDVAAKVPTMGTYLADIQAKNKAGASPPIAGI FVVYDLPDRDCAALASNGEYS I ANNGVANYKAYI DAIRAQLVKYSDVHTILVIEPDSLANLVTNLNVAKCANAQSAYLECVDYALKQLNLPNVA MYLDAGHAGWLGWPANLGPAATLFAKVYTDAGSPAALRGLATNVANYNAWSLSTCPSYTQGDPNCDEKKYIN AMAPLLKNAGF-
DAHFIMDTSRNGVQPTKQSAWGDWCNVIGTGFGVRPSTNTGDPLQDAFVWIKPGGESDGTSNSSSARYDAHC GYS-DALQPAPEAGTWFQAYFEQLLTNANPSF
[0150] SEQ ID NO: 40 (gi_367023495_Ascomycota_Myceliophthora_thermophila) QNCGAVWTQCGGNGWQGPTCCASGSTCVAQNEWYSQCLPNSPSSTSTSQRSTSTSSSTTRSGSS- SSSSTTPPPVSSPTSI PGGATSTASYSGPFSGVRLFANDYYRSEVHNLAI PSMT-
GTLAAKASAVAEVPSFQWLDRNVTIDTMVQTLSQVRALNKAGANPPYAAQLVVYDLPDRDCAAAASNGEFSI ANGGAANYRSYI DAIRKHI IEYSDIRI ILVIEPDSMANMVTNMNVAKCSNAASTYHELTVYALKQLNLPNVA MYLDAGHAGWLGWPANIQPAAELFAGIYNDAGKPAAVRGLATNVANYNAWSIASAPSYTSPNPNYDEKHYIE AFSPLLNSAGF-
PARFIVDTGRNGKQPTGQQQWGDWCNVKGTGFGVRPTANTGHELVDAFVWVKPGGESDGTSDTSAARYDYHC GLS-DALQPAPEAGQWFQAYFEQLLTNANPPF
[0151] SEQ ID NO : 41 (gi_310790274_Ascomycota_Glomerella_graminicola)
QACASQWGQCGGQGWTGPSCCAAGSVCTVSNPFYSQCLPGSTVASSTSTVRTSSTPWSPSRTSTVTGSVST TSAGTGTTPP—PTGGATYTGPFVGVNLWANSYYASEISTLAIPSLS- PALATAAAKVAKVPTFMWMDTRSKIPLVDATLADIRKANQAGA —
YAGEFVVYNLPDRDCAAAASNGELSIADGGVAKYKQYI DDIRAMWKYSDIRI ILTIEPDSLANLVTNLNVP KCAGAQAAYLEGTNYAVTQLNLPNVAMYLDGGHAGWLGWPANLPPAAAMYAKVYKDAGKPKALRGLVTNVSN YNGYSISTAPSYTQGNANYDEKHYIEALAPLLSAEGW-
DAKFIVDQGRSGKQPTGQLAWGDWCNAIGTGFGVRPTANTGSTLVDAFVWVKPGGESDGTSDTTAARYDLNC GKA-DALKPAPEAGTWFQAYFEQLLINANPAF
[0152] SEQ ID NO : 42 (gi_302405457_Ascomycota_Verticillium_albo-atrum)
QACASQWGQCGGQGWSGPTCCPSGTTCQLQNAWYSQCLPGAATTAASSTRPATTSSIRSTTVVNPPTTTVAP PPGTTVAPPPAPPPGGATYTGPFAGVNQWANAYYRSEVSSLAVPSLS- GPLATAAAKVADVPTFQWMDTTAKVPLI DGALADIRRANAAGGN—
YAGI FVVYNLPDRDCAAAASNGELSIANDGINKYKAYI DS IRTVLLKYNDIRTLLVIEPDSLANMVTNMGVA KCSNAAAAYKECTKYAVQKLDLPHVAQYLDAGHAGWLGWPANIGPAATIFTDIYKEAGKPKSLRGLATNVSN YNAWNASSPAPYTSPNPNYDEKHYVDAFAPLLRQNGW-
DAKFI I DQGRSGKQPTGQQEWGHWCNALGTGFGLRPTSNTGHPDVDAFVWVKPGGEADGTSDTTAVRYDHFC GSA-SSMKP
[0153] SEQ ID NO: 43 (gi_345565889_Ascomycota_Arthrobotrys_oligospora) LWGQCGGIGWTGATNCVAGAACSTLNPYYAQCLSAAATTPRTTTTPATTTR—
TTTPATTPRTTTPATTPATTTTTPGGVGNPVTINGPFAGRKQHYNAYYSSEIYNIAVPSLVAASLTAAAKAV ATVPTFVWFDTI DKLSQLEGHLNDIRAKRATGED—
TLGI FVVYDLPDRDCAALASNGELSIANNGVNIYKTYI DPMVAI FKRYPDIPLALVIEPDSVANMITNMGTA KCANAKSAYEECISYAVQKLNLPNIAMYLDAGHAGWLGWPDNLSKSGPYYANIYKNAGSPASFRGLATNVAN YNAWSISTCPPYTQGASICDEKRYINAFGPLLRQNGW-
DAHFIVDQGRSGKQPTGQGQWGDWCNAIGTGFGIRPDTATNDALLDAFVWIKPGGECDGTSDTSAVRYDSHC GSS-SSLKPAPEAGTWFQAYFEQLLVNANPSF
[0154] SEQ ID NO : 44 (tr_Q6UJX9_Acomycota_Trichoderma_viride)
QACSSVWGQCGGQNWSGPTCCASGSTCVYSNDYYSQCLPG-AASSSSSTRAASTTS—
RVSPTTSRSSSATPPPGSTTTRVPPVGSGTATYSGPFVGVTPWANAYYASEVSSLAIPSLT- GAMATAAAAVAKVPSFMWLDTLDKTPLMEQTLADIRTANKNGGN—
YAGQFVVYDLPDRDCAALASNGEYSIADGGVAKYKNYI DTIRQIWEYSDIRTLLVIEPDSLANLVTNLGTP KCANAPSAYLECINYAVTQLNLPNVAMYLDAGHAGWLGWPANQDPAAQLFANVYKNASSPRALRGLATNVAN YNGWNITSPPSYTQGNAVYNEKLYIHAIGPLLANHGWSNAFFITDQGRSGKQPTGQQQWGDWCNVIGTGFGI RPSANTGDSLLDSFVWVKPGGECDGTSDSSAPRFDSHCALP-DALQPAPQAGAWFQAYFVQLLTNANPSF
[0155] SEQ ID NO: 45 (tr_Q66PNl_Ascomycota_Trichoderma_parceramosum) QACSSVWGQCGGQNWSGPTCCAAGSTCVYSNDYYSQCPPG-AASSSSSTRASSTT —RVSSTT- STSSATPPPGSTTTRVPPVGSGTATYSGPFVGVTPWANAYYASEVSSLAIPSLT- GAMATAAAAVAKVPSFMWLDTLDKTPLMEQTLADIRTANKNGG —
YAGQFVVYDLPDRDCAALASNGEYSIADGGVAKYKNYI DTIRQIWEYSDIRTILVIEPDSLANLVTNLGTP KCANAQSAYLECINYAITQLNLPNIAMYLDAGHAGWLGWPANQDPAAQLFANVYKNASSPSALRGLATNVAN YNGWNITSPPSYTQGNAVYNEKLYIHAIGPLLANHGWSNAFFITDQGRSGKQPTGQQQWGDWCNVIGTGFGI RPSSNTGDSLLDSFVWVKPGGECDGTSDSSAPRFDSHCALP-DALQPAPQAGAWFQAYFVQLLTNANPSF
[0156] SEQ ID NO: 46 (tr_G4TC42_Basidiomycota_Piriformospora_indica) AGQWGQCGGNGYTGPTQCPSGWVCTPVSPWYYQCLQGTRSSSSSSSSSRSTSS—
SSSTRSTSTSSSSTRSTSTSTATTSGSTVI PTATGPFSGKTVWLSTYYAAEVDS- AADQVSDATLKAKILKVKEI PTFTWLDTIAKVATLDDYLPAASG
KI FQLVVYDLPNRDCHANASNGELFFDQGGAAKYQGYI DGIAAAVKRNPSTTVIAVIEPDSLANLVTNLSDP
RCSAAADGYKSSTTYALKTLAAAGVYMYMDAGHAGWLGWPANISPAADLFVTMWTNGGKSPFIRGLATNVAN
YNALTAASPDPATQGNANYDETHYINALAPMLRTKGW-NAQFIVDQGRSGVQNI-
RSAWGNWCNIKGAGFGLRPTTNTGNQYIDAIVWIKPGGESDGTSNTSATRYDTMCGGP-
DAKI PAPEAGQWFQAYFVDLVNNANPAF
[0157] SEQ ID NO: 47 (tr_Q9Y894_Basidiomycota_Volvariella_volvacea) QRPWGQCGGPGWTGPTCCVTGCTCPVTND-YSQCLPG—TTTTTPGPPSTTTT
PTSGGTPPPNNAATTTA
TTAVNGPVTGWQPFLTPYYAGEVAAPLAPDIDTPALSTKAAAVANIPTFNWFDT-AKGPDLGAYLGMF
LGNQIWYDLPDRDCAALRWRRMESLASQTMGSTTTRATSI WLLRSRNTLESLQVIEPDS ATRCIWPQ
LCGVYMYLDAGHAGWLGWPANLNPAAQLFSQLYRDAGSPQYVRGLATNVANYNALCTPPRPSHTRQSQLCRS
SLHQRALTPAVQSGGF-PAHFIVDQGRSGVQNI-RQQWGDWYDQHACGYCIRPASNYYHRFIPHRRHC-
LGQTRRRVNPAAKAAGDSTCSLT-DAPQPAPQAGTWFQAYFGTLVPAANPTE
[0158] SEQ ID NO: 48 (tr_Q96TP4_Basidiomycota_Pleurotus_sajor-caju) VGEWGQCGGINYTGSTTCDAGLVCNVINDYYHQCLP
—TPDA GPYIGYQIYLSPYYADEVAA- AVSAISNPALAAKAASVANI PTFIWFDVVAKVPTLGTYLADALS IQQSTGRN- QLVQIVVYDLPDRDCAALASNGEFSIANNGLANYKNYVDQIVAQLSEYPQIRWAVVEPDSLANMVTNLNVP KCAGAQAAYTEGVTYALQKLNTVGVYSYVDAGHAGWLGWPANLGPAAQLFANLYTNAGSPSFFRGLATNVAN YNLLNAPSPDPVTSPNANYDEIHYINALAPELSSRGF-PAHFIVDQGRSAVQGI- RGAWGDWCNVDNAGFGTRPTTSTGSSLI DAIVWVKPGGESDGTSDTSAVRYDGHCGLA- SAKKPAPEAHSSFQAYFEMLVANAVPAL
[0159] SEQ ID NO: 49 (tr_Q02321_Basidiomycota_Phanerochaete_chrys)
ASEWGQCGGIGWTGPTTCVSGTTCTVLNPYYSQCLPGS--TTSVITSHSSSVS—
SVSSHSGSSTSTSSPTGPTGTNPP PPPSANPWTGFQI FLSPYYANEVAA-
AAKQITDPTLSSKAASVANI PTFTWLDSVAKI PDLGTYLASASALGKSTGTK-
QLVQIVIYDLPDRDCAAKASNGEFSIANNGQANYENYIDQIVAQIQQFPDVRWAVIEPDSLANLVTNLNVQ KCANAKTTYLACVNYALTNLAKVGVYMYMDAGHAGWLGWPANLSPAAQLFTQVWQNAGKSPFIKGLATNVAN YNALQAASPDPITQGNPNYDEIHYINALAPLLQQAGW-DATFIVDQGRSGVQNI- RQQWGDWCNIKGAGFGTRPTTNTGSQFI DS IVWVKPGGECDGTSNSSSPRYDSTCSLP- DAAQPAPEAGTWFQAYFQTLVSAANPPL
[0160] SEQ ID NO: 50 (tr_C4B8Il_Basidiomycota_Coniophora_puteana) VAAYGQCGGQDWTGATACASGTACTKVNDYYYQCLPG
SSGSSVSGGSGSGSTSAPSPTSTVPTSTSSTAPSSTSTSSAASSDPYTGYQIFLNPEYASEVQA- AIPSITDSAVAAKALKVAEVPVFFWLDQVAKVPDLETYLAAADKQGKSSGQK-
QLLQIVVYDLPDRDCAANASNGEFSI SDDGQAKYENYI DQIVAIVKKYPDVRWAVVEPDSMGNLVTNMDLP
KCSAAAPTYKTCINYAIAQLSSAGVYMYVDAGHAGWLGWPNNLAPAAQLFGELYETSGKSAYFRGLATNVAN
YNALNTSSPDPCTQNAPNYDEMLYINALSPLLQQQGF-SAQFIVDQGRSGVQNI-
RNAWGDWCNIKGAGFGIRPTTDTGSPLIDSIVWVKPGGECDGTSNSSAPRYDSTCSLS-
DSLQPAPEAGTWFQQYFEALVTNAVPSL
[0161] SEQ ID NO : 51 (tr_E2 JAJ2_Basidiomycota_Neolentinus_lepideus) SPIYGQCGGTGWTGATTCASGSTCVFSNPYYSQCLPGA-TTTTTSPQPTTTTT
TTTTNSGGGNPTTTTSAPGSTSTPD-AGPFVGYTLYLSPYYAAEVQA- AAG ITDATQKAKAAS IANI PTFTWFDVIAKTSQLGTYLADASAKQKSSGQK-
YIVQIVVYDLPDRDCAAAASNGEFSIANNGLANYETYI DQLAAQIQQYPDVRWAVIEPDSLANLVTNLNVA KCSNAQTAYKAGVTYAMQQLNKVGVYMYLDAGHAGWLGWPANLTPAAQLFASLYKSAGSPSFVRGLATNVAN YNALSAASPDPITQGNSNYDEIHYINALGPMLSSQGF-PAHFIVDQGRAGVQNI- RQQWGDWCNVAGAGFGTRPTTNTGSSLI DAWWVKPGGECDGTSDTSAARYDYHCGLS- DALQPAPEAGTWFQAYFAALVKNANPPL
[0162] SEQ ID NO: 52 (tr_A8CED8_Basidiomycota_Polyporus_arcularius) APVYGQCGGIGWSGATTCVSGSVCTKQNDYYSQCLPG-AASSAPTSPPTTSAP
SSTPVSTPPTGTTGSAP SSTPAAGPFVGVTPFLSPYYAAEVAA- AADAITDSTLKAKAASVAKI PTFTWLDSVAKVPDLGTYLADASALQKSSGQP-
QVVQIVVYDLPDRDCAAKASNGEFSIADGGQAKYYDYI DQIVAQIKKFPDVRVIAVIEPDSLANLVTNLNVQ
KCANAQTTYKACVTYALNQLASVGVYQYMDAGHAGWLGWPANIQPAAQLFADMFKSANSSKFVRGLATNVAN
YNALSAASPDPITQGDPNYDELHYINALGPMLAQQGF-PAQFWDQGRSGQQNL-
RQQWGDWCNIKGAGFGTRPTTNTGSSLIDAIVWVKPGGESDGTSNSSSPRFDSTCSLS-
DATQPAPEAGTWFQTYFETLVSKANPPL
[0163] SEQ ID NO: 53 (tr_A8NEJ3_Basidiomycota_Coprinopsis_cinerea) RPLYAQCGGTGWTGETTCVSGAVCEVINQWYHQCLPGS QPPVTTQPPV
WPTTSQPPVWPTNPP—GGTPVPSTGPFEGYDIYLSPYYAEEVEA- AAAMIDDPVLKAKALKVKEIPTFIWFDVVRKTPDLGRYLADATAIQQRTGRK-
QLVQIVVYDLPDRDCAAAASNGEFSLADGGMEKYKDYVDRLASEIRKYPDVRIVAVIEPDSLANMVTNMNVA KCRGAEAAYKEGVIYALRQLSALGVYSYVDAGHAGWLGWNANLAPSARLFAQIYKDAGRSAFIRGLATNVSN YNALSATTRDPVTQGNDNYDELRFINALAPLLRNEGW-DAKFIVDQGRSGVQNI- RQEWGNWCNVYGAGFGMRPTLNTPSSAIDAIVWIKPGGEADGTSDTSAPRYDTHCGKS- DSHKPAPEAGTWFQEYFVNLVKNANPPL
[0164] SEQ ID NO: 54 (tr_B2ZZ24_Basidiomycota_Irpex_lacteus) AQTWAQCGGIGFTGPTTCVAGSVCTKQNDYYSQCIPGS TTPTSAPT
SAPTSQPSQPSSTSSAPSGPSSTPTPSAPWTGYQIYLSPYYANEVAA- AAKAITDPTLAAKAASVANI PNFTWLDSVSKIADLKTYLADASALGKSSGQK- QLLQIVVYDLPDRDCAAKASNGEFSIADNGLANYQNYI DQIVAAVKQFPDVRWAVIEPDSLANLVTNLNVQ KCANAKSTYLTAVNYALKQLSSVGVYQYMDAGHAGWLGWPANLTPAAQLFAQVYSDAGKSPFIKGLATNVAN YNALSAASPDPITQGDPNYDEIHYINALAPALQSAGF-PATFIVDQGRSGQQNH- RQQWGDWCNIKGAGFGTRPTTNTGSSLIDSIVWVKPGGESDGTSNSSSPRFDSTCSLS- DATQPAPEAGTWFQAYFETLVSKANPPL
[0165] SEQ ID NO: 55 (tr_Q96VU2_Basidiomycota_Lentinula_edodes) LYGQCGGIGWSGATTCVSGATCTWNAYYSQCLPG SASAP
PTSTSS IGTGTTTSSAPGSTGTTTPAAGPFTGYEIYLSPYYANEIAA- AVTQISDPTTAAAAAKVANI PTFIWLDQVAKVPDLGTYLADASAKQKSEGKN-
YLVQIVVYDLPDRDCAALASNGEFTIADNGEANYHDYI DQIVAQIKQYPDVHWAVIEPDSLANLVTNLSVA KCANAQTTYLECVTYAMQQLSAVGVTMYLDAGHAGWLGWPANLSPAAQLFTSLYSNAGSPSGVRGLATNVAN
YNALVATTPDPITQGDPNYDEMLYIEALAPLL GSFPAHFIVDQGRSGVQDI -
RQQWGDWCNVLGAGFGTQPTTNTGSSLIDSIVWVKPGGECDGTSNTSSPRYDAHCGLP- DATPNAPEAGTWFQAYFETLVEKANPPL
[0166] SEQ ID NO: 56 (tr_Q6E5B I Basidiomycota Volvariella volvacea) SPLYGQCGGNGWTGPKTCVSGATCTVINDWYWQCLPG NGPTS
SSPTSTPTTTTTT GGPQPTVPAAGPYTGYEIYLSPYYAAEAQA-
AAAQISDATQKAKALKVAQI PTFTWFDVIAKTSTLGDYLAEASALGKSSGKK- YLVQIVVYDLPDRDCAALASNGEFSIANNGLNNYKGYI DQLVAQIKKYPDVRWAVIEPDSLANLVTNLNVS KCANAQTAYKAGVTYALQQLNSVGVYMYLDAGHAGWLGWPANLNPAAQLFSQLYRDAGSPQYVRGLATNVAN YNALSASSPDPVTQGNPNYDELHYINALAPALQSGGF-PAHFIVDQGRSGVQNI- RQQWGDWCNVKGAGFGQRPTLSTGSSLIDAIVWIKPGGECDGTTNTSSPRYDSHCGLS- DATPNAPEAGQWFQAYFETLVRNASPPL
[0167] SEQ ID NO: 57 (tr_F8Q7V9_Basidiomycota_Serpula_lacrymans) ASLYGQCGGVGWTGATTCDSGSSCQEINSYYSQCLPG STTVPTTPTTQPA
SASGTTTSSAPTS TATGAAGPFTGYEIYLSPYYVAEVQA-
AVA ITDSALQAKASKVA I PNFTWLDEVAKVPTLGTYLADADALAKSSGNE-
QLLQIVVYDLPDRDCAAAASNGEFSIANNGQANYFNYI DQIVAQIQKYPGVRWAI IEPDSMANLVTNLSVA
KCANAASTYKACVQYALEQLATVGVYMYLDAGHAGWLGWPANLSPAAQLFAQTYQAAGSSPFFRGLATNVAN
YNALTTTSPDPITQGDANYDELLYIQALSPLLIEQGF-PAQFIVDQGRSGVQNI-
RSAWGDWCNVKGAGFGTQPTTNTGSSLIDAIAWIKPGGECDGTSDSSSLRYDPHCSLS-
DALQPAPEAGTWFQTYFEQLVSNANPAL
APPENDIX 2
[0168] SEQ ID NO: 58 (node #58)
QNCAAGWGAC GGGGWGGDTS CTGGGSCGVG APTYSGCGAG TPTSSGNVDP TTTSSGNVDP TTTSSPTTTS SGNVDPTTSA APGGGNCSPA VSGPFADAHL FVDPYYVKKV DSLSIAQVTD TALKAKMEKV KQIQTAFWLD RIEAIKELPA YLDAALKLQN ELCEPPVTAL IWYDLPNRD CFAEASNGEL HLDQNGLQRY KEYIDPIKQI LKKYSGQRIV AVIEPDSLPN LVTNLGGKRC DETQASYRDG VAYTLKELNL PHVYQYLDAG HSGWLGWPDN QKKGAKIFAE VIKAAGSPAN VRGFATNVAN YTQLSYTAES YDQQDNPCFG EFDYVDAMAS ALSAEGLGDK HFI IDTSRNG VGNIGREDWG YWCNNKGAGM GQRPKANGGA TLLDAFVWVK PPGDSDGVGQ EGQPRYDLFC GKENADTRAP QAGQWFHEYF VQCVKNANPP L
[0169] SEQ ID NO: 59 (node #59)
QNCAAGWGAC GGQGWTGDTS CVGGGSCGVA NPTYSQCWAG SPTSTTTTDP PTTTSSSVGT TSTSNPTTTS TPPPTPTTTS APSGPTSTPA VSGPFSGVQL YLNPYYVAEV DALAIAQITD SALKAKAEKV KNI PTAIWLD TIENVPQLPT YLDDALALQK AGGKKPVTAV FWYDLPDRD CHALASNGEL SIDDNGLQRY KAYIDSIKAQ LKKYSDQRIV LVIEPDSLAN LVTNLNAAKC ADAQASYKEG VAYAIKKLNL PHVYMYLDAG HAGWLGWPDN QEKAAKVFAE VIKNAGSPAK VRGFATNVAN YTALSDTSRS PDTQGNPCCD EKHYI DAMAS ALQAQGLSDA HFI IDTSRNG VQNIDRQQWG DWCNVKGAGL GARPKANSGA SLLDAFVWVK PPGESDGTSD ESAARYDSYC GREDALQPAP EAGQWFQEYF EQLLKNANPP L
[0170] SEQ ID NO: 60 (node #60)
QDCAAGWGAC GGQEWTDDNG CEGGGWCGVA NPTYSSCWAG SPTSTTTTDV SYTDSDSVGK WGVENPTTTS TQPPTPTTTS APSGPTSTPT VSGPFSGVQL YLNPYYVAEV DALAIAQLTD SALKAKAEKV KNI PTAIWLD TIENVPQLPT YLDDALALQK SGGKKPVTW FWYDLPDRD CHALASNGEL LINDNGLQRY KAYIDSIEEQ LKKYSNQRVV LIIEPDSLAN LVTNLNTAKC ADAQNSYKEG VAYAIKKFGL PHVYMYLDAG HAGWLGWPDN REKAAKVFAE VIKNAGSPGK VRGFTTNVAN YTPLSDTSRG PDTQGNPCCD EKHYLDAMAS ALQAQGISDV HFI IDTSRNG VKNIDRKQSG DWCNVKGAGL GARPKANSGT SLLDAFVWIK PPGESDGTSD ESAARYDSYC GREDALQPAP EAGQWFQEYF EQLLKNANPP L
[0171] SEQ ID NO: 61 (node #61)
LDCAAGYPCC SGSEYTDDDG VENGNWCGIA DPTYESCWAE SPCSTTTTEV EYTDSDSVGK WGVENPVPTG TQTPTPTSGS DPSTPGSPLT ISGPFSGVEF YLNPYYVAEV DALAIAQMTD SSLKAKAEKM KTFSNAIWLD TIKNMQQLET NLKGALAQHQ TGGKKPVLTV FWYDLPGRD CHALASNGEL LANDSDLQRY KTYIDVIEEK LKKYNSQPVV LIIEPDSLAN LVTNLNTPAC ADSEKYYLEG HAYLIKKFGL PHVAMYLDIG HAFWLGWDDN REKAAKVYAK VIKSSGSPGK VRGFTDNVAN YTPWEDPSRG PDTEWNPCPD EKRYLEAMHK DFKAAGISSV YFVSDTSRNG HKNVDRKHPG EWCNQTGVGI GARPKANSGM DYLDAFYWIK PLGESDGTSD ESAARYDGYC GHETAMKPAP EAGQWFQKHF EQGLKNANPP L
[0172] SEQ ID NO: 62 (node #62)
QNCASVWGQC GGQGWTGATS CVSGSTCTVS NDYYSQCLPG SATTTTTTTS PTTTTSSTST TSTSTSTTTS TPPPTPTTTS APSGPTSTAA VSGPFTGYQL YLNPYYAAEV AALAIPQITD GALKAKAAKV ANI PTFIWLD TVAKVPTLGT YLADARALQK AGGKKPIAVQ FWYDLPDRD CAALASNGEF SIADNGLAKY KAYIDSIVAQ LKKYSDVRVV LVIEPDSLAN LVTNLNVAKC ANAQTAYKEG VTYAIKQLNL PNVYMYLDAG HAGWLGWPAN LGPAAQLFAQ VYKNAGSPAA VRGLATNVAN YNALSATSRP PYTQGNPNCD EKHYINALAP LLQAQGFSPA HFIVDTGRNG VQNIGRQQWG DWCNVKGAGF GVRPTTNTGS SLIDAFVWVK PGGESDGTSD TSAARYDSHC GLSDALQPAP EAGTWFQAYF EQLLKNANPP L
[0173] SEQ ID NO: 63 (node #63)
QNCAPLWGQC GGTGWTGATT CVSGSTCTVI NDYYSQCLPG SATTTTTTTS PTTTTSSTSS SSTSTSTTTS TPPPTPTTTS APSGSTSTTP VSGPFTGYQL YLSPYYAAEV AALAVPQITD PALKAKAAKV ANI PTFIWFD TVAKVPTLGT YLADASALQK SSGKKPIAVQ IWYDLPDRD CAALASNGEF SIADNGLANY KNYIDSIVAQ LKKYSDVRVV AVIEPDSLAN LVTNLNVAKC ANAQTAYKEG VTYALQQLNL VGVYMYLDAG HAGWLGWPAN LSPAAQLFAQ LYKNAGSPSF VRGLATNVAN YNALSATSRP PYTQGNPNCD EIHYINALAP LLQSQGFSPA HFIVDQGRSG VQNIGRQQWG DWCNVKGAGF GTRPTTNTGS SLIDAIVWVK PGGESDGTSD TSAPRYDSHC GLSDATQPAP EAGTWFQAYF ETLVKNANPP L
[0174] SEQ ID NO: 64 (node #64)
QNCAPLWGQC GGTGWTGATT CVSGSTCTVI NDYYSQCLPG SATTTTTTSS PTTTTSSTSS SSTSTSTTTS SPPTTPTTTS APSGSTSTTP AAGPFTGYQI YLSPYYAAEV AALAVAQISD PTLKAKAAKV ANI PTFIWFD WAKVPTLGT YLADASALQK SSGKKPQLVQ IWYDLPDRD CAALASNGEF SIADNGLANY KNYIDQIVAQ IKKYPDVRVV AVIEPDSLAN LVTNLNVAKC ANAQTAYKEG VTYALQQLNS VGVYMYLDAG HAGWLGWPAN LSPAAQLFAQ LYKNAGSPSF VRGLATNVAN YNALSATSPD PITQGNPNYD EIHYINALAP LLQSQGFFPA HFIVDQGRSG VQNIGRQQWG DWCNVKGAGF GTRPTTNTGS SLIDAIVWVK PGGECDGTSD TSAPRYDSHC GLSDATQPAP EAGTWFQAYF ETLVKNANPP L
[0175] SEQ ID NO: 65 (node #65)
QNCAPLWGQC GGTGWTGATT CVSGSVCTVI NDYYHQCLPG SATTTTTTSS PTTTTSSVSS SSTSTSTSTS SPPPTPTTTS APSGSTSTTP AAGPFTGYQI YLSPYYAAEV AALAVAQISD PTLKAKAAKV ANI PTFIWFD WAKVPTLGT YLADASAIQQ STGRKPQLVQ IWYDLPDRD CAALASNGEF SIADNGLANY KNYIDQIVAQ IKKYPDVRVV AVIEPDSLAN MVTNLNVAKC AGAQAAYKEG VTYALQQLNS VGVYMYVDAG HAGWLGWPAN LSPAAQLFAQ LYKNAGSPSF VRGLATNVAN YNALSATSPD PVTQGNANYD EIHYINALAP LLQSQGFFPA HFIVDQGRSG VQNIGRQQWG DWCNVKGAGF GTRPTTNTGS SLIDAIVWVK PGGESDGTSD TSAPRYDSHC GLSDAKKPAP EAGTWFQAYF ETLVKNANPP L
[0176] SEQ ID NO: 66 (node #66)
QNCAPLWGQC GGTGWTGATT CVSGSVCTVI NQWYHQCLPG SATTTTTSSS PTTTTSPVSS SSTSTSTSTS SPPPTPTTTS APSGSTTTTP ATGPFTGYQI YLSPYYAAEV EALAAAQISD PTLKAKALKV KEI PTFIWFD WAKVPTLGT YLADASAIQQ STGRKPQLVQ IWYDLPDRD CAAAASNGEF SIADGGLAKY KNYI DRIAAQ IKKYPDVRVV AVIEPDSLAN MVTNLNVAKC AGAQAAYKEG VTYALQQLSA VGVYMYVDAG HAGWLGWPAN LSPAAQLFAQ LYKNAGRSSF IRGLATNVAN YNALSATSPD PVTQGNANYD EIHYINALAP LLRSQGWFDA QFIVDQGRSG VQNIGRQAWG NWCNVKGAGF GMRPTTNTGS SLIDAIVWIK PGGESDGTSD TSAPRYDTHC GLSDAKKPAP EAGTWFQAYF VNLVKNANPP L
[0177] SEQ ID NO: 67 (node #67)
QNCAPLYGQC GGTGWTGATT CVSGSTCTVI NDYYSQCLPG SATTTTTTSS PTTTTSSTSS SSTSTSTTTS SPPTTPTTTS APSGSTSTTP AAGPFTGYQI YLSPYYAAEV AALAVAQISD PTLKAKAAKV ANI PTFIWFD WAKVPTLGT YLADASALQK SSGKKPQLVQ IWYDLPDRD CAALASNGEF SIADNGLANY KNYIDQIVAQ IKKYPDVRVV AVIEPDSLAN LVTNLNVAKC ANAQTAYKEG VTYALQQLNS VGVYMYLDAG HAGWLGWPAN LSPAAQLFAQ LYKNAGSPSF VRGLATNVAN YNALSATSPD PITQGNPNYD EIHYINALAP LLQSQGFFPA HFIVDQGRSG VQNIGRQQWG DWCNVKGAGF GTRPTTNTGS SLIDAIVWVK PGGECDGTSD TSAPRYDSHC GLSDATQPAP EAGTWFQAYF ETLVKNANPP L
[0178] SEQ ID NO: 68 (node #68)
QNCSPLYGQC GGTGWTGATT CVSGSTCTVI NDYYSQCLPG SATTTTTTSS PTTTTSSTSS SSTSTSTTTS SPPTTPTTTS APSGSTSTTP AAGPFTGYQI YLSPYYAAEV QALAAAQISD ATQKAKAAKV ANI PTFTWFD VIAKTSTLGT YLADASALQK SSGKKPYLVQ IWYDLPDRD CAALASNGEF SIANNGLANY KNYIDQLVAQ IKKYPDVRVV AVIEPDSLAN LVTNLNVAKC ANAQTAYKAG VTYALQQLNS VGVYMYLDAG HAGWLGWPAN LSPAAQLFAQ LYKNAGSPSF VRGLATNVAN YNALSATSPD PITQGNPNYD EIHYINALAP MLQSQGFFPA HFIVDQGRSG VQNIGRQQWG DWCNVKGAGF GTRPTTNTGS SLIDAIVWVK PGGECDGTSD TSAPRYDSHC GLSDATQPAP EAGTWFQAYF ETLVKNANPP L
[0179] SEQ ID NO: 69 (node #69)
QNCAPLYGQC GGIGWTGATT CVSGSTCTVI NDYYSQCLPG SATTTTTTSS PTTSASSTSS SSTSTSTSTS STPTTPTTTS APSGSTSTTP AAGPFTGYQI YLSPYYAAEV AALAVAQITD PTLKAKAAKV ANI PTFIWLD QVAKVPDLGT YLADASALQK SSGKKPQLVQ IWYDLPDRD CAALASNGEF SIADNGQANY HNYIDQIVAQ IKKYPDVRVV AVIEPDSLAN LVTNLNVAKC ANAQTTYKEC VTYALQQLSS VGVYMYLDAG HAGWLGWPAN LSPAAQLFAQ LYKNAGSPSF VRGLATNVAN YNALSATSPD PITQGDPNYD EMHYINALAP LLQQQGFFPA HFIVDQGRSG VQNIGRQQWG DWCNVKGAGF GTRPTTNTGS SLIDAIVWVK PGGECDGTSN TSSPRYDSHC GLSDATQPAP EAGTWFQAYF ETLVKKANPP L
[0180] SEQ ID NO: 70 (node #70)
QNCAPLYGQC GGIGWTGATT CVSGSTCTKI NDYYSQCLPG SATTTTVTTS PTTSASSTSS SSTSASSSTS STPTTPTTTS APSGSTSTTP AAGPFTGYQI YLSPYYAAEV AALAVAQITD PTLKAKAAKV ANI PTFTWLD QVAKVPDLGT YLADASALQK SSGQKPQLVQ IWYDLPDRD CAAKASNGEF SIADNGQANY HNYIDQIVAQ IKKYPDVRVV AVIEPDSLAN LVTNLNVAKC ANAQTTYKAC VTYALQQLSS VGVYMYLDAG HAGWLGWPAN LSPAAQLFAQ LYKNAGSSPF VRGLATNVAN YNALSATSPD PITQGDPNYD EMHYINALAP LLQQQGFFPA QFIVDQGRSG VQNIGRQQWG DWCNVKGAGF GTRPTTNTGS SLIDAIVWVK PGGECDGTSN SSSPRYDSHC SLSDATQPAP EAGTWFQAYF ETLVSKANPP L
[0181] SEQ ID NO: 71 (node #71)
QNCASLYGQC GGIGWTGATT CVSGSTCTKI NDYYSQCLPG SATSTTVTTS PTTSAAGSSS SSTSASSSTS TTPTTPTSTS APSGSTSTTA AAGPFTGYQI YLSPYYAAEV QALAVANITD SALKAKAAKV ANI PTFTWLD QVAKVPDLGT YLADADALAK SSGQKPQLLQ IWYDLPDRD CAAKASNGEF SIADNGQANY HNYIDQIVAQ IKKYPDVRVV AVIEPDSMAN LVTNLNVAKC ANAATTYKAC VTYALEQLSS VGVYMYLDAG HAGWLGWPAN LSPAAQLFAQ LYQTAGSSPF FRGLATNVAN YNALSTTSPD PITQGDPNYD EMLYINALSP LLQQQGFFPA QFIVDQGRSG VQNIGRSAWG DWCNVKGAGF GTRPTTNTGS SLIDAIVWVK PGGECDGTSN SSSPRYDSHC SLSDALQPAP EAGTWFQAYF ETLVSNANPP L
[0182] SEQ ID NO: 72 (node #72)
QNCAPVYGQC GGIGWTGATT CVSGSTCTKQ NDYYSQCLPG SAASTTVTTS PTTSASSTSS SSTSASSTTS STPTTPTTTS APSGPTSTTP AAGPFTGYQI YLSPYYAAEV AALAAAQITD PTLKAKAASV ANI PTFTWLD SVAKVPDLGT YLADASALQK SSGQKPQLVQ IWYDLPDRD CAAKASNGEF SIADNGQANY HNYIDQIVAQ IKKFPDVRVV AVIEPDSLAN LVTNLNVQKC ANAQTTYKAC VTYALKQLSS VGVYMYMDAG HAGWLGWPAN LSPAAQLFAQ LYKNAGSSPF VRGLATNVAN YNALSAASPD PITQGDPNYD EIHYINALAP LLQQQGFFPA QFIVDQGRSG VQNIGRQQWG DWCNIKGAGF GTRPTTNTGS SLIDAIVWVK PGGECDGTSN SSSPRYDSTC SLSDATQPAP EAGTWFQAYF ETLVSKANPP L
[0183] SEQ ID NO: 73 (node #73)
QNCAPVWGQC GGIGWTGPTT CVSGSTCTKQ NDYYSQCLPG SAATTTVTTS PTSSASSTSV SSHSGSSTTS SSPTTPTTTS APSGPSSTPP AAGPWTGYQI YLSPYYANEV AALAAKQITD PTLAAKAASV ANI PTFTWLD SVAKIPDLGT YLADASALGK SSGQKPQLVQ IWYDLPDRD CAAKASNGEF SIADNGQANY QNYIDQIVAQ IKQFPDVRVV AVIEPDSLAN LVTNLNVQKC ANAKTTYLAC VNYALKQLSS VGVYMYMDAG HAGWLGWPAN LSPAAQLFAQ VYKNAGKSPF IKGLATNVAN YNALSAASPD PITQGDPNYD EIHYINALAP LLQQAGFFPA TFIVDQGRSG VQNIGRQQWG DWCNIKGAGF GTRPTTNTGS SLIDSIVWVK PGGECDGTSN SSSPRYDSTC SLSDATQPAP EAGTWFQAYF ETLVSKANPP L
[0184] SEQ ID NO: 74 (node #74)
QNCASVWGQC GGQGWSGATS CVSGSTCVVS NDYYSQCLPG SATTTTTTTT PTTTTSSTTT TSTTTSTTTT TPPPTATTTS APSGPTTTAT VSGPFSGYQL YANPYYASEV HALAI PSLTD GAMAAKAAAV AKVPTFVWLD TAAKVPTMGT YLADIRALNK AGANPPIAGQ FWYDLPDRD CAALASNGEY SIADNGLAKY KAYIDSIVAQ LKKYSDVRVI LVIEPDSLAN LVTNLNVAKC ANAQSAYLEG INYAITQLNL PNVAMYLDAG HAGWLGWPAN LGPAAQLFAQ VYKNAGSPAA VRGLATNVAN YNAWSITSCP SYTQGNSNCD EKHYINALAP LLTAQGFSPA HFIVDTGRNG VQPTGQQAWG DWCNVIGTGF GVRPTTNTGD ALVDAFVWVK PGGESDGTSD TSAARYDSHC GLSDALQPAP EAGTWFQAYF EQLLTNANPS F
[0185] SEQ ID NO: 75 (node #75)
QNCASVWGQC GGQGWSGATS CVSGSTCVVS NDYYSQCLPG SATTTTTTTT PTTTTSSTTT TSTTTSTTTT TPPPTATTTS APSGPTTTAT VSGPFSGYQL YANPYYASEV HTLAIPSLTD GAMAAKAAAV AKVPSFVWLD TAAKVPTMGT YLADIRALNK AGANPPIAGQ FWYDLPDRD CAALASNGEY SIADNGVAKY KAYI DS IRAQ LKKYSDVRI I LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC INYAITQLNL PNVAMYLDAG HAGWLGWPAN LGPAAQLFAQ VYKNAGSPAA VRGLATNVAN YNAWSISSCP SYTQGNSNCD EKHYINALAP LLTAQGFSDA HFIVDTGRNG VQPTGQQAWG DWCNVIGTGF GVRPTTNTGD ALVDAFVWVK PGGESDGTSD TSAARYDSHC GLSDALQPAP EAGTWFQAYF EQLLTNANPS F
[0186] SEQ ID NO: 76 (node #76)
QNCQTVWGQC GGQGWTGATS CVAGATCSTL NPYYAQCLPG TATTTTTTTT PTTTTSSTTT TSTTTTTTTT TPTTTTTTTS APSGPTTTAT ASGPFSGYQL YANPYYSSEV HTLAIPSLTD GSLAAKASAV AKVPSFVWLD TAAKVPTMGT YLADIRAKNK AGANPPIAGI FWYDLPDRD CAALASNGEY SIADNGVANY KAYI DS IRAQ LKKYSDVHI I LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC INYALTQLNL PNVAMYLDAG HAGWLGWPAN LGPAAQLFAS VYKNAGSPAA VRGLATNVAN YNAWSISSCP SYTQGNSNCD EKRYINALAP LLKAQGFSDA HFIVDTGRNG VQPTGQQAWG DWCNVIGTGF GVRPTTNTGD ALVDAFVWVK PGGESDGTSD TSAARYDAHC GYSDALQPAP EAGTWFQAYF EQLLTNANPS F
[0187] SEQ ID NO: 77 (node #77)
QNCQTLWGQC GGQGWTGATS CVAGATCSTL NPYYAQCLPA TATTTTTTTT PTTTTSSTTT
TSATTTTTTT TPTTTTTTTS APSGPTTTAT ASGPFSGYQL YVNPYYSSEV HSLAIPSLTD
GSLAAKASAV AKVPSFVWLD TAAKVPTMGT YLADIRAKNA AGANPPIAGI FWYDLPDRD
CAALASNGEY SIADNGVANY KAYI DS IRAQ LKKYSDVHI I LVIEPDSLAN LVTNLNVAKC
ANAQSAYLEC INYALTQLNL PNVAMYLDAG HAGWLGWPAN LGPAAQLFAS VYKNAGSPAA VRGLATNVAN YNAWSISSCP SYTQGNSVCD EKRYINALAP LLKAQGFSDA HFIVDTGRNG KQPTGQQAWG DWCNVIGTGF GVRPTTNTGD ALVDAFVWVK PGGESDGTSD TSAARYDAHC GYSDALQPAP EAGTWFQAYF EQLLTNANPS F
[0188] SEQ ID NO: 78 (node #78)
QNCQTLWGQC GGQGWTGATS CVAGATCSTL NPYYAQCLPA TATTTTSTTT TTTTSSSTTT TSAATATTTT TPTSTTTTTS APSGPTTTAT ASGPFSGYQL YVNPYYSSEV QSLAIPSLTD GSLAAKASAA AKVPSFVWLD TAAKVPTMGT YLADIRAKNA AGANPPIAGI FWYDLPDRD CAALASNGEY SIADNGVENY KAYIDSIREQ LVKYSDVHII LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC TNYALTQLNL PNVAMYLDAG HAGWLGWPAN LGPAAQLFAS VYKNAGSPAA VRGLATNVAN YNAWSISSCP SYTQGNSVCD EKRYINALAP LLKAQGFSDA HFIVDTGRNG KQPTGQQAWG DWCNVIGTGF GVRPTTNTGD ALVDAFVWVK PGGESDGTSD TSAARYDAHC GYSDALQPAP EAGTWFQAYF EQLLTNANPS F
[0189] SEQ ID NO: 79 (node #79)
QNCQTLWGQC GGQGWTGATS CVAGATCSTL NQYYAQCLPA TASTTTSTTT TTTTSSSTTT TSAATATTTT TPTSTTTTTS APSSPTTTAT ASGPFSGYQL YVNPYYSSEV QSLAIPSLTD GSLAAAATAA AKVPSFVWLD TAAKVPTMGT YLADIRSQNA AGANPPIAGQ FWYDLPDRD CAALASNGEY SIADNGVEHY KAYI DS IREI LVQYSDVHII LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC TNYALTQLNL PNVAMYLDAG HAGWLGWPAN LQPAADLFAS VYKNAGSPAA VRGLATNVAN YNAWSISSCP SYTQGNSVCD EKQYINALAP LLKAQGFSDA HFIVDTGRNG KQPTGQQAWG DWCNVIGTGF GVRPTTNTGD ALVDAFVWVK PGGESDGTSD TSAARYDAHC GYSDALQPAP EAGTWFQAYF VQLLTNANPS F
[0190] SEQ ID NO: 80 (node #80)
QNCQTLWGQC GGQGYSGATS CVAGATCSTI NEYYAQCTPA TASTTTATTT TTTTSSSTTT TTAAVATTTT TPTSTPSASA SPSSPTTTAS ASGPFSGYQL YVNPYYSSEV ASLAI PSLTD GSLQAAATAA AKVPSFVWLD TAAKVPTMGD YLADIKSQNA AGANPPIAGQ FWYDLPDRD CAALASNGEY SIADNGVEHY KAYI DS IREI LVQYSDVHTL LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC TNYALTQLNL PNVAMYLDAG HAGWLGWPAN QQPAADLFAS VYKNASSPAA VRGLATNVAN YNAWTISSCP SYTQGNSVCD EQQYINAIAP LLQAQGFSDA HFIVDTGRNG KQPTGQQAWG DWCNVINTGF GVRPTTNTGD ALVDAFVWVK PGGESDGTSD SSATRYDAHC GYSDALQPAP EAGTWFQAYF VQLLTNANPA F
[0191] SEQ ID NO: 81 (node #81)
QNCQTVWGQC GGQGWTGATS CVAGATCSTL NPYYAQCLPG TATTTTTTTT PTTTTSSTTT TSTTTTSTTT TPTTTTTTTS APSGPTTTAT ASGPFSGYQL YANPYYSSEV HTLAIPSLTD GSLAPKASAV AKVPSFVWLD TAAKVPTMGT YLADIQAKNK AGANPPIAGI FWYDLPDRD CAALASNGEY S IANNGVANY KAYI DS IRAQ LKKYSDVHTI LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC VNYALTQLNL PNVAMYLDAG HAGWLGWPAN LGPAAQLFAS VYKNAGSPAA VRGLATNVAN YNAWSISSCP SYTQGDSNCD EKRYINALAP LLKAQGFPDA HFIMDTSRNG VQPTKQQAWG DWCNVIGTGF GVRPTTNTGD PLEDAFVWVK PGGESDGTSD TSAARYDAHC GYSDALQPAP EAGTWFQAYF EQLLTNANPS F
[0192] SEQ ID NO: 82 (node #82)
QNCQTVWGQC GGQGWSGPTS CVAGAACSTL NPYYAQCLPG TATTTTTTTT TTTTTTSTTT TSTTTTSTTT TPTTTGTTTS APSGPTTTAT ASGPFSGYQL YANPYYSSEV HTLAIPSLTD SSLAPKASAV AKVPSFVWLD TAAKVPTMGT YLADIQAKNK AGANPPIAGI FWYDLPDRD CAALASNGEY S IANNGVANY KAYI DAIRAQ LVKYSDVHTI LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC VNYALKQLNL PNVAMYLDAG HAGWLGWPAN LGPAAQLFAS VYKDAGSPAA VRGLATNVAN YNAWSISTCP SYTQGDSNCD EKRYINALAP LLKAQGFPDA HFIMDTSRNG VQPTKQQAWG DWCNVIGTGF GVRPTTNTGD PLQDAFVWVK PGGESDGTSD TSSARYDAHC GYSDALQPAP EAGTWFQAYF EQLLTNANPS F
[0193] SEQ ID NO: 83 (node #83)
QNCQTVWGQC GGQGWSGPTS CVAGAACSTL NPYYAQCIPG AATATTTTTT TATTTTSTTT TSTTTTQTTT KPTTTGPTTS APSGPTTTVT ASGPFSGYQL YANPYYSSEV HTLAMPSLPD SSLQPKASAV AEVPSFVWLD VAAKVPTMGT YLADIQAKNK AGANPPIAGI FWYDLPDRD CAALASNGEY S IANNGVANY KAYI DAIRAQ LVKYSDVHTI LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC VDYALKQLNL PNVAMYLDAG HAGWLGWPAN LGPAATLFAK VYTDAGSPAA VRGLATNVAN YNAWSLSTCP SYTQGDPNCD EKKYINAMAP LLKEAGFPDA HFIMDTSRNG VQPTKQNAWG DWCNVIGTGF GVRPSTNTGD PLQDAFVWIK PGGESDGTSN SSSARYDAHC GYSDALQPAP EAGTWFQAYF EQLLTNANPS F
[0194] SEQ ID NO: 84 (node #84)
QNCQTVWGQC GGIGWSGPTS CVAGAACSTQ NPYYAQCLPG TATTTTTTTT TSTTTTSTTT TAGTGGTTTS APSGPTITAS ASGPFSGYQL YANPYYSSEV HTLAIPSLTD SSLAPKASAV AKVPSFVWLD TAAKVPTMGT YLADIKAKNK AGANPPIAGI FWYDLPDRD CAALASNGEY SIANGGVANY KKYI DAIRAQ LLKYPDVHTI LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC VNYALKQLNL PNVAMYLDAG HAGWLGWPAN IGPAAQLFAS VYKDAGSPAA VRGLATNVAN YNAWSISTCP SYTQGDSNCD EKRYINALAP LLKAQGFPDA HFIMDTSRNG VQPTKQQAWG DWCNVIGTGF GVRPTTNTGD ALQDAFVWVK PGGESDGTSD TSSARYDAHC GYSDALQPAP EAGTWFQAYF EQLLTNANPS F
[0195] SEQ ID NO: 85 (node #85)
QNCQTVWGQC GGIGWSGPTN CVAGAACSTQ NPYYAQCLPG TSTTTSSTTT TAGTGTTTTS ASSGPTITAS PSGPFSGYQL YANPYYSSEV HTLAIPSLTD SSLAPKASAV AKVPSFVWLD TAAKVPTMGT YLADIKAKNK AGANPPIAGI FWYDLPDRD CAALASNGEY SIANGGVANY KKYI DAIRAQ LLKYPDVHTI LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC VNYALKQLNL PNVAMYIDAG HAGWLGWPAN IGPAAQLFAS VYKDAGAPAA LRGLATNVAN YNAWSISTCP SYTQGDANCD EKRYINALAP LLKAQGWPDA HFIMDTGRNG VQPTKQQAWG DWCNVIGTGF GVRPTTNTGD SLLDAFVWVK PGGESDGTSD TSSARYDAHC GYSDALQPAP EAGTWFQAYF EQLLTNANPA F
[0196] SEQ ID NO: 86 (node #86)
QNCQSVWGQC GGQGWTGATS CAAGATCSTL NPYYAQCIPA TATTTTTTTT PTTTSSTTTT TSTTTTSTTT TPTTTTTTTT APSGATTTAT ASGPFSGYQL YANPYYSSEV HTLAIPSLTD GSLAPKASAV AKVPSFVWLD TAAKVPTMGT YLADIQAKNK AGANPPIAGI FWYDLPDRD CAAAASNGEY S IANNGVANY KAYI DS IRAQ LKTYSDVHTI LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC VNYALTQLNL PNVAMYLDAG HAGWLGWPAN LGPAAQLFAS VYKNASSPAS VRGLATNVAN YNAWSISSCP SYTQGDSNCD EKRYINALAP LLKAQGFPDA HFIMDTSRNG VQPTKQQAWG DWCNVIGTGF GVRPTTNTGD PLEDAFVWVK PGGESDGTSN TSAARYDYHC GYSDALQPAP EAGTWFQAYF EQLLTNANPS F
[0197] SEQ ID NO: 87 (node #87)
QNCQSVWGQC GGQGWTGATS CAAGATCSTL NPYYAQCIPA TATTATSTTL VTTTSSTSVG TSTATTSTTT TPTTTTTTTT ASTTATTTAA ASGPFSGYQL YANPYYSSEV HTLAIPSLTD GSLAAAATKA AEIPSFVWLD TAAKVPTMGT YLANIQAANK AGASPPIAGI FWYDLPDRD CAAAASNGEY TVANNGVANY KAYIDSIVAQ LKAYPDVHTI LI IEPDSLAN MVTNLSTAKC AEAQSAYYEC VNYALINLNL ANVAMYIDAG HAGWLGWPAN LSPAAQLFAT VYKNASSPAS LRGLATNVAN YNAWSISSAP SYTSGDSNYD EQLYINALSP LLTSNGWPNA HFIMDTSRNG VQPTKQQAWG DWCNVIGTGF GVQPTTNTGD PLEDAFVWVK PGGESDGTSN SSATRYDYHC GYSDALQPAP EAGTWFQAYF VQLLTNANPS L
[0198] SEQ ID NO: 88 (node #88)
QNCQSVWGQC GGQGWTGATS CAAGSTCSTL NPYYAQCIPA TATTATSTTL VKTTSSTSVG TSSATTSTTT TPTTTTTTTT ASTTATTTAA ASGPFSGYQL YANPYYSSEV HTLAIPSLTD GSLAAAATKA AEIPSFVWLD TAAKVPTMGT YLANIEAANK AGASPPIAGI FWYDLPDRD CAAAASNGEY TVANNGVANY KAYIDSIVAQ LKAYPDVHTI LI IEPDSLAN MVTNLSTAKC AEAQSAYYEC VNYALINLNL ANVAMYIDAG HAGWLGWSAN LSPAAQLFAT VYKNASSPAS LRGLATNVAN YNAWSISSAP SYTSGDSNYD EQLYINALSP LLTSNGWPNA HFIMDTSRNG VQPTKQQAWG DWCNVIGTGF GVQPTTNTGD PLEDAFVWVK PGGESDGTSN SSATRYDYHC GYSDALQPAP EAGTWFQAYF VQLLTNANPS L
[0199] SEQ ID NO: 89 (node #89)
QNCQSVWGQC GGQGWSGATS CAAGSTCSTL NPYYAQCIPG TATTATSTTL VKTTSSTSVG TSSATTSVGT TSPPTTTTTK ASTTATTTAA ASGPFSGYQL YANPYYSSEV HTLAIPSLTD GSLAAAATKA AEIPSFVWLD TAAKVPTMGT YLANIEAANK AGASPPIAGI FWYDLPDRD CAAAASNGEY TVANNGVANY KAYIDSIVAQ LKAYPDVHTI LI IEPDSLAN MVTNLSTAKC AEAQSAYYEC VNYALINLNL ANVAMYIDAG HAGWLGWSAN LSPAAQLFAT VYKNASAPAS LRGLATNVAN YNAWSISSPP SYTSGDSNYD EKLYINALSP LLTSNGWPNA HFIMDTSRNG VQPTKQQAWG DWCNVIGTGF GVQPTTNTGD PLEDAFVWVK PGGESDGTSN SSATRYDFHC GYSDALQPAP EAGTWFQAYF VQLLTNANPA L
[0200] SEQ ID NO: 90 (node #90)
QNCASVWGQC GGQGWSGATC CASGSTCVVQ NDYYSQCLPG SATTTTSSTR STTTTSSTTT SSTTTSTTST TPPPAATTTS APSGPSGTAT YSGPFSGVQL WANSYYASEV HTLAIPSLTD GAMAAKAAAV AKVPSFQWLD TAAKVPTMAS TLADIRAANK AGANPPYAGQ FWYDLPDRD CAAAASNGEY SIADNGVAKY KAYI DS IRAQ LVKYSDIRII LVIEPDSLAN LVTNMNVAKC ANAQSAYLEC INYAITQLNL PNVAMYLDAG HAGWLGWPAN LQPAAQLFAN VYKNAGKPAA LRGLATNVAN YNAWSISSAP SYTQGNPNYD EKHYIHALSP LLTAQGWSDA HFIVDQGRSG KQPTGQQAWG DWCNVIGTGF GVRPTANTGD ALVDAFVWVK PGGECDGTSD TSAARYDYHC GLSDALQPAP EAGTWFQAYF EQLLTNANPS F
[0201] SEQ ID NO: 91 (node #91)
QNCASVWGQC GGTGWSGATC CASGSTCVVQ NDYYSQCLPG SATTTTSSTR STTTTSSTTT SSTTTSTTST TTPPAATTTS APSGASGTAS YSGPFSGVQL WANSYYASEV HTLAIPSLTD GAMAAKAAAV AKVPSFQWLD TAAKVPTMAG TLADIRAANK AGANPPYAGQ FWYDLPDRD CAAAASNGEY SIADNGVAKY KAYI DAIRAQ LVKYSDIRII LVIEPDSLAN LVTNMNVAKC ANAQSAYLEC INYAITQLNL PNVAMYLDAG HAGWLGWPAN LQPAAQLFAN VYKNAGKPAA LRGLATNVAN YNAWSISSAP SYTQGNPNYD EKHYIQALSP LLTAAGWSDA HFIVDQGRSG KQPTGQQAWG DWCNVIGTGF GVRPTANTGL ELVDAFVWVK PGGECDGTSD TTAARYDYHC GLSDALQPAP EAGTWFQAYF EQLLTNANPS F
[0202] SEQ ID NO: 92 (node #92)
QNCGSVWGQC GGIGWSGATC CASGSTCVVQ NDYYSQCLPG SSTTTTSSTR STTTTSSTTS SSTTTSTTST TAPPAATTTS APGGASSTAS YSGPFSGVQL WANDYYASEV HTLAIPSLTD GAMAAKAAAV AKVPSFQWLD TAAKVDTMAG TLADIRAANK AGANPPYAGQ FWYDLPDRD CAAAASNGEF SIADGGVAKY KAYI DAIRKQ LVQYSDIRTI LVIEPDSLAN MVTNMNVAKC ANAQSAYLEC TIYAIKQLNL PNVAMYLDAG HAGWLGWPAN LQPAAELFAK IYKDAGKPAA LRGLATNVAN YNAWSISSAP SYTSPNPNYD EKHYIEAFSP LLTAAGWSDA HFIVDQGRSG KQPTGQQEWG DWCNVIGTGF GVRPTANTGS ELVDAFVWVK PGGECDGTSD TTAARYDYHC GLSDALKPAP EAGQWFQAYF EQLLTNANPP F
[0203] SEQ ID NO: 93 (node #93)
QSCGSQWGQC GGIGWSGATC CASGSTCVVQ NDYYSQCLPG SSTTTTSSTS STTTTSSTSS STTATSTTST TAPPAATTTS TPGGAGTTAS FTGPFSGVNL WANDYYASEV HTLAI PSLTD GAMAAKAAAV AKVPSFQWLD IAAKVDTMPG TLADIRAANK AGGNPPYAAQ FWYDLPDRD CAAAASNGEF SIADGGVAKY KAYI DAIRKQ LVAYSDIRTI LVIEPDSLAN MVTNMNVPKC ANAQAAYLEC TIYAIKQLNL PNVAMYLDGG HAGWLGWPAN LQPAADLFGK LYADAGKPSQ LRGMATNVAN YNAWNLTSAP SYTSPNPNYD EKHYIEAFSP LLAAKGWSNA HFIVDQGRSG KQPTGQQEWG HWCNAMGTGF GMRPSANTGS ELADAFVWIK PGGECDGTSD TTAARFDHFC GMSDALKPAP EAGQWFQAYF EQLLTNANPP F
[0204] SEQ ID NO: 94 (node #94)
QNCGSVWGQC GGIGWSGATC CASGSTCVEQ NDYYSQCLPG SSTTTTSTTR STSTTSSTTS SSTSTSTTST TAPPVPTTTS IPGGASSTAS YSGPFSGVQL WANDYYSSEV HTLAI PSMTD GAMAAKAAAV AKVPSFQWLD RNVTVDTMAG TLAEIRAANK AGANPPYAGQ FWYDLPDRD CAAAASNGEF SIANGGVANY KAYI DAIRKL LIQYSDIRTI LVIEPDSLAN MVTNMNVAKC ANAASAYKEC TNYAIKQLNL PNVAMYLDAG HAGWLGWPAN IQPAAELFAK IYKDAGKPAA VRGLATNVAN YNAWSISSAP SYTSPNPNYD EKHYIEAFSP LLTAAGFSPA HFIVDTGRSG KQPTGQQEWG DWCNVIGTGF GVRPTANTGH ELVDAFVWVK PGGECDGTSD TTAARYDYHC GLSDALKPAP EAGQWFQAYF EQLLTNANPP F
[0205] SEQ ID NO: 95 (node #95)
QNCGSAWSQC GGIGWSGATC CSSGNSCVEI NSYYSQCLPG ASTSPTSTSK VSSTTSKVTS SSAAQPITTT TAPSVPTTTT IAGGASSTAS FTGPFLGVQG WANSYYSSEI YNHAI PSMTD GSLAAQASAV AKVPTFQWLD RNVTVDTMKS TLEEIRAANK AGANPPYAAH FWYDLPDRD CAAAASNGEF SIANGGVANY KTYINAIRKL LIEYSDIRTI LVIEPDSLAN LVTNTNVAKC ANAASAYKEC TNYAITQLDL PHVAQYLDAG HGGWLGWPAN IQPAATLFAD IYKAAGKPKS VRGLVTNVSN YNGWSLSSAP SYTTPNPNYD EKKYIEAFSP LLNAAGFSPA QFIVDTGRSG KQPTGQIEQG DWCNAIGTGF GVRPTTNTGS SLADAFVWVK PGGESDGTSD TSATRYDYHC GLSDALKPAP EAGQWFQAYF EQLLKNANPA F
[0206] SEQ ID NO: 96 (node #96)
QNCGSVWGQC GGIGWSGATC CASGSTCVEQ NDWYSQCLPG SSTTTTTTTR STSTTSSTTS SSTSTSTTST TAPPVPTTTS IPGGASSTAS YSGPFSGVQL WANDYYRSEV HTLAI PSMTD GAMAAKAAAV AKVPSFQWLD RNVTVDTMAG TLAEIRAANK AGANPPYAGQ FWYDLPDRD CAAAASNGEF SIANGGAANY KAYI DAIRKL LIQYSDIRTI LVIEPDSLAN MVTNMNVAKC ANAASTYKEL TVYALKQLNL PNVAMYLDAG HAGWLGWPAN IQPAAELFAK IYKDAGKPAA VRGLATNVAN YNAWSISSAP SYTSPNPNYD EKHYIEAFSP LLTAAGFSPA HFIVDTGRSG KQPTGQQEWG DWCNVIGTGF GVRPTANTGH ELVDAFVWVK PGGECDGTSD TTAARYDYHC GLSDALKPAP EAGQWFQAYF EQLLTNANPP F
[0207] SEQ ID NO: 97 (node #97)
QNCGSVWSQC GGIGWSGATC CASGSTCVEQ NDWYSQCLPN SSTTTSTTTR STSTSSSTTS SSTSTSTTST TAPPVPTTTS IPGGASSTAS YSGPFSGVQL WANDYYRSEV HTLAI PSMTD GAMAAKAAAV AKVPSFQWLD RNVTIDTMAH TLSQIRAANK AGANPPYAGQ FWYDLPDRD CAAAASNGEF SIANGGAANY KAYI DAIRKL IIQYSDIRII LVIEPDSLAN MVTNMNVAKC ANAASTYKEL TVYALKQLNL PNVAMYLDAG HAGWLGWPAN IQPAAELFAE IYKDAGKPAA VRGLATNVAN YNAWSIASAP SYTSPNPNYD EKHYIEAFSP LLTAAGFSPA HFIVDTGRNG KQPTGQQQWG DWCNVIGTGF GVRPTANTGH ELVDAFVWVK PGGECDGTSD TTAARYDYHC GLSDALQPAP EAGQWFQAYF EQLLTNANPP F
[0208] SEQ ID NO: 98 (node #98)
QNCGAVWTQC GGNGWQGPTC CASGSTCVAQ NEWYSQCLPN SPSSTSTSQR STSTSSSTTR SGSSTSSSST TPPPVSSPTS IPGGATSTAS YSGPFSGVRL FANDYYRSEV HNLAI PSMTD GTLAAKASAV AEVPSFQWLD RNVTIDTMVQ TLSQVRALNK AGANPPYAAQ LWYDLPDRD CAAAASNGEF SIANGGAANY RSYI DAIRKH IIEYSDIRII LVIEPDSMAN MVTNMNVAKC SNAASTYHEL TVYALKQLNL PNVAMYLDAG HAGWLGWPAN IQPAAELFAG IYNDAGKPAA VRGLATNVAN YNAWSIASAP SYTSPNPNYD EKHYIEAFSP LLNSAGFSPA RFIVDTGRNG KQPTGQQQWG DWCNVKGTGF GVRPTANTGH ELVDAFVWVK PGGESDGTSD TSAARYDYHC GLSDALQPAP EAGQWFQAYF EQLLTNANPP F
[0209] SEQ ID NO: 99 (node #99)
QNCGSVWGQC GGIGWNGATC CASGSTCVKQ NDWYSQCLPG SSTTTTSTTT STSTTSSTTS TSTSTSTTST TTPPVPTTTS IPGGASSTAS YSGPFSGVQL WANDYYRSEV HTLAI PSLTD GAMAAKAAAV AEVPSFQWLD RNVTVDTFSG TLAEIRAANQ AGANPPYAGI FWYDLPDRD CAAAASNGEW SIANGGAANY KAYI DRIREL LIQYSDIRTI LVIEPDSLAN MVTNMNVAKC ANAASTYKEL TVYALKQLNL PNVAMYLDAG HAGWLGWPAN IQPAAELFAK IYKDAGRPAA VRGLATNVAN YNAWSISSAP SYTSPNPNYD EKHYIEAFSP LLTAAGFSPA HFIVDTGRSG KQPTGQLEWG HWCNVIGTGF GVRPTANTGH ELVDAFVWVK PGGECDGTSD TTAARYDYHC GLSDALKPAP EAGQWFQAYF EQLLTNANPP F
[0210] SEQ ID NO: 100 (node #100)
QNCASVWGQC GGIGYNGPTC CQSGSTCVKQ NDWYSQCLPG SSTTTTSTTS STSTTSSTTS TSTSTSTTST TTAPAPTTTT IPGGASSTAS YNGPFSGVQL WANNYYRSEV HTLAI PSLTD PALAAKAAAV AEVPSFQWLD RNVTVDTFSG TLAEIRAANQ AGANPPYAGI FWYDLPDRD CAAAASNGEW S IANNGANNY KAYI DRIREL LIQYSDIRTI LVIEPDSLAN MVTNMNVAKC SNAASTYKEL TVYALKQLNL PHVAMYMDAG HAGWLGWPAN IQPAAELFAK IYKDAGRPAA VRGLATNVAN YNAWSISSPP SYTSPNPNYD EKHYIEAFSP LLTAQGFSPA QFIVDTGRSG KQPTGQLEWG HWCNVIGTGF GVRPTANTGH ELVDAFVWVK PGGECDGTSD TTAARYDYHC GLSDALKPAP EAGQWFQAYF EQLLTNANPP F
[0211] SEQ ID NO: 101 (node #101)
QNCGSVWGQC GGNGWNGATC CASGSTCVKQ NDWYSQCLPG SSTTTTPTST STSTSSSSRS TSTSTSTSST TTPPVATTTS IPGGASSTAS YSGPFSGVQL WANDYYRSEV HTLAI PSLTD GAMATKAAAV AEVPSFQWMD RNVTVDTFSG TLAEIRAANQ AGANPPYAGI FWYDLPDRD CAAAASNGEW SIANGGAANY KAYI DRIREL LIQYSDIRTI LVIEPDSLAN MVTNMNVAKC AGAASTYKEL TVYALKQLNL PNVAMYLDAG HAGWLGWPAN IQPAAELFAK IYKDAGRPAA VRGLATNVAN YNAWSISSAP SYTSPNPNYD EKHYVEAFSP LLTAAGFSPA HFITDTGRSG KQPTGQLEWG HWCNAIGTGF GQRPTANTGH DLVDAFVWIK PGGECDGTSD TTAARYDHHC GLADALKPAP EAGQWFQAYF EQLLTNANPP F
[0212] SEQ ID NO: 102 (node #102)
QACASVWGQC GGQGWSGATC CASGSTCVVS NDYYSQCLPG SATTSTSSTR STTTTSSTTT SSTTTSTTST TPPPGSTTTS APSGPSGTAT YSGPFSGVNL WANSYYASEV STLAIPSLSD GAMATAAAAV AKVPSFQWLD TAAKVPLMES TLADIRAANK AGGNPPYAGQ FWYDLPDRD CAAAASNGEY SIADNGVAKY KAYIDSIRAI LVKYSDIRII LVIEPDSLAN LVTNMNVAKC ANAQSAYLEC INYAITQLNL PNVAMYLDAG HAGWLGWPAN LQPAAQLFAN VYKNAGKPRA LRGLATNVAN YNGWNISSAP SYTQGNPNYD EKHYIHALSP LLTQQGWSDA HFIVDQGRSG KQPTGQQAWG DWCNVIGTGF GVRPTANTGD ALVDAFVWVK PGGECDGTSD TSAARYDYHC GLSDALQPAP EAGTWFQAYF EQLLTNANPS F
[0213] SEQ ID NO: 103 (node #103)
QACASVWGQC GGQGWSGATC CASGSTCVVS NDFYSQCLPG SATTSTSSTR STTTTSVTTT SSTTTATTST SPPPGTTVTS PPSGPSGTAT YTGPFSGVNL WANSYYRSEV STLAIPSLSD GAMATAAAKV AKVPSFQWMD TAAKIPLMEG TLADIRKANK AGGNPPYAGQ FWYDLPDRD CAAAASNGEY SIADDGVAKY KAYIDSIRAI LVKYSDIRII LVIEPDSLAN LVTNMNVAKC ANAQSAYLEC TNYAITQLNL PNVAMYLDAG HAGWLGWPAN LPPAAQLFAN VYKDAGKPKA LRGLVTNVSN YNGWNISSAP SYTQGNPNYD EKHYIHALAP LLTQEGWSDA KFIVDQGRSG KQPTGQQAWG DWCNAIGTGF GVRPTANTGD ALVDAFVWVK PGGESDGTSD TSAARYDYHC GLSDALKPAP EAGTWFQAYF EQLLTNANPS F
[0214] SEQ ID NO: 104 (node #104)
QSCSNVWSQC GGQNWSGTPC CTSGNKCVKL NDFYSQCQPG SATSATSSTT SATTTSVTTT ATKTTATTTS STTSGTSVTS APSGPSGPPA ATDPFSGVDL WANNYYRSEV STLAIPKLSD GAMATAAAKV ADVPSFQWMD TYDHISLMED TLADIRKANK AGGNPPYAGQ FWYDLPDRD CAAAASNGEY SLDKDGANKY KAYIAKIKGI LQNYSDTRI I LVIEPDSLAN LVTNMNVPKC ANAESAYKEL TIYAIKELNL PNVSMYLDAG HGGWLGWPAN LPPAAQLYAQ IYKDAGKPSR LRGLVTNVSN YNGWKLSSKP DYTESNPNYD EQRYIHALAP LLAQEGWSNA KFIVDQGRSG KQPTGQKAWG DWCNAKGTGF GLRPSANTGD ALVDAFVWVK PGGESDGTSD TSAARYDYHC GLDDALKPAP EAGTWFQAYF EQLLKNANPS F
[0215] SEQ ID NO: 105 (node #105)
QACASQWGQC GGQGWSGPTC CASGSTCVVS NAFYSQCLPG SATTSTSSTR STTTTSVTST SSTTTATTSV SPPPGTTVTS PPAGPSGGAT YTGPFAGVNL WANSYYRSEV STLAIPSLSD GALATAAAKV AKVPTFQWMD TTAKIPLMDG TLADIRKANK AGGNPPYAGQ FWYNLPDRD CAAAASNGEL SIADDGVAKY KAYIDSIRAI LVKYSDIRII LVIEPDSLAN LVTNMNVAKC ANAQAAYLEC TNYAVTQLNL PNVAMYLDAG HAGWLGWPAN LPPAAALFAN VYKDAGKPKA LRGLVTNVSN YNGWNISSAP SYTQGNPNYD EKHYI DALAP LLSQEGWSDA KFIVDQGRSG KQPTGQQAWG DWCNAIGTGF GVRPTANTGD ALVDAFVWVK PGGESDGTSD TTAARYDYHC GLADALKPAP EAGTWFQAYF EQLLTNANPS F
[0216] SEQ ID NO: 106 (node #106)
QACASQWGQC GGQGWTGPSC CAAGSVCTVS NPFYSQCLPG STVASSTSTV RTSSTPVVSP SRTSTVTGSV STTSAGTGTT PPAGPTGGAT YTGPFVGVNL WANSYYASEI STLAIPSLSD PALATAAAKV AKVPTFMWMD TRSKIPLVDA TLADIRKANQ AGANPPYAGE FWYNLPDRD CAAAASNGEL SIADGGVAKY KQYI DDIRAM VVKYSDIRII LTIEPDSLAN LVTNLNVPKC AGAQAAYLEG TNYAVTQLNL PNVAMYLDGG HAGWLGWPAN LPPAAAMYAK VYKDAGKPKA LRGLVTNVSN YNGYSISTAP SYTQGNANYD EKHYIEALAP LLSAEGWSDA KFIVDQGRSG KQPTGQLAWG DWCNAIGTGF GVRPTANTGS TLVDAFVWVK PGGESDGTSD TTAARYDLNC GKADALKPAP EAGTWFQAYF EQLLINANPA F
[0217] SEQ ID NO: 107 (node #107)
QACASQWGQC GGQGWSGPTC CPSGTTCQLQ NAWYSQCLPG AATTAASSTR PATTSSVRST TVVNPPTTTV APPPGTTVAP PPAPPPGGAT YTGPFAGVNQ WANAYYRSEV SSLAVPSLSD GPLATAAAKV ADVPTFQWMD TTAKVPLIDG ALADIRRANA AGGNPPYAGI FWYNLPDRD CAAAASNGEL SIANDGINKY KAYIDSIRAV LLKYNDIRTL LVIEPDSLAN MVTNMGVAKC SNAAAAYKEC TKYAVQQLDL PHVAQYLDAG HAGWLGWPAN IGPAATIFTD IYKEAGKPKS LRGLATNVSN YNAWNASSPA PYTSPNPNYD EKHYVDAFAP LLRQNGWSDA KFI IDQGRSG KQPTGQQEWG HWCNALGTGF GLRPTSNTGH PDVDAFVWVK PGGEADGTSD TTAVRYDHFC GSASSMKPAP EAGTWFQAYF EQLLRNANPS F
[0218] SEQ ID NO: 108 (node #108)
QACASVWGQC GGQGWSGATC CASGSTCVVS NDYYSQCLPG SAASSSSSTR SSTTTSSTRA SSTTTSSSST TPPPGSTTTP PPPVGSGTAT YSGPFSGVNP WANSYYASEV SSLAIPSLSD GAMATAAAAV AKVPSFMWLD TLSKTPLMES TLADIRAANK AGGNPPYAGQ FWYDLPDRD CAAAASNGEY SIADNGVAKY KNYIDTIRAI LVKYSDIRTI LVIEPDSLAN LVTNLSVAKC ANAQSAYLEC INYAITQLNL PNVAMYLDAG HAGWLGWPAN QQPAAQLFAN VYKNASSPRA LRGLATNVAN YNGWNITSAP SYTQGNSVYN EKLYIHAISP LLTQQGWSNA YFITDQGRSG KQPTGQQAWG DWCNVIGTGF GIRPSANTGD SLLDAFVWVK PGGECDGTSD TSAARYDYHC GLSDALQPAP EAGTWFQAYF VQLLTNANPS F
[0219] SEQ ID NO: 109 (node #109)
QACSSVWGQC GGQNWSGPTC CASGSTCVYS NDYYSQCLPG SAASSSSSTR ASSTTSRARV
SSTTSSSSSA TPPPGSTTTR VPPVGSGTAT YSGPFVGVTP WANAYYASEV SSLAIPSLTD
GAMATAAAAV AKVPSFMWLD TLDKTPLMEQ TLADIRTANK NGGNPPYAGQ FWYDLPDRD
CAALASNGEY SIADGGVAKY KNYIDTIRQI VVEYSDIRTI LVIEPDSLAN LVTNLGTPKC
ANAQSAYLEC INYAITQLNL PNVAMYLDAG HAGWLGWPAN QDPAAQLFAN VYKNASSPRA
LRGLATNVAN YNGWNITSPP SYTQGNAVYN EKLYIHAIGP LLANHGWSNA FFITDQGRSG
KQPTGQQQWG DWCNVIGTGF GIRPSANTGD SLLDSFVWVK PGGECDGTSD SSAPRFDSHC
ALPDALQPAP QAGAWFQAYF VQLLTNANPS F
[0220] SEQ ID NO: 110 (node #110)
QACSSVWGQC GGQNWSGPTC CASGSTCVYS NDYYSQCLPG SAASSSSSTR ASSTTSRARV
SSTTSSSSSA TPPPGSTTTR VPPVGSGTAT YSGPFVGVTP WANAYYASEV SSLAIPSLTD
GAMATAAAAV AKVPSFMWLD TLDKTPLMEQ TLADIRTANK NGGNPPYAGQ FWYDLPDRD
CAALASNGEY SIADGGVAKY KNYIDTIRQI VVEYSDIRTL LVIEPDSLAN LVTNLGTPKC
ANAQSAYLEC INYAVTQLNL PNVAMYLDAG HAGWLGWPAN QDPAAQLFAN VYKNASSPRA
LRGLATNVAN YNGWNITSPP SYTQGNAVYN EKLYIHAIGP LLANHGWSNA FFITDQGRSG
KQPTGQQQWG DWCNVIGTGF GIRPSANTGD SLLDSFVWVK PGGECDGTSD SSAPRFDSHC
ALPDALQPAP QAGAWFQAYF VQLLTNANPS F
[0221] SEQ ID NO: 111 (node #111)
QACSSVWGQC GGQNWSGPTC CASGSTCVYS NDYYSQCLPG SAASSSSSTR ASSTTARARA
SSTTSSRSSA TPPPGSSTTR VPPVGSGTAT YSGPFVGVTP WANAYYASEV SSLAIPSLTD
GAMATAAAAV AKVPSFMWLD TFDKTPLMEQ TLADIRTANK NGGNPPYAGQ FWYDLPDRD
CAALASNGEY SIADGGVDKY KNYIDTIRQI VVEYSDIRTL LVIEPDSLAN LVTNLGTPKC
ANAQSAYLEC INYAVTQLNL PNVAMYLDAG HAGWLGWPAN QDPAAQLFAN VYKNASSPRA
LRGLATNVAN YNGWNITSPP SYTQGNAVYN EQLYIHAIGP LLANHGWSNA FFITDQGRSG
KQPTGQQQWG DWCNVIGTGF GIRPSANTGD SLLDSFVWIK PGGECDGTSD SSAPRFDSHC
ALPDALQPAP QAGAWFQAYF VQLLTNANPS F
[0222] SEQ ID NO: 112 (node #112)
QACSSVWGQC GGQNWSGPTC CASGSTCVYS NDYYSQCLPG SAASSSSSTR AASTTSRARV
SPTTSRSSSA TPPPGSTTTR VPPVGSGTAT YSGPFVGVTP WANAYYASEV SSLAIPSLTD
GAMATAAAAV AKVPSFMWLD TLDKTPLMEQ TLADIRTANK NGGNPPYAGQ FWYDLPDRD
CAALASNGEY SIADGGVAKY KNYIDTIRQI VVEYSDIRTL LVIEPDSLAN LVTNLGTPKC
ANAQSAYLEC INYAVTQLNL PNVAMYLDAG HAGWLGWPAN QDPAAQLFAN VYKNASSPRA
LRGLATNVAN YNGWNITSPP SYTQGNAVYN EKLYIHAIGP LLANHGWSNA FFITDQGRSG KQPTGQQQWG DWCNVIGTGF GIRPSANTGD SLLDSFVWVK PGGECDGTSD SSAPRFDSHC ALPDALQPAP QAGAWFQAYF VQLLTNANPS F
[0223] SEQ ID NO: 113 (node #113)
QACSSVWGQC GGQNWSGPTC CASGSTCVYS NDYYSQCLPG SAASSSSSTR AASTTSRARV SPTTSRSSSA TPPPGSTTTR VPPVGSGTAT YSGPFVGVTP WANAYYASEV SSLAIPSLTD GAMATAAAAV AKVPSFMWLD TLDKTPLMEQ TLADIRTANK NGGNPPYAGQ FWYDLPDRD CAALASNGEY SIADGGVAKY KNYIDTIRQI VVEYSDIRTL LVIEPDSLAN LVTNLGTPKC ANAQSAYLEC INYAVTQLNL PNVAMYLDAG HAGWLGWPAN QDPAAQLFAN VYKNASSPRA LRGLATNVAN YNGWNITSPP SYTQGNAVYN EKLYIHAIGP LLANHGWSNA FFITDQGRSG KQPTGQQQWG DWCNVIGTGF GIRPSANTGD SLLDSFVWVK PGGECDGTSD SSAPRFDSHC ALPDALQPAP QAGAWFQAYF VQLLTNANPS F
APPENDIX 3
[0224] SEQ ID NO: 114 (node #58)
QNCAPGWGAC GGQGWGGDTS CTGGGSCGVG NPTYSGCWAG TPTSTGNVDP TTTSSGNVGP TTTSNPTTTS TPPPDPTTTA APGGPNCTPA VSGPFAGAQL YLDPYYVKKV DALAIAQLTD GALKAKMEKV KQIPTAFWLD RIENVPQLPA YLDDALKLQK ELGKKPVTVL FWYDLPNRD CFALASNGEL HLDDNGLQRY KEYIDPIKQQ LKKYSGQRIV LVIEPDSLPN LVTNLGGKKC DDAQASYKEG VAYALKKLNL PHVYQYLDAG HAGWLGWPDN QKKGAKVFAE VIKNAGSPAK VRGFATNVAN YTPLSYTARS YDQQGNPCCG EFDYVDAMAS ALQAQGLGDK HFI IDTSRNG VKNIGRQQWG YWCNNKGAGL GQRPKANGGA TLLDAFVWVK PPGESDGVGD EGQPRYDLYC GREDALQPAP QAGQWFHEYF EQLLKNANPP L
[0225] SEQ ID NO: 115 (node #59)
QNCAPGWGAC GGQGWTGDTS CTGGGSCGVG NPTYSGCWAG SPTSTTNTDS TTTSSGNVGT TSTSNPTTTS TPPPTPTTTS APSGPTSTPT VSGPFSGAQL YLNPYYVAEV DALAIAQLTD GALKAKAEKV KNI PTAIWLD TIENVPQLPT YLDDALALQK SSGKKPVTVL FWYDLPNRD CHALASNGEL HIDDNGLQRY KAYIDSIEEQ LKKYSDQRIV LVIEPDSLAN LVTNLNAAKC ADAQASYKEG VAYAIKKLNL PHVYMYLDAG HAGWLGWPDN QEKAAKVFAE VIKNAGSPAK VRGFATNVAN YTPLSDTSRS PDTQGNPCCD EFHYVDAMAS ALQAQGLSDV HFI IDTSRNG VKNIGRQQWG DWCNNKGAGL GARPKANSGA SLLDAFVWVK PPGESDGTSD ESAPRYDSYC GREDALQPAP EAGQWFHEYF EQLLKNANPP L
[0226] SEQ ID NO: 116 (node #60)
QNCAPGWGAC GGQEWTDDTS CTGGGSCGVG NPTYSGCWAG SPTSTTNTDV TYTDSDNVGK WGVENPTTTS TPPPTPTTTS APSGPTSTPT VSGPFSGAQL YLNPYYVAEV DALAIAQLTD GALKAKAEKV KNI PTAIWLD TIENVPQLPT YLDDALALQK SSGKKPVTW FWYDLPNRD CHALASNGEL HIDDNGLQRY KAYIDSIEEQ LKKYSDQRIV LI IEPDSLAN LVTNLNAAKC ADAQASYKEG VAYAIKKFGL PHVYMYLDAG HAGWLGWPDN QEKAAKVFAE VIKNAGSPGK VRGFATNVAN YTPLSDTSRS PDTQGNPCCD EFHYVDAMAS ALQAQGLSDV HFI IDTSRNG VKNIGRQQSG DWCNNKGAGL GARPKANSGA SLLDAFVWIK PPGESDGTSD ESAPRYDSYC GREDALQPAP EAGQWFHEYF EQLLKNANPP L
[0227] SEQ ID NO: 117 (node #61)
LNCAPGYPCC SGNEYTDDDG VENGNWCGIA DPTYESCWAE SPCSTTNTEV EYTDSDNVGK WGVENPIPTG TPTPTPTSGS DPSTPGSPLT ISGPFSGVEF YLNPYYVAEV DALAIAQMTD SSLKAKAEKM KTFSNAIWLD TIKNMQQLET NLKGALAQHQ SSGKKPVLTV FWYDLPGRD CHALASNGEL LANDSDLQRY KTYI DVIEEQ LKKYNSQPVV LI IEPDSLAN LVTNLNTPAC ADSEQYYLEG HAYLIKKFGL PHVAMYLDIG HAFWLGWDDN REKAAKVYAK VIKSSGSPGK VRGFTDNVAN YTPWEDPSRG PDTEWNPCPD EKRYLEAMHK DFKAAGISSV YFVSDTSRNG HKNVDRKHPG EWCNQTGVGI GARPKANSGM DYLDAFYWIK PLGESDGTSD ESAARYDGYC GHETAMKPAP EAGQWFQKHF EQGLKNANPP L
[0228] SEQ ID NO: 118 (node #62)
QNCAPVWGQC GGQGWTGATS CVSGSTCTVS NDYYSQCLPG SATTTTTTTS PTTTTSSVST TSTSTSTTTS TPPPTPTTTS APSGPTSTAT VSGPFSGYQL YLNPYYAAEV AALAIAQLTD GALKAKAAKV ANI PTFIWLD TVAKVPTLGT YLADASALQK SSGKKPVAVQ FWYDLPDRD CAALASNGEF SIADNGLAKY KAYIDSIVAQ LKKYSDVRIV LVIEPDSLAN LVTNLNVAKC ANAQAAYKEG VTYAIQQLNL PNVYMYLDAG HAGWLGWPAN LGPAAQLFAQ VYKNAGSPAA VRGLATNVAN YNALSATSRP PYTQGNPNCD ELHYINALAP LLQAQGFSDA HFIVDTGRNG VQNIGRQQWG DWCNVKGAGF GVRPTTNTGS SLIDAFVWVK PGGESDGTSD TSAPRYDSHC GLSDALQPAP EAGTWFQAYF EQLLKNANPP L [0229] SEQ ID NO: 119 (node #63)
QNCAPLWGQC GGQGWTGATS CVSGSTCTVS NDYYSQCLPG SATTTTTTTS PTTTTSSVST SSTSTSTTTS TPPPTPTTTS APSGPTSTAT VSGPFTGYQL YLSPYYAAEV AALAVAQITD PALKAKAAKV ANIPTFIWFD TVAKVPTLGT YLADASALQK SSGKKPVAVQ IWYDLPDRD CAALASNGEF SIADNGLANY KAYIDSIVAQ LKKYSDVRVV LVIEPDSLAN LVTNLNVAKC ANAQAAYKEG VTYALQQLNL VGVYMYLDAG HAGWLGWPAN LSPAAQLFAQ LYKNAGSPSF VRGLATNVAN YNALSATSRP PYTQGNPNCD ELHYINALAP LLQSQGFFPA HFIVDQGRSG VQNIGRQQWG DWCNVKGAGF GVRPTTNTGS SLIDAIVWVK PGGESDGTSD TSAPRYDSHC GLSDATQPAP EAGTWFQAYF ETLVKNANPP L
[0230] SEQ ID NO: 120 (node #64)
QNCAPLWGQC GGTGWTGATT CVSGSTCTVI NDYYSQCLPG SATTTTTTTS PTTTTSSVST SSTSTSTTTS SPPPTPTTTS APSGPTSTAP AAGPFTGYQI YLSPYYAAEV AALAVAQITD PTLKAKAAKV ANIPTFIWFD WAKVPTLGT YLADASALQK SSGKKPQLVQ IWYDLPDRD CAALASNGEF SIADNGLANY KNYIDQIVAQ IKKYPDVRVV AVIEPDSLAN LVTNLNVAKC ANAQAAYKEG VTYALQQLNS VGVYMYLDAG HAGWLGWPAN LSPAAQLFAQ LYKNAGSPSF VRGLATNVAN YNALSATSPD PITQGNPNYD ELHYINALAP LLQSQGFFPA HFIVDQGRSG VQNIGRQQWG DWCNVKGAGF GTRPTTNTGS SLIDAIVWVK PGGESDGTSD TSAPRYDSHC GLSDATQPAP EAGTWFQAYF ETLVKNANPP L
[0231] SEQ ID NO: 121 (node #65)
QNCAPLWGQC GGTGWTGATT CVSGSVCTVI NDYYHQCLPG SATTTTTTTS PTTTTSSVST SSTSTSTTTS SPPPTPTTTS APSGPTSTAP AAGPFTGYQI YLSPYYAAEV AALAVAQISD PTLKAKAAKV ANIPTFIWFD WAKVPTLGT YLADASAIQQ STGRKPQLVQ IWYDLPDRD CAALASNGEF SIADNGLANY KNYIDQIVAQ IKKYPDVRVV AVIEPDSLAN MVTNLNVAKC AGAQAAYKEG VTYALQQLNS VGVYMYVDAG HAGWLGWPAN LSPAAQLFAQ LYKNAGSPSF VRGLATNVAN YNALSATSPD PVTQGNANYD ELHYINALAP LLQSQGFFPA HFIVDQGRSG VQNIGRQAWG DWCNVKGAGF GTRPTTNTGS SLIDAIVWVK PGGESDGTSD TSAPRYDSHC GLSDAKKPAP EAGTWFQAYF ETLVKNANPP L
[0232] SEQ ID NO: 122 (node #66)
QNCAPLWGQC GGTGWTGATT CVSGSVCTVI NQWYHQCLPG SATTTTTTTS PTTTTSSVST SSTSTSTTTS SPPPTPTTTS APSGPTSTAP ATGPFTGYQI YLSPYYAAEV EALAAAQISD PTLKAKALKV KEI PTFIWFD WAKVPTLGT YLADASAIQQ STGRKPQLVQ IWYDLPDRD CAAAASNGEF SIADGGLAKY KNYI DRIAAQ IKKYPDVRVV AVIEPDSLAN MVTNLNVAKC AGAQAAYKEG VTYALQQLSA VGVYMYVDAG HAGWLGWPAN LSPAAQLFAQ LYKNAGRSSF IRGLATNVAN YNALSATSPD PVTQGNANYD ELHYINALAP LLRSQGWFDA QFIVDQGRSG VQNIGRQAWG NWCNVKGAGF GMRPTTNTGS SLIDAIVWIK PGGESDGTSD TSAPRYDTHC GLSDAKKPAP EAGTWFQAYF VNLVKNANPP L
[0233] SEQ ID NO: 123 (node #67)
QNCAPLYGQC GGTGWTGATT CVSGSTCTVI NDYYSQCLPG SATTTTTTTS PTTTTSSVST SSTSTSTTTS SPPTTPTTTS APSGPTSTAP AAGPFTGYQI YLSPYYAAEV AALAVAQITD PTLKAKAAKV ANIPTFIWFD WAKVPTLGT YLADASALQK SSGKKPQLVQ IWYDLPDRD CAALASNGEF SIADNGLANY KNYIDQIVAQ IKKYPDVRVV AVIEPDSLAN LVTNLNVAKC ANAQTAYKEG VTYALQQLNS VGVYMYLDAG HAGWLGWPAN LSPAAQLFAQ LYKNAGSPSF VRGLATNVAN YNALSATSPD PITQGNPNYD ELHYINALAP LLQSQGFFPA HFIVDQGRSG VQNIGRQQWG DWCNVKGAGF GTRPTTNTGS SLIDAIVWVK PGGECDGTSD TSAPRYDSHC GLSDATQPAP EAGTWFQAYF ETLVKNANPP L [0234] SEQ ID NO: 124 (node #68)
QNCSPLYGQC GGTGWTGATT CVSGSTCTVI NDYYSQCLPG SATTTTTTTS PTTTTSSVST
SSTSTSTTTS SPPTTPTTTS APSGPTSTAP AAGPFTGYQI YLSPYYAAEV QALAAAQITD
ATQKAKAAKV ANIPTFTWFD VIAKTSTLGT YLADASALQK SSGKKPYLVQ IWYDLPDRD
CAALASNGEF SIANNGLANY KNYIDQLVAQ IKKYPDVRVV AVIEPDSLAN LVTNLNVAKC
ANAQTAYKAG VTYALQQLNS VGVYMYLDAG HAGWLGWPAN LSPAAQLFAQ LYKNAGSPSF
VRGLATNVAN YNALSATSPD PITQGNPNYD ELHYINALAP LLQSQGFFPA HFIVDQGRSG
VQNIGRQQWG DWCNVKGAGF GTRPTTNTGS SLIDAIVWVK PGGECDGTSD TSAPRYDSHC
GLSDATQPAP EAGTWFQAYF ETLVKNANPP L
[0235] SEQ ID NO: 125 (node #69)
QNCAPLYGQC GGIGWTGATT CVSGSTCTVI NDYYSQCLPG SATTTTVTTS PTTSASSVST
SSTSTSTTTS STPTTPTTTS APSGPTSTTP AAGPFTGYQI YLSPYYAAEV AALAVAQITD
PTLKAKAAKV ANIPTFIWLD QVAKVPDLGT YLADASALQK SSGKKPQLVQ IWYDLPDRD
CAALASNGEF SIADNGQANY HNYIDQIVAQ IKKYPDVRVV AVIEPDSLAN LVTNLNVAKC
ANAQTTYKEC VTYALQQLSS VGVYMYLDAG HAGWLGWPAN LSPAAQLFAQ LYKNAGSPSF
VRGLATNVAN YNALSATSPD PITQGDPNYD ELHYINALAP LLQQQGFFPA HFIVDQGRSG
VQNIGRQQWG DWCNVKGAGF GTRPTTNTGS SLIDAIVWVK PGGECDGTSN TSSPRYDSHC
GLSDATQPAP EAGTWFQAYF ETLVKNANPP L
[0236] SEQ ID NO: 126 (node #70)
QNCAPLYGQC GGIGWTGATT CVSGSTCTKI NDYYSQCLPG SATTTTVTTS PTTSASSVST
SSTSTSSTTS STPTTPTTTS APSGPTSTTP AAGPFTGYQI YLSPYYAAEV AALAVAQITD
PTLKAKAAKV ANI PTFTWLD QVAKVPDLGT YLADASALQK SSGQKPQLVQ IWYDLPDRD
CAAKASNGEF SIADNGQANY HNYIDQIVAQ IKKYPDVRVV AVIEPDSLAN LVTNLNVAKC
ANAQTTYKAC VTYALQQLSS VGVYMYLDAG HAGWLGWPAN LSPAAQLFAQ LYKNAGSSPF
VRGLATNVAN YNALSATSPD PITQGDPNYD ELHYINALAP LLQQQGFFPA QFIVDQGRSG
VQNIGRQQWG DWCNVKGAGF GTRPTTNTGS SLIDAIVWVK PGGECDGTSN SSSPRYDSHC
SLSDATQPAP EAGTWFQAYF ETLVSNANPP L
[0237] SEQ ID NO: 127 (node #71)
QNCAPLYGQC GGIGWTGATT CVSGSTCTKI NDYYSQCLPG SATTTTVTTS PTTSASSVST
SSTSTSSTTS TTPTTPTSTS APSGPTSTTA AAGPFTGYQI YLSPYYAAEV QALAVANITD
SALKAKAAKV ANI PTFTWLD QVAKVPDLGT YLADADALAK SSGQKPQLLQ IWYDLPDRD
CAAKASNGEF SIADNGQANY HNYIDQIVAQ IKKYPDVRVV AVIEPDSMAN LVTNLNVAKC
ANAATTYKAC VTYALQQLSS VGVYMYLDAG HAGWLGWPAN LSPAAQLFAQ LYKTAGSSPF
FRGLATNVAN YNALSTTSPD PITQGDPNYD ELLYINALSP LLQQQGFFPA QFIVDQGRSG
VQNIGRSAWG DWCNVKGAGF GTRPTTNTGS SLIDAIVWVK PGGECDGTSN SSSPRYDSHC
SLSDALQPAP EAGTWFQAYF ETLVSNANPP L
[0238] SEQ ID NO: 128 (node #72)
QNCAPVYGQC GGIGWTGATT CVSGSTCTKQ NDYYSQCLPG SATTTTVTTS PTTSASSVST SSTSTSSTTS STPTTPTTTS APSGPTSTTP AAGPFTGYQI YLSPYYAAEV AALAAAQITD PTLKAKAASV ANI PTFTWLD SVAKVPDLGT YLADASALQK SSGQKPQLVQ IWYDLPDRD CAAKASNGEF SIADNGQANY HNYIDQIVAQ IKKFPDVRVV AVIEPDSLAN LVTNLNVQKC ANAQTTYKAC VTYALKQLSS VGVYMYMDAG HAGWLGWPAN LSPAAQLFAQ LYKNAGSSPF VRGLATNVAN YNALSAASPD PITQGDPNYD ELHYINALAP LLQQQGFFPA QFIVDQGRSG VQNIGRQQWG DWCNIKGAGF GTRPTTNTGS SLIDAIVWVK PGGECDGTSN SSSPRYDSTC SLSDATQPAP EAGTWFQAYF ETLVSKANPP L
[0239] SEQ ID NO: 129 (node #73) QNCAPVWGQC GGIGWTGPTT CVSGSTCTKQ NDYYSQCLPG SATTTTVTTS PTSSASSVST SSTSTSSTTS SSPTTPTTTS APSGPTSTPP AAGPWTGYQI YLSPYYANEV AALAAKQITD PTLSAKAASV ANIPTFTWLD SVAKIPDLGT YLADASALGK SSGQKPQLVQ IWYDLPDRD CAAKASNGEF SIADNGQANY QNYIDQIVAQ IKQFPDVRVV AVIEPDSLAN LVTNLNVQKC ANAKTTYLAC VNYALKQLSS VGVYMYMDAG HAGWLGWPAN LSPAAQLFAQ VYKNAGKSPF IKGLATNVAN YNALSAASPD PITQGDPNYD EIHYINALAP LLQQAGFFPA TFIVDQGRSG VQNIGRQQWG DWCNIKGAGF GTRPTTNTGS SLIDSIVWVK PGGECDGTSN SSSPRYDSTC SLSDATQPAP EAGTWFQAYF ETLVSKANPP L
[0240] SEQ ID NO: 130 (node #74)
QNCASVWGQC GGQGWTGATS CVSGSTCVVS NDYYSQCLPG SATTTTTTTS PTTTTSSTTT TSTTTSTTTT TPPPTTTTTS APSGPTSTAT VSGPFSGYQL YANPYYASEV SALAI PSLTD GAMAAKAAAV AKVPTFVWLD TAAKVPTMGT YLADIRALNK AGANPPVAGQ FWYDLPDRD CAALASNGEY SIADNGLAKY KAYIDSIVAQ LKKYSDVRI I LVIEPDSLAN LVTNLNVAKC ANAQAAYLEG TNYAITQLNL PNVAMYLDAG HAGWLGWPAN LGPAAQLFAQ VYKNAGSPAA VRGLATNVAN YNAWSITSCP SYTQGNSNCD EKHYINALAP LLTAQGFSDA HFIVDTGRNG VQPTGQQAWG DWCNVIGTGF GVRPTTNTGD ALVDAFVWVK PGGESDGTSD TSAARYDSHC GLSDALQPAP EAGTWFQAYF EQLLTNANPS F
[0241] SEQ ID NO: 131 (node #75)
QNCASVWGQC GGQGWTGATS CVSGSTCVVS NDYYSQCLPG SATTTTTTTS PTTTTSSTTT TSTTTSTTTT TPPPTTTTTS APSGPTSTAT VSGPFSGYQL YANPYYASEV HTLAIPSLTD GAMAAKAAAV AKVPSFVWLD TAAKVPTMGT YLADIRALNK AGANPPVAGQ FWYDLPDRD CAALASNGEY SIADNGVAKY KAYI DS IRAQ LKKYSDVRI I LVIEPDSLAN LVTNLNVAKC ANAQAAYLEC TNYAITQLNL PNVAMYLDAG HAGWLGWPAN LGPAAQLFAQ VYKNAGSPAA VRGLATNVAN YNAWSITSCP SYTQGNSNCD EKHYINALAP LLTAQGFSDA HFIVDTGRNG VQPTGQQAWG DWCNVIGTGF GVRPTTNTGD ALVDAFVWVK PGGESDGTSD TSAARYDSHC GLSDALQPAP EAGTWFQAYF EQLLTNANPS F
[0242] SEQ ID NO: 132 (node #76)
QNCQTVWGQC GGQGWTGATS CVAGATCSTL NPYYAQCLPA TATTTTTTTT PTTTTSSTTT TSTTTTTTTT TPTPTTTTTS APSGPTTTAT ASGPFSGYQL YANPYYSSEV HTLAIPSLTD GSLAAKASAV AKVPSFVWLD TAAKVPTMGT YLADIRAKNK AGANPPIAGI FWYDLPDRD CAALASNGEY SIADNGVAKY KAYI DS IRAQ LKKYSDVHI I LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC TNYALTQLNL PNVAMYLDAG HAGWLGWPAN LGPAAQLFAS VYKNAGSPAA VRGLATNVAN YNAWSISSCP SYTQGNSNCD EKRYINALAP LLKAQGFSDA HFIVDTGRNG VQPTGQQAWG DWCNVIGTGF GVRPTTNTGD ALVDAFVWVK PGGESDGTSD TSAARYDAHC GYSDALQPAP EAGTWFQAYF EQLLTNANPS F
[0243] SEQ ID NO: 133 (node #77)
QNCQTLWGQC GGQGWTGATS CVAGATCSTL NPYYAQCLPA TATTTTTTTT PTTTTSSTTT TSATTTTTTT TPTPTTTTTS APSGPTTTAT ASGPFSGYQL YVNPYYSSEV HSLAIPSLTD GSLAAKASAV AKVPSFVWLD TAAKVPTMGT YLADIRAKNA AGANPPIAGI FWYDLPDRD CAALASNGEY SIADNGVAKY KAYI DS IRAQ LKKYSDVHI I LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC TNYALTQLNL PNVAMYLDAG HAGWLGWPAN LGPAAQLFAS VYKNAGSPAA VRGLATNVAN YNAWSISSCP SYTQGNSVCD EKRYINALAP LLKAQGFSDA HFIVDTGRNG KQPTGQQAWG DWCNVIGTGF GVRPTTNTGD ALVDAFVWVK PGGESDGTSD TSAARYDAHC GYSDALQPAP EAGTWFQAYF EQLLTNANPS F
[0244] SEQ ID NO: 134 (node #78) QNCQTLWGQC GGQGWTGATS CVAGATCSTL NPYYAQCLPA TASTTTSTTT PTTTSSSTTT TSAATTTTTT TPTPTTTTTS APSGPTTTAT ASGPFSGYQL YVNPYYSSEV QSLAIPSLTD GSLAAKASAA AKVPSFVWLD TAAKVPTMGT YLADIRAK A AGANPPIAGI FWYDLPDRD CAALASNGEY SIADNGVEKY KAYIDSIREQ LKKYSDVHI I LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC TNYALTQLNL PNVAMYLDAG HAGWLGWPAN LGPAADLFAS VYKNAGSPAA VRGLATNVAN YNAWSISSCP SYTQGNSVCD EKRYINALAP LLKAQGFSDA HFIVDTGRNG KQPTGQQAWG DWCNVIGTGF GVRPTTNTGD ALVDAFVWVK PGGESDGTSD TSAARYDAHC GYSDALQPAP EAGTWFQAYF EQLLTNANPS F
[0245] SEQ ID NO: 135 (node #79)
QNCQTLWGQC GGQGWTGATS CVAGATCSTL NQYYAQCLPA TASTTTSTTT TTTTSSSTTT TSAATTTTTT TPTPTTTTTS APSSPTTTAT ASGPFSGYQL YVNPYYSSEV QSLAIPSLTD GSLAAAATAA AKVPSFVWLD TAAKVPTMGT YLADIRSQNA AGANPPIAGQ FWYDLPDRD CAALASNGEY SIADNGVEHY KAYI DS IREI LVQYSDVHII LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC TNYALTQLNL PNVAMYLDAG HAGWLGWPAN LQPAADLFAS VYKNAGSPAA VRGLATNVAN YNAWSISSCP SYTQGNSVCD EKQYINALAP LLKAQGFSDA HFIVDTGRNG KQPTGQQAWG DWCNVIGTGF GVRPTTNTGD ALVDAFVWVK PGGESDGTSD TSAARYDAHC GYSDALQPAP EAGTWFQAYF VQLLTNANPS F
[0246] SEQ ID NO: 136 (node #80)
QNCQTLWGQC GGQGYSGATS CVAGATCSTI NEYYAQCTPA TASTTTSTTT TTTTSSSTTT TTAAVTTTTT TPTPTPSASA SPSSPTTTAS ASGPFSGYQL YVNPYYSSEV ASLAI PSLTD GSLQAAATAA AKVPSFVWLD TAAKVPTMGD YLADIKSQNA AGANPPIAGQ FWYDLPDRD CAALASNGEY SIADNGVEHY KAYI DS IREI LVQYSDVHTL LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC TNYALTQLNL PNVAMYLDAG HAGWLGWPAN QQPAADLFAS VYKNASSPAA VRGLATNVAN YNAWTISSCP SYTQGNSVCD EQQYINAIAP LLQAQGFSDA HFIVDTGRNG KQPTGQQAWG DWCNVINTGF GVRPTTNTGD ALVDAFVWVK PGGESDGTSD SSATRYDAHC GYSDALQPAP EAGTWFQAYF VQLLTNANPA F
[0247] SEQ ID NO: 137 (node #81)
QNCQTVWGQC GGQGWTGATS CVAGATCSTL NPYYAQCLPA TATTTTTTTT PTTTTSSTTT TSTTTTSTTT TPTTTTTTTS APSGPTTTAT ASGPFSGYQL YANPYYSSEV HTLAIPSLTD GSLAPKASAV AKVPSFVWLD TAAKVPTMGT YLADIQAKNK AGANPPIAGI FWYDLPDRD CAALASNGEY S IANNGVANY KAYI DS IRAQ LKKYSDVHTI LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC VNYALTQLNL PNVAMYLDAG HAGWLGWPAN LGPAAQLFAS VYKNAGSPAA VRGLATNVAN YNAWSISSCP SYTQGDSNCD EKRYINALAP LLKAQGFSDA HFIMDTSRNG VQPTKQQAWG DWCNVIGTGF GVRPTTNTGD PLEDAFVWVK PGGESDGTSD TSAARYDAHC GYSDALQPAP EAGTWFQAYF EQLLTNANPS F
[0248] SEQ ID NO: 138 (node #82)
QNCQTVWGQC GGQGWSGPTS CVAGAACSTL NPYYAQCLPG TATTTTTTTT TSTTTTSTTT TPTTTTTTTS APSGPTITAT ASGPFSGYQL YANPYYSSEV HTLAIPSLTD SSLAPKASAV AKVPSFVWLD TAAKVPTMGT YLADIQAKNK AGANPPIAGI FWYDLPDRD CAALASNGEY S IANNGVANY KAYI DAIRAQ LLKYSDVHTI LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC VNYALKQLNL PNVAMYLDAG HAGWLGWPAN LGPAAQLFAS VYKDAGSPAA VRGLATNVAN YNAWSISTCP SYTQGDSNCD EKRYINALAP LLKAQGFSDA HFIMDTSRNG VQPTKQQAWG DWCNVIGTGF GVRPTTNTGD PLQDAFVWVK PGGESDGTSD TSSARYDAHC GYSDALQPAP EAGTWFQAYF EQLLTNANPS F
[0249] SEQ ID NO: 139 (node #83) QNCQTVWGQC GGQGWSGPTS CVAGAACSTL NPYYAQCIPG AATSTTTTTT TATTTTSTTT TSTTTTQTTT KPTTTGPTTS APSGPTITVT ASGPFSGYQL YANPYYSSEV HTLAMPSLPD SSLQPKASAV AEVPSFVWLD VAAKVPTMGT YLADIQAKNK AGANPPIAGI FWYDLPDRD CAALASNGEY S IAN GVANY KAYI DAIRAQ LVKYSDVHTI LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC VDYALKQLNL PNVAMYLDAG HAGWLGWPAN LGPAATLFAK VYTDAGSPAA VRGLATNVAN YNAWSLSTCP SYTQGDPNCD EKKYINAMAP LLKEAGFSDA HFIMDTSRNG VQPTKQNAWG DWCNVIGTGF GVRPSTNTGD PLQDAFVWIK PGGESDGTSN SSSARYDAHC GYSDALQPAP EAGTWFQAYF EQLLTNANPS F
[0250] SEQ ID NO: 140 (node #84)
QNCQTVWGQC GGIGWSGPTS CVAGAACSTQ NPYYAQCLPG TATTTTTTTT TSTTTTSTTT TAGTGTTTTS APSGPTITAS ASGPFSGYQL YANPYYSSEV HTLAIPSLTD SSLAPKASAV AKVPSFVWLD TAAKVPTMGT YLADIKAKNK AGANPPIAGI FWYDLPDRD CAALASNGEY SIANGGVANY KKYI DAIRAQ LLKYPDVHTI LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC VNYALKQLNL PNVAMYLDAG HAGWLGWPAN IGPAAQLFAS VYKDAGSPAA VRGLATNVAN YNAWSISTCP SYTQGDSNCD EKRYINALAP LLKAQGFSDA HFIMDTSRNG VQPTKQQAWG DWCNVIGTGF GVRPTTNTGD SLQDAFVWVK PGGESDGTSD TSSARYDAHC GYSDALQPAP EAGTWFQAYF EQLLTNANPS F
[0251] SEQ ID NO: 141 (node #85)
QNCQTVWGQC GGIGWSGPTN CVAGAACSTQ NPYYAQCLPG TATTTTTTTT TSTTTTSTTT TAGTGTTTTS ASSGPTITAS PSGPFSGYQL YANPYYSSEV HTLAIPSLTD SSLAPKASAV AKVPSFVWLD TAAKVPTMGT YLADIKAKNK AGANPPIAGI FWYDLPDRD CAALASNGEY SIANGGVANY KKYI DAIRAQ LLKYPDVHTI LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC VNYALKQLNL PNVAMYIDAG HAGWLGWPAN IGPAAQLFAS VYKDAGAPAA LRGLATNVAN YNAWSISTCP SYTQGDSNCD EKRYINALAP LLKAQGWSDA HFIMDTGRNG VQPTKQQAWG DWCNVIGTGF GVRPTTNTGD SLLDAFVWVK PGGESDGTSD TSSARYDAHC GYSDALQPAP EAGTWFQAYF EQLLTNANPA F
[0252] SEQ ID NO: 142 (node #86)
QNCQSVWGQC GGQGWTGATS CAAGATCSTL NPYYAQCIPA TATTTTTTTT PTTTSSSTTT TSTTTTSTTT TPTTTTTTTT APSGATTTAT ASGPFSGYQL YANPYYSSEV HTLAIPSLTD GSLAPKASAV AKVPSFVWLD TAAKVPTMGT YLADIQAKNK AGANPPIAGI FWYDLPDRD CAAAASNGEY S IANNGVANY KAYI DS IRAQ LKTYSDVHTI LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC VNYALTQLNL PNVAMYLDAG HAGWLGWPAN LGPAAQLFAS VYKNASSPAS VRGLATNVAN YNAWSISSCP SYTQGDSNCD EKRYINALAP LLKAQGFSDA HFIMDTSRNG VQPTKQQAWG DWCNVIGTGF GVRPTTNTGD PLEDAFVWVK PGGESDGTSN TSAARYDYHC GYSDALQPAP EAGTWFQAYF EQLLTNANPS F
[0253] SEQ ID NO: 143 (node #87)
QNCQSVWGQC GGQGWTGATS CAAGATCSTL NPYYAQCIPA TATTATSTTL VTTTSSTSVG TSTATTSTTT TPTTTTTTTT ASTTATTTAA ASGPFSGYQL YANPYYSSEV HTLAIPSLTD GSLAAAATKA AEIPSFVWLD TAAKVPTMGT YLANIQAANK AGASPPIAGI FWYDLPDRD CAAAASNGEY TVANNGVANY KAYIDSIVAQ LKAYPDVHTI LI IEPDSLAN MVTNLSTAKC AEAQSAYYEC VNYALINLNL ANVAMYIDAG HAGWLGWPAN LSPAAQLFAT VYKNASSPAS LRGLATNVAN YNAWSISSAP SYTSGDSNYD EKLYINALSP LLTSNGWPNA HFIMDTSRNG VQPTKQQAWG DWCNVIGTGF GVQPTTNTGD PLEDAFVWVK PGGESDGTSN SSATRYDYHC GYSDALQPAP EAGTWFQAYF VQLLTNANPS L
[0254] SEQ ID NO: 144 (node #88) QNCQSVWGQC GGQGWTGATS CAAGSTCSTL NPYYAQCIPA TATTATSTTL VKTTSSTSVG TSTATTSTTT TPTTTTTTTT ASTTATTTAA ASGPFSGYQL YANPYYSSEV HTLAIPSLTD GSLAAAATKA AEIPSFVWLD TAAKVPTMGT YLANIEAANK AGASPPIAGI FWYDLPDRD CAAAASNGEY TVANNGVANY KAYIDSIVAQ LKAYPDVHTI LI IEPDSLAN MVTNLSTAKC AEAQSAYYEC VNYALINLNL ANVAMYIDAG HAGWLGWSAN LSPAAQLFAT VYKNASSPAS LRGLATNVAN YNAWSISSAP SYTSGDSNYD EKLYINALSP LLTSNGWPNA HFIMDTSRNG VQPTKQQAWG DWCNVIGTGF GVQPTTNTGD PLEDAFVWVK PGGESDGTSN SSATRYDYHC GYSDALQPAP EAGTWFQAYF VQLLTNANPS L
[0255] SEQ ID NO: 145 (node #89)
QNCQSVWGQC GGQGWSGATS CAAGSTCSTL NPYYAQCIPG TATTATSTTL VKTTSSTSVG TSTATTSVGT TSPPTTTTTK ASTTATTTAA ASGPFSGYQL YANPYYSSEV HTLAIPSLTD GSLAAAATKA AEIPSFVWLD TAAKVPTMGT YLANIEAANK AGASPPIAGI FWYDLPDRD CAAAASNGEY TVANNGVANY KAYIDSIVAQ LKAYPDVHTI LI IEPDSLAN MVTNLSTAKC AEAQSAYYEC VNYALINLNL ANVAMYIDAG HAGWLGWSAN LSPAAQLFAT VYKNASAPAS LRGLATNVAN YNAWSISSPP SYTSGDSNYD EKLYINALSP LLTSNGWPNA HFIMDTSRNG VQPTKQQAWG DWCNVIGTGF GVQPTTNTGD PLEDAFVWVK PGGESDGTSN SSATRYDFHC GYSDALQPAP EAGTWFQAYF VQLLTNANPA L
[0256] SEQ ID NO: 146 (node #90)
QNCASVWGQC GGQGWSGATC CASGSTCVVS NDYYSQCLPG SATTTTSSTS STTTTSSTTT SSTTTSTTST TTPPATTTTS APSGPSSTAT YSGPFSGVQL WANSYYASEV HTLAIPSLTD GAMAAKAAAV AKVPSFQWLD TAAKVPTMAG TLADIRAANK AGANPPYAGQ FWYDLPDRD CAAAASNGEY SIADNGVAKY KAYI DS IRAQ LVKYSDIRII LVIEPDSLAN LVTNMNVAKC ANAQAAYLEC TNYAITQLNL PNVAMYLDAG HAGWLGWPAN LQPAAQLFAK VYKNAGKPAA LRGLATNVAN YNAWNITSAP SYTQGNSNYD EKHYIHALSP LLTAQGWSDA HFIVDQGRSG KQPTGQQAWG DWCNVIGTGF GVRPTANTGD ALVDAFVWVK PGGECDGTSD TSAARYDYHC GLSDALQPAP EAGTWFQAYF EQLLTNANPS F
[0257] SEQ ID NO: 147 (node #91)
QNCASVWGQC GGTGWSGATC CASGSTCVVQ NDYYSQCLPG SATTTTSSTS STTTTSSTTT SSTTTSTTST TTPPATTTTS APSGPSSTAS YSGPFSGVQL WANSYYASEV HTLAIPSLTD GAMAAKAAAV AKVPSFQWLD TAAKVPTMAG TLADIRAANK AGANPPYAGQ FWYDLPDRD CAAAASNGEY SIADNGVAKY KAYI DAIRAQ LVAYSDIRII LVIEPDSLAN LVTNMNVAKC ANAQAAYLEC TNYAITQLNL PNVAMYLDAG HAGWLGWPAN LQPAAQLFAK VYKNAGKPAA LRGLATNVAN YNAWNITSAP SYTQGNSNYD EKHYIQALSP LLTAAGWSDA HFIVDQGRSG KQPTGQQAWG DWCNVIGTGF GVRPTANTGL ELVDAFVWVK PGGECDGTSD TTAARYDYHC GLSDALQPAP EAGTWFQAYF EQLLTNANPS F
[0258] SEQ ID NO: 148 (node #92)
QNCGSVWGQC GGIGWSGATC CASGSTCVVQ NDYYSQCLPG SSTTTTSSTS STTTTSSTTS SSTTTSTTST TTPPATTTTS APGGPSSTAS YSGPFSGVQL WANNYYASEV HTLAIPSLTD GAMAAKAAAV AKVPSFQWLD TAAKVDTMAG TLADIRAANK AGANPPYAGQ FWYDLPDRD CAAAASNGEF SIADGGVAKY KAYI DAIRKQ LVAYSDIRTI LVIEPDSLAN MVTNMNVAKC ANAQAAYLEC TIYAIKQLNL PNVAMYLDAG HAGWLGWPAN LQPAAELFAK IYKDAGKPAA LRGLATNVAN YNAWNITSAP SYTSPNPNYD EKHYIEAFSP LLTAAGWSDA HFIVDQGRSG KQPTGQQEWG DWCNAIGTGF GVRPTANTGL ELVDAFVWVK PGGECDGTSD TTAARYDYHC GLSDALKPAP EAGQWFQAYF EQLLTNANPP F
[0259] SEQ ID NO: 149 (node #93) QSCGSQWGQC GGIGWSGATC CASGSTCVVQ NDYYSQCLPG SSTTTTSSTS STTTTSSTSS STTTTSTTST TTPPATTTTS APGGPGTTAS FTGPFSGVNL WANNYYASEV HTLAI PSLTD GAMAAKAAAV AKVPSFQWLD IAAKVDTMPG TLADIRAANK AGGNPPYAAQ FWYDLPDRD CAAAASNGEF SIADGGVAKY KAYI DAIRKQ LVAYSDIRTI LVIEPDSLAN MVTNMNVPKC ANAQAAYLEC TIYAIKQLNL PNVAMYLDGG HAGWLGWPAN LQPAADLFGK LYADAGKPSQ LRGMATNVAN YNAWNLTSAP SYTSPNPNYD EKHYIEAFSP LLAAKGWSNA HFIVDQGRSG KQPTGQQEWG HWCNAMGTGF GMRPSANTGL ELADAFVWIK PGGECDGTSD TTAARFDHFC GMSDALKPAP EAGQWFQAYF EQLLTNANPP F
[0260] SEQ ID NO: 150 (node #94)
QNCGSVWGQC GGIGWSGATC CASGSTCVEQ NDYYSQCLPG SSTTTTSSTS STTTTSSTTS SSTSTSTTST TTPPVPTTTS IPGGASSTAS YSGPFSGVQL WANNYYASEV HTLAI PSMTD GAMAAKAAAV AKVPSFQWLD RNVTVDTMAG TLAEIRAANK AGANPPYAGQ FWYDLPDRD CAAAASNGEF SIANGGVANY KAYI DAIRKL LIQYSDIRTI LVIEPDSLAN MVTNMNVAKC ANAASAYKEC TIYAIKQLNL PNVAMYLDAG HAGWLGWPAN IQPAAELFAK IYKDAGKPAA VRGLATNVAN YNAWSISSAP SYTSPNPNYD EKHYIEAFSP LLTAAGFSPA HFIVDTGRSG KQPTGQQEWG DWCNAIGTGF GVRPTANTGH ELVDAFVWVK PGGECDGTSD TTAARYDYHC GLSDALKPAP EAGQWFQAYF EQLLTNANPP F
[0261] SEQ ID NO: 151 (node #95)
QNCGSAWSQC GGIGWSGATC CSSGNSCVEI NSYYSQCLPG ASTSPTSTSK VSSTTSKVTS SSAAQPITTT TAPSVPTTTT IAGGASSTAS FTGPFLGVQG WANSYYSSEI YNHAI PSMTD GSLAAQASAV AKVPTFQWLD RNVTVDTMKS TLEEIRAANK AGANPPYAAH FWYDLPDRD CAAAASNGEF SIANGGVANY KTYINAIRKL LIEYSDIRTI LVIEPDSLAN LVTNTNVAKC ANAASAYKEC TNYAITQLDL PHVAQYLDAG HGGWLGWPAN IQPAATLFAD IYKAAGKPKS VRGLVTNVSN YNGWSLSSAP SYTTPNPNYD EKKYIEAFSP LLNAAGFSPA QFIVDTGRSG KQPTGQIEQG DWCNAIGTGF GVRPTTNTGS SLADAFVWVK PGGESDGTSD TSATRYDYHC GLSDALKPAP EAGQWFQAYF EQLLKNANPA F
[0262] SEQ ID NO: 152 (node #96)
QNCGSVWGQC GGIGWSGATC CASGSTCVEQ NDWYSQCLPG SSTTTTSSTS STTTTSSTTS SSTSTSTTST TTPPVPTTTS IPGGASSTAS YSGPFSGVQL WANNYYRSEV HTLAI PSMTD GAMAAKAAAV AEVPSFQWLD RNVTVDTMAG TLAEIRAANK AGANPPYAGQ FWYDLPDRD CAAAASNGEF SIANGGAANY KAYI DAIRKL LIQYSDIRTI LVIEPDSLAN MVTNMNVAKC ANAASTYKEL TIYALKQLNL PNVAMYLDAG HAGWLGWPAN IQPAAELFAK IYKDAGKPAA VRGLATNVAN YNAWSISSAP SYTSPNPNYD EKHYIEAFSP LLTAAGFSPA HFIVDTGRSG KQPTGQQEWG DWCNAIGTGF GVRPTANTGH ELVDAFVWVK PGGECDGTSD TTAARYDYHC GLSDALKPAP EAGQWFQAYF EQLLTNANPP F
[0263] SEQ ID NO: 153 (node #97)
QNCGSVWSQC GGIGWSGATC CASGSTCVEQ NDWYSQCLPN SSTTTSTSTR STTTSSSTTS SSTSTSTTST TTPPVPTTTS IPGGASSTAS YSGPFSGVQL WANDYYRSEV HTLAI PSMTD GAMAAKAAAV AEVPSFQWLD RNVTIDTMAQ TLSQIRAANK AGANPPYAGQ FWYDLPDRD CAAAASNGEF SIANGGAANY KAYI DAIRKL IIQYSDIRII LVIEPDSLAN MVTNMNVAKC ANAASTYKEL TVYALKQLNL PNVAMYLDAG HAGWLGWPAN IQPAAELFAK IYKDAGKPAA VRGLATNVAN YNAWSIASAP SYTSPNPNYD EKHYIEAFSP LLTAAGFSPA HFIVDTGRNG KQPTGQQQWG DWCNVIGTGF GVRPTANTGH ELVDAFVWVK PGGECDGTSD TTAARYDYHC GLSDALQPAP EAGQWFQAYF EQLLTNANPP F
[0264] SEQ ID NO: 154 (node #98) QNCGAVWTQC GGNGWQGPTC CASGSTCVAQ NEWYSQCLPN SPSSTSTSQR STSTSSSTTR SGSSTSSSST TPPPVSSPTS IPGGATSTAS YSGPFSGVRL FANDYYRSEV HNLAI PSMTD GTLAAKASAV AEVPSFQWLD RNVTIDTMVQ TLSQVRALNK AGANPPYAAQ LWYDLPDRD CAAAASNGEF SIANGGAANY RSYI DAIRKH IIEYSDIRII LVIEPDSMAN MVTNMNVAKC SNAASTYHEL TVYALKQLNL PNVAMYLDAG HAGWLGWPAN IQPAAELFAG IYNDAGKPAA VRGLATNVAN YNAWSIASAP SYTSPNPNYD EKHYIEAFSP LLNSAGFSPA RFIVDTGRNG KQPTGQQQWG DWCNVKGTGF GVRPTANTGH ELVDAFVWVK PGGESDGTSD TSAARYDYHC GLSDALQPAP EAGQWFQAYF EQLLTNANPP F
[0265] SEQ ID NO: 155 (node #99)
QNCGSVWGQC GGIGWNGATC CASGSTCVKQ NDWYSQCLPG SSTTTTSSTS STTTTSSTTS SSTSTSTTST TTPPVPTTTS IPGGASSTAS YSGPFSGVQL WANNYYRSEV HTLAI PSLTD GAMAAKAAAV AEVPSFQWLD RNVTVDTFSG TLAEIRAANQ AGANPPYAGQ FWYDLPDRD CAAAASNGEW SIANGGAANY KAYI DRIREL LIQYSDIRTI LVIEPDSLAN MVTNMNVAKC A AASTYKEL TIYALKQLNL PNVAMYLDAG HAGWLGWPAN IQPAAELFAK IYKDAGRPAA VRGLATNVAN YNAWSISSAP SYTSPNPNYD EKHYIEAFSP LLTAAGFSPA HFIVDTGRSG KQPTGQLEWG HWCNAIGTGF GVRPTANTGH ELVDAFVWVK PGGECDGTSD TTAARYDYHC GLSDALKPAP EAGQWFQAYF EQLLTNANPP F
[0266] SEQ ID NO: 156 (node #100)
QNCGSVWGQC GGIGYNGPTC CQSGSTCVKQ NDWYSQCLPG SSTTTTSSTS STTTTSSTTS SSTSTSTTST TTAPAPTTTT IPGGASSTAS YNGPFSGVQL WANNYYRSEV HTLAI PSLTD PALAAKAAAV AEVPSFQWLD RNVTVDTFSG TLAEIRAANQ AGANPPYAGQ FWYDLPDRD CAAAASNGEW S IANNGANNY KAYI DRIREL LIQYSDIRTI LVIEPDSLAN MVTNMNVAKC SNAASTYKEL TIYALKQLNL PHVAMYMDAG HAGWLGWPAN IQPAAELFAK IYKDAGRPAA VRGLATNVAN YNAWSISSPP SYTSPNPNYD EKHYIEAFSP LLTAQGFSPA QFIVDTGRSG KQPTGQLEWG HWCNAIGTGF GVRPTANTGH ELVDAFVWVK PGGECDGTSD TTAARYDYHC GLSDALKPAP EAGQWFQAYF EQLLTNANPP F
[0267] SEQ ID NO: 157 (node #101)
QNCGSVWGQC GGNGWNGATC CASGSTCVKQ NDWYSQCLPG SSTTTTPSST STTTSSSSRS TSTSTSTTST TTPPVATTTS IPGGASSTAS YSGPFSGVQL WANNYYRSEV HTLAI PSLTD GAMATKAAAV AEVPSFQWMD RNVTVDTFSG TLAEIRAANQ AGANPPYAGI FWYDLPDRD CAAAASNGEW SIANGGAANY KAYI DRIREL LIQYSDIRTI LVIEPDSLAN MVTNMNVAKC AGAASTYKEL TIYALKQLNL PNVAMYLDAG HAGWLGWPAN IQPAAELFAK IYKDAGRPAA VRGLATNVAN YNAWSISSAP SYTSPNPNYD EKHYVEAFSP LLTAAGFSPA HFITDTGRSG KQPTGQLEWG HWCNAIGTGF GQRPTANTGH DLVDAFVWIK PGGECDGTSD TTAARYDHHC GLADALKPAP EAGQWFQAYF EQLLTNANPP F
[0268] SEQ ID NO: 158 (node #102)
QACASVWGQC GGQGWSGATC CASGSTCVVS NDYYSQCLPG SATTSTSSTS STTTTSSTTT SSTTTSTTST TTPPGTTTTS APSGPSGTAT YSGPFSGVNL WANSYYASEV STLAI PSLTD GAMATAAAAV AKVPSFQWLD TAAKVPLMEG TLADIRAANK AGGNPPYAGQ FWYDLPDRD CAAAASNGEY SIADNGVAKY KAYIDSIRAI LVKYSDIRII LVIEPDSLAN LVTNMNVAKC ANAQAAYLEC TNYAITQLNL PNVAMYLDAG HAGWLGWPAN LQPAAQLFAK VYKNAGKPRA LRGLATNVAN YNGWNITSAP SYTQGNSNYD EKHYIHALSP LLTQQGWSDA HFIVDQGRSG KQPTGQQAWG DWCNVIGTGF GVRPTANTGD ALVDAFVWVK PGGECDGTSD TSAARYDYHC GLSDALQPAP EAGTWFQAYF EQLLTNANPS F
[0269] SEQ ID NO: 159 (node #103) QACASVWGQC GGQGWSGATC CASGSTCVVS NDFYSQCLPG SATTSTSSTV STTTTSVTTT SSTTTATTST STPPGTTVTS APSGPSGTAT YTGPFSGVNL WANSYYRSEV STLAIPSLSD GAMATAAAKV AKVPSFQWMD TAAKVPLMEG TLADIRKANK AGGNPPYAGQ FWYDLPDRD CAAAASNGEY SIADDGVAKY KAYIDSIRAI LVKYSDIRII LVIEPDSLAN LVTNMNVAKC ANAQAAYLEC TNYAITQLNL PNVAMYLDAG HAGWLGWPAN LPPAAQLFAK VYKDAGKPRA LRGLVTNVSN YNGWNISSAP SYTQGNPNYD EKHYIHALSP LLTQEGWSDA KFIVDQGRSG KQPTGQQAWG DWCNAIGTGF GVRPTANTGD ALVDAFVWVK PGGESDGTSD TSAARYDYHC GLADALKPAP EAGTWFQAYF EQLLTNANPS F
[0270] SEQ ID NO: 160 (node #104)
QSCSNVWSQC GGQNWSGTPC CTSGNKCVKV NDFYSQCQPG SATSPTSSTV SATTTSVTTT ATKTTATTSS STTSGTSVTS APSGPSGPPA ATDPFSGVDL WANNYYRSEV STLAIPKLSD GAMATAAAKV ADVPSFQWMD TYDHISLMEE TLADIRKANK AGGNPPYAGQ FWYDLPDRD CAAAASNGEY SLDKDGANKY KAYIAKIKGI LQNYSDTRI I LVIEPDSLAN LVTNMNVAKC ANAESAYKEL TIYAIKELNL PNVSMYLDAG HGGWLGWPAN LPPAAQLYAQ IYKDAGKPSR LRGLVTNVSN YNGWKLSSKP DYTESNPNYD EQRYIHALSP LLAQEGWSNA KFIVDQGRSG KQPTGQKAWG DWCNAKGTGF GLRPSANTGD ALVDAFVWVK PGGESDGTSD TSAARYDYHC GLDDALKPAP EAGTWFQAYF EQLLKNANPS F
[0271] SEQ ID NO: 161 (node #105)
QACASQWGQC GGQGWSGPTC CASGSTCVVS NAFYSQCLPG SATTSTSSTV STTTTSVTST SSTTTATTSV STPPGTTVTS PPSGPSGGAT YTGPFAGVNL WANSYYRSEV STLAIPSLSD GALATAAAKV AKVPTFQWMD TAAKVPLMDG TLADIRKANK AGGNPPYAGQ FWYNLPDRD CAAAASNGEL SIADDGVAKY KAYIDSIRAI LVKYSDIRII LVIEPDSLAN LVTNMNVAKC ANAQAAYLEC TNYAVTQLNL PNVAMYLDAG HAGWLGWPAN LPPAAALFAK VYKDAGKPKA LRGLVTNVSN YNGWNISSAP SYTQGNPNYD EKHYI DALAP LLTQEGWSDA KFIVDQGRSG KQPTGQQAWG DWCNAIGTGF GVRPTANTGD ALVDAFVWVK PGGESDGTSD TTAARYDYHC GLADALKPAP EAGTWFQAYF EQLLTNANPS F
[0272] SEQ ID NO: 162 (node #106)
QACASQWGQC GGQGWTGPSC CAAGSVCTVS NPFYSQCLPG STVASSTSTV RTSSTPVVSP SRTSTVTGSV STTSAGTGTT PPSGPTGGAT YTGPFVGVNL WANSYYASEI STLAIPSLSD PALATAAAKV AKVPTFMWMD TRSKIPLVDA TLADIRKANQ AGANPPYAGE FWYNLPDRD CAAAASNGEL SIADGGVAKY KQYI DDIRAM VVKYSDIRII LTIEPDSLAN LVTNLNVPKC AGAQAAYLEG TNYAVTQLNL PNVAMYLDGG HAGWLGWPAN LPPAAAMYAK VYKDAGKPKA LRGLVTNVSN YNGYSISTAP SYTQGNANYD EKHYIEALAP LLSAEGWSDA KFIVDQGRSG KQPTGQLAWG DWCNAIGTGF GVRPTANTGS TLVDAFVWVK PGGESDGTSD TTAARYDLNC GKADALKPAP EAGTWFQAYF EQLLINANPA F
[0273] SEQ ID NO: 163 (node #107)
QACASQWGQC GGQGWSGPTC CPSGTTCQLQ NAWYSQCLPG AATTAASSTR PATTSSVRST TVVNPPTTTV APPPGTTVAP PPAPPPGGAT YTGPFAGVNQ WANAYYRSEV SSLAVPSLSD GPLATAAAKV ADVPTFQWMD TTAKVPLIDG ALADIRRANA AGGNPPYAGI FWYNLPDRD CAAAASNGEL SIANDGINKY KAYIDSIRAV LLKYNDIRTL LVIEPDSLAN MVTNMGVAKC SNAAAAYKEC TKYAVQQLDL PHVAQYLDAG HAGWLGWPAN IGPAATIFTD IYKEAGKPKS LRGLATNVSN YNAWNASSPA PYTSPNPNYD EKHYVDAFAP LLRQNGWSDA KFI IDQGRSG KQPTGQQEWG HWCNALGTGF GLRPTSNTGH PDVDAFVWVK PGGEADGTSD TTAVRYDHFC GSASSMKPAP EAGTWFQAYF EQLLRNANPS F
[0274] SEQ ID NO: 164 (node #108) QACASVWGQC GGQGWSGATC CASGSTCVVS NDYYSQCLPG SATTSTSSTS SSTTTSSTRA SSTTTSSSST TPPPGSTTTS APPVGSGTAT YSGPFSGVNP WANSYYASEV SSLAIPSLTD GAMATAAAAV AKVPSFMWLD TLAKTPLMES TLADIRAANK AGGNPPYAGQ FWYDLPDRD CAAAASNGEY SIADNGVAKY KNYIDTIRAI LVKYSDIRTI LVIEPDSLAN LVTNLSVAKC ANAQAAYLEC INYAITQLNL PNVAMYLDAG HAGWLGWPAN QQPAAQLFAN VYKNASSPRA LRGLATNVAN YNGWNITSAP SYTQGNSVYN EKLYIHAISP LLTQQGWSNA YFITDQGRSG KQPTGQQAWG DWCNVIGTGF GIRPSANTGD SLLDAFVWVK PGGECDGTSD TSAARYDYHC GLSDALQPAP EAGTWFQAYF VQLLTNANPS F
[0275] SEQ ID NO: 165 (node #109)
QACSSVWGQC GGQNWSGPTC CASGSTCVYS NDYYSQCLPG SAASSSSSTR ASSTTSRARV
SSTTSSSSSA TPPPGSTTTR VPPVGSGTAT YSGPFVGVTP WANAYYASEV SSLAIPSLTD
GAMATAAAAV AKVPSFMWLD TLDKTPLMEQ TLADIRTANK NGGNPPYAGQ FWYDLPDRD
CAALASNGEY SIADGGVAKY KNYIDTIRQI VVEYSDIRTI LVIEPDSLAN LVTNLGTPKC
ANAQSAYLEC INYAITQLNL PNVAMYLDAG HAGWLGWPAN QDPAAQLFAN VYKNASSPRA
LRGLATNVAN YNGWNITSPP SYTQGNAVYN EKLYIHAIGP LLANHGWSNA FFITDQGRSG
KQPTGQQQWG DWCNVIGTGF GIRPSANTGD SLLDSFVWVK PGGECDGTSD SSAPRFDSHC
ALPDALQPAP QAGAWFQAYF VQLLTNANPS F
[0276] SEQ ID NO: 166 (node #110)
QACSSVWGQC GGQNWSGPTC CASGSTCVYS NDYYSQCLPG SAASSSSSTR ASSTTSRARV
SSTTSSSSSA TPPPGSTTTR VPPVGSGTAT YSGPFVGVTP WANAYYASEV SSLAIPSLTD
GAMATAAAAV AKVPSFMWLD TLDKTPLMEQ TLADIRTANK NGGNPPYAGQ FWYDLPDRD
CAALASNGEY SIADGGVAKY KNYIDTIRQI VVEYSDIRTL LVIEPDSLAN LVTNLGTPKC
ANAQSAYLEC INYAVTQLNL PNVAMYLDAG HAGWLGWPAN QDPAAQLFAN VYKNASSPRA
LRGLATNVAN YNGWNITSPP SYTQGNAVYN EKLYIHAIGP LLANHGWSNA FFITDQGRSG
KQPTGQQQWG DWCNVIGTGF GIRPSANTGD SLLDSFVWVK PGGECDGTSD SSAPRFDSHC
ALPDALQPAP QAGAWFQAYF VQLLTNANPS F
[0277] SEQ ID NO: 167 (node #111)
QACSSVWGQC GGQNWSGPTC CASGSTCVYS NDYYSQCLPG SAASSSSSTR ASSTTARARA
SSTTSSRSSA TPPPGSSTTR VPPVGSGTAT YSGPFVGVTP WANAYYASEV SSLAIPSLTD
GAMATAAAAV AKVPSFMWLD TFDKTPLMEQ TLADIRTANK NGGNPPYAGQ FWYDLPDRD
CAALASNGEY SIADGGVDKY KNYIDTIRQI VVEYSDIRTL LVIEPDSLAN LVTNLGTPKC
ANAQSAYLEC INYAVTQLNL PNVAMYLDAG HAGWLGWPAN QDPAAQLFAN VYKNASSPRA
LRGLATNVAN YNGWNITSPP SYTQGNAVYN EQLYIHAIGP LLANHGWSNA FFITDQGRSG
KQPTGQQQWG DWCNVIGTGF GIRPSANTGD SLLDSFVWIK PGGECDGTSD SSAPRFDSHC
ALPDALQPAP QAGAWFQAYF VQLLTNANPS F
[0278] SEQ ID NO: 168 (node #112)
QACSSVWGQC GGQNWSGPTC CASGSTCVYS NDYYSQCLPG SAASSSSSTR AASTTSRARV
SPTTSRSSSA TPPPGSTTTR VPPVGSGTAT YSGPFVGVTP WANAYYASEV SSLAIPSLTD
GAMATAAAAV AKVPSFMWLD TLDKTPLMEQ TLADIRTANK NGGNPPYAGQ FWYDLPDRD
CAALASNGEY SIADGGVAKY KNYIDTIRQI VVEYSDIRTL LVIEPDSLAN LVTNLGTPKC
ANAQSAYLEC INYAVTQLNL PNVAMYLDAG HAGWLGWPAN QDPAAQLFAN VYKNASSPRA
LRGLATNVAN YNGWNITSPP SYTQGNAVYN EKLYIHAIGP LLANHGWSNA FFITDQGRSG
KQPTGQQQWG DWCNVIGTGF GIRPSANTGD SLLDSFVWVK PGGECDGTSD SSAPRFDSHC
ALPDALQPAP QAGAWFQAYF VQLLTNANPS F
[0279] SEQ ID NO: 169 (node #113) QACSSVWGQC GGQNWSGPTC CASGSTCVYS NDYYSQCLPG SAASSSSSTR AASTTSRARV SPTTSRSSSA TPPPGSTTTR VPPVGSGTAT YSGPFVGVTP WANAYYASEV SSLAIPSLTD GAMATAAAAV AKVPSFMWLD TLDKTPLMEQ TLADIRTANK NGGNPPYAGQ FWYDLPDRD CAALASNGEY SIADGGVAKY KNYIDTIRQI VVEYSDIRTL LVIEPDSLAN LVTNLGTPKC ANAQSAYLEC INYAVTQLNL PNVAMYLDAG HAGWLGWPAN QDPAAQLFAN VYKNASSPRA LRGLATNVAN YNGWNITSPP SYTQGNAVYN EKLYIHAIGP LLANHGWSNA FFITDQGRSG KQPTGQQQWG DWCNVIGTGF GIRPSANTGD SLLDSFVWVK PGGECDGTSD SSAPRFDSHC ALPDALQPAP QAGAWFQAYF VQLLTNANPS F
APPENDIX 4
[0280] SEQ ID NO: 170 (node #58)
QDCAAGWGAC GGGGWGGDTS CTGGGSCGVG APTYSGCGAG TPTSSGNVDP TTTSSGNVDP TTTSSPTTTS SGNVTPTTSA APGGGNCSPA VSGPFAGAQL FVDPYYVKEV DSLSIAQVTD TALKAKAEKV KQIPTAFWLD RIEAIKELPA YLDDALKLQK ELCKPPVTAL FWYDLPNRD CFAEASNGEL HLDQNGLQRY KEYIDPIKQI LKKYSGQRIV AVIEPDSLPN LVTNLNGKRC DETQASYRDG VAYTLKELNL PHVYMYLDAG HAGWLGWPDN QKKGAKIFAE VIKAAGSPAN VRGFATNVAN YTQLSYTAES YDQQGNPCFD EFDYVDAMAS ALSAEGLGDK HFI IDTSRNG VGNIDREDWG YWCNNKGAGM GQRPKANGGA TLLDAFVWVK PPGESDGTGD EGQPRYDLFC GKENADQPAP EAGQWFHEYF VQLVKNANPP
[0281] SEQ ID NO: 171 (node #59)
QDCAAGWGAC GGNGWTGDTS CAGGGSCGVA NPTYSQCWAG SPTSTTNTDP PTTTSSSVGS TSTSNPTTTS SPPPTPTTTS APSGSTSTPT VSGPFSGVQL YLNPYYVAEV DALAIAQLTD AALKAKAEKV KQIPTAIWLD TIENVPQLPT YLDDALALQK ASGKKPVLW FWYDLPDRD CHAAASNGEL SIDDNGLQKY KGYIDSIEAQ LKKYSDQRVV LVIEPDSLAN LVTNLNAAKC ADAQASYKEG VAYAIKQLNL PHVYMYLDAG HAGWLGWPDN QEKAAKVFAE VIKNAGSPAK VRGFATNVAN YTALSDTSRS PDTQGNPCYD EKHYI DAMAS ALRAQGLSDA HFI IDTSRNG VQNIDRQQWG DWCNVKGAGL GARPKANSGS SLLDAFVWIK PPGESDGTSD ESAPRYDSYC GREDALQPAP EAGQWFQEYF VQLLKNANPP L
[0282] SEQ ID NO: 172 (node #60)
QDCAAGWGAC GGNEWTDDNG CAGGGWCGVA NPTYSSCWAG SPTSTTNTDV SYTDSDSVGK WGVENPTTTS TQPPTPTTTS APSGSTSTPT VSGPFSGVQL YLNPYYVAEV DALAIAQLTD SALKAKAEKV KQIPTAIWLD TIENVPQLPT YLDDALALQK SSGKKPVLW FWYDLPDRD CHAAASNGEL LIDDNGLQKY KGYIDSIEEQ LKKYPNQRVV LIIEPDSLAN LVTNLNTAKC ADAQNSYKEG VAYAIKKFGL PHVYMYLDAG HAGWLGWPDN REKAAKVFAE VIKNAGSPGK VRGFTTNVAN YTPLSDTSRG PDTQGNPCYD EKHYLDAMAS ALKAQGISDV HFI IDTSRNG VKNIDRKQSG DWCNVKGAGL GARPKANSGT SLLDAFVWIK PPGESDGTSD ESAPRYDSYC GREDALQPAP EAGQWFQEYF VQLLKNANPP L
[0283] SEQ ID NO: 173 (node #61)
LDCAAGYPCC SGNEYTDDDG VENGNWCGIA DPTYESCWAE SPCSTTNTEV EYTDSDSVGK WGVENPVPTG TQTPTPTSGS DPSTPGSPLT ISGPFSGVEF YLNPYYVAEV DALAIAQMSD SSLKAKAEKM KTFSNAIWLD TIKNMQQLET NLKGALAQHQ TSGKKPVLTV FWYDLPGRD CHALASNGEL LANDSDLQRY KSYIDVIEEK LKKYNSQPVV LIIEPDSLAN LVTNLNTPAC ADSEKYYLEG HAYLIKKFGL PHVAMYLDIG HAFWLGWDDN REKAAKVYAK VIKSSGSPGK VRGFTDNVAN YTPWEDPSRG PDTEWNPCPD EKRYLEAMHK DFKAAGISSV YFVSDTSRNG HKNVDRKHPG EWCNQTGVGI GARPKANSGM DYLDAFYWIK PLGESDGTSD ESAARYDGYC GHETAMKPAP EAGQWFQKHF EQGLKNANPP L
[0284] SEQ ID NO: 174 (node #62)
QNCASVYGQC GGNGWTGATT CVSGSTCTVI NPWYSQCLPG SATTTTTTTS PTTTTSSTSS SSTSTSTTTS SPPTTTTTTT APSGSTTTAT ASGPFSGYQL YLNPYYAAEV DALAIAQITD ATLKAKAAKV KQIPTFIWLD TIAKVPTLGT YLADASALQK ASGKKPVLVQ FWYDLPDRD CAAAASNGEF SIADNGLAKY KGYIDSIVAQ LKKYPDVRVV LVIEPDSLAN LVTNLNVAKC ANAQAAYKEG VTYAIKQLNL PNVYMYLDAG HAGWLGWPAN LSPAAQLFAQ IYKNAGSPAS VRGLATNVAN YNALSATSPD PVTQGNPNYD EKHYINALAP LLRSQGWSDA HFIVDTSRNG VQNIGRQQWG DWCNVKGAGF GVRPTTNTGS SLIDAFVWIK PGGESDGTSD TSAPRYDSHC GLSDALQPAP EAGTWFQAYF VQLLKNANPP F [0285] SEQ ID NO: 175 (node #63)
QNCAPLYGQC GGNGWTGATT CVSGSTCTVI NPWYSQCLPG SATTTTTTSS PTTTTSSVSS SSTSTSTSTS SPPTTTTTTT APSGSTTTAP AAGPFSGYQI YLSPYYAAEV DALAAAQISD ATLKAKALKV KEI PTFTWLD TIAKVPTLGT YLADASALQK SSGKKPQLVQ IWYDLPDRD CAAAASNGEF SIADNGLAKY KGYIDQIVAQ IKKYPDVRVV AVIEPDSLAN LVTNLNVAKC ANAQTAYKEG VTYALKQLNS VGVYMYLDAG HAGWLGWPAN LSPAAQLFAQ IYKNAGSPAF VRGLATNVAN YNALSATSPD PVTQGNPNYD ELHYINALAP MLRSQGWSDA HFIVDQGRSG VQNIGRQQWG DWCNVKGAGF GMRPTTNTGS SLIDAIVWIK PGGESDGTSD TSAPRYDSHC GLSDATQPAP EAGTWFQAYF VNLVKNANPP L
[0286] SEQ ID NO: 176 (node #64)
QNCAPLYGQC GGNGWTGATT CVSGSTCTVI NPWYSQCLPG SATTTTTTSS PTTTTSSVSS SSTSTSTSTS SPPTTTTTTT APSGSTTTAP AAGPFTGYQI YLSPYYAAEV EALAAAQISD ATLKAKALKV KEI PTFTWFD VIAKVPTLGT YLADASALQK SSGKKPQLVQ IWYDLPDRD CAAAASNGEF SIADNGLAKY KGYIDQIVAQ IKKYPDVRVV AVIEPDSLAN LVTNLNVAKC ANAQTAYKEG VTYALKQLNS VGVYMYLDAG HAGWLGWPAN LSPAAQLFAQ IYKNAGSPAF VRGLATNVAN YNALSATSPD PVTQGNPNYD ELHYINALAP MLRSQGWSDA HFIVDQGRSG VQNIGRQQWG DWCNVKGAGF GMRPTTNTGS SLIDAIVWIK PGGESDGTSD TSAPRYDSHC GLSDATQPAP EAGTWFQAYF VNLVKNANPP L
[0287] SEQ ID NO: 177 (node #65)
QNCAPLYGQC GGNGWTGATT CVSGSTCTVI NDWYSQCLPG SATTTTTSSS PTTTTSSVSS SSTSTSTSTS SPPTTTTTTT APSGSTSTVP AAGPFTGYQI YLSPYYAAEV QALAAAQISD ATLKAKALKV AQI PTFTWFD VIAKVPTLGT YLADASALQK SSGKKPQLVQ IWYDLPDRD CAALASNGEF SIANNGLANY KGYIDQIVAQ IKKYPDVRVV AVIEPDSLAN LVTNLNVAKC ANAQTAYKEG VTYALQQLNS VGVYMYLDAG HAGWLGWPAN LSPAAQLFAQ LYKNAGSPSF VRGLATNVAN YNALSATSPD PVTQGNPNYD ELHYINALAP MLQSQGFFPA HFIVDQGRSG VQNIGRQQWG DWCNVKGAGF GTRPTTNTGS SLIDAIVWIK PGGECDGTSD TSAPRYDSHC GLSDATQPAP EAGTWFQAYF ETLVKNANPP L
[0288] SEQ ID NO: 178 (node #66)
QNCAPLYGQC GGTGWTGATT CVSGSTCTVI NDYYSQCLPG SATTTTTSSS PTTTTSSVSS SSTSTSTSTS SPPTTPTTTT APSGSTSTTP AAGPFTGYQI YLSPYYAAEV QALAAAQISD ATLKAKAAKV ANI PTFTWFD VIAKVPTLGT YLADASALQK SSGKKPQLVQ IWYDLPDRD CAALASNGEF SIANNGLANY KNYIDQIVAQ IKKYPDVRVV AVIEPDSLAN LVTNLNVAKC ANAQTAYKEG VTYALQQLNS VGVYMYLDAG HAGWLGWPAN LSPAAQLFAQ LYKNAGSPSF VRGLATNVAN YNALSATSPD PITQGNPNYD EIHYINALAP MLQSQGFFPA HFIVDQGRSG VQNIGRQQWG DWCNVKGAGF GTRPTTNTGS SLIDAIVWVK PGGECDGTSD TSAPRYDSHC GLSDATQPAP EAGTWFQAYF ETLVKNANPP L
[0289] SEQ ID NO: 179 (node #67)
QNCAPLYGQC GGIGWTGATT CVSGSTCTVI NDYYSQCLPG SATTTTTTSS PTTTTSSVSS SSTSTSTSTS SPPTTPTTTS APSGSTSTTP AAGPFTGYQI YLSPYYAAEV AALAVAQISD PTLAAKAAKV ANI PTFTWFD WAKVPDLGT YLADASALQK SSGKKPQLVQ IWYDLPDRD CAALASNGEF SIANNGLANY KNYIDQIVAQ IKKYPDVRVV AVIEPDSLAN LVTNLNVAKC ANAQTAYKEG VTYALQQLNS VGVYMYLDAG HAGWLGWPAN LSPAAQLFAQ LYKNAGSPSF VRGLATNVAN YNALSATSPD PITQGNPNYD EIHYINALAP MLQSQGFFPA HFIVDQGRSG VQNIGRQQWG DWCNVKGAGF GTRPTTNTGS SLIDAIVWVK PGGECDGTSD TSAPRYDSHC GLSDATQPAP EAGTWFQAYF ETLVSNANPP L [0290] SEQ ID NO: 180 (node #68)
QNCAPLWGQC GGIGWTGATT CVSGSTCTVI NDYYSQCLPG SATTTTTTSS PTTTTSSVSS SSTSTSTSTS SPPTTPTTTS APSGSTSTTP AAGPFTGYQI YLSPYYAAEV AALAVAQISD PTLAAKAAKV ANIPTFTWFD WAKVPDLGT YLADASALQK SSGKKPQLVQ IWYDLPDRD CAALASNGEF SIANNGLANY KNYIDQIVAQ LKKYPDVRVV AVIEPDSLAN LVTNLNVAKC ANAQTAYKEG VTYALQQLNS VGVYMYLDAG HAGWLGWPAN LSPAAQLFAQ LYKNAGSPSF VRGLATNVAN YNALSAPSPD PITQGNPNYD EIHYINALAP MLQSQGFFPA HFIVDQGRSG VQNIGRQQWG DWCNVKGAGF GTRPTTNTGS SLIDAIVWVK PGGECDGTSD TSAPRYDSHC GLSDATQPAP EAGTWFQAYF ETLVSNANPP L
[0291] SEQ ID NO: 181 (node #69)
QNCAPLYGQC GGIGWTGATT CVSGSTCTVI NDYYSQCLPG SATTTTTTSS PTTSAPSVSS SSTSASTSTS STPTTPTTTS APSGSTSTTP AAGPFTGYQI YLSPYYAAEV AALAVAQISD PTLAAKAAKV ANI PTFTWLD QVAKVPDLGT YLADASALQK SSGKKPQLVQ IWYDLPDRD CAALASNGEF SIADNGQANY HNYIDQIVAQ IKKYPDVRVV AVIEPDSLAN LVTNLNVAKC ANAQTTYKEC VTYALQQLSS VGVYMYLDAG HAGWLGWPAN LSPAAQLFAQ LYKNAGSPSF VRGLATNVAN YNALSATSPD PITQGDPNYD EIHYINALAP LLQQQGFFPA HFIVDQGRSG VQNIGRQQWG DWCNVKGAGF GTRPTTNTGS SLIDAIVWVK PGGECDGTSN TSSPRYDSHC GLSDATQPAP EAGTWFQAYF ETLVSKANPP L
[0292] SEQ ID NO: 182 (node #70)
QNCAPLYGQC GGIGWTGATT CVSGSTCTKI NDYYSQCLPG SATSTTVTSS PTTSAPGSSS SSTSASSSTS STPTTPTTTS APSGSTSTTP AAGPFTGYQI YLSPYYAAEV AALAVAQITD PTLAAKAAKV ANI PTFTWLD QVAKVPDLGT YLADASALQK SSGQKPQLVQ IWYDLPDRD CAAKASNGEF SIADNGQANY HNYIDQIVAQ IKKYPDVRVV AVIEPDSLAN LVTNLNVAKC ANAQTTYKAC VTYALQQLSS VGVYMYLDAG HAGWLGWPAN LSPAAQLFAQ LYKNAGSSPF VRGLATNVAN YNALSATSPD PITQGDPNYD EIHYINALAP LLQQQGFFPA QFIVDQGRSG VQNIGRQQWG DWCNVKGAGF GTRPTTNTGS SLIDAIVWVK PGGECDGTSN SSSPRYDSHC SLSDATQPAP EAGTWFQAYF ETLVSKANPP L
[0293] SEQ ID NO: 183 (node #71)
QNCASLYGQC GGIGWTGATT CVSGSTCTKI NDYYSQCLPG SATSTTVTSS PTTSAAGSSS SSTSASSSTS TTPTTPTSTS APSGSTSTTA AAGPFTGYQI YLSPYYAAEV QALAVANITD SALAAKAAKV ANI PTFTWLD QVAKVPDLGT YLADADALAK SSGQKPQLLQ IWYDLPDRD CAAKASNGEF SIADNGQANY HNYIDQIVAQ IKKYPDVRVV AVIEPDSMAN LVTNLNVAKC ANAATTYKAC VTYALEQLSS VGVYMYLDAG HAGWLGWPAN LSPAAQLFAQ LYQTAGSSPF FRGLATNVAN YNALSTTSPD PITQGDPNYD EMLYINALSP LLQQQGFFPA QFIVDQGRSG VQNIGRSAWG DWCNVKGAGF GTRPTTNTGS SLIDAIVWVK PGGECDGTSN SSSPRYDSHC SLSDALQPAP EAGTWFQAYF ETLVSNANPP L
[0294] SEQ ID NO: 184 (node #72)
QNCAPVYGQC GGIGWTGATT CVSGSTCTKQ NDYYSQCLPG SAASTTVTSS PTTSAPGSSS SSTSASSTTS STPTTPTTTS APSGSTSTTP AAGPFTGYQI YLSPYYAAEV AALAAAQITD PTLAAKAASV ANI PTFTWLD SVAKVPDLGT YLADASALQK SSGQKPQLVQ IWYDLPDRD CAAKASNGEF SIADNGQANY HNYIDQIVAQ IKKFPDVRVV AVIEPDSLAN LVTNLNVQKC ANAQTTYKAC VTYALKQLSS VGVYMYMDAG HAGWLGWPAN LSPAAQLFAQ LYKNAGSSPF VRGLATNVAN YNALSAASPD PITQGDPNYD EIHYINALAP LLQQQGFFPA QFIVDQGRSG VQNIGRQQWG DWCNIKGAGF GTRPTTNTGS SLIDAIVWVK PGGECDGTSN SSSPRYDSTC SLSDATQPAP EAGTWFQAYF ETLVSKANPP L
[0295] SEQ ID NO: 185 (node #73) QNCAPVWGQC GGIGWTGPTT CVSGSTCTKQ NDYYSQCLPG SAATTTVTTS PTSSASGSSV SSHSGSSTTS SSPTTPTTTS APSGPSSTPP AAGPWTGYQI YLSPYYANEV AALAAKQITD PTLAAKAASV ANI PTFTWLD SVAKIPDLGT YLADASALGK SSGQKPQLVQ IWYDLPDRD CAAKASNGEF SIADNGQANY QNYIDQIVAQ IKQFPDVRVV AVIEPDSLAN LVTNLNVQKC ANAKTTYLAC VNYALKQLSS VGVYMYMDAG HAGWLGWPAN LSPAAQLFAQ VYKNAGKSPF IKGLATNVAN YNALSAASPD PITQGDPNYD EIHYINALAP LLQQAGFFPA TFIVDQGRSG VQNIGRQQWG DWCNIKGAGF GTRPTTNTGS SLIDSIVWVK PGGECDGTSN SSSPRYDSTC SLSDATQPAP EAGTWFQAYF ETLVSKANPP L
[0296] SEQ ID NO: 186 (node #74)
QNCASVWGQC GGQGWTGATT CVSGSTCVVI NPYYSQCLPG SATTTTSTTA PTTTTSTTTT TSTTTSSTTT PPTTTTTTT APSGATTTAT ASGPFSGYQL YANPYYASEV SALAIPSLTD GAMAAKAAAV AKVPTFVWLD TAAKVPTMGT YLADIRALNK AGANPPIAGI FWYDLPDRD CAAAASNGEY SIADNGLAKY KAYIDSIVAQ LKTYSDVRVI LVIEPDSLAN LVTNLNVAKC ANAQSAYLEG INYAITQLNL PNVAMYLDAG HAGWLGWPAN LSPAAQLFAQ VYKNAGSPAA VRGLATNVAN YNAWSSTSCP SYTQGDSNYD EKLYINALAP LLTAQGWSDA HFIMDTSRNG VQPTGQQAWG DWCNVIGTGF GVRPTTNTGD SLEDAFVWIK PGGESDGTSD TSAARYDYHC GLSDALQPAP EAGTWFQAYF VQLLTNANPS F
[0297] SEQ ID NO: 187 (node #75)
QNCASVWGQC GGQGWTGATS CVSGSTCVVL NPYYSQCLPG SATTTTSTTS PTTTTSTTTT TSTTTSSTTT TPPTTTTTTT APSGATTTAT ASGPFSGYQL YANPYYASEV STLAI PSLTD GAMAAKAAAV AKVPSFVWLD TAAKVPTMGT YLADIRAANK AGANPPIAGI FWYDLPDRD CAAAASNGEY SIADNGVAKY KAYI DS IRAQ LKKYSDVRTI LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC INYAITQLNL PNVAMYLDAG HAGWLGWPAN LSPAAQLFAN VYKNAGSPAA LRGLATNVAN YNAWSITSCP SYTQGDSNYD EKLYINALAP LLTAQGWSDA HFIMDTSRNG VQPTGQQAWG DWCNVIGTGF GVRPTTNTGD SLEDAFVWVK PGGESDGTSD TSAARYDYHC GLSDALQPAP EAGTWFQAYF VQLLTNANPS F
[0298] SEQ ID NO: 188 (node #76)
QNCQSVWGQC GGQGWTGATS CVAGSTCSTL NPYYAQCLPA TATTTTSTTT PTTTTSTTTT TSTTTSSTTT TPTTTTTTTT APSGATTTAT ASGPFSGYQL YANPYYSSEV HTLAIPSLTD GSLAAKASAV AKVPSFVWLD TAAKVPTMGT YLADIRAANK AGANPPIAGI FWYDLPDRD CAAAASNGEY S IANNGVANY KAYI DS IRAQ LKKYSDVHTI LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC INYALTQLNL PNVAMYLDAG HAGWLGWPAN LSPAAQLFAS VYKNAGSPAA LRGLATNVAN YNAWSISSCP SYTQGDSNYD EKLYINALAP LLTAQGWPDA HFIMDTSRNG VQPTKQQAWG DWCNVIGTGF GVRPTTNTGD PLEDAFVWVK PGGESDGTSD TSAARYDYHC GYSDALQPAP EAGTWFQAYF VQLLTNANPS F
[0299] SEQ ID NO: 189 (node #77)
QNCQSVWGQC GGQGWTGATS CAAGSTCSTL NPYYAQCIPA TATTATSTTL VTTTSSTSVG TSTATTSTTT TPTTTTTTTT ASTTATTTAA ASGPFSGYQL YANPYYSSEV HTLAIPSLTD GSLAAAATKA AEIPSFVWLD TAAKVPTMGT YLANIQAANK AGASPPIAGI FWYDLPDRD CAAAASNGEY TVANNGVANY KAYIDSIVAQ LKAYPDVHTI LI IEPDSLAN MVTNLSTAKC AEAQSAYYEC VNYALINLNL ANVAMYIDAG HAGWLGWPAN LSPAAQLFAT VYKNASSPAS LRGLATNVAN YNAWSISSAP SYTSGDSNYD EKLYINALSP LLTSNGWPNA HFIMDTSRNG VQPTKQQAWG DWCNVIGTGF GVQPTTNTGD PLEDAFVWVK PGGESDGTSN SSATRYDYHC GYSDALQPAP EAGTWFQAYF VQLLTNANPS L
[0300] SEQ ID NO: 190 (node #78) QNCQSVWGQC GGQGWTGATS CAAGSTCSTL NPYYAQCIPA TATTATSTTL VKTTSSTSVG TSSATTSTTT TPTTTTTTTT ASTTATTTAA ASGPFSGYQL YANPYYSSEV HTLAIPSLTD GSLAAAATKA AEIPSFVWLD TAAKVPTMGT YLANIEAANK AGASPPIAGI FWYDLPDRD CAAAASNGEY TVANNGVANY KAYIDSIVAQ LKAYPDVHTI LI IEPDSLAN MVTNLSTAKC AEAQSAYYEC VNYALINLNL ANVAMYIDAG HAGWLGWSAN LSPAAQLFAT VYKNASSPAS LRGLATNVAN YNAWSISSAP SYTSGDSNYD EKLYINALSP LLTSNGWPNA HFIMDTSRNG VQPTKQQAWG DWCNVIGTGF GVQPTTNTGD PLEDAFVWVK PGGESDGTSN SSATRYDYHC GYSDALQPAP EAGTWFQAYF VQLLTNANPS L
[0301] SEQ ID NO: 191 (node #79)
QNCQSVWGQC GGQGWSGATS CAAGSTCSTL NPYYAQCIPG TATTATSTTL VKTTSSTSVG TSSATTSVGT TSPPTTTTTK ASTTATTTAA ASGPFSGYQL YANPYYSSEV HTLAIPSLTD GSLAAAATKA AEIPSFVWLD TAAKVPTMGT YLANIEAANK AGASPPIAGI FWYDLPDRD CAAAASNGEY TVANNGVANY KAYIDSIVAQ LKAYPDVHTI LI IEPDSLAN MVTNLSTAKC AEAQSAYYEC VNYALINLNL ANVAMYIDAG HAGWLGWSAN LSPAAQLFAT VYKNASAPAS LRGLATNVAN YNAWSISSPP SYTSGDSNYD EKLYINALSP LLTSNGWPNA HFIMDTSRNG VQPTKQQAWG DWCNVIGTGF GVQPTTNTGD PLEDAFVWVK PGGESDGTSN SSATRYDFHC GYSDALQPAP EAGTWFQAYF VQLLTNANPA L
[0302] SEQ ID NO: 192 (node #80)
QNCQSVWGQC GGQGWTGATS CVAGATCSTL NPYYAQCLPA TATTTTTTTT PTTTTSSTTT TSTTTSSTTT PTTTTTTTT APSGVTTTAT ASGPFSGYQL YANPYYSSEV HTLAIPSLTD GSLAPKASAV AKVPSFVWLD TAAKVPTMGT YLADIRAQNA AGANPPIAGI FWYDLPDRD CAAAASNGEY S IANNGVANY KAYI DS IRAQ LVKYSDVHTI LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC INYALTQLNL PNVAMYLDAG HAGWLGWPAN LSPAAQLFAS VYKNAGSPAA LRGLATNVAN YNAWSISSCP SYTQGDSNCD EKRYINALAP LLKAQGFPDA HFIMDTSRNG VQPTKQQAWG DWCNVIGTGF GVRPTTNTGD PLEDAFVWVK PGGESDGTSD TSAARYDYHC GYSDALQPAP EAGTWFQAYF EQLLTNANPS F
[0303] SEQ ID NO: 193 (node #81)
QNCQTVWGQC GGQGWTGATS CVAGAACSTL NPYYAQCLPA TATTTTTTTT TTTTTSSTTT TSTTTSSTTT TPTTTTTTTT APSSVTTTAT ASGPFSGYQL YANPYYSSEV HTLAIPSLTD GSLAPKASAV AKVPSFVWLD TAAKVPTMGT YLADIRAKNA AGANPPIAGI FWYDLPDRD CAALASNGEY S IANNGVANY KAYI DS IRAQ LVKYSDVHTI LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC INYALTQLNL PNVAMYLDAG HAGWLGWPAN LSPAAQLFAS VYKNAGSPAA LRGLATNVAN YNAWSISTCP SYTQGDSNCD EKRYINALAP LLKAQGFPDA HFIMDTSRNG VQPTKQQAWG DWCNVIGTGF GVRPTTNTGD PLQDAFVWVK PGGESDGTSD TSAARYDAHC GYSDALQPAP EAGTWFQAYF EQLLTNANPS F
[0304] SEQ ID NO: 194 (node #82)
QNCQTLWGQC GGQGWTGATS CVAGAACSTL NPYYAQCLPA TATTTTTTTT TTTTTSSTTT TSATTTSTTT TPTTTTTTTT APSSVTTTAT ASGPFSGYQL YANPYYSSEV HSLAI PSLTD GSLAPAATAV AKVPSFVWLD TAAKVPTMGT YLADIRAKNA AGANPPIAGI FWYDLPDRD CAALASNGEY S IANNGVANY KAYIDSIRAI LVKYSDVHTI LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC INYALTQLNL PNVAMYLDAG HAGWLGWPAN LSPAAQLFAS VYKNAGSPAA LRGLATNVAN YNAWSISTCP SYTQGDSVCD EKRYINALAP LLKAQGFPDA HFIVDTGRNG KQPTGQQAWG DWCNVIGTGF GVRPTTNTGD ALLDAFVWVK PGGESDGTSD TSAARYDAHC GYSDALQPAP EAGTWFQAYF EQLLTNANPS F
[0305] SEQ ID NO: 195 (node #83) QNCQTLWGQC GGQGWTGATS CVAGAACSTL NPYYAQCLPA TATTTTSTTT TTTTSSSTTT TSAATATTTT TPTSTTTTTS APSSVTTTAT ASGPFSGYQL YVNPYYSSEV QSLAIPSLTD GSLAPAATAA AKVPSFVWLD TAAKVPTMGT YLADIRSQNA AGANPPIAGQ FWYDLPDRD CAALASNGEF SIADNGVEHY KAYI DS IREI LVKYSDVHTI LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC TNYALTQLNL PNVAMYLDAG HAGWLGWPAN LQPAANLFAS VYKNAGSPAA LRGLATNVAN YNAWSISTCP SYTQGNSVCD EKQYINALAP LLKAQGFPDA HFIVDTGRNG KQPTGQQAWG DWCNVIGTGF GVRPTTNTGD SLLDAFVWVK PGGESDGTSD TSAARYDAHC GYSDALQPAP EAGTWFQAYF VQLLTNANPS F
[0306] SEQ ID NO: 196 (node #84)
QNCQTLWGQC GGQGYTGATS CVAGATCSTL NPYYAQCTPA TASTTTSTTT TTTTSSSTTT TSAATATTTT TATSTPSTSS APSSPTTTAT ASGPFSGYQL YVNPYYSSEV QSLAIPSLTD GSLAAAATAA AKVPSFVWLD TAAKVPTMGD YLADIKSQNA AGANPPIAGQ FWYDLPDRD CAALASNGEF SIADNGVEHY KAYI DS IREI LVKYSDVHTI LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC TNYALTQLNL PNVAMYLDAG HAGWLGWPAN LQPAADLFAS VYKNASSPAA VRGLATNVAN YNAWSISSCP SYTQGNSVCD EKQYINALAP LLKAQGFPDA HFIVDTGRNG KQPTGQQAWG DWCNVINTGF GVRPTTDTGD ALVDAFVWVK PGGESDGTSD TSAARYDAHC GYSDALQPAP EAGTWFQAYF VQLLTNANPS F
[0307] SEQ ID NO: 197 (node #85)
QNCQTLWGQC GGQGYSGATS CVAGATCSTI NEYYAQCTPA TASTTTATTT TTTTSSSTTT TTAAVATTTT TATSTPSASA SPSSPTTTAS ASGPFSGYQL YVNPYYSSEV ASLAI PSLTD GSLQAAATAA AKVPSFVWLD TAAKVPTMGD YLADIKSQNA AGANPPIAGQ FWYDLPDRD CAALASNGEY SIADNGVEHY KAYI DS IREI LVQYSDVHTL LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC TNYALTQLNL PNVAMYLDAG HAGWLGWPAN QQPAADLFAS VYKNASSPAA VRGLATNVAN YNAWTISSCP SYTQGNSVCD EQQYINAIAP LLQAQGFPDA HFIVDTGRNG KQPTGQQAWG DWCNVINTGF GVRPTTDTGD ALVDAFVWVK PGGESDGTSD SSATRYDAHC GYSDALQPAP EAGTWFQAYF VQLLTNANPA F
[0308] SEQ ID NO: 198 (node #86)
QNCQTVWGQC GGQGWTGPTS CVAGAACSTL NPYYAQCLPG TATTTTTTTT TSTTTSSTTT TPTTTTTTTS APSSVTTTAT ASGPFSGYQL YANPYYSSEV HTLAIPSLTD SSLAPKASAV AKVPSFVWLD TAAKVPTMGT YLADIRAKNA AGANPPIAGI FWYDLPDRD CAALASNGEY S IANNGVANY KAYI DAIRAQ LVKYSDVHTI LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC INYALKQLNL PNVAMYLDAG HAGWLGWPAN LGPAAQLFAS VYKDAGSPAA LRGLATNVAN YNAWSISTCP SYTQGDSNCD EKRYINALAP LLKAQGFPDA HFIMDTSRNG VQPTKQQAWG DWCNVIGTGF GVRPTTNTGD PLQDAFVWVK PGGESDGTSD TSSARYDAHC GYSDALQPAP EAGTWFQAYF EQLLTNANPS F
[0309] SEQ ID NO: 199 (node #87)
QNCQTVWGQC GGQGWSGPTS CVAGAACSTL NPYYAQCIPG AATATTTTTT TATTTTSTTT TSTTTSQTTT KPTTTGPTTS APSGATTTVT ASGPFSGYQL YANPYYSSEV HTLAMPSLPD SSLQPKASAV AEVPSFVWLD VAAKVPTMGT YLADIQAKNK AGANPPIAGI FWYDLPDRD CAALASNGEY S IANNGVANY KAYI DAIRAQ LVKYSDVHTI LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC VDYALKQLNL PNVAMYLDAG HAGWLGWPAN LGPAATLFAK VYTDAGSPAA LRGLATNVAN YNAWSLSTCP SYTQGDPNCD EKKYINAMAP LLKEAGFPDA HFIMDTSRNG VQPTKQNAWG DWCNVIGTGF GVRPSTNTGD PLQDAFVWIK PGGESDGTSN SSSARYDAHC GYSDALQPAP EAGTWFQAYF EQLLTNANPS F
[0310] SEQ ID NO: 200 (node #88) QNCQTVWGQC GGIGWSGPTS CVAGAACSTQ NPYYAQCLPG TATTTTTTTT TSTTTSSTTT TPSTGTTTTS APSSTTITAT PSGPFSGYQL YANPYYSSEV HTLAI PSLAD SSLAPKASAV AKVPSFVWLD TAAKVPTMGT YLADIKAKNA AGANPPIAGI FWYDLPDRD CAALASNGEY SIANGGVANY KKYI DAIRAQ LLKYPDVHTI LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC INYALKQLNL PNVAMYLDAG HAGWLGWPAN IGPAAQLFAS VYKDAGSPAA LRGLATNVAN YNAWSISTCP SYTQGDSNCD EKRYINALAP LLKAQGFPDA HFIMDTSRNG VQPTKQQAWG DWCNVIGTGF GVRPTTNTGD SLQDAFVWVK PGGESDGTSD TSSARYDAHC GYSDALQPAP EAGTWFQAYF EQLLTNANPS F
[0311] SEQ ID NO: 201 (node #89)
QNCQTVWGQC GGIGWSGPTS CVAGAACSTQ NPYYAQCLPG TATATTTTTT TTTTTSSTTT SAGSGSTTTS APSGTTITAS PSGPFSGYQL YANPYYSSEV HTLAI PSLAD SSLAPKASAV AKVPSFVWLD TAAKVPTMGT YLADIKAKNA AGANPPIAGI FWYDLPDRD CAALASNGEY SIANGGVANY KKYI DAIRAQ LLKYPDVHTI LVIEPDSLAN LVTNLNVAKC SGAQDAYLEC INYALKQLNL PNVAMYLDAG HAGWLGWPAN IGPAAELFAS VYKDAGSPAA LRGLATNVAN YNAWSISTCP SYTQGDQNCD EKRYINALAP LLKAQGFPDA HFIMDTSRNG VQPTKQQAWG DWCNVIGTGF GVRPTTDTGD ALQDAFVWVK PGGESDGTSD TSSARYDAHC GYSDALKPAP EAGTWFQAYF EQLLTNANPS F
[0312] SEQ ID NO: 202 (node #90)
QNCASVWGQC GGQGWTGATC CASGSTCVVQ NDYYSQCLPG SATTTTSSTS PTTTTSSTTT TSTTTSSTST TPPPATTTTT APSGASGTAT YSGPFSGVQL WANSYYASEV STLAIPSLSD GAMATKAAAV AKVPSFQWLD TAAKVPTMSS TLADIRAANK AGANPPYAGQ FWYDLPDRD CAAAASNGEY SIADNGVAKY KAYI DS IRAQ LVTYSDIRTI LVIEPDSLAN LVTNMNVAKC ANAQSAYLEC INYAITQLNL PNVAMYLDAG HAGWLGWPAN LQPAAQLFAN VYKNAGKPAA LRGLATNVAN YNAWNITSAP SYTQGDSNYD EKLYIHALSP LLTAQGWSDA HFITDQSRSG KQPTGQQAWG DWCNVIGTGF GVRPTTNTGD SLEDAFVWVK PGGECDGTSD TTAARYDYHC GLSDALQPAP EAGTWFQAYF VQLLTNANPS F
[0313] SEQ ID NO: 203 (node #91)
QACASVWGQC GGQGWSGATC CASGSTCVVQ NDYYSQCLPG SATTTTSSTR STTTTSSTTT SSTTTSSTST TPPPATTTTT APSGASGTAT YSGPFSGVNL WANSYYASEV STLAIPSLSD GAMATKAAAV AKVPSFQWLD TAAKVPTMSS TLADIRAANK AGGNPPYAGQ FWYDLPDRD CAAAASNGEY SIADNGVAKY KAYIDSIRAI LVTYSDIRTI LVIEPDSLAN LVTNMNVAKC ANAQSAYLEC INYAITQLNL PNVAMYLDAG HAGWLGWPAN LQPAAQLFAN VYKNAGKPAA LRGLATNVAN YNAWNITSAP SYTQGNSNYD EKLYIHALSP LLTAQGWSNA HFITDQGRSG KQPTGQQAWG DWCNVIGTGF GVRPTANTGD SLVDAFVWVK PGGECDGTSD TTAARYDYHC GLSDALQPAP EAGTWFQAYF VQLLTNANPS F
[0314] SEQ ID NO: 204 (node #92)
QACASVWGQC GGQGWSGATC CASGSTCVVS NDYYSQCLPG SATTSTSSTR SSTTTSSTRA SSTTTSSSST TPPPGSTTTP APPVGSGTAT YSGPFSGVNP WANSYYASEV SSLAIPSLSD GAMATAAAAV AKVPSFMWLD TLAKTPLMSS TLADIRAANK AGGNPPYAGQ FWYDLPDRD CAAAASNGEY SIADNGVAKY KNYIDTIRAI LVTYSDIRTI LVIEPDSLAN LVTNLSVAKC ANAQSAYLEC INYAITQLNL PNVAMYLDAG HAGWLGWPAN QQPAAQLFAN VYKNASSPRA LRGLATNVAN YNGWNITSAP SYTQGNSVYN EKLYIHAISP LLTQQGWSNA YFITDQGRSG KQPTGQQAWG DWCNVIGTGF GIRPSANTGD SLLDAFVWVK PGGECDGTSD TSAARYDYHC GLSDALQPAP EAGTWFQAYF VQLLTNANPS F
[0315] SEQ ID NO: 205 (node #93) QACSSVWGQC GGQNWSGPTC CASGSTCVYS NDYYSQCLPG SAASSSSSTR ASSTTSRARV SSTTSSSSSA TPPPGSTTTR VPPVGSGTAT YSGPFVGVTP WANAYYASEV SSLAIPSLTD GAMATAAAAV AKVPSFMWLD TLDKTPLMEQ TLADIRTANK NGGNPPYAGQ FWYDLPDRD CAALASNGEY SIADGGVAKY KNYIDTIRQI VVEYSDIRTI LVIEPDSLAN LVTNLGTPKC ANAQSAYLEC INYAITQLNL PNVAMYLDAG HAGWLGWPAN QDPAAQLFAN VYKNASSPRA LRGLATNVAN YNGWNITSPP SYTQGNAVYN EKLYIHAIGP LLANHGWSNA FFITDQGRSG KQPTGQQQWG DWCNVIGTGF GIRPSANTGD SLLDSFVWVK PGGECDGTSD SSAPRFDSHC ALPDALQPAP QAGAWFQAYF VQLLTNANPS F
[0316] SEQ ID NO: 206 (node #94)
QACSSVWGQC GGQNWSGPTC CASGSTCVYS NDYYSQCLPG SAASSSSSTR ASSTTSRARV SSTTSSSSSA TPPPGSTTTR VPPVGSGTAT YSGPFVGVTP WANAYYASEV SSLAIPSLTD GAMATAAAAV AKVPSFMWLD TLDKTPLMEQ TLADIRTANK NGGNPPYAGQ FWYDLPDRD CAALASNGEY SIADGGVAKY KNYIDTIRQI VVEYSDIRTL LVIEPDSLAN LVTNLGTPKC ANAQSAYLEC INYAVTQLNL PNVAMYLDAG HAGWLGWPAN QDPAAQLFAN VYKNASSPRA LRGLATNVAN YNGWNITSPP SYTQGNAVYN EKLYIHAIGP LLANHGWSNA FFITDQGRSG KQPTGQQQWG DWCNVIGTGF GIRPSANTGD SLLDSFVWVK PGGECDGTSD SSAPRFDSHC ALPDALQPAP QAGAWFQAYF VQLLTNANPS F
[0317] SEQ ID NO: 207 (node #95)
QACSSVWGQC GGQNWSGPTC CASGSTCVYS NDYYSQCLPG SAASSSSSTR ASSTTARARA SSTTSSRSSA TPPPGSSTTR VPPVGSGTAT YSGPFVGVTP WANAYYASEV SSLAIPSLTD GAMATAAAAV AKVPSFMWLD TFDKTPLMEQ TLADIRTANK NGGNPPYAGQ FWYDLPDRD CAALASNGEY SIADGGVDKY KNYIDTIRQI VVEYSDIRTL LVIEPDSLAN LVTNLGTPKC ANAQSAYLEC INYAVTQLNL PNVAMYLDAG HAGWLGWPAN QDPAAQLFAN VYKNASSPRA LRGLATNVAN YNGWNITSPP SYTQGNAVYN EQLYIHAIGP LLANHGWSNA FFITDQGRSG KQPTGQQQWG DWCNVIGTGF GIRPSANTGD SLLDSFVWIK PGGECDGTSD SSAPRFDSHC ALPDALQPAP QAGAWFQAYF VQLLTNANPS F
[0318] SEQ ID NO: 208 (node #96)
QACSSVWGQC GGQNWSGPTC CASGSTCVYS NDYYSQCLPG SAASSSSSTR AASTTSRARV SPTTSRSSSA TPPPGSTTTR VPPVGSGTAT YSGPFVGVTP WANAYYASEV SSLAIPSLTD GAMATAAAAV AKVPSFMWLD TLDKTPLMEQ TLADIRTANK NGGNPPYAGQ FWYDLPDRD CAALASNGEY SIADGGVAKY KNYIDTIRQI VVEYSDIRTL LVIEPDSLAN LVTNLGTPKC ANAQSAYLEC INYAVTQLNL PNVAMYLDAG HAGWLGWPAN QDPAAQLFAN VYKNASSPRA LRGLATNVAN YNGWNITSPP SYTQGNAVYN EKLYIHAIGP LLANHGWSNA FFITDQGRSG KQPTGQQQWG DWCNVIGTGF GIRPSANTGD SLLDSFVWVK PGGECDGTSD SSAPRFDSHC ALPDALQPAP QAGAWFQAYF VQLLTNANPS F
[0319] SEQ ID NO: 209 (node #97)
QACSSVWGQC GGQNWSGPTC CASGSTCVYS NDYYSQCLPG SAASSSSSTR AASTTSRARV SPTTSRSSSA TPPPGSTTTR VPPVGSGTAT YSGPFVGVTP WANAYYASEV SSLAIPSLTD GAMATAAAAV AKVPSFMWLD TLDKTPLMEQ TLADIRTANK NGGNPPYAGQ FWYDLPDRD CAALASNGEY SIADGGVAKY KNYIDTIRQI VVEYSDIRTL LVIEPDSLAN LVTNLGTPKC ANAQSAYLEC INYAVTQLNL PNVAMYLDAG HAGWLGWPAN QDPAAQLFAN VYKNASSPRA LRGLATNVAN YNGWNITSPP SYTQGNAVYN EKLYIHAIGP LLANHGWSNA FFITDQGRSG KQPTGQQQWG DWCNVIGTGF GIRPSANTGD SLLDSFVWVK PGGECDGTSD SSAPRFDSHC ALPDALQPAP QAGAWFQAYF VQLLTNANPS F
[0320] SEQ ID NO: 210 (node #98) QSCASVWGQC GGQGWSGATC CASGSTCVVQ NDYYSQCLPG SATTTTSSTR STTTTSSTTT SSTTTSTTST TTPPATTTTT APSGASGTAT YTGPFSGVNL WANSYYRSEV STLAIPSLSD GAMATKAAAV AKVPSFQWLD TAAKVPTMSG TLADIRAANK AGGNPPYAGQ FWYDLPDRD CAAAASNGEY SIADGGVAKY KAYIDSIRAI LVKYSDIRTI LVIEPDSLAN MVTNMNVAKC ANAQSAYKEC TNYAIKQLNL PNVAMYLDAG HAGWLGWPAN LQPAAQLFAK IYKDAGKPAA LRGLATNVAN YNAWNISSAP SYTQPNPNYD EKHYIEAFSP LLTAQGWSNA HFIVDQGRSG KQPTGQQAWG DWCNAIGTGF GVRPTANTGD ALVDAFVWVK PGGECDGTSD TTAARYDYHC GLSDALKPAP EAGTWFQAYF EQLLTNANPS F
[0321] SEQ ID NO: 211 (node #99)
QSCASVWGQC GGQGWSGATC CASGSTCVVQ NDFYSQCLPG SATTTTSSTR STTTTSVTTT SSTTTATTST STPPATTVTT APSGPSGTAT YTGPFSGVNL WANSYYRSEV STLAIPSLSD GAMATAAAKV AKVPSFQWMD TAAKVPLMDG TLADIRKANK AGGNPPYAGQ FWYDLPDRD CAAAASNGEY SIADDGVAKY KAYIDSIRAI LVKYSDIRTI LVIEPDSLAN MVTNMNVPKC ANAQSAYKEC TNYAIKQLNL PNVAMYLDAG HAGWLGWPAN LPPAAQLFAK IYKDAGKPSA LRGLATNVSN YNAWNISSAP SYTQPNPNYD EKHYIEAFSP LLTQEGWSDA KFIVDQGRSG KQPTGQQAWG DWCNAIGTGF GVRPTANTGD ALVDAFVWVK PGGESDGTSD TTAARYDYHC GLADALKPAP EAGTWFQAYF EQLLTNANPS F
[0322] SEQ ID NO: 212 (node #100)
QSCSNVWSQC GGQNWSGTPC CTSGNKCVKL NDFYSQCQPG SATSATSSTT SATTTSVTTT ATKTTATTTS STTSGTSVTS APAGPSGPPA ATDPFSGVDL WANNYYRSEV STLAIPKLSD GAMATAAAKV ADVPSFQWMD TYDHISLMEE TLADIRKANK AGGNPPYAGQ FWYDLPDRD CAAAASNGEY SLDKDGANKY KAYIAKIKGI LQNYSDTRI I LVIEPDSLAN MVTNMNVPKC ANAESAYKEL TIYAIKELNL PNVSMYLDAG HGGWLGWPAN LPPAAQLYAQ IYKDAGKPSR LRGLVTNVSN YNAWKLSSKP DYTESNPNYD EQRYINAFSP LLAQEGWSNA KFIVDQGRSG KQPTGQKAWG DWCNAPGTGF GLRPSANTGD ALVDAFVWVK PGGESDGTSD TSAARYDYHC GLDDALKPAP EAGTWFQAYF EQLLKNANPS F
[0323] SEQ ID NO: 213 (node #101)
QACASQWGQC GGQGWSGPTC CASGSTCVVQ NAFYSQCLPG SATTATSSTR STTTTSVTST SSTSTATTSV STPPATTVTT PPAGPSGGAT YTGPFAGVNL WANSYYRSEV STLAIPSLSD GALATAAAKV AKVPTFQWMD TAAKVPLMDG TLADIRKANK AGGNPPYAGQ FWYNLPDRD CAAAASNGEL SIADDGVAKY KAYIDSIRAI LVKYSDIRTI LVIEPDSLAN MVTNMNVPKC ANAQAAYKEC TNYAVKQLNL PNVAMYLDAG HAGWLGWPAN LPPAAALFAK IYKDAGKPKA LRGLATNVSN YNAWNISSAP SYTQPNPNYD EKHYIEAFAP LLSQEGWSDA KFIVDQGRSG KQPTGQQAWG DWCNAIGTGF GVRPTANTGH ALVDAFVWVK PGGESDGTSD TTAARYDYHC GLADALKPAP EAGTWFQAYF EQLLTNANPS F
[0324] SEQ ID NO: 214 (node #102)
QACASQWGQC GGQGWSGPTC CPSGTTCQLQ NAWYSQCLPG AATTAASSTR PATTSSVRST TVVNPPTTTV APPPGTTVAP PPAPPPGGAT YTGPFAGVNQ WANAYYRSEV SSLAVPSLSD GPLATAAAKV ADVPTFQWMD TTAKVPLIDG ALADIRRANA AGGNPPYAGI FWYNLPDRD CAAAASNGEL SIANDGINKY KAYIDSIRAV LLKYNDIRTL LVIEPDSLAN MVTNMGVAKC SNAAAAYKEC TKYAVQQLDL PHVAQYLDAG HAGWLGWPAN IGPAATIFTD IYKEAGKPKS LRGLATNVSN YNAWNASSPA PYTSPNPNYD EKHYVDAFAP LLRQNGWSDA KFI IDQGRSG KQPTGQQEWG HWCNALGTGF GLRPTSNTGH PDVDAFVWVK PGGEADGTSD TTAVRYDHFC GSASSMKPAP EAGTWFQAYF EQLLRNANPS F
[0325] SEQ ID NO: 215 (node #103) QACASQWGQC GGQGWTGPSC CAAGSVCTVS NPFYSQCLPG STVASSTSTV RTSSTPVVSP SRTSTVTGSV STTSAGTGTT PPAGPTGGAT YTGPFVGVNL WANSYYASEI STLAIPSLSD PALATAAAKV AKVPTFMWMD TRSKIPLVDA TLADIRKANQ AGANPPYAGE FWYNLPDRD CAAAASNGEL SIADGGVAKY KQYI DDIRAM VVKYSDIRII LTIEPDSLAN LVTNLNVPKC AGAQAAYLEG TNYAVTQLNL PNVAMYLDGG HAGWLGWPAN LPPAAAMYAK VYKDAGKPKA LRGLVTNVSN YNGYSISTAP SYTQGNANYD EKHYIEALAP LLSAEGWSDA KFIVDQGRSG KQPTGQLAWG DWCNAIGTGF GVRPTANTGS TLVDAFVWVK PGGESDGTSD TTAARYDLNC GKADALKPAP EAGTWFQAYF EQLLINANPA F
[0326] SEQ ID NO: 216 (node #104)
QSCGSVWGQC GGQGWSGATC CASGSTCVVQ NDYYSQCLPG SATTTTSSTR STTTTSSTTS SSTTTSTTST TAPPATTTTT APGGASSTAS YTGPFSGVNL WANNYYRSEV HTLAI PSLTD GAMATKAAAV AKVPSFQWLD TAAKVDTMSG TLADIRAANK AGGNPPYAGQ FWYDLPDRD CAAAASNGEF SIADGGVAKY KAYI DAIRKL LVKYSDIRTI LVIEPDSLAN MVTNMNVAKC ANAQSAYKEC TIYAIKQLNL PNVAMYLDAG HAGWLGWPAN LQPAAELFAK IYKDAGKPAA LRGLATNVAN YNAWNISSAP SYTSPNPNYD EKHYIEAFSP LLTAQGWSNA HFIVDQGRSG KQPTGQQEWG DWCNAIGTGF GVRPTANTGH ELVDAFVWVK PGGECDGTSD TTAARYDYHC GLSDALKPAP EAGQWFQAYF EQLLTNANPP F
[0327] SEQ ID NO: 217 (node #105)
QSCGSQWGQC GGQGWSGATC CASGSTCVVQ NDYYSQCLPG STTTTTSSTS STTTTSSTSS STTTTSTTST TAPPATTTTT APGGAGTTAS FTGPFSGVNL WANNYYASEV HTLAI PSLTD GAMATKAAAV AKVPSFQWLD IAAKVDTMPG TLADIRAANK AGGNPPYAAQ FWYDLPDRD CAAAASNGEF SIADGGVAKY KAYI DAIRKQ LVSYSDIRTI LVIEPDSLAN MVTNMNVPKC ANAQAAYREC TIYAIKQLNL PNVAMYLDGG HAGWLGWPAN LQPAADLFGK LYADAGKPSQ LRGMATNVAN YNAWNLTSAP SYTSPNPNYD EKHYIEAFSP LLAAKGWSNA HFIVDQGRSG KQPTGQQEWG HWCNAMGTGF GMRPSANTGS ELVDAFVWIK PGGECDGTSD TTAARFDHFC GMSDALKPAP EAGQWFQAYF EQLLTNANPP F
[0328] SEQ ID NO: 218 (node #106)
QNCGSVWGQC GGIGWSGATC CASGSTCVEQ NDYYSQCLPG SSTTTTSSTR STSTTSSTTS SSTSTSTTST TAPPVPTTTT IPGGASSTAS YTGPFSGVQL WANNYYRSEV HTLAI PSMTD GAMATKAAAV AKVPSFQWLD RNVTVDTMSG TLAEIRAANK AGANPPYAGQ FWYDLPDRD CAAAASNGEF SIANGGVANY KAYI DAIRKL LIQYSDIRTI LVIEPDSLAN MVTNMNVAKC ANAASAYKEC TIYAIKQLNL PNVAMYLDAG HAGWLGWPAN IQPAAELFAK IYKDAGKPAA VRGLATNVAN YNAWSISSAP SYTSPNPNYD EKHYIEAFSP LLTAAGFSPA HFIVDTGRSG KQPTGQQEWG DWCNAIGTGF GVRPTANTGH ELVDAFVWVK PGGECDGTSD TTAARYDYHC GLSDALKPAP EAGQWFQAYF EQLLTNANPP F
[0329] SEQ ID NO: 219 (node #107)
QNCGSAWSQC GGIGWSGATC CSSGNSCVEI NSYYSQCLPG ASTSPTSTSK VSSTTSKVTS SSAAQPITTT TAPSVPTTTT IAGGASSTAS FTGPFLGVQG WANSYYSSEI YNHAI PSMTD GSLAAQASAV AKVPTFQWLD RNVTVDTMKS TLEEIRAANK AGANPPYAAH FWYDLPDRD CAAAASNGEF SIANGGVANY KTYINAIRKL LIEYSDIRTI LVIEPDSLAN LVTNTNVAKC ANAASAYKEC TNYAITQLDL PHVAQYLDAG HGGWLGWPAN IQPAATLFAD IYKAAGKPKS VRGLVTNVSN YNGWSLSSAP SYTTPNPNYD EKKYIEAFSP LLNAAGFSPA QFIVDTGRSG KQPTGQIEQG DWCNAIGTGF GVRPTTNTGS SLADAFVWVK PGGESDGTSD TSATRYDYHC GLSDALKPAP EAGQWFQAYF EQLLKNANPA F
[0330] SEQ ID NO: 220 (node #108) QNCGSVWGQC GGIGWSGATC CASGSTCVEQ NDWYSQCLPG SSTTTTSSTR STSTTSSTTS SSTSTSTTST TTPPVPTTTT IPGGASSTAS YTGPFSGVQL WANNYYRSEV HTLAI PSMTD GAMATKAAAV AKVPSFQWLD RNVTVDTMSG TLAEIRAANK AGANPPYAGQ FWYDLPDRD CAAAASNGEF SIANGGAANY KAYI DAIRKL LIQYSDIRTI LVIEPDSLAN MVTNMNVAKC A AASTYKEL TVYALKQLNL PNVAMYLDAG HAGWLGWPAN IQPAAELFAK IYKDAGKPAA VRGLATNVAN YNAWSISSAP SYTSPNPNYD EKHYIEAFSP LLTAAGFSPA HFIVDTGRSG KQPTGQQEWG DWCNAIGTGF GVRPTANTGH ELVDAFVWVK PGGECDGTSD TTAARYDYHC GLSDALKPAP EAGQWFQAYF EQLLTNANPP F
[0331] SEQ ID NO: 221 (node #109)
QNCGSVWSQC GGIGWSGATC CASGSTCVEQ NDWYSQCLPN SSTTTSTSTR STSTSSSTTS SSTSTSTTST TAPPVPTTTS IPGGASSTAS YSGPFSGVQL WANDYYRSEV HTLAI PSMTD GAMATKAAAV AKVPSFQWLD RNVTIDTMAQ TLSQIRAANK AGANPPYAGQ FWYDLPDRD CAAAASNGEF SIANGGAANY KAYI DAIRKL IIQYSDIRII LVIEPDSLAN MVTNMNVAKC ANAASTYKEL TVYALKQLNL PNVAMYLDAG HAGWLGWPAN IQPAAELFAE IYKDAGKPAA VRGLATNVAN YNAWSIASAP SYTSPNPNYD EKHYIEAFSP LLTAAGFSPA HFIVDTGRNG KQPTGQQQWG DWCNVIGTGF GVRPTANTGH ELVDAFVWVK PGGECDGTSD TTAARYDYHC GLSDALQPAP EAGQWFQAYF EQLLTNANPP F
[0332] SEQ ID NO: 222 (node #110)
QNCGAVWTQC GGNGWQGPTC CASGSTCVAQ NEWYSQCLPN SPSSTSTSQR STSTSSSTTR SGSSTSSSST TPPPVSSPTS IPGGATSTAS YSGPFSGVRL FANDYYRSEV HNLAI PSMTD GTLAAKASAV AEVPSFQWLD RNVTIDTMVQ TLSQVRALNK AGANPPYAAQ LWYDLPDRD CAAAASNGEF SIANGGAANY RSYI DAIRKH IIEYSDIRII LVIEPDSMAN MVTNMNVAKC SNAASTYHEL TVYALKQLNL PNVAMYLDAG HAGWLGWPAN IQPAAELFAG IYNDAGKPAA VRGLATNVAN YNAWSIASAP SYTSPNPNYD EKHYIEAFSP LLNSAGFSPA RFIVDTGRNG KQPTGQQQWG DWCNVKGTGF GVRPTANTGH ELVDAFVWVK PGGESDGTSD TSAARYDYHC GLSDALQPAP EAGQWFQAYF EQLLTNANPP F
[0333] SEQ ID NO: 223 (node #111)
QNCGSVWGQC GGIGWNGATC CASGSTCVKQ NDWYSQCLPG SSTTTTSSTT STSTTSSTTS TSTSTSTTST TTPPVPTTTT IPGGASSTAS YTGPFSGVQL WANNYYRSEV HTLAI PSLTD GAMATKAAAV AEVPSFQWLD RNVTVDTFSG TLAEIRAANQ AGANPPYAGI FWYDLPDRD CAAAASNGEW SIANGGAANY KAYI DRIREL LIQYSDIRTI LVIEPDSLAN MVTNMNVAKC ANAASTYKEL TIYALKQLNL PNVAMYLDAG HAGWLGWPAN IQPAAELFAK IYKDAGRPAA VRGLATNVAN YNAWSISSAP SYTSPNPNYD EKHYIEAFSP LLTAAGFSPA HFIVDTGRSG KQPTGQLEWG HWCNAIGTGF GVRPTANTGH ELVDAFVWVK PGGECDGTSD TTAARYDYHC GLSDALKPAP EAGQWFQAYF EQLLTNANPP F
[0334] SEQ ID NO: 224 (node #112)
QNCGSVWGQC GGNGWNGATC CASGSTCVKQ NDWYSQCLPG SSTTTTPTST STSTSSSSRS TSTSTSTSST TTPPVATTTS IPGGASSTAS YTGPFSGVQL WANNYYRSEV HTLAI PSLTD GAMATKAAAV AEVPSFQWMD RNVTVDTFSG TLAEIRAANQ AGANPPYAGI FWYDLPDRD CAAAASNGEW SIANGGAANY KAYI DRIREL LIQYSDIRTI LVIEPDSLAN MVTNMNVAKC AGAASTYKEL TIYALKQLNL PNVAMYLDAG HAGWLGWPAN IQPAAELFAK IYKDAGRPAA VRGLATNVAN YNAWSISSAP SYTSPNPNYD EKHYVEAFSP LLTAAGFSPA HFITDTGRSG KQPTGQLEWG HWCNAIGTGF GQRPTANTGH DLVDAFVWIK PGGECDGTSD TTAARYDHHC GLADALKPAP EAGQWFQAYF EQLLTNANPP F
[0335] SEQ ID NO: 225 (node #113) QNCASVWGQC GGIGYNGPTC CQSGSTCVKQ NDWYSQCLPG SSTTTTSTTS STSTTSSTTS TSTSTSTTST TTAPAPTTTT IPGGASSTAS YNGPFSGVQL WANNYYRSEV HTLAIPSLTD PALAAKAAAV AEVPSFQWLD RNVTVDTFSG TLAEIRAANQ AGANPPYAGI FWYDLPDRD CAAAASNGEW S IAN GAN Y KAYI DRIREL LIQYSDIRTI LVIEPDSLAN MVTNMNVAKC SNAASTYKEL TIYALKQLNL PHVAMYMDAG HAGWLGWPAN IQPAAELFAK IYKDAGRPAA VRGLATNVAN YNAWSISSPP SYTSPNPNYD EKHYIEAFSP LLTAQGFSPA QFIVDTGRSG KQPTGQLEWG HWCNAIGTGF GVRPTANTGH ELVDAFVWVK PGGECDGTSD TTAARYDYHC GLSDALKPAP EAGQWFQAYF EQLLTNANPP F
APPENDIX 5
[0336] SEQ ID NO: 226 (node #58)
QACAPGWGAC GGNGWTGDTS CTGGGSCGVA NPTYSGCWAG SPTSTTNTDP TTTSSGNVGT TSTSNPTTTS SPPTTPTTTS APGGGTCTPT VSGPFSGVQL YLNPYYVAEV DALAIAQLTD AALKAKAEKV KQIPTAVWLD TIENVPQLPT YLDDALALQK SLGKKPVLVL FWYDLPNRD CHAAASNGEL HIDDNGLQRY KGYIDPIEEQ LKKYSDQRVV LVIEPDSLAN LVTNLNAAKC ADAQASYKEG VAYAIKKLNL PHVYMYLDAG HAGWLGWPDN QEKAAKVFAE VIKNAGSPAK VRGFATNVAN YTALSDTSRS PDTQGNPCFD EFHYI DAMAS ALRAQGLGDA HFI IDTSRNG VKNIGRQQWG DWCNVKGAGL GARPKANGGA SLLDAFVWIK PPGESDGTGD EGAPRYDLYC GREDALQPAP EAGQWFQEYF VQLLKNANPP L
[0337] SEQ ID NO: 227 (node #59)
QACAPGWGAC GGNGWTGDTS CTGGGSCGVA NPTYSGCWAG SPTSTTNTDS TTTSSGNVGT TSTSNPTTTS SPPTTPTTTS APSGSTSTPT VSGPFSGVQL YLNPYYVAEV DALAIAQLTD AALKAKAEKV KQIPTAVWLD TIENVPQLPT YLDDALALQK SSGKKPVLVL FWYDLPNRD CHAAASNGEL HIDDNGLQRY KGYIDSIEEQ LKKYSDQRVV LVIEPDSLAN LVTNLNAAKC ADAQASYKEG VAYAIKKLNL PHVYMYLDAG HAGWLGWPDN QEKAAKVFAE VIKNAGSPAK VRGFATNVAN YTALSDTSRS PDTQGNPCFD EFHYI DAMAS ALRAQGLSDA HFI IDTSRNG VKNIGRQQWG DWCNVKGAGL GARPKANSGA SLLDAFVWIK PPGESDGTSD ESAPRYDSYC GREDALQPAP EAGQWFQEYF VQLLKNANPP L
[0338] SEQ ID NO: 228 (node #60)
QACAPGWGAC GGNEWTDDTS CTGGGSCGVA NPTYSGCWAG SPTSTTNTDV TYTDSDNVGK WGVENPTTTS SPPTTPTTTS APSGSTSTPT VSGPFSGVQL YLNPYYVAEV DALAIAQLTD AALKAKAEKV KQIPTAVWLD TIENVPQLPT YLDDALALQK SSGKKPVLW FWYDLPNRD CHAAASNGEL HIDDNGLQRY KGYIDSIEEQ LKKYSDQRVV LI IEPDSLAN LVTNLNAAKC ADAQASYKEG VAYAIKKFGL PHVYMYLDAG HAGWLGWPDN QEKAAKVFAE VIKNAGSPGK VRGFATNVAN YTALSDTSRS PDTQGNPCFD EFHYI DAMAS ALRAQGLSDV HFI IDTSRNG VKNIGRQQSG DWCNVKGAGL GARPKANSGA SLLDAFVWIK PPGESDGTSD ESAPRYDSYC GREDALQPAP EAGQWFQEYF VQLLKNANPP L
[0339] SEQ ID NO: 229 (node #61)
LACAPGYPCC SGNEYTDDDG VENGNWCGIA DPTYESCWAE SPCSTTNTEV EYTDSDNVGK WGVENPIPTG TPTPTPTSGS DPSTPGSPLT ISGPFSGVEF YLNPYYVAEV DALAIAQMTD SSLKAKAEKM KTFSNAIWLD TIKNMQQLET NLKGALAQHQ SSGKKPVLTV FWYDLPGRD CHALASNGEL LANDSDLQRY KSYIDVIEEQ LKKYNSQPVV LI IEPDSLAN LVTNLNTPAC ADSEQYYLEG HAYLIKKFGL PHVAMYLDIG HAFWLGWDDN REKAAKVYAK VIKSSGSPGK VRGFTDNVAN YTPWEDPSRG PDTEWNPCPD EKRYLEAMHK DFKAAGISSV YFVSDTSRNG HKNVDRKHPG EWCNQTGVGI GARPKANSGM DYLDAFYWIK PLGESDGTSD ESAARYDGYC GHETAMKPAP EAGQWFQKHF EQGLKNANPP L
[0340] SEQ ID NO: 230 (node #62)
QACAPVWGQC GGNGWTGATS CVSGSTCTVI NPWYSQCLPG SATTTTTTSS PTTTTSSVST TSTSTSTTTS SPPTTPTTTS APSGSTSTPT ASGPFSGYQL YLNPYYAAEV DALAIAQLTD AALKAKAAKV KQIPTFVWLD TIAKVPTLGT YLADASALQK SSGKKPVLVQ FWYDLPDRD CAAAASNGEF SIADNGLAKY KGYIDSIVAQ LKKYSDVRVV LVIEPDSLAN LVTNLNVAKC ANAQAAYKEG VTYAIKQLNL PNVYMYLDAG HAGWLGWPAN LSPAAQLFAQ VYKNAGSPAA VRGLATNVAN YNALSATSRS PVTQGNPNYD ELHYINALAP LLRSQGWSDA HFIVDTSRNG VQNIGRQQWG DWCNVKGAGF GVRPTTNTGS SLIDAFVWIK PGGESDGTSD TSAPRYDSHC GLSDALQPAP EAGTWFQAYF VQLLKNANPP L [0341] SEQ ID NO: 231 (node #63)
QACAPLWGQC GGNGWTGATT CVSGSTCTVI NPWYSQCLPG SATTTTTTSS PTTTTSSVST SSTSTSTSTS SPPTTPTTTS APSGSTSTPT ATGPFSGYQI YLSPYYAAEV DALAAAQISD ATLKAKALKV KQIPTFTWLD TIAKVPTLGT YLADASALQK SSGKKPQLVQ IWYDLPDRD CAAAASNGEF SIADNGLAKY KGYIDSIVAQ IKKYPDVRVV AVIEPDSLAN LVTNLNVAKC ANAQAAYKEG VTYALKQLNS VGVYMYLDAG HAGWLGWPAN LSPAAQLFAQ VYKNAGSPAF VRGLATNVAN YNALSATSRD PVTQGNPNYD ELHYINALAP LLRSQGWFDA HFIVDQGRSG VQNIGRQQWG DWCNVKGAGF GMRPTTNTGS SLIDAIVWIK PGGESDGTSD TSAPRYDSHC GLSDATQPAP EAGTWFQAYF VNLVKNANPP L
[0342] SEQ ID NO: 232 (node #64)
QACAPLYGQC GGNGWTGATT CVSGSTCTVI NPWYSQCLPG SATTTTTTSS PTTTTSSVST SSTSTSTSTS SPPTTPTTTS APSGSTSTPP ATGPFSGYQI YLSPYYAAEV DALAAAQISD ATLKAKALKV KQI PTFTWFD VIAKVPTLGT YLADASALQK SSGKKPQLVQ IWYDLPDRD CAAAASNGEF SIADNGLAKY KGYIDQIVAQ IKKYPDVRVV AVIEPDSLAN LVTNLNVAKC ANAQAAYKEG VTYALKQLNS VGVYMYLDAG HAGWLGWPAN LSPAAQLFAQ VYKNAGSPAF VRGLATNVAN YNALSATSRD PVTQGNPNYD ELHYINALAP LLRSQGWFDA HFIVDQGRSG VQNIGRQQWG DWCNVKGAGF GMRPTTNTGS SLIDAIVWIK PGGESDGTSD TSAPRYDSHC GLSDATQPAP EAGTWFQAYF VNLVKNANPP L
[0343] SEQ ID NO: 233 (node #65)
QACAPLYGQC GGNGWTGATT CVSGSTCTVI NDWYSQCLPG SATTTTTTSS PTTTTSSVST SSTSTSTSTS SPPTTPTTTS APSGSTSTPP AAGPFTGYQI YLSPYYAAEV QALAAAQISD ATLKAKALKV AQI PTFTWFD VIAKVPTLGT YLADASALQK SSGKKPQLVQ IWYDLPDRD CAALASNGEF SIANNGLANY KGYIDQIVAQ IKKYPDVRVV AVIEPDSLAN LVTNLNVAKC ANAQTAYKEG VTYALQQLNS VGVYMYLDAG HAGWLGWPAN LSPAAQLFAQ LYKNAGSPSF VRGLATNVAN YNALSATSPD PVTQGNPNYD ELHYINALAP LLQSQGFFPA HFIVDQGRSG VQNIGRQQWG DWCNVKGAGF GTRPTTNTGS SLIDAIVWIK PGGECDGTSD TSAPRYDSHC GLSDATQPAP EAGTWFQAYF ETLVKNANPP L
[0344] SEQ ID NO: 234 (node #66)
QACAPLYGQC GGNGWTGATT CVSGSTCTVI NDYYSQCLPG SATTTTTTSS PTTTTSSVST SSTSTSTSTS SPPTTPTTTS APSGSTSTPP AAGPFTGYQI YLSPYYAAEV QALAAAQISD ATLKAKAAKV ANI PTFTWFD VIAKVPTLGT YLADASALQK SSGKKPQLVQ IWYDLPDRD CAALASNGEF SIANNGLANY KNYIDQIVAQ IKKYPDVRVV AVIEPDSLAN LVTNLNVAKC ANAQTAYKEG VTYALQQLNS VGVYMYLDAG HAGWLGWPAN LSPAAQLFAQ LYKNAGSPSF VRGLATNVAN YNALSATSPD PITQGNPNYD ELHYINALAP LLQSQGFFPA HFIVDQGRSG VQNIGRQQWG DWCNVKGAGF GTRPTTNTGS SLIDAIVWVK PGGECDGTSD TSAPRYDSHC GLSDATQPAP EAGTWFQAYF ETLVKNANPP L
[0345] SEQ ID NO: 235 (node #67)
QACAPLYGQC GGIGWTGATT CVSGSTCTVI NDYYSQCLPG SATTTTTTSS PTTTTSSVST SSTSTSTSTS SPPTTPTTTS APSGSTSTTP AAGPFTGYQI YLSPYYAAEV AALAVAQISD PTLAAKAAKV ANI PTFTWFD WAKVPTLGT YLADASALQK SSGKKPQLVQ IWYDLPDRD CAALASNGEF SIANNGLANY KNYIDQIVAQ IKKYPDVRVV AVIEPDSLAN LVTNLNVAKC ANAQTAYKEG VTYALQQLNS VGVYMYLDAG HAGWLGWPAN LSPAAQLFAQ LYKNAGSPSF VRGLATNVAN YNALSATSPD PITQGNPNYD ELHYINALAP LLQSQGFFPA HFIVDQGRSG VQNIGRQQWG DWCNVKGAGF GTRPTTNTGS SLIDAIVWVK PGGECDGTSD TSAPRYDSHC GLSDATQPAP EAGTWFQAYF ETLVSNANPP L [0346] SEQ ID NO: 236 (node #68)
QACAPLWGQC GGIGWTGATT CVSGSTCTVI NDYYSQCLPG SATTTTTTSS PTTTTSSVST SSTSTSTSTS SPPTTPTTTS APSGSTSTTP AAGPFTGYQI YLSPYYAAEV AALAVAQISD PTLAAKAAKV ANIPTFTWFD WAKVPTLGT YLADASALQK SSGKKPQLVQ IWYDLPDRD CAALASNGEF SIANNGLANY KNYIDQIVAQ LKKYPDVRVV AVIEPDSLAN LVTNLNVAKC ANAQTAYKEG VTYALQQLNS VGVYMYLDAG HAGWLGWPAN LSPAAQLFAQ LYKNAGSPSF VRGLATNVAN YNALSAPSPD PITQGNPNYD ELHYINALAP LLQSQGFFPA HFIVDQGRSG VQNIGRQQWG DWCNVKGAGF GTRPTTNTGS SLIDAIVWVK PGGECDGTSD TSAPRYDSHC GLSDATQPAP EAGTWFQAYF ETLVSNANPP L
[0347] SEQ ID NO: 237 (node #69)
QACAPLYGQC GGIGWTGATT CVSGSTCTVI NDYYSQCLPG SATTTTVTSS PTTSASSVST SSTSTSTSTS STPTTPTTTS APSGSTSTTP AAGPFTGYQI YLSPYYAAEV AALAVAQISD PTLAAKAAKV ANI PTFTWLD QVAKVPDLGT YLADASALQK SSGKKPQLVQ IWYDLPDRD CAALASNGEF SIADNGQANY HNYIDQIVAQ IKKYPDVRVV AVIEPDSLAN LVTNLNVAKC ANAQTTYKEC VTYALQQLSS VGVYMYLDAG HAGWLGWPAN LSPAAQLFAQ LYKNAGSPSF VRGLATNVAN YNALSATSPD PITQGDPNYD ELHYINALAP LLQQQGFFPA HFIVDQGRSG VQNIGRQQWG DWCNVKGAGF GTRPTTNTGS SLIDAIVWVK PGGECDGTSN TSSPRYDSHC GLSDATQPAP EAGTWFQAYF ETLVSNANPP L
[0348] SEQ ID NO: 238 (node #70)
QACAPLYGQC GGIGWTGATT CVSGSTCTKI NDYYSQCLPG SATTTTVTSS PTTSASSVST SSTSTSSSTS STPTTPTTTS APSGSTSTTP AAGPFTGYQI YLSPYYAAEV AALAVAQITD PTLAAKAAKV ANI PTFTWLD QVAKVPDLGT YLADASALQK SSGQKPQLVQ IWYDLPDRD CAAKASNGEF SIADNGQANY HNYIDQIVAQ IKKYPDVRVV AVIEPDSLAN LVTNLNVAKC ANAQTTYKAC VTYALQQLSS VGVYMYLDAG HAGWLGWPAN LSPAAQLFAQ LYKNAGSSPF VRGLATNVAN YNALSATSPD PITQGDPNYD ELHYINALAP LLQQQGFFPA QFIVDQGRSG VQNIGRQQWG DWCNVKGAGF GTRPTTNTGS SLIDAIVWVK PGGECDGTSN SSSPRYDSHC SLSDATQPAP EAGTWFQAYF ETLVSNANPP L
[0349] SEQ ID NO: 239 (node #71)
QACAPLYGQC GGIGWTGATT CVSGSTCTKI NDYYSQCLPG SATTTTVTSS PTTSASSVST SSTSTSSSTS TTPTTPTSTS APSGSTSTTA AAGPFTGYQI YLSPYYAAEV QALAVANITD SALAAKAAKV ANI PTFTWLD QVAKVPDLGT YLADADALAK SSGQKPQLLQ IWYDLPDRD CAAKASNGEF SIADNGQANY HNYIDQIVAQ IKKYPDVRVV AVIEPDSMAN LVTNLNVAKC ANAATTYKAC VTYALQQLSS VGVYMYLDAG HAGWLGWPAN LSPAAQLFAQ LYKTAGSSPF FRGLATNVAN YNALSTTSPD PITQGDPNYD ELLYINALSP LLQQQGFFPA QFIVDQGRSG VQNIGRSAWG DWCNVKGAGF GTRPTTNTGS SLIDAIVWVK PGGECDGTSN SSSPRYDSHC SLSDALQPAP EAGTWFQAYF ETLVSNANPP L
[0350] SEQ ID NO: 240 (node #72)
QACAPVYGQC GGIGWTGATT CVSGSTCTKQ NDYYSQCLPG SATTTTVTSS PTTSASSVST SSTSTSSTTS STPTTPTTTS APSGSTSTTP AAGPFTGYQI YLSPYYAAEV AALAAAQITD PTLAAKAASV ANI PTFTWLD SVAKVPDLGT YLADASALQK SSGQKPQLVQ IWYDLPDRD CAAKASNGEF SIADNGQANY HNYIDQIVAQ IKKFPDVRVV AVIEPDSLAN LVTNLNVQKC ANAQTTYKAC VTYALKQLSS VGVYMYMDAG HAGWLGWPAN LSPAAQLFAQ LYKNAGSSPF VRGLATNVAN YNALSAASPD PITQGDPNYD ELHYINALAP LLQQQGFFPA QFIVDQGRSG VQNIGRQQWG DWCNIKGAGF GTRPTTNTGS SLIDAIVWVK PGGECDGTSN SSSPRYDSTC SLSDATQPAP EAGTWFQAYF ETLVSKANPP L
[0351] SEQ ID NO: 241 (node #73) QACAPVWGQC GGIGWTGPTT CVSGSTCTKQ NDYYSQCLPG SATTTTVTTS PTSSASSVST SSTSTSSTTS SSPTTPTTTS APSGSTSTPP AAGPWTGYQI YLSPYYANEV AALAAKQITD PTLAAKAASV ANI PTFTWLD SVAKIPDLGT YLADASALGK SSGQKPQLVQ IWYDLPDRD CAAKASNGEF SIADNGQANY QNYIDQIVAQ IKQFPDVRVV AVIEPDSLAN LVTNLNVQKC ANAKTTYLAC VNYALKQLSS VGVYMYMDAG HAGWLGWPAN LSPAAQLFAQ VYKNAGKSPF IKGLATNVAN YNALSAASPD PITQGDPNYD EIHYINALAP LLQQAGFFPA TFIVDQGRSG VQNIGRQQWG DWCNIKGAGF GTRPTTNTGS SLIDSIVWVK PGGECDGTSN SSSPRYDSTC SLSDATQPAP EAGTWFQAYF ETLVSKANPP L
[0352] SEQ ID NO: 242 (node #74)
QACASVWGQC GGQGWTGATS CVSGSTCVVL NPYYSQCLPG SATTTTTTTS PTTTTSSTTT TSTTTSSTTT TPPTTTTTTS APSGATSTAT ASGPFSGYQL YANPYYASEV SALAI PSLTD GAMAAKAAAV AKVPTFVWLD TAAKVPTMGT YLADIRALNK AGANPPVAGQ FWYDLPDRD CAAAASNGEY SIADNGLAKY KAYIDSIVAQ LKKYSDVRVI LVIEPDSLAN LVTNLNVAKC ANAQAAYLEG INYAITQLNL PNVAMYLDAG HAGWLGWPAN LSPAAQLFAQ VYKNAGSPAA VRGLATNVAN YNAWSITSCP SYTQGDSNYD EKLYINALAP LLTSQGWSDA HFIMDTSRNG VQPTGQQAWG DWCNVIGTGF GVRPTTNTGD ALEDAFVWIK PGGESDGTSD TSAARYDYHC GLSDALQPAP EAGTWFQAYF VQLLTNANPS F
[0353] SEQ ID NO: 243 (node #75)
QACASVWGQC GGQGWTGATS CVSGSTCVVL NPYYSQCLPG SATTTTTTTS PTTTTSSTTT TSTTTSSTTT TPPTTTTTTS APSGATSTAT ASGPFSGYQL YANPYYASEV STLAIPSLTD GAMAAKAAAV AKVPSFVWLD TAAKVPTMGT YLADIRAANK AGANPPIAGQ FWYDLPDRD CAAAASNGEY SIADNGVAKY KAYI DS IRAQ LVKYSDVRTI LVIEPDSLAN LVTNLNVAKC ANAQAAYLEC INYAITQLNL PNVAMYLDAG HAGWLGWPAN LSPAAQLFAQ VYKNAGSPAA LRGLATNVAN YNAWSITSCP SYTQGDSNYD EKLYINALAP LLTSQGWSDA HFIMDTSRNG VQPTGQQAWG DWCNVIGTGF GVRPTTNTGD ALEDAFVWVK PGGESDGTSD TSAARYDYHC GLSDALQPAP EAGTWFQAYF VQLLTNANPS F
[0354] SEQ ID NO: 244 (node #76)
QACQSVWGQC GGQGWTGATS CVAGSTCSTL NPYYAQCLPA TATTTTTTTS PTTTTSSTTT TSTTTSSTTT TPTTTTTTTS APSGATTTAT ASGPFSGYQL YANPYYSSEV HTLAIPSLTD GSLAAKATAV AKVPSFVWLD TAAKVPTMGT YLADIRAANK AGANPPIAGI FWYDLPDRD CAAAASNGEY S IANNGVANY KAYI DS IRAQ LVKYSDVHTI LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC INYALTQLNL PNVAMYLDAG HAGWLGWPAN LSPAAQLFAQ VYKNAGSPAA LRGLATNVAN YNAWSISSCP SYTQGDSNYD EKLYINALAP LLTSQGWSDA HFIMDTSRNG VQPTKQQAWG DWCNVIGTGF GVRPTTNTGD ALEDAFVWVK PGGESDGTSD TSAARYDYHC GYSDALQPAP EAGTWFQAYF VQLLTNANPS F
[0355] SEQ ID NO: 245 (node #77)
QACQSVWGQC GGQGWTGATS CAAGSTCSTL NPYYAQCIPA TATTATSTTL VTTTSSTSVG TSTATTSTTT PTTTTTTTT ASTTATTTAA ASGPFSGYQL YANPYYSSEV HTLAIPSLTD GSLAAAATKA AEIPSFVWLD TAAKVPTMGT YLANIQAANK AGASPPIAGI FWYDLPDRD CAAAASNGEY TVANNGVANY KAYIDSIVAQ LKAYPDVHTI LI IEPDSLAN MVTNLSTAKC AEAQSAYYEC VNYALINLNL ANVAMYIDAG HAGWLGWPAN LSPAAQLFAQ VYKNASSPAS LRGLATNVAN YNAWSISSAP SYTSGDSNYD EKLYINALSP LLTSNGWPNA HFIMDTSRNG VQPTKQQAWG DWCNVIGTGF GVQPTTNTGD PLEDAFVWVK PGGESDGTSN SSATRYDYHC GYSDALQPAP EAGTWFQAYF VQLLTNANPS L
[0356] SEQ ID NO: 246 (node #78) QACQSVWGQC GGQGWTGATS CAAGSTCSTL NPYYAQCIPA TATTATSTTL VKTTSSTSVG TSTATTSTTT PTTTTTTTT ASTTATTTAA ASGPFSGYQL YANPYYSSEV HTLAIPSLTD GSLAAAATKA AEIPSFVWLD TAAKVPTMGT YLANIEAANK AGASPPIAGI FWYDLPDRD CAAAASNGEY TVANNGVANY KAYIDSIVAQ LKAYPDVHTI LI IEPDSLAN MVTNLSTAKC AEAQSAYYEC VNYALINLNL ANVAMYIDAG HAGWLGWSAN LSPAAQLFAT VYKNASSPAS LRGLATNVAN YNAWSISSAP SYTSGDSNYD EKLYINALSP LLTSNGWPNA HFIMDTSRNG VQPTKQQAWG DWCNVIGTGF GVQPTTNTGD PLEDAFVWVK PGGESDGTSN SSATRYDYHC GYSDALQPAP EAGTWFQAYF VQLLTNANPS L
[0357] SEQ ID NO: 247 (node #79)
QACQSVWGQC GGQGWSGATS CAAGSTCSTL NPYYAQCIPG TATTATSTTL VKTTSSTSVG TSTATTSVGT TSPPTTTTTK ASTTATTTAA ASGPFSGYQL YANPYYSSEV HTLAIPSLTD GSLAAAATKA AEIPSFVWLD TAAKVPTMGT YLANIEAANK AGASPPIAGI FWYDLPDRD CAAAASNGEY TVANNGVANY KAYIDSIVAQ LKAYPDVHTI LI IEPDSLAN MVTNLSTAKC AEAQSAYYEC VNYALINLNL ANVAMYIDAG HAGWLGWSAN LSPAAQLFAT VYKNASAPAS LRGLATNVAN YNAWSISSPP SYTSGDSNYD EKLYINALSP LLTSNGWPNA HFIMDTSRNG VQPTKQQAWG DWCNVIGTGF GVQPTTNTGD PLEDAFVWVK PGGESDGTSN SSATRYDFHC GYSDALQPAP EAGTWFQAYF VQLLTNANPA L
[0358] SEQ ID NO: 248 (node #80)
QACQSVWGQC GGQGWTGATS CVAGATCSTL NPYYAQCLPA TATTTTTTTT PTTTTSSTTT TSTTTSSTTT TPTTTTTTTS APSGVTTTAT ASGPFSGYQL YANPYYSSEV HTLAIPSLTD GSLAPKATAV AKVPSFVWLD TAAKVPTMGT YLADIRAQNA AGANPPIAGI FWYDLPDRD CAAAASNGEY S IANNGVANY KAYI DS IRAQ LVKYSDVHTI LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC INYALTQLNL PNVAMYLDAG HAGWLGWPAN LSPAAQLFAS VYKNAGSPAA LRGLATNVAN YNAWSISSCP SYTQGDSNCD EKRYINALAP LLKEQGFSDA HFIMDTSRNG VQPTKQQAWG DWCNVIGTGF GVRPTTNTGD ALEDAFVWVK PGGESDGTSD TSAARYDYHC GYSDALQPAP EAGTWFQAYF EQLLTNANPS F
[0359] SEQ ID NO: 249 (node #81)
QACQTVWGQC GGQGWTGATS CVAGAACSTL NPYYAQCLPA TATTTTTTTT TTTTTSSTTT TSTTTSSTTT TPTTTTTTTS APSGVTTTAT ASGPFSGYQL YANPYYSSEV HTLAIPSLTD GSLAPKATAV AKVPSFVWLD TAAKVPTMGT YLADIRAQNA AGANPPIAGI FWYDLPDRD CAALASNGEY S IANNGVANY KAYI DS IRAQ LVKYSDVHTI LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC INYALTQLNL PNVAMYLDAG HAGWLGWPAN LSPAAQLFAS VYKNAGSPAA LRGLATNVAN YNAWSISTCP SYTQGDSNCD EKRYINALAP LLKEQGFSDA HFIMDTSRNG VQPTKQQAWG DWCNVIGTGF GVRPTTNTGD ALQDAFVWVK PGGESDGTSD TSAARYDAHC GYSDALQPAP EAGTWFQAYF EQLLTNANPS F
[0360] SEQ ID NO: 250 (node #82)
QACQTLWGQC GGQGWTGATS CVAGAACSTL NPYYAQCLPA TATTTTTTTT TTTTTSSTTT TSATTSSTTT TPTTTTTTTS APSGVTTTAT ASGPFSGYQL YANPYYSSEV HSLAI PSLTD GSLAPAATAV AKVPSFVWLD TAAKVPTMGT YLADIRAQNA AGANPPIAGI FWYDLPDRD CAALASNGEY S IANNGVANY KAYIDSIRAI LVKYSDVHTI LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC INYALTQLNL PNVAMYLDAG HAGWLGWPAN LSPAAQLFAS VYKNAGSPAA LRGLATNVAN YNAWSISTCP SYTQGDSVCD EKRYINALAP LLKEQGFSDA HFIVDTGRNG KQPTGQQAWG DWCNVIGTGF GVRPTTNTGD ALLDAFVWVK PGGESDGTSD TSAARYDAHC GYSDALQPAP EAGTWFQAYF EQLLTNANPS F
[0361] SEQ ID NO: 251 (node #83) QACQTLWGQC GGQGWTGATS CVAGAACSTL NPYYAQCLPA TATTTTSTTT TTTTSSSTTT TSAATATTTT TPTSTTTTTS APSSVTTTAT ASGPFSGYQL YVNPYYSSEV QSLAIPSLTD GSLAPAATAA AKVPSFVWLD TAAKVPTMGT YLADIRSQNA AGANPPIAGQ FWYDLPDRD CAALASNGEY SIADNGVEHY KAYI DS IREI LVKYSDVHTI LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC TNYALTQLNL PNVAMYLDAG HAGWLGWPAN LQPAADLFAS VYKNAGSPAA LRGLATNVAN YNAWSISTCP SYTQGNSVCD EKQYINALAP LLKAQGFSDA HFIVDTGRNG KQPTGQQAWG DWCNVIGTGF GVRPTTNTGD ALLDAFVWVK PGGESDGTSD TSAARYDAHC GYSDALQPAP EAGTWFQAYF VQLLTNANPS F
[0362] SEQ ID NO: 252 (node #84)
QACQTLWGQC GGQGYTGATS CVAGAACSTL NPYYAQCTPA TASTTTSTTT TTTTSSSTTT TSAATATTTT TATSTPSTTS APSSPTTTAT ASGPFSGYQL YVNPYYSSEV QSLAIPSLTD GSLAAAATAA AKVPSFVWLD TAAKVPTMGD YLADIKSQNA AGANPPIAGQ FWYDLPDRD CAALASNGEY SIADNGVEHY KAYI DS IREI LVKYSDVHTI LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC TNYALTQLNL PNVAMYLDAG HAGWLGWPAN LQPAADLFAS VYKNASSPAA VRGLATNVAN YNAWSISSCP SYTQGNSVCD EKQYINALAP LLKAQGFSDA HFIVDTGRNG KQPTGQQAWG DWCNVINTGF GVRPTTDTGD ALVDAFVWVK PGGESDGTSD TSAARYDAHC GYSDALQPAP EAGTWFQAYF VQLLTNANPS F
[0363] SEQ ID NO: 253 (node #85)
QACQTLWGQC GGQGYSGATS CVAGATCSTI NEYYAQCTPA TASTTTSTTT TTTTSSSTTT TTAAVATTTT TATSTPSASA SPSSPTTTAS ASGPFSGYQL YVNPYYSSEV ASLAI PSLTD GSLQAAATAA AKVPSFVWLD TAAKVPTMGD YLADIKSQNA AGANPPIAGQ FWYDLPDRD CAALASNGEY SIADNGVEHY KAYI DS IREI LVQYSDVHTL LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC TNYALTQLNL PNVAMYLDAG HAGWLGWPAN QQPAADLFAS VYKNASSPAA VRGLATNVAN YNAWTISSCP SYTQGNSVCD EQQYINAIAP LLQAQGFSDA HFIVDTGRNG KQPTGQQAWG DWCNVINTGF GVRPTTDTGD ALVDAFVWVK PGGESDGTSD SSATRYDAHC GYSDALQPAP EAGTWFQAYF VQLLTNANPA F
[0364] SEQ ID NO: 254 (node #86)
QACQTVWGQC GGQGWSGPTS CVAGAACSTL NPYYAQCLPG TATTTTTTTT TSTTTSSTTT TPTTTTTTTS APSGTTITAT ASGPFSGYQL YANPYYSSEV HTLAIPSLTD SSLAPKASAV AKVPSFVWLD TAAKVPTMGT YLADIRAKNA AGANPPIAGI FWYDLPDRD CAALASNGEY S IANNGVANY KAYI DAIRAQ LVKYSDVHTI LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC INYALKQLNL PNVAMYLDAG HAGWLGWPAN LGPAAQLFAS VYKDAGSPAA LRGLATNVAN YNAWSISTCP SYTQGDSNCD EKRYINALAP LLKEQGFSDA HFIMDTSRNG VQPTKQQAWG DWCNVIGTGF GVRPTTNTGD ALQDAFVWVK PGGESDGTSD TSSARYDAHC GYSDALQPAP EAGTWFQAYF EQLLTNANPS F
[0365] SEQ ID NO: 255 (node #87)
QACQTVWGQC GGQGWSGPTS CVAGAACSTL NPYYAQCIPG AATSTTTTTT TATTTTSTTT TSTTTSQTTT KPTTTGPTTS APSGTTITVT ASGPFSGYQL YANPYYSSEV HTLAMPSLPD SSLQPKASAV AEVPSFVWLD VAAKVPTMGT YLADIQAKNK AGANPPIAGI FWYDLPDRD CAALASNGEY S IANNGVANY KAYI DAIRAQ LVKYSDVHTI LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC VDYALKQLNL PNVAMYLDAG HAGWLGWPAN LGPAATLFAK VYTDAGSPAA LRGLATNVAN YNAWSLSTCP SYTQGDPNCD EKKYINAMAP LLKEAGFSDA HFIMDTSRNG VQPTKQNAWG DWCNVIGTGF GVRPSTNTGD PLQDAFVWIK PGGESDGTSN SSSARYDAHC GYSDALQPAP EAGTWFQAYF EQLLTNANPS F
[0366] SEQ ID NO: 256 (node #88) QACQTVWGQC GGIGWSGPTS CVAGAACSTQ NPYYAQCLPG TATTTTTTTT TSTTTSSTTT TASTGTTTTS APSGTTITAT PSGPFSGYQL YANPYYSSEV HTLAI PSLTD SSLAPKASAV AKVPSFVWLD TAAKVPTMGT YLADIRAKNA AGANPPIAGI FWYDLPDRD CAALASNGEY SIANGGVANY KKYI DAIRAQ LLKYPDVHTI LVIEPDSLAN LVTNLNVAKC ANAQSAYLEC INYALKQLNL PNVAMYLDAG HAGWLGWPAN IGPAAQLFAS VYKDAGSPAA LRGLATNVAN YNAWSISTCP SYTQGDSNCD EKRYINALAP LLKEQGFSDA HFIMDTSRNG VQPTKQQAWG DWCNVIGTGF GVRPTTNTGD ALQDAFVWVK PGGESDGTSD TSSARYDAHC GYSDALQPAP EAGTWFQAYF EQLLTNANPS F
[0367] SEQ ID NO: 257 (node #89)
QACQTVWGQC GGIGWSGPTS CVAGAACSTQ NPYYAQCLPG TATTTTTTTT TTTTTSSTTT SAGSGTTTTS APSGTTITAS PSGPFSGYQL YANPYYSSEV HTLAI PSLTD SSLAPKASAV AKVPSFVWLD TAAKVPTMGT YLADIRAKNA AGANPPIAGI FWYDLPDRD CAALASNGEY SIANGGVANY KKYI DAIRAQ LLKYPDVHTI LVIEPDSLAN LVTNLNVAKC SGAQDAYLEC INYALKQLNL PNVAMYLDAG HAGWLGWPAN IGPAAELFAS VYKDAGSPAA LRGLATNVAN YNAWSISTCP SYTQGDQNCD EKRYINALAP LLKAQGFSDA HFIMDTSRNG VQPTKQQAWG DWCNVIGTGF GVRPTTDTGD ALQDAFVWVK PGGESDGTSD TSSARYDAHC GYSDALKPAP EAGTWFQAYF EQLLTNANPS F
[0368] SEQ ID NO: 258 (node #90)
QACASVWGQC GGQGWTGATC CASGSTCVVQ NDYYSQCLPG SATTTTSSTS PTTTTSSTTT SSTTTSSTST TPPPATTTTS APSGASGTAT YSGPFSGVQL WANSYYASEV STLAIPSLTD GAMAAKAAAV AKVPSFQWLD TAAKVPTMSS TLADIRAANK AGANPPYAGQ FWYDLPDRD CAAAASNGEY SIADNGVAKY KAYI DS IRAQ LVKYSDIRTI LVIEPDSLAN LVTNMNVAKC ANAQAAYLEC INYAITQLNL PNVAMYLDAG HAGWLGWPAN LQPAAQLFAN VYKNAGKPAA LRGLATNVAN YNAWNITSAP SYTQGDSNYD EKLYIQALSP LLTSQGWSDA HFITDQSRSG KQPTGQQAWG DWCNVIGTGF GVRPTTNTGD ALEDAFVWVK PGGECDGTSD TSAARYDYHC GLSDALQPAP EAGTWFQAYF VQLLTNANPS F
[0369] SEQ ID NO: 259 (node #91)
QACASVWGQC GGQGWSGATC CASGSTCVVQ NDYYSQCLPG SATTTTSSTS STTTTSSTTT SSTTTSSTST TPPPGTTTTS APSGASGTAT YSGPFSGVNL WANSYYASEV STLAIPSLTD GAMATKAAAV AKVPSFQWLD TAAKVPTMSS TLADIRAANK AGGNPPYAGQ FWYDLPDRD CAAAASNGEY SIADNGVAKY KAYIDSIRAI LVKYSDIRTI LVIEPDSLAN LVTNMNVAKC ANAQAAYLEC INYAITQLNL PNVAMYLDAG HAGWLGWPAN LQPAAQLFAN VYKNAGKPAA LRGLATNVAN YNAWNITSAP SYTQGNSNYD EKLYIQALSP LLTQQGWSNA HFITDQGRSG KQPTGQQAWG DWCNVIGTGF GVRPTANTGD ALVDAFVWVK PGGECDGTSD TSAARYDYHC GLSDALQPAP EAGTWFQAYF VQLLTNANPS F
[0370] SEQ ID NO: 260 (node #92)
QACASVWGQC GGQGWSGATC CASGSTCVVS NDYYSQCLPG SATTTTSSTS SSTTTSSTRA SSTTTSSSST TPPPGSTTTS APPVGSGTAT YSGPFSGVNP WANSYYASEV SSLAIPSLTD GAMATAAAAV AKVPSFMWLD TLAKTPLMSS TLADIRAANK AGGNPPYAGQ FWYDLPDRD CAAAASNGEY SIADNGVAKY KNYIDTIRAI LVKYSDIRTI LVIEPDSLAN LVTNLSVAKC ANAQAAYLEC INYAITQLNL PNVAMYLDAG HAGWLGWPAN QQPAAQLFAN VYKNASSPRA LRGLATNVAN YNGWNITSAP SYTQGNSVYN EKLYIHAISP LLTQQGWSNA YFITDQGRSG KQPTGQQAWG DWCNVIGTGF GIRPSANTGD SLLDAFVWVK PGGECDGTSD TSAARYDYHC GLSDALQPAP EAGTWFQAYF VQLLTNANPS F
[0371] SEQ ID NO: 261 (node #93) QACSSVWGQC GGQNWSGPTC CASGSTCVYS NDYYSQCLPG SAASSSSSTR ASSTTSRARV SSTTSSSSSA TPPPGSTTTR VPPVGSGTAT YSGPFVGVTP WANAYYASEV SSLAIPSLTD GAMATAAAAV AKVPSFMWLD TLDKTPLMEQ TLADIRTANK NGGNPPYAGQ FWYDLPDRD CAALASNGEY SIADGGVAKY KNYIDTIRQI VVEYSDIRTI LVIEPDSLAN LVTNLGTPKC ANAQSAYLEC INYAITQLNL PNVAMYLDAG HAGWLGWPAN QDPAAQLFAN VYKNASSPRA LRGLATNVAN YNGWNITSPP SYTQGNAVYN EKLYIHAIGP LLANHGWSNA FFITDQGRSG KQPTGQQQWG DWCNVIGTGF GIRPSANTGD SLLDSFVWVK PGGECDGTSD SSAPRFDSHC ALPDALQPAP QAGAWFQAYF VQLLTNANPS F
[0372] SEQ ID NO: 262 (node #94)
QACSSVWGQC GGQNWSGPTC CASGSTCVYS NDYYSQCLPG SAASSSSSTR ASSTTSRARV SSTTSSSSSA TPPPGSTTTR VPPVGSGTAT YSGPFVGVTP WANAYYASEV SSLAIPSLTD GAMATAAAAV AKVPSFMWLD TLDKTPLMEQ TLADIRTANK NGGNPPYAGQ FWYDLPDRD CAALASNGEY SIADGGVAKY KNYIDTIRQI VVEYSDIRTL LVIEPDSLAN LVTNLGTPKC ANAQSAYLEC INYAVTQLNL PNVAMYLDAG HAGWLGWPAN QDPAAQLFAN VYKNASSPRA LRGLATNVAN YNGWNITSPP SYTQGNAVYN EKLYIHAIGP LLANHGWSNA FFITDQGRSG KQPTGQQQWG DWCNVIGTGF GIRPSANTGD SLLDSFVWVK PGGECDGTSD SSAPRFDSHC ALPDALQPAP QAGAWFQAYF VQLLTNANPS F
[0373] SEQ ID NO: 263 (node #95)
QACSSVWGQC GGQNWSGPTC CASGSTCVYS NDYYSQCLPG SAASSSSSTR ASSTTARARA SSTTSSRSSA TPPPGSSTTR VPPVGSGTAT YSGPFVGVTP WANAYYASEV SSLAIPSLTD GAMATAAAAV AKVPSFMWLD TFDKTPLMEQ TLADIRTANK NGGNPPYAGQ FWYDLPDRD CAALASNGEY SIADGGVDKY KNYIDTIRQI VVEYSDIRTL LVIEPDSLAN LVTNLGTPKC ANAQSAYLEC INYAVTQLNL PNVAMYLDAG HAGWLGWPAN QDPAAQLFAN VYKNASSPRA LRGLATNVAN YNGWNITSPP SYTQGNAVYN EQLYIHAIGP LLANHGWSNA FFITDQGRSG KQPTGQQQWG DWCNVIGTGF GIRPSANTGD SLLDSFVWIK PGGECDGTSD SSAPRFDSHC ALPDALQPAP QAGAWFQAYF VQLLTNANPS F
[0374] SEQ ID NO: 264 (node #96)
QACSSVWGQC GGQNWSGPTC CASGSTCVYS NDYYSQCLPG SAASSSSSTR AASTTSRARV SPTTSRSSSA TPPPGSTTTR VPPVGSGTAT YSGPFVGVTP WANAYYASEV SSLAIPSLTD GAMATAAAAV AKVPSFMWLD TLDKTPLMEQ TLADIRTANK NGGNPPYAGQ FWYDLPDRD CAALASNGEY SIADGGVAKY KNYIDTIRQI VVEYSDIRTL LVIEPDSLAN LVTNLGTPKC ANAQSAYLEC INYAVTQLNL PNVAMYLDAG HAGWLGWPAN QDPAAQLFAN VYKNASSPRA LRGLATNVAN YNGWNITSPP SYTQGNAVYN EKLYIHAIGP LLANHGWSNA FFITDQGRSG KQPTGQQQWG DWCNVIGTGF GIRPSANTGD SLLDSFVWVK PGGECDGTSD SSAPRFDSHC ALPDALQPAP QAGAWFQAYF VQLLTNANPS F
[0375] SEQ ID NO: 265 (node #97)
QACSSVWGQC GGQNWSGPTC CASGSTCVYS NDYYSQCLPG SAASSSSSTR AASTTSRARV SPTTSRSSSA TPPPGSTTTR VPPVGSGTAT YSGPFVGVTP WANAYYASEV SSLAIPSLTD GAMATAAAAV AKVPSFMWLD TLDKTPLMEQ TLADIRTANK NGGNPPYAGQ FWYDLPDRD CAALASNGEY SIADGGVAKY KNYIDTIRQI VVEYSDIRTL LVIEPDSLAN LVTNLGTPKC ANAQSAYLEC INYAVTQLNL PNVAMYLDAG HAGWLGWPAN QDPAAQLFAN VYKNASSPRA LRGLATNVAN YNGWNITSPP SYTQGNAVYN EKLYIHAIGP LLANHGWSNA FFITDQGRSG KQPTGQQQWG DWCNVIGTGF GIRPSANTGD SLLDSFVWVK PGGECDGTSD SSAPRFDSHC ALPDALQPAP QAGAWFQAYF VQLLTNANPS F
[0376] SEQ ID NO: 266 (node #98) QACASVWGQC GGQGWSGATC CASGSTCVVQ NDYYSQCLPG SATTTTSSTS STTTTSSTTT SSTTTSTTST TTPPGTTTTS APGGASGTAT YTGPFSGVNL WANSYYRSEV STLAI PSLTD GAMATKAAAV AKVPSFQWLD TAAKVPTMSG TLADIRAANK AGGNPPYAGQ FWYDLPDRD CAAAASNGEY SIADDGVAKY KAYIDSIRAI LVKYSDIRTI LVIEPDSLAN MVTNMNVPKC A AQAAYKEC TNYAIKQLNL PNVAMYLDAG HAGWLGWPAN LQPAAQLFAK IYKDAGKPAA LRGLATNVAN YNAWNITSAP SYTQPNPNYD EKHYIEAFSP LLTQEGWSNA HFIVDQGRSG KQPTGQQAWG DWCNAIGTGF GVRPTANTGH ALVDAFVWVK PGGECDGTSD TTAARYDYHC GLSDALKPAP EAGTWFQAYF EQLLTNANPS F
[0377] SEQ ID NO: 267 (node #99)
QACASVWGQC GGQGWSGATC CASGSTCVVQ NDFYSQCLPG SATTATSSTV STTTTSVTTT SSTTTATTST STPPGTTVTS APGGPSGTAT YTGPFSGVNL WANSYYRSEV STLAIPSLSD GAMATAAAKV AKVPSFQWMD TAAKVPLMDG TLADIRKANK AGGNPPYAGQ FWYDLPDRD CAAAASNGEY SIADDGVAKY KAYIDSIRAI LVKYSDIRTI LVIEPDSLAN MVTNMNVPKC ANAQAAYKEC TNYAIKQLNL PNVAMYLDAG HAGWLGWPAN LPPAAQLFAK IYKDAGKPAA LRGLATNVSN YNAWNISSAP SYTQPNPNYD EKHYIEAFSP LLTQEGWSNA KFIVDQGRSG KQPTGQQAWG DWCNAIGTGF GVRPTANTGH ALVDAFVWVK PGGESDGTSD TTAARYDYHC GLADALKPAP EAGTWFQAYF EQLLTNANPS F
[0378] SEQ ID NO: 268 (node #100)
QSCSNVWSQC GGQNWSGTPC CTSGNKCVKL NDFYSQCQPG SATSATSSTV SATTTSVTTT ATKTTATTSS STTSGTSVTS APGGPSGPPA ATDPFSGVDL WANNYYRSEV STLAIPKLSD GAMATAAAKV ADVPSFQWMD TYDHISLMEE TLADIRKANK AGGNPPYAGQ FWYDLPDRD CAAAASNGEY SLDKDGANKY KAYIAKIKGI LQNYSDTRI I LVIEPDSLAN MVTNMNVPKC ANAESAYKEL TIYAIKELNL PNVSMYLDAG HGGWLGWPAN LPPAAQLYAQ IYKDAGKPSR LRGLVTNVSN YNAWKLSSKP DYTESNPNYD EQRYINAFSP LLAQEGWSNA KFIVDQGRSG KQPTGQKAWG DWCNAPGTGF GLRPSANTGD ALVDAFVWVK PGGESDGTSD TSAARYDYHC GLDDALKPAP EAGTWFQAYF EQLLKNANPS F
[0379] SEQ ID NO: 269 (node #101)
QACASQWGQC GGQGWSGPTC CASGSTCVVQ NAFYSQCLPG SATTATSSTV STTTTSVTST SSTTTATTSV STPPGTTVTS PPGGPSGGAT YTGPFAGVNL WANSYYRSEV STLAIPSLSD GALATAAAKV AKVPTFQWMD TAAKVPLMDG TLADIRKANK AGGNPPYAGQ FWYNLPDRD CAAAASNGEL SIADDGVAKY KAYIDSIRAI LVKYSDIRTI LVIEPDSLAN MVTNMNVPKC ANAQAAYKEC TNYAVKQLNL PNVAMYLDAG HAGWLGWPAN LPPAAALFAK IYKDAGKPKA LRGLATNVSN YNAWNISSAP SYTQPNPNYD EKHYIEAFAP LLTQEGWSDA KFIVDQGRSG KQPTGQQAWG DWCNAIGTGF GVRPTANTGH ALVDAFVWVK PGGESDGTSD TTAARYDYHC GLADALKPAP EAGTWFQAYF EQLLTNANPS F
[0380] SEQ ID NO: 270 (node #102)
QACASQWGQC GGQGWSGPTC CPSGTTCQLQ NAWYSQCLPG AATTAASSTR PATTSSVRST TVVNPPTTTV APPPGTTVAP PPAPPPGGAT YTGPFAGVNQ WANAYYRSEV SSLAVPSLSD GPLATAAAKV ADVPTFQWMD TTAKVPLIDG ALADIRRANA AGGNPPYAGI FWYNLPDRD CAAAASNGEL SIANDGINKY KAYIDSIRAV LLKYNDIRTL LVIEPDSLAN MVTNMGVAKC SNAAAAYKEC TKYAVQQLDL PHVAQYLDAG HAGWLGWPAN IGPAATIFTD IYKEAGKPKS LRGLATNVSN YNAWNASSPA PYTSPNPNYD EKHYVDAFAP LLRQNGWSDA KFI IDQGRSG KQPTGQQEWG HWCNALGTGF GLRPTSNTGH PDVDAFVWVK PGGEADGTSD TTAVRYDHFC GSASSMKPAP EAGTWFQAYF EQLLRNANPS F
[0381] SEQ ID NO: 271 (node #103) QACASQWGQC GGQGWTGPSC CAAGSVCTVS NPFYSQCLPG STVASSTSTV RTSSTPVVSP SRTSTVTGSV STTSAGTGTT PPGGPTGGAT YTGPFVGVNL WANSYYASEI STLAIPSLSD PALATAAAKV AKVPTFMWMD TRSKIPLVDA TLADIRKANQ AGANPPYAGE FWYNLPDRD CAAAASNGEL SIADGGVAKY KQYI DDIRAM VVKYSDIRII LTIEPDSLAN LVTNLNVPKC AGAQAAYLEG TNYAVTQLNL PNVAMYLDGG HAGWLGWPAN LPPAAAMYAK VYKDAGKPKA LRGLVTNVSN YNGYSISTAP SYTQGNANYD EKHYIEALAP LLSAEGWSDA KFIVDQGRSG KQPTGQLAWG DWCNAIGTGF GVRPTANTGS TLVDAFVWVK PGGESDGTSD TTAARYDLNC GKADALKPAP EAGTWFQAYF EQLLINANPA F
[0382] SEQ ID NO: 272 (node #104)
QACGSVWGQC GGQGWSGATC CASGSTCVVQ NDYYSQCLPG SATTTTSSTS STTTTSSTTS SSTTTSTTST TTPPGTTTTS APGGASSTAS YTGPFSGVNL WANNYYRSEV HTLAI PSLTD GAMATKAAAV AKVPSFQWLD TAAKVDTMSG TLADIRAANK AGGNPPYAGQ FWYDLPDRD CAAAASNGEY SIADGGVAKY KAYIDSIRKL LVKYSDIRTI LVIEPDSLAN MVTNMNVPKC ANAQAAYKEC TIYAIKQLNL PNVAMYLDAG HAGWLGWPAN LQPAAELFAK IYKDAGKPAA LRGLATNVAN YNAWNITSAP SYTSPNPNYD EKHYIEAFSP LLTAEGWSNA HFIVDQGRSG KQPTGQQEWG DWCNAIGTGF GVRPTANTGH ELVDAFVWVK PGGECDGTSD TTAARYDYHC GLSDALKPAP EAGQWFQAYF EQLLTNANPP F
[0383] SEQ ID NO: 273 (node #105)
QACGSQWGQC GGQGWSGATC CASGSTCVVQ NDYYSQCLPG SATTTTSSTS STTTTSSTSS STTTTSTTST TTPPGTTTTS APGGAGTTAS FTGPFSGVNL WANNYYSSEV HTLAI PSLTD GAMATKAAAV AKVPSFQWLD IAAKVDTMPG TLADIRAANK AGGNPPYAAQ FWYDLPDRD CAAAASNGEY SIADGGVAKY KAYIDSIRKQ LVSYSDIRTI LVIEPDSLAN MVTNMNVPKC ANAQAAYREC TIYAIKQLNL PNVAMYLDGG HAGWLGWPAN LQPAADLFGK LYADAGKPSQ LRGMATNVAN YNAWNLTSAP SYTSPNPNYD EKHYIEAFSP LLAAKGWSNA HFIVDQGRSG KQPTGQQEWG HWCNAMGTGF GMRPSANTGH ELVDAFVWIK PGGECDGTSD TTAARFDHFC GMSDALKPAP EAGQWFQAYF EQLLTNANPP F
[0384] SEQ ID NO: 274 (node #106)
QNCGSVWGQC GGIGWSGATC CASGSTCVEQ NDYYSQCLPG SSTTTTSSTS STTTTSSTTS SSTSTSTTST TTPPVPTTTS IPGGASSTAS YTGPFSGVQL WANNYYRSEV HTLAI PSMTD GAMATKAAAV AKVPSFQWLD RNVTVDTMSG TLAEIRAANK AGANPPYAGQ FWYDLPDRD CAAAASNGEF SIANGGVANY KAYI DAIRKL LIQYSDIRTI LVIEPDSLAN MVTNMNVAKC ANAASAYKEC TIYAIKQLNL PNVAMYLDAG HAGWLGWPAN IQPAAELFAK IYKDAGKPAA VRGLATNVAN YNAWSISSAP SYTSPNPNYD EKHYIEAFSP LLTAAGFSPA HFIVDTGRSG KQPTGQQEWG DWCNAIGTGF GVRPTANTGH ELVDAFVWVK PGGECDGTSD TTAARYDYHC GLSDALKPAP EAGQWFQAYF EQLLTNANPP F
[0385] SEQ ID NO: 275 (node #107)
QNCGSAWSQC GGIGWSGATC CSSGNSCVEI NSYYSQCLPG ASTSPTSTSK VSSTTSKVTS SSAAQPITTT TAPSVPTTTT IAGGASSTAS FTGPFLGVQG WANSYYSSEI YNHAI PSMTD GSLAAQASAV AKVPTFQWLD RNVTVDTMKS TLEEIRAANK AGANPPYAAH FWYDLPDRD CAAAASNGEF SIANGGVANY KTYINAIRKL LIEYSDIRTI LVIEPDSLAN LVTNTNVAKC ANAASAYKEC TNYAITQLDL PHVAQYLDAG HGGWLGWPAN IQPAATLFAD IYKAAGKPKS VRGLVTNVSN YNGWSLSSAP SYTTPNPNYD EKKYIEAFSP LLNAAGFSPA QFIVDTGRSG KQPTGQIEQG DWCNAIGTGF GVRPTTNTGS SLADAFVWVK PGGESDGTSD TSATRYDYHC GLSDALKPAP EAGQWFQAYF EQLLKNANPA F
[0386] SEQ ID NO: 276 (node #108) QNCGSVWGQC GGIGWSGATC CASGSTCVEQ NDWYSQCLPG SSTTTTSSTS STTTTSSTTS SSTSTSTTST TTPPVPTTTS IPGGASSTAS YTGPFSGVQL WANNYYRSEV HTLAI PSMTD GAMATKAAAV AEVPSFQWLD RNVTVDTMSG TLAEIRAANK AGANPPYAGQ FWYDLPDRD CAAAASNGEF SIANGGAANY KAYI DAIRKL LIQYSDIRTI LVIEPDSLAN MVTNMNVAKC A AASTYKEL TIYALKQLNL PNVAMYLDAG HAGWLGWPAN IQPAAELFAK IYKDAGKPAA VRGLATNVAN YNAWSISSAP SYTSPNPNYD EKHYIEAFSP LLTAAGFSPA HFIVDTGRSG KQPTGQQEWG DWCNAIGTGF GVRPTANTGH ELVDAFVWVK PGGECDGTSD TTAARYDYHC GLSDALKPAP EAGQWFQAYF EQLLTNANPP F
[0387] SEQ ID NO: 277 (node #109)
QNCGSVWSQC GGIGWSGATC CASGSTCVEQ NDWYSQCLPN SSTTTSTSTR STTTSSSTTS SSTSTSTTST TTPPVPTTTS IPGGASSTAS YSGPFSGVQL WANDYYRSEV HTLAI PSMTD GAMATKAAAV AEVPSFQWLD RNVTIDTMAQ TLSQIRAANK AGANPPYAGQ FWYDLPDRD CAAAASNGEF SIANGGAANY KAYI DAIRKL IIQYSDIRII LVIEPDSLAN MVTNMNVAKC ANAASTYKEL TVYALKQLNL PNVAMYLDAG HAGWLGWPAN IQPAAELFAE IYKDAGKPAA VRGLATNVAN YNAWSIASAP SYTSPNPNYD EKHYIEAFSP LLTAAGFSPA HFIVDTGRNG KQPTGQQQWG DWCNVIGTGF GVRPTANTGH ELVDAFVWVK PGGECDGTSD TTAARYDYHC GLSDALQPAP EAGQWFQAYF EQLLTNANPP F
[0388] SEQ ID NO: 278 (node #110)
QNCGAVWTQC GGNGWQGPTC CASGSTCVAQ NEWYSQCLPN SPSSTSTSQR STSTSSSTTR SGSSTSSSST TPPPVSSPTS IPGGATSTAS YSGPFSGVRL FANDYYRSEV HNLAI PSMTD GTLAAKASAV AEVPSFQWLD RNVTIDTMVQ TLSQVRALNK AGANPPYAAQ LWYDLPDRD CAAAASNGEF SIANGGAANY RSYI DAIRKH IIEYSDIRII LVIEPDSMAN MVTNMNVAKC SNAASTYHEL TVYALKQLNL PNVAMYLDAG HAGWLGWPAN IQPAAELFAG IYNDAGKPAA VRGLATNVAN YNAWSIASAP SYTSPNPNYD EKHYIEAFSP LLNSAGFSPA RFIVDTGRNG KQPTGQQQWG DWCNVKGTGF GVRPTANTGH ELVDAFVWVK PGGESDGTSD TSAARYDYHC GLSDALQPAP EAGQWFQAYF EQLLTNANPP F
[0389] SEQ ID NO: 279 (node #111)
QNCGSVWGQC GGIGWNGATC CASGSTCVKQ NDWYSQCLPG SSTTTTSSTS STTTTSSTTS SSTSTSTTST TTPPVPTTTS IPGGASSTAS YTGPFSGVQL WANNYYRSEV HTLAI PSLTD GAMATKAAAV AEVPSFQWLD RNVTVDTFSG TLAEIRAANQ AGANPPYAGQ FWYDLPDRD CAAAASNGEW SIANGGAANY KAYI DRIREL LIQYSDIRTI LVIEPDSLAN MVTNMNVAKC ANAASTYKEL TIYALKQLNL PNVAMYLDAG HAGWLGWPAN IQPAAELFAK IYKDAGRPAA VRGLATNVAN YNAWSISSAP SYTSPNPNYD EKHYIEAFSP LLTAAGFSPA HFIVDTGRSG KQPTGQLEWG HWCNAIGTGF GVRPTANTGH ELVDAFVWVK PGGECDGTSD TTAARYDYHC GLSDALKPAP EAGQWFQAYF EQLLTNANPP F
[0390] SEQ ID NO: 280 (node #112)
QNCGSVWGQC GGNGWNGATC CASGSTCVKQ NDWYSQCLPG SSTTTTPSST STTTSSSSRS TSTSTSTTST TTPPVATTTS IPGGASSTAS YTGPFSGVQL WANNYYRSEV HTLAI PSLTD GAMATKAAAV AEVPSFQWMD RNVTVDTFSG TLAEIRAANQ AGANPPYAGI FWYDLPDRD CAAAASNGEW SIANGGAANY KAYI DRIREL LIQYSDIRTI LVIEPDSLAN MVTNMNVAKC AGAASTYKEL TIYALKQLNL PNVAMYLDAG HAGWLGWPAN IQPAAELFAK IYKDAGRPAA VRGLATNVAN YNAWSISSAP SYTSPNPNYD EKHYVEAFSP LLTAAGFSPA HFITDTGRSG KQPTGQLEWG HWCNAIGTGF GQRPTANTGH DLVDAFVWIK PGGECDGTSD TTAARYDHHC GLADALKPAP EAGQWFQAYF EQLLTNANPP F
[0391] SEQ ID NO: 281 (node #113) QNCGSVWGQC GGIGYNGPTC CQSGSTCVKQ NDWYSQCLPG SSTTTTSSTS STTTTSSTTS SSTSTSTTST TTAPAPTTTT IPGGASSTAS YNGPFSGVQL WANNYYRSEV HTLAIPSLTD PALAAKAAAV AEVPSFQWLD RNVTVDTFSG TLAEIRAANQ AGANPPYAGQ FWYDLPDRD CAAAASNGEW S IAN GAN Y KAYI DRIREL LIQYSDIRTI LVIEPDSLAN MVTNMNVAKC SNAASTYKEL TIYALKQLNL PHVAMYMDAG HAGWLGWPAN IQPAAELFAK IYKDAGRPAA VRGLATNVAN YNAWSISSPP SYTSPNPNYD EKHYIEAFSP LLTAQGFSPA QFIVDTGRSG KQPTGQLEWG HWCNAIGTGF GVRPTANTGH ELVDAFVWVK PGGECDGTSD TTAARYDYHC GLSDALKPAP EAGQWFQAYF EQLLTNANPP F

Claims

What is claimed:
1. An isolated polypeptide comprising an amino acid sequence about 90% identical to the amino acid sequence of SEQ ID NO: 58, SEQ ID NO: 59, SEQ ID NO: 60, SEQ ID NO: 61, SEQ ID NO: 62, SEQ ID NO: 63, SEQ ID NO: 64, SEQ ID NO: 65, SEQ ID NO: 66, SEQ ID NO: 67, SEQ ID NO: 68, SEQ ID NO: 69, SEQ ID NO: 70, SEQ ID NO: 71, SEQ ID NO: 72, SEQ ID NO: 73, SEQ ID NO: 74, SEQ ID NO: 75, SEQ ID NO: 76, SEQ ID NO: 77, SEQ ID NO: 78, SEQ ID NO: 79, SEQ ID NO: 80, SEQ ID NO: 81, SEQ ID NO: 82, SEQ ID NO: 83, SEQ ID NO: 84, SEQ ID NO: 85, SEQ ID NO: 86, SEQ ID NO: 87, SEQ ID NO: 88, SEQ ID NO: 89, SEQ ID NO: 90, SEQ ID NO: 91, SEQ ID NO: 92, SEQ ID NO: 93, SEQ ID NO: 94, SEQ ID NO: 95, SEQ ID NO: 96, SEQ ID NO: 97, SEQ ID
NO 98, SEQ ID NO: 99, SEQ ID NO: 100, SEQ ID NO: 101, SEQ ID NO: 102, SEQ ID NO 103, SEQ ID NO 104, SEQ ID NO 105, SEQ ID NO 106 SEQ ID NO 107, SEQ ID NO 108, SEQ ID NO 109, SEQ ID NO 110, SEQ ID NO 111 SEQ ID NO 112, SEQ ID NO 113, SEQ ID NO 114, SEQ ID NO 115, SEQ ID NO 116 SEQ ID NO 117, SEQ ID NO 118, SEQ ID NO 119, SEQ ID NO 120, SEQ ID NO 121 SEQ ID NO 122, SEQ ID NO 123, SEQ ID NO 124, SEQ ID NO 125, SEQ ID NO 126 SEQ ID NO 127, SEQ ID NO 128, SEQ ID NO 129, SEQ ID NO 130, SEQ ID NO 131 SEQ ID NO 132, SEQ ID NO 133, SEQ ID NO 134, SEQ ID NO 135, SEQ ID NO 136 SEQ ID NO 137, SEQ ID NO 138, SEQ ID NO 139, SEQ ID NO 140, SEQ ID NO 141 SEQ ID NO 142, SEQ ID NO 143, SEQ ID NO 144, SEQ ID NO 145, SEQ ID NO 146 SEQ ID NO 147, SEQ ID NO 148, SEQ ID NO 149, SEQ ID NO 150, SEQ ID NO 151 SEQ ID NO 152, SEQ ID NO 153, SEQ ID NO 154, SEQ ID NO 155, SEQ ID NO 156 SEQ ID NO 157, SEQ ID NO 158, SEQ ID NO 159, SEQ ID NO 160, SEQ ID NO 161 SEQ ID NO 162, SEQ ID NO 163, SEQ ID NO 164, SEQ ID NO 165, SEQ ID NO 166 SEQ ID NO 167, SEQ ID NO 168, SEQ ID NO 169, SEQ ID NO 170, SEQ ID NO 171 SEQ ID NO 172, SEQ ID NO 173, SEQ ID NO 174, SEQ ID NO 175, SEQ ID NO 176 SEQ ID NO 177, SEQ ID NO 178, SEQ ID NO 179, SEQ ID NO 180, SEQ ID NO 181 SEQ ID NO 182, SEQ ID NO 183, SEQ ID NO 184, SEQ ID NO 185, SEQ ID NO 186 SEQ ID NO 187, SEQ ID NO 188, SEQ ID NO 189, SEQ ID NO 190, SEQ ID NO 191 SEQ ID NO 192, SEQ ID NO 193, SEQ ID NO 194, SEQ ID NO 195, SEQ ID NO 196 SEQ ID NO 197, SEQ ID NO 198, SEQ ID NO 199, SEQ ID NO 200, SEQ ID NO 201 SEQ ID NO 202, SEQ ID NO 203, SEQ ID NO 204, SEQ ID NO 205, SEQ ID NO 206 SEQ ID NO 207, SEQ ID NO 208, SEQ ID NO 209, SEQ ID NO 210, SEQ ID NO 211 SEQ ID NO 212, SEQ ID NO 213, SEQ ID NO 214, SEQ ID NO 215, SEQ ID NO 216 SEQ ID NO 217, SEQ ID NO: 218, SEQ ID NO: 219, SEQ ID NO: 220, SEQ ID NO: 221, SEQ ID NO: 222, SEQ ID
NO: 223, SEQ ID NO: 224, SEQ ID NO: 225, SEQ ID NO: 226, SEQ ID NO: 227, SEQ ID
NO: 228, SEQ ID NO: 229, SEQ ID NO: 230, SEQ ID NO: 231, SEQ ID NO: 232, SEQ ID
NO: 233, SEQ ID NO: 234, SEQ ID NO: 235, SEQ ID NO: 236, SEQ ID NO: 237, SEQ ID
NO: 238, SEQ ID NO: 239, SEQ ID NO: 240, SEQ ID NO: 241, SEQ ID NO: 242, SEQ ID
NO: 243, SEQ ID NO: 244, SEQ ID NO: 245, SEQ ID NO: 246, SEQ ID NO: 247, SEQ ID
NO: 248, SEQ ID NO: 249, SEQ ID NO: 250, SEQ ID NO: 251, SEQ ID NO: 252, SEQ ID
NO: 253, SEQ ID NO: 254, SEQ ID NO: 255, SEQ ID NO: 256, SEQ ID NO: 257, SEQ ID
NO: 258, SEQ ID NO: 259, SEQ ID NO: 260, SEQ ID NO: 261, SEQ ID NO: 262, SEQ ID
NO: 263, SEQ ID NO: 264, SEQ ID NO: 265, SEQ ID NO: 266, SEQ ID NO: 267, SEQ ID
NO: 268, SEQ ID NO: 269, SEQ ID NO: 270, SEQ ID NO: 271, SEQ ID NO: 272, SEQ ID
NO: 273, SEQ ID NO: 274, SEQ ID NO: 275, SEQ ID NO: 276, SEQ ID NO: 277, SEQ ID
NO: 278, SEQ ID NO: 279, SEQ ID NO: 280, or SEQ ID NO: 281.
2. The isolated polypeptide of claim 1 wherein the signal peptide is removed.
3. A nucleic acid encoding a polypeptide of claim 1, or 2.
4. A recombinant microorganism, wherein said microorganism expresses a nucleic acid of claim 3.
5. A recombinant microorganism, wherein said microorganism expresses a nucleic acid
encoding a polypeptide of claim 1, 2, or a combination thereof.
6. The recombinant microorganism of claim 4 or 5, wherein the microorganism is a fungus.
7. The recombinant microorganism of claim 6, wherein the microorganism is from the phylum Basidomycota, from the phylum Ascomycota, from the subkingdom dikarya, or from the class Sordariomycetes.
8. The recombinant microorganism of claim 6, wherein the microorganism is a yeast.
9. The recombinant microorganism of claim 4 or 5, wherein the microorganism is a bacteria.
10. The recombinant microorganism of claim 8, wherein the microorganism is Saccharomyces cerevisiae.
11. The recombinant microorganism of claim 6, wherein the microorganism is selected from the group consisting of Saccharomyces sp., Pichia sp., Sclerotium rolfsii, Phanenerochate chrysosporium, Trichoderma sp., Aspergillus sp., Schizophyllum sp., and Penicillium sp.
12. The recombinant microorganism of claim 9, wherein the microorganism is selected from the group consisting of E.coli sp., Clostridium sp., Cellulomonas sp., Bacillus sp.,
Thermomonospora sp., Ruminococcus sp., Bacteriodes sp., Erwinia sp., Acetovibrio sp., Microbispora sp., and Streptomyces sp.
13. A method for the production of cellulosic ethanol, comprising adding an isolated polypeptide of claim 1, 2, or a combination thereof, to a source material of cellulose for cellulose processing.
14. A method for the production of cellulosic ethanol, comprising adding a recombinant
microorganism of claim 4, 5, 6, 7, 8, 9, 10, 11, 12, or a combination thereof, to a source material of cellulose for cellulose processing.
15. A method for cellulose processing, comprising adding a polypeptide of claim 1, 2, or a combination thereof, to a source material of cellulose.
16. A method for cellulose processing, comprising adding a recombinant microorganism of claim 4, 5, 6, 7, 8, 9, 10, 11, 12, or a combination thereof, to a source material of cellulose.
17. The method of claim 13 and 15, further comprising adding a recombinant microorganism of claim 4, 5, 6, 7, 8, 9, 10, 11, 12, or a combination thereof.
18. The method of claim 14 and 16, further comprising adding a polypeptide of claim 1, 2, or a combination thereof.
19. The method of claim 17 and 18, wherein the isolated polypeptide and recombinant
microorganism are added sequentially, in any order.
20. The method of claim 17 and 18, wherein the isolated polypeptide and recombinant
microorganism are added simultaneously.
21. The method of claim 13, 14, 15, or 16, wherein carbohydrate polymers are depolymerized.
PCT/US2014/056827 2013-09-20 2014-09-22 Biofuel production enzymes and uses thereof WO2015042543A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201361880466P 2013-09-20 2013-09-20
US61/880,466 2013-09-20

Publications (2)

Publication Number Publication Date
WO2015042543A2 true WO2015042543A2 (en) 2015-03-26
WO2015042543A3 WO2015042543A3 (en) 2015-05-14

Family

ID=52689611

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2014/056827 WO2015042543A2 (en) 2013-09-20 2014-09-22 Biofuel production enzymes and uses thereof

Country Status (1)

Country Link
WO (1) WO2015042543A2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017151957A1 (en) * 2016-03-02 2017-09-08 Novozymes A/S Cellobiohydrolase variants and polynucleotides encoding same
CN109182358A (en) * 2018-09-17 2019-01-11 黑龙江省农业科学院耕作栽培研究所 Bacillus cellulose enzyme gene FIB1X suitable for rice straw returning to the field

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8163976B2 (en) * 2007-05-23 2012-04-24 The Penn State Research Foundation Compositions and methods relating to transgenic plants and cellulosic ethanol production
US20120040408A1 (en) * 2008-06-20 2012-02-16 Decker Stephen R Processing cellulosic biomass
CA3092340A1 (en) * 2008-11-21 2010-05-27 Universiteit Stellenbosch Yeast expressing cellulases for simultaneous saccharification and fermentation using cellulose
DK2606131T3 (en) * 2010-08-20 2017-06-19 Codexis Inc Use of proteins from glycoside hydrolase 61 family for cellulose processing
JP2013169154A (en) * 2012-02-17 2013-09-02 Toyota Motor Corp Polypeptide enhancing cellulosic biomass degradation activity
WO2013166312A1 (en) * 2012-05-02 2013-11-07 The Trustees Of Columbia University In The City Of New York Biofuel production enzymes and uses thereof

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017151957A1 (en) * 2016-03-02 2017-09-08 Novozymes A/S Cellobiohydrolase variants and polynucleotides encoding same
CN109415712A (en) * 2016-03-02 2019-03-01 诺维信公司 Cellobiohydrolase variant and the polynucleotides for encoding them
US20190085309A1 (en) * 2016-03-02 2019-03-21 Novozymes A/S Cellobiohydrolase Variants and Polynucleotides Encoding Same
US10738293B2 (en) 2016-03-02 2020-08-11 Novozymes A/S Cellobiohydrolase variants and polynucleotides encoding same
US11053489B2 (en) 2016-03-02 2021-07-06 Novozymes A/S Cellobiohydrolase variants and polynucleotides encoding same
CN109182358A (en) * 2018-09-17 2019-01-11 黑龙江省农业科学院耕作栽培研究所 Bacillus cellulose enzyme gene FIB1X suitable for rice straw returning to the field
CN109182358B (en) * 2018-09-17 2022-02-08 黑龙江省农业科学院耕作栽培研究所 Bacillus cellulase gene FIB1X suitable for returning rice straws to field

Also Published As

Publication number Publication date
WO2015042543A3 (en) 2015-05-14

Similar Documents

Publication Publication Date Title
CA3107110A1 (en) Enzyme-expressing yeast for ethanol production
US20230012672A1 (en) Polypeptides having beta-glucanase activity and polynucleotides encoding same
US20120276594A1 (en) Cellobiohydrolase variants
US20230399632A1 (en) Glucoamylase Variants and Polynucleotides Encoding Same
US20160298154A1 (en) Processes for Increasing Enzymatic Hydrolysis of Cellulosic Material
CA3210777A1 (en) Polypeptides having pectinase activity, polynucleotides encoding same, and uses thereof
EP3201332A1 (en) Compositions comprising beta-mannanase and methods of use
Shibata et al. A novel GH10 xylanase from Penicillium sp. accelerates saccharification of alkaline-pretreated bagasse by an enzyme from recombinant Trichoderma reesei expressing Aspergillus β-glucosidase
CN113286871A (en) Microorganisms with enhanced nitrogen utilization for ethanol production
US20150087028A1 (en) Cbh1a variants
CA3143527A1 (en) Fusion proteins for improved enzyme expression
US20150210991A1 (en) Methods For Enhancing The Degradation Or Conversion Of Cellulosic Material
US8975058B2 (en) Endoglucanases for treatment of cellulosic material
WO2013166312A1 (en) Biofuel production enzymes and uses thereof
EP2855673B1 (en) Improved endoglucanases for treatment of cellulosic material
WO2015042543A2 (en) Biofuel production enzymes and uses thereof
CA2891504A1 (en) Compositions and methods of use
CA3143381A1 (en) Microorganisms with improved nitrogen transport for ethanol production
WO2016054163A1 (en) Compositions comprising beta mannanase and methods of use
WO2019074828A1 (en) Cellobiose dehydrogenase variants and methods of use thereof
Tran One-Pot Enzymatic Treatment of Lignocellulosic Biomass for Bioenergy Production
EP2758515A1 (en) Endoglucanase 1b
WO2016054205A1 (en) Compositions comprising beta mannanase and methods of use
WO2016054168A1 (en) Compositions comprising beta mannanase and methods of use

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14846561

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14846561

Country of ref document: EP

Kind code of ref document: A2