EP0950064A1

EP0950064A1 - Genes encoding transcriptional regulatory proteins from trichoderma reesei and uses thereof

Info

Publication number: EP0950064A1
Application number: EP97945899A
Authority: EP
Inventors: Anu Saloheimo; Nina Aro; Marja Ilm N; Merja Penttilä
Original assignee: AB Enzymes Oy
Current assignee: AB Enzymes Oy
Priority date: 1996-11-29
Filing date: 1997-12-01
Publication date: 1999-10-20
Also published as: WO1998023642A1; AU5123598A; EP0939825A1; WO1998023764A1; AU5123498A

Abstract

A purified nucleic acid molecule encoding a protein having the ability to transcriptionally regulate promoters. In particular the invention relates to transcriptional activators of the CBH I promoter from Trichoderma reesei, DNA encoding such proteins, and methods for their use.

Description

GENES ENCODING TRANSCRIPTIONAL REGULATORY PROTEINS FROM TRICHODERMA REESEI AND USES THEREOF

Background of the Invention

Field of the Invention

The invention is in the field of transcriptional regulation of fungal gene expression.

Related Art

Trichoderma reesei is widely used in production of hydrolytic enzymes for industrial applications. The promoters used, especially the cbhl promoter of the gene encoding the major cellulase of this fungus, are strong and lead to high expression levels of most of the products. There is, however, a need for even more efficient expression systems for both homologous and heterologous products.

Introducing multiple copies of an expression cassette that contains a gene of interest operably linked to a desired promoter does not always lead to linear increase in expression of the gene of interest. This has been described for expression under the control of the glucoamylase promoter in Aspergillus and also for expression under the control of the cbhl promoter in Trichoderma. The results indicate that as little as three copies of the cassette is saturating for expression (Karhunen, T., et al, Mol. Gen Genet. 241:515-522 (1993);

Margolles-Clark, E., et al, Appl. Env. Microbiol, 62: 2145-2151 (1996)).

This kind of limitation can be overcome by overexpressing an activator protein that is capable of inducing the production promoter. One example of this is the overexpression of the alcR activator of Aspergillus which activates the alcA promoter. Another example is the S. cerevisiae GAL4 activator, or the corresponding protein LAC9 of K. lactis, which activates the GALl promoter of S. cerevisiae.

Only a few filamentous fungal transcriptional ιcμulatoι \ pmtcins h.-n c been characterized. Of the fungal transcriptional regulatory proteins that have been characterized, most are from yeast Saccharomyces cerevisiae. Little is known about promoter activity for control of expression of the most abundant filamentous fungal extracellular proteins, such as, for example, glucoamylases, amylases, cellulases, hemicellulases, ligninases etc. creA of Aspergillus, and the homologue crel of Trichoderma, encode repressors involved in glucose repression of genes encoding both intracellular and extracellular enzymes. They participate in the coordinated expression of cellulases and hemicellulases that occurs in response to the presence of plant polysaccharides, and especially of cellulose and sometimes of sophorose, but not in the presence of glucose. The CREI of Trichoderma is the first protein shown to regulate cellulase expression (Ilmen, M., et al, Mol. Gen. Genet. 255:303-314 (1996) and Mol. Gen. Genet. 257:451-460 (1996)). From Aspergillus, XYLR activating the gene expression of the xylanase genes is characterized (WO 97/00962). No positively acting regulatory proteins involved in the induction of cellulase promoters of any fungi are, however, known.

Another option to increase (stimulate) promoter function, which could be used in connection with overexpression of the activator, is to modify the promoter so that it contains multiple binding sites for the activator(s) thus sequestering the activator so that it binds in a larger amount to the desired promoter and further activates the promoter's expression. With this method, the fungus could be tailored to produce an optimal combination of enzymes for each industrial application.

Detailed promoter analysis and description of binding sites for activators have, however, not been described for the strongest promoters in most filamentous fungi, and especially in Trichoderma. One report describes positive regulation mediated through a CCAAT element in a T. reesei xylanase gene (Zeilinger, S., et al, J. Biol. Chem. 277:25624-25629 (1996)). The protein responsible for this regulation has not been isolated. No knowledge exists of the binding sites or of the activators of the cbhl promoter of Trichoderma. Cloning of activator genes can be attempted by analyzing all proteins binding to previously characterized binding sites, and then making antibodies to the proteins or making nucleic acid probes based on the N-terminal amino acid sequences of the proteins, and using conventional cloning methods to obtain the activator gene. Also, if the binding site is known, the so-called one-hybrid system can be used (Clontech). Also, regulatory mutant complementation, or alternatively heterologous hybridization with a DNA encoding the desired activator from an other organism can be used and the cloned genes analyzed.

However, each of these methods relies on the availability of previous data on the regulatory protein in concern. No methods to clone specific activators have been described for cases when no mutant data, data describing the binding sites in the promoters, or data on other homologous regulatory proteins is available.

Summatγ of the Invention

Recognizing the need to be able to identify and clone desired transcriptional regulatory proteins when no information was available about the protein or its DNA recognition site, and cognizant of the limitations in the current methods that are available that require such information, the inventors set forth to develop new methods that overcame the limitations of the assays known in the art. These efforts culminated in the development of a method for cloning transcriptional activator proteins and in the use of this method in the identification of previously unknown transcriptional activator proteins of the cellulase genes of Trichoderma reesei.

This invention is first directed to a method for cloning transcriptional regulatory elements (binding sites) and transcriptional regulatory proteins, in particular, transcriptional activator elements and transcriptional activator proteins, in a manner that does not depend upon the availability of mutants or data therefrom, and does not require that the DNA binding site of such proteins be known in advance.

The invention is further directed to the identification of the T. reesei acel and ace2 transcriptional activator proteins using such method, and their cloning. The invention is further directed to nucleic acid sequences encoding T. reesei acel and ace2, including expression cassettes from which the proteins encoded by these genes can be expressed, including vectors providing the same.

The invention is further directed to hosts transformed with such nucleic acid sequences.

The invention is further directed to a method of stimulating gene expression in hosts transformed with sequences encoding acel and ace2 by providing to such host a DNA construct in which the gene of interest is operably linked to a promoter that further contains one or more binding sites for the acel and/or ace2 transcriptional activator proteins that are heterologous to the native promoter structure.

The invention is further directed to a method for enhancing expression of a desired gene in cells capable of expressing ACEI and/or ACEII by inserting into the promoter of the gene a binding site for ACEI and/or ACEII, or multiple copies of such sites.

Brief Description of the Figures

FIG. 1. Plasmid pAS3.

FIG. 2. Plasmid pMS95.

FIG. 3. Plasmid pAJ401. FIG. 4 The acel cDNA sequence (SEQ ID NO. 5).

FIG. 5a to e.The acel chromosomal sequence and the deduced protein sequence of ACEI (SEQ ID NO. 6 and 7).

FIG. 6 The ace2 cDNA sequence (SEQ ID NO. 8).

FIG. 7a to c.The ace2 chromosomal sequence and the deduced protein sequence of ACEII (SEQ ID NO. 9 and 10).

FIG. 8 Plasmid p -66

FIG. 9 Northern analysis of T. reesei QM9414 host (lane 1) and five transformants (lanes 2-6) that overproduce the ACEII protein on Solka floe cellulose or medium containing glucose. FIG. 10. The expression vector pIvfl-69. FIG. 11. The expression vector pARO3. FIG. 12. Plasmid pALK1062NB .

Description of the Deposits

Plasmid pAS34 is also called VTT-F-97077 and was deposited as VTT-F-

97077 at the DSMZ (Deutche Sammlung von Mikroorganismem und Zelkulturen GmbH), Mascheroder Weg lb, D-38124 Braunschweig, F.R.G. in E. coli on March 7, 1997 and assigned accession number DSM 11451.

Plasmid pAS33 is also called VTT-F-97078 and was deposited as VTT-F- 97078 at the DSMZ in E. coli on March 7, 1997 and assigned accession number

DSM 11452.

Plasmid pAS28 is also called VTT-F-97079 and was deposited as VTT-F-

97079 at the DSMZ in E. coli on March 7, 1997 and assigned accession number DSM 11453. Plasmid pAS26 is also called VTT-F-97080 and was deposited as VTT-F-

97080 at the DSMZ in E. coli on March 7, 1997 and assigned accession number DSM 1 1454.

Detailed Description of the Preferred Embodiments

I. Method for Cloning Transcriptional Regulatory Proteins

The invention describes a method for identification and isolation of transcriptional regulatory protein-encoding genes, and especially genes encoding transcriptional activator proteins. The advantage of the method of the invention is that no information about the DNA binding site (sequence) that is recognized by the transcriptional regulatory protein is needed to practice the method because the method of the invention allows for the identification of the sites where specific activators bind. According to the invention, the method for cloning genes activating expression through a specific promoter can be based on expression of a complete cDNA library from the desired organism, in a second host, for example, in the yeast S. cerevisiae. The second host, for example, the yeast strain, is first transformed with a reporter construct in which expression of a reporter gene is under the control of (operably linked to) a desired heterologous (or homologous) promoter thought to contain a binding site of (or is at least responsive to) the transcriptional regulatory protein in question. This could be a promoter for which regulatory features (such as inducers, repressors, growth conditions that turn it on and off, etc) are known but for which the actual regulatory proteins are not known or at least the corresponding genes are not cloned. Also, this could be a promoter for which no known inducers or regulatory mechanisms have yet been identified.

The second strain, such as, for example, the yeast strain discussed above, is then transformed with a sample from a cDNA bank that is to be screened for the presence of genes capable of expressing proteins that activate the promoter that is operably linked to the reporter gene. This may be a cDNA bank that is from the same organism as that of the promoter or from a different organism. Preferably, the clones in the cDNA bank are in the form of an expression library wherein expression of proteins encoded by the clones is provided in a constitutive or inducible manner. The design of the expression library should be such that promoters operably linked to the cDNA constructs are capable of functioning in the organism.

According to the invention, when the second host described above contains both a clone (from the expression library) that expresses a transcriptional activator that is capable of regulating the promoter that is operably linked to the reporter construct, and also the host contains the reporter construct, expression of the reporter should be such that induction of the reporter's promoter's expression occurs only when activators of the gene are present. Alternatively, the presence of the activator can be identified as an increase in the expression of the reporter gene over a base level that is found in the absense of the activator. In addition to identifying transcriptional activator proteins, the methods of the invention are also useful for identifying transcriptional repressor proteins. Essentially the same steps are involved, except that now the promoter of interest that is operably linked to the reporter gene is active in the absence of the repressor and clones from the cDNA bank that contain DNA encoding the repressor protein are identified by identifying those hosts in which expression of the desired reporter gene is suppressed or eliminated. Therefore, although the discussion herein focuses on the identification of transcriptional activator proteins, the application of the method to the identification of transcriptional repressor proteins is immediately apparant therefrom.

As exemplified herein, a useful reporter sequence to identify transcriptional activator proteins when S. cerevisiae is the host is the HIS 3 gene. S. cerevisiae host strains (his3-minus) are available where the HIS3 gene has been deleted or otherwise inactivated in a way that they cannot grow without added histidine unless the yeast has been transformed with a functional HIS3 gene. Consequently, by transforming such hosts with a HIS3 DNA construct to which a desired promoter has been operably linked and also transforming such hosts with the gene bank from which activator genes are to be identified, yeast clones harboring the desired activator gene can be found based on their ability to grow without histidine addition to the medium.

Using yeast genetic methods, the ability of the activator to activate only in the presence of the specific promoter can be confirmed. Possibly leakiness of the reporter construct can be avoided, when necessary, by placing a stuffer fragment in between the upstream vector sequences and the promoter, or alternatively, for example, by using appropriate amounts of the competitive inhibitor of the HIS3 gene product, aminotriatzole (for example, 1 - 100 mM), in the medium.

The TATA region on the reporter gene's promoter can be provided from the desired reporter gene, for example, the HIS3 gene, or alternatively from the promoter of question. The reporter gene can be also any other gene for which the desired result, activation or repression, can be detected in a similar manner. Furthermore, it can, for instance, encode beta-galactosidase as described for many reporter systems, or it can be, for example, CUPl or an antibiotic resistance marker such as G418 and its activity detected based on the copper or antibiotics resistance, respectively, that it confers to the yeast harboring the activator clone. The advantage of the method of the invention is that no previous knowledge of the identity or presence of the transcriptional regulatory protein, such as the activator, or of the protein's binding site is needed. Unlike many other methods, large promoter fragments can be operably linked to the reporter gene. However, as shown herein, the method works also for smaller fragments of the promoter, and once the activator has been cloned, its binding sites in the promoter can be mapped by replacing the whole promoter by overlapping smaller fragments of the same. Furthermore, the method of the invention can also be used to test whether promoters or promoter fragments contain binding sites for certain activators. A yeast-based system is especially useful for cloning of fungal activator genes regulating genes encoding filamentous fungal extracellular enzymes since the yeast S. cerevisiae does not generally produce such enzymes or transcriptional regulatory proteins responsible for their production, S. cerevisiae being an exception amongst yeasts and filamentous fungi. Thus it is unlikely that proteins native to the yeast host would activate the reporter construct causing background.

In a similar manner, transcriptional activator proteins that regulate the transcription of themselves or of other transcriptional activator proteins can be identified, using the same reporter system as described above. In this example, the host cell would be provided with at least three constructs: a first construct containing the reporter gene operably linked to a promoter capable of being activated by a known transcriptional activator protein; a second construct that contains the gene of the known transcriptional activator protein under its native promoter; and a third construct that is the representative of the cDNA bank that is being screened for the identification of a protein that will activate transcription of the known transcriptional activator. In the presence of the protein encoded by the cDNA bank construct, the new transcriptional activator will effectively activate transcription of the known transcriptional activator, which, in turn, activates transcription of the reporter gene. This can be achieved also by operably linking the promoter of the activator gene directly to the reporter gene.

The method of the invention is also useful to identify not just regulatory proteins that regulate the transcription of other regulatory proteins, but also, to identify those transcriptional regulatory proteins that interact in an ancillary manner with another protein required for transcription so as to alter its ability to enhance or repress transcription, but that may not bind to the promoter.

This method is not limited by the type of host and would be useful to identify any transcriptional regulatory protein for any host and in any host as long as the basic transcription machinery of such host would be expected to bind to the promoter operably linked to the reporter gene and to the transcriptional regulatory protein, as provided by the cDNA bank. For example, activator proteins in bacterial hosts could be identified by using a promoter capable of functioning in such host in the presence of the activator.

II. Identification and Cloning of the A CEI and A CEII Activator Proteins

Using the method of the invention, two transcriptional activator proteins that regulate transcription in filamentous fungi have been identified. These proteins are called ACEI and ACEII and are capable of activating the promoter of the cellulase gene cbhl that encodes the major cellulase cellobiohydrolase I

(CBHI) protein. acel and ace2 genes were obtained by screening of a T. reesei gene bank that had been induced to maximally express a variety of T. reesei extracellular enzymes including cellulases as well as xylanases and other hemicellulases. The encoded proteins contain DNA binding regions but show no other obvious amino acid similarity to any other protein known, not even when compared against the data base containing the complete yeast genome.

ACEI (Example 3 and 4) has an open reading frame of 733 amino acids (SEQ ID No. 7). It contains the C₂H₂ zinc finger domain that characterizes one major class of DNA binding domains present in a variety of regulatory proteins from other species. The acel cDNA sequence is shown in SEQ ID No. 5 and the sequence of the acel gene containing three introns is shown in SEQ ID No. 6. ACEII (Example 3 and 5) has an open reading frame of 341 amino acids (SEQ ID No.10). The ace2 cDNA sequence is shown in SEQ ID No. 8 and the ace2 gene sequence, which contains no introns, is shown in SEQ ID No. 9. The DNA binding domain of ACEII is a binuclear zinc cluster typical of many activators of fungal origin only.

As described herein, it is possible to modulate the expression of ACEI and ACEII, and overexpress them under any inducible or constitutive promoter in Trichoderma or Aspergillus singly or together in various repressing, neutral or induced conditions in respect to cellulase production such as on glucose containing media or on media containing sorbitol, on cellulose or its derivatives cellobiose or sophorose, on xylan, lactose, or whey. Transforming a fungal host with clones capable of expressing ACEI and or ACEII either under their own promoters or under the control of a desired heterologous promoter, enhances the levels of these proteins in the host cell and allows the maximal transcriptional expression of fungal proteins that are the natural targets for these proteins. Especially, such modulation is used to improve or modify expression of hydrolytic enzyme genes under their own, modified or heterologous promoters.

Additionally, the gene encoding any desired protein can be placed under the control of a promoter that is known to respond to the ACEI or ACEII protein, for example, the T reesei cbhl promoter, and expression of such protein can thereby be regulated or enhanced in a desired host cell. If such host cell naturally produced ACEI and or ACEII then it may not be necessary to transform such a host with additional copies of the genes encoding these proteins. However, if the host cell does not naturally produce ACEI or ACEII, or if the host cell produces relatively low levels of these proteins in a manner that may be limiting to the transcriptional induction capacity, then the host cell may be transformed with additional copies of the genes encoding one or both ACEI and ACEII as necessary and as provided according to the invention. If production is desired in conditions were the activators are not naturally produced, they can be overexpressed under a promoter functional in all conditions, eg. the fungal glycerol phosphate dehydrogenase A (gpdA) promoter or cDNAl promoter of T. reesei. The protein which expression is enhanced by producing the activators can be any homologous protein of Trichoderma or Aspergillus, or it can be any heterologous protein like the β-lactamase encoded by the lacZ gene of E. coli shown here.

Another way to enhance production of proteins is to modify the promoters in such a way that they contain additional copies of the ACΕI and/or ACEII binding sites. Also promoters not normally under the regulation of the activators can be modified to contain one or more binding sites. By combining these methods the fungus can be manipulated to produce enzyme mixtures specifically tailored for each application.

III. Preparation of Coding Sequences Operably Linked to the Promoter Sequences of the Invention

The process for genetically engineering a coding sequence, for expression under an ACEI or ACEII-sensitive promoter, according to the invention, is facilitated through the isolation and partial sequencing of pure protein encoding an enzyme of interest or by the cloning of genetic sequences which are capable of encoding such protein, for example, by cDNA cloning or with polymerase chain reaction technologies; and through the expression of such genetic sequences. As used herein, the term "genetic sequences" is intended to refer to a nucleic acid molecule (preferably DNA). Genetic sequences that are capable of encoding a protein are derived from a variety of sources. These sources include genomic DNA, cDNA, synthetic DNA, and combinations thereof. The preferred source of genomic DNA is a fungal genomic bank. The preferred source of the cDNA is a cDNA bank prepared from fungal mRNA grown in conditions known to induce expression of the desired gene to produce mRNA or protein. However, since the genetic code is universal, a coding sequence from any host, including prokaryotic (bacterial) hosts, and any eukaryotic host plants, mammals, insects, yeast, and any cultured cell populations would be expected to function (encode the desired protein).

Genomic DNA may or may not include naturally occurring introns.

Moreover, such genomic DNA may be obtained in association with the 5' promoter region of the gene sequences and/or with the 3' transcriptional termination region. According to the invention however, the native promoter region would be replaced with an ACEI and/or ACEII responsive promoter.

Such genomic DNA may also be obtained in association with the genetic sequences which encode the 5' non-translated region of the mRNA and/or with the genetic sequences which encode the 3' non-translated region. To the extent that a host cell can recognize the transcriptional and or translational regulatory signals associated with the expression of the mRNA and protein, then the 5' and/or 3' non-transcribed regions of the native gene, and/or, the 5' and/or 3' non- translated regions of the mRNA may be retained and employed for transcriptional and translational regulation.

Genomic DNA can be extracted and purified from any host cell, especially a fungal host cell, which naturally expresses the desired protein by means well known in the art. A genomic DNA sequence may be shortened by means known in the art to isolate a desired gene from a chromosomal region that otherwise would contain more information than necessary for the utilization of this gene in the hosts of the invention. For example, restriction digestion may be utilized to cleave the full-length sequence at a desired location. Alternatively, or in addition, nucleases that cleave from the 3 '-end of a DNA molecule may be used to digest a certain sequence to a shortened form, the desired length then being identified and purified by gel electrophoresis and DNA sequencing. Such nucleases include, for example, Exonuclease III and Baβ 1. Other nucleases are well known in the art.

For cloning into a vector, such suitable DNA preparations (either genomic DNA or cDNA) are randomly sheared or enzymatically cleaved, respectively, and ligated into appropriate vectors to form a recombinant gene (either genomic or cDNA) bank. A DNA sequence encoding a desired protein or its functional derivatives may be inserted into a DNA vector in accordance with conventional techniques, including blunt-ending or staggered-ending termini for Iigation, restriction enzyme digestion to provide appropriate termini, filling in of cohesive ends as appropriate, alkaline phosphatase treatment to avoid undesirable joining, and

Iigation with appropriate ligases. Techniques for such manipulations are well known in the art.

Libraries containing sequences coding for the desired gene may be screened and the desired gene sequence identified by any means which specifically selects for a sequence coding for such gene or protein such as, for example, a) by hybridization with an appropriate nucleic acid probe(s) containing a sequence specific for the DNA of this protein, or b) by hybridization-selected translational analysis in which native mRNA which hybridizes to the clone in question is translated in vitro and the translation products are further characterized, or, c) if the cloned genetic sequences are themselves capable of expressing mRNA, by immunoprecipitation of a translated protein product produced by the host containing the clone.

Oligonucleotide probes specific for a certain protein which can be used to identify clones to this protein can be designed from the knowledge of the amino acid sequence of the protein or from the knowledge of the nucleic acid sequence of the DNA encoding such protein or a related protein. Alternatively, antibodies may be raised against purified forms of the protein and used to identify the presence of unique protein determinants in transformants that express the desired cloned protein. When an amino acid sequence is listed horizontally, unless otherwise stated, the amino terminus is intended to be on the left end and the carboxy terminus is intended to be at the right end. Similarly, unless otherwise stated or apparent from the context, a nucleic acid sequence is presented with the 5' end on the left.

Because the genetic code is degenerate, more than one codon may be used to encode a particular amino acid. Peptide fragments can be analyzed to identify sequences of amino acids that may be encoded by oligonucleotides having the lowest degree of degeneracy. This is preferably accomplished by identifying sequences that contain amino acids which are encoded by only a single codon. Although occasionally an amino acid sequence can be encoded by only a single oligonucleotide sequence, frequently the amino acid sequence are encoded by any of a set of similar oligonucleotides. Importantly, whereas all of the members of this set contain oligonucleotide sequences which are capable of encoding the same peptide fragment and, thus, potentially contain the same oligonucleotide sequence as the gene which encodes the peptide fragment, only one member of the set contains the nucleotide sequence that is identical to the exon coding sequence of the gene. Because this member is present within the set, and is capable of hybridizing to DNA even in the presence of the other members of the set, it is possible to employ the unfractionated set of oligonucleotides in the same manner in which one would employ a single oligonucleotide to clone the gene that encodes the peptide. Using the genetic code, one or more different oligonucleotides can be identified from the amino acid sequence, each of which would be capable of encoding the desired protein. The probability that a particular oligonucleotide will, in fact, constitute the actual protein encoding sequence can be estimated by considering abnormal base pairing relationships and the frequency with which a particular codon is actually used (to encode a particular amino acid) in eukaryotic cells. Using "codon usage rules," a single oligonucleotide sequence, or a set of oligonucleotide sequences, that contain a theoretical "most probable" nucleotide sequence capable of encoding the protein sequences is identified.

The suitable oligonucleotide, or set of oligonucleotides, which is capable of encoding a fragment of a certain gene (or which is complementary to such an oligonucleotide, or set of oligonucleotides) can be synthesized by means well known in the art (see, for example, Oligonucleotides and Analogues, A Practical Approach, F. Eckstein, ed., 1992, LRL Press, New York) and employed as a probe to identify and isolate a clone to such gene by techniques known in the art. Those members of the above-described gene bank that are found to be capable of such hybridization are then analyzed to determine the extent and nature of coding sequences which they contain.

To facilitate the detection of a desired DNA coding sequence, the above- described DNA probe is labeled with a detectable group. Such detectable group can be any material having a detectable physical or chemical property. Such materials have been well-developed in the field of nucleic acid hybridization and in general most any label useful in such methods can be applied to the present invention. Particularly useful are radioactive labels, such as ²P, H, ^l4C, ³⁵S, ¹²⁵I, or the like. Any radioactive label may be employed which provides for an adequate signal and has a sufficient half-life. If single stranded, the oligonucleotide may be radioactively labelled using kinase reactions. Alternatively, polynucleotides are also useful as nucleic acid hybridization probes when labeled with a non-radioactive marker such as biotin, an enzyme or a fluorescent group. Thus, in summary, the elucidation of a partial protein sequence, permits the identification of a theoretical "most probable" DNA sequence, or a set of such sequences, capable of encoding such a peptide. By constructing an oligonucleotide complementary to this theoretical sequence (or by constructing a set of oligonucleotides complementary to the set of "most probable" oligonucleotides), one obtains a DNA molecule (or set of DNA molecules), capable of functioning as a probe(s) for the identification and isolation of clones containing a gene.

In an alternative way of cloning a gene, a bank is prepared using an expression vector, by cloning DNA or, more preferably cDNA prepared from a cell capable of expressing the protein into an expression vector. The bank is then screened for members which express the desired protein, for example, by screening the bank with antibodies to the protein.

The above discussed methods are, therefore, capable of identifying genetic sequences that are capable of encoding a protein or biologically active or antigenic fragments of this protein. The desired coding sequence may be further characterized by demonstrating its ability to encode a protein having the ability to bind antibody in a specific manner, the ability to elicit the production of antibody which are capable of binding to the native, non-recombinant protein, the ability to provide a enzymatic activity to a cell that is a property of the protein, and the ability to provide a non-enzymatic (but specific) function to a recipient cell, among others. In order to produce the recombinant protein, it is desirable to operably link such DNA coding sequences to the ACEI and/or ACEII regulated promoters of the invention. When the coding sequence and the operably linked promoter of the invention are introduced into a recipient eukaryotic cell (preferably a fungal host cell) as a non-replicating DNA (or RNA), or non-integrating molecule, the expression of the encoded protein may occur through the transient

(nonstable) expression of the introduced sequence.

Preferably however, when utilizing a fungal host, the coding sequence is introduced as a DNA molecule, such as a closed circular or linear DNA molecule that is incapable of autonomous replication and most preferably, a linear molecule that integrates into the host chromosome. Genetically stable fungal transformants may be constructed with vector systems, or transformation systems, whereby a desired DNA is integrated into the host chromosome. Such integration may occur de novo within the cell or, be assisted by transformation with a vector which functionally inserts itself into the host chromosome. The gene coding the desired protein operably linked to the ACEI and/or

ACEII inducible promoter of the invention may be provided along with a transformation marker gene in one plasmid construction and introduced into the host cells by transformation, or, the marker gene may be on a separate construct for co-transformation with the coding sequence construct into the host cell. The nature of the vector will depend on the host organism. In the practical realization of the ACEI and ACEII regulated embodiments of the invention, the filamentous fungus Trichoderma has been employed as a model. Thus, for Trichoderma and especially for T. reesei, vectors incorporating DNA that provides for integration of the expression cassette (the coding sequence operably linked to its transcriptional and translational regulatory elements) into the host's chromosome are preferred. It is not necessary to target the chromosomal insertion to a specific site. However, targeting the integration to a specific locus may be achieved by providing specific coding or flanking sequences on the recombinant construct, in an amount sufficient to direct integration to this locus at a relevant frequency. Cells that have stably integrated the introduced DNA into their chromosomes are selected by also introducing one or more markers that allow for selection of host cells that contain the expression vector in the chromosome, for example the marker may provide biocide resistance, e.g., resistance to antibiotics, or heavy metals, such as copper, or the like. The selectable marker gene can either be directly linked to the DNA gene sequences to be expressed, or intro- duced into the same cell by co-transformation. This discussion of marker genes should not to be confused with the marker gene that is used to identify transcriptional regulatory proteins according to that method of the invention (although such markers could be used in fungal hosts in that regard). Rather, the discussion here regards the use of these markers as markers for the transformation event. Such a genetic marker especially for the transformation of the hosts of the invention is amdS, encoding acetamidase and thus enabling Trichoderma to grow on acetamide as the only nitrogen source. Selectable markers for use in transforming filamentous fungi include, for example, acetamidase (the amdS gene), benomyl resistance, oligomycin resistance, hygromycin resistance, aminoglycoside resistance, bleomycin resistance; and, with auxotrophic mutants, ornithine carbamoyltransferase (OCTase or the argB gene). The use of such markers is also reviewed in Finkelstein, D.B. in: Biotechnology of Filamentous Fungi: Technology and Products, Chapter 6, Finkelstein, D.B. et al, eds., Butterworth-Heinemam , publishers, Stoneham, MA, (1992), pp. 113-156. To express a desired protein and/or its active derivatives, transcriptional and translational signals recognizable by an appropriate host are necessary. The cloned coding sequences, obtained through the methods described above, and preferably in a double-stranded form, may be operably linked to sequences controlling transcriptional expression in an expression vector, and introduced into a host cell, either prokaryote or eukaryote, to produce recombinant protein or a functional derivative thereof. Depending upon which strand of the coding sequence is operably linked to the sequences controlling transcriptional expression, it is also possible to express antisense RNA or a functional derivative thereof.

Expression of the protein in different hosts may result in different post- translational modifications which may alter the properties of the protein.

Preferably, the present invention encompasses the expression of the protein or a functional derivative thereof, in eukaryotic cells, and especially in fungus.

A nucleic acid molecule, such as DNA, is said to be "capable of expressing" a polypeptide if it contains expression control sequences which contain transcriptional regulatory information and such sequences are "operably linked" to the nucleotide sequence which encodes the polypeptide.

An operable linkage is a linkage in which a sequence is connected to a regulatory sequence (or sequences) in such a way as to place expression of the sequence under the influence or control of the regulatory sequence. Two DNA sequences (such as a coding sequence and a promoter region sequence linked to the 5' end of the coding sequence) are said to be operably linked if induction of promoter function results in the transcription of mRNA encoding the desired protein and if the nature of the linkage between the two DNA sequences does not ( 1 ) result in the introduction of a frame-shift mutation, (2) interfere with the ability of the expression regulatory sequences to direct the expression of the proteins, antisense RNA, or (3) interfere with the ability of the DNA template to be transcribed. Thus, a promoter region would be operably linked to a DNA sequence if the promoter was capable of effecting transcription of that DNA sequence. The precise nature of the regulatory regions needed for gene expression may vary between species or cell types, but shall in general include, as necessary, 5' non-transcribing and 5' non-translating (non-coding) sequences involved with initiation of transcription and translation respectively, such as the TATA box, capping sequence, CAAT sequence, and the like, with those elements necessary for the promoter sequence being provided by the promoters of the invention. Such transcriptional control sequences may also include enhancer sequences or upstream activator sequences, as desired.

Expression of a protein in eukaryotic hosts such as fungus requires the use of regulatory regions functional in such hosts, and preferably fungal regulatory systems. A wide variety of transcriptional and translational regulatory sequences can be employed, depending upon the nature of the host. Preferably, these regulatory signals are associated in their native state with a particular gene that is capable of a high level of expression in the host cell.

In eukaryotes, where transcription is not linked to translation, such control regions may or may not provide an initiator methionine (AUG) codon, depending on whether the cloned sequence contains such a methionine. Such regions will, in general, include a promoter region sufficient to direct the initiation of RNA synthesis in the host cell. Promoters from filamentous fungal genes which encode a mRNA product capable of translation are preferred, and especially, strong promoters can be employed provided they also function as promoters in the host cell.

As is widely known, translation of eukaryotic mRNA is initiated at the codon which encodes the first methionine. For this reason, it is preferable to ensure that the linkage between a eukaryotic promoter and a DNA sequence that encodes the desired protein, or a functional derivative thereof, does not contain any intervening codons that are capable of encoding a methionine. The presence of such codons results either in a formation of a fusion protein (if the AUG codon is in the same reading frame as the protein-coding DNA sequence) or a frame- shift mutation (if the AUG codon is not in the same reading frame as the protein- coding sequence).

It may be desired to construct a fusion product that contains a partial coding sequence (usually at the amino terminal end) of a protein and a second coding sequence (partial or complete) of a second protein. The first coding sequence may or may not function as a signal sequence for secretion of the protein from the host cell. For example, the sequence coding for desired protein may be linked to a signal sequence that will allow secretion of the protein from, or the compartmentalization of the protein in, a particular host. Such fusion protein sequences may be designed with or without specific protease sites such that a desired peptide sequence is amenable to subsequent removal. In a preferred embodiment, the native signal sequence of a fungal protein is used, or a functional derivative of that sequence that retains the ability to direct the secretion of the peptide that is operably linked to it. Aspergillus leader/secretion signal elements also function in Trichoderma.

If desired, the non-transcribed and/or non-translated regions 3¹ to the sequence coding for a desired protein can be obtained by the above-described cloning methods. The 3'-non-transcribed region may be retained for its transcriptional termination regulatory sequence elements, or for those elements that direct polyadenylation in eukaryotic cells. Where the native expression control sequences signals do not function satisfactorily in a host cell, then sequences functional in the host cell may be substituted. The vectors may further comprise other operably linked regulatory elements such as DNA elements that confer antibiotic resistance, or origins of replication for maintenance of the vector in one or more host cells.

In another embodiment, especially for maintenance of vectors in prokaryotic cells, or in yeast S. cerevisiae cells, the introduced sequence is incor- porated into a plasmid or viral vector capable of autonomous replication in the recipient host. Any of a wide variety of vectors may be employed for this purpose. In Bacillus hosts, integration of the desired DNA may be necessary.

Factors of importance in selecting a particular plasmid or viral vector include: the ease with which recipient cells that contain the vector may be recognized and selected from those recipient cells that do not contain the vector; the number of copies of the vector that are desired in a particular host; and whether it is desirable to be able to "shuttle" the vector between host cells of different species.

When it is desired to use S. cerevisiae as a host or as a shuttle vector, preferred S. cerevisiae yeast plasmids include those containing the 2-micron circle, etc., or their derivatives. Such plasmids are well known in the art (Bot- stein, D., et al, Miami Wntr. Symp. 7P:265-274 (1982); Broach, J.R., in: The Molecular Biology of the Yeast Saccharomyces: Life Cycle and Inheritance, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, p. 445-470 (1981); Broach, J.R., Cell 25:203-204 (1982); Bollon, D.P., et al, J. Clin. Hematol. Oncol. 70:39-48 (1980); Maniatis, T., In: Cell Biology: A Comprehensive

Treatise, Vol. 3, Gene Expression, Academic Press, NY, pp. 563-608 (1980)), and are commercially available.

Once the vector or DNA sequence containing the construct(s) is prepared for expression, the DNA construct(s) is introduced into an appropriate host cell by any of a variety of suitable means, including transformation. After the introduction of the vector, recipient cells are grown in a selective medium, which selects for the growth of vector-containing cells. If this medium includes glucose, expression of the cloned gene sequence(s) results in the production of the desired protein, or in the production of a fragment of this protein as desired. This expression can take place in a continuous manner in the transformed cells, or in a controlled manner, for example, by induction of expression.

Fungal transformation is carried out also accordingly to techniques known in the art, for example, using, for example, homologous recombination to stably insert a gene into the fungal host and/or to destroy the ability of the host cell to express a certain protein.

Fungi useful as recombinant hosts for the purpose of the invention include, e.g. Trichoderma, Aspergillus, Claviceps purpurea, Penicillium chrysogenum, Magnaporthe grisea, Neurospora, Mycosphaerella spp., CoUectotrichum trifoli the dimorphic fungus Histoplasmia capsulatum, Nectria haematococca solani f. sp. phaseoli and f. sp. pisi),

Ustilago violacea, Ustilago maydis, Cephalosporium acremonium, Schizophyllum commune, Podospora anserina, Sordaria macrospora, Mucor circinelloides, Humicola, Melanocarpus, Myceliophthora, Chaetomium and CoUectotrichum capsici. Transformation and selection techniques for each of these fungi have been described (reviewed in Finkelstein, D.B. in: Biotechnology of Filamentous Fungi: Technology and Products, Chapter 6, Finkelstein, D.B. et al, eds., Butterworth-Heinemann, publishers, Stoneham, MA, (1992), pp. 113- 156). Especially preferred are Trichoderma reesei, T. harzianum, T. longibrachiatum, T. viride, T. koningii, Aspergillus nidulans, A. niger, A. terreus, A.ficum, A. oryzae, A. awamori and N. crassa. The hosts of the invention are meant to include all Trichoderma.

Trichoderma are classified on the basis of morphological evidence of similarity. T. reesei was formerly known as T. viride Pers. or T. koningii Oudem; sometimes it was classified as a distinct species of the T. longibrachiatum group. The entire genus Trichoderma, in general, is characterized by rapidly growing colonies bearing tufted or pustulate, repeatedly branched conidiophores with lageniform phialides and hyaline or green conidia borne in slimy heads (Bissett, J., Can. J. Bot. 62:924-931 (1984)).

The fungus called T. reesei is clearly defined as a genetic family originating from the strain QM6a, that is, a family of strains possessing a common genetic background originating from a single nucleus of the particular isolate QM6a. Only those strains are called T. reesei.

Classification by morphological means is problematic and the first recently published molecular data from DΝA-fingerprint analysis and the hybridization pattern of the cellobiohydrolase 2 (cbh2) gene in T. reesei and T. longibrachiatum clearly indicate a differentiation of these strains (Meyer, W. et al, Curr. Genet. 27:27-30 (1992); Morawetz, R. et al, Curr. Genet. 27:31-36 (1992)).

However, there is evidence of similarity between different Trichoderma species at the molecular level that is found in the conservation of nucleic acid and amino acid sequences of macromolecular entities shared by the various

Trichoderma species. For example, Cheng, C, et al, Nucl. Acids. Res. 18:5559 (1990), discloses the nucleotide sequence of T. viride cbhl. The gene was isolated using a probe based on the T reesei sequence. The authors note that there is a 95% homology between the amino acid sequences of the T viride and T. reesei gene. Goldman, G.H. et al, Nuc Acids Res. 18:6717 (1990), discloses the nucleotide sequence of phosphoglycerate kinases from T. viride and notes that the deduced amino acid sequence is 81% homologous with the phosphoglycerate kinase gene from T. reesei. Thus, the species classified to T. viride and T. reesei must genetically be very close to each other.

In addition, there is a high similarity of transformation conditions among the Trichoderma. Although practically all the industrially important species of

Trichoderma can be found in the formerly discussed Trichoderma section Longbrachiatum, there are some other species of Trichoderma that are not assigned to this section. Such a species is, for example, Trichoderma harzianum, which acts as a biocontrol agent against plant pathogens. A transformation system has also been developed for this Trichoderma species (Herrera-Estrella,

A. et al, Molec. Microbiol. :839-843 (1990)). Thus, even though Trichoderma harzianum is not assigned to the section Longibrachiatum, the method used by Herrera-Estrella in the preparation of spheroplasts before transformation is the same. The teachings of Herrera-Estrella show that there is not a significant diversity of Trichoderma spp. such that the transformation system of the invention would not be expected to function in all Trichoderma.

Further, there is a common functionality of fungal transcriptional control signals among fungal species. At least three A. nidulans promoter sequences, amdS, argB, and gpdA, have been shown to give rise to gene expression in T. reesei. For amdS and argB, only one or two copies of the gene are sufficient to being about a selectable phenotypes (Penttila, M., et al, Gene 61: 155-164 (1987)). Gruber, F. et al, Curr. Genetic 75:71-76 (1990) also notes that fungal genes can often by successfully expressed across different species. Therefore, it is to be expected that the glucose regulated promoters identified herein would be also regulatable by glucose in other fungi.

Many species of fungi, and especially Trichoderma, are available from a wide variety of resource centers that contain fungal culture collections. In addition, Trichoderma species are catalogued in various databases. These resources and databases are summarized by O'Donnell, K. et al, in Biochemistry of Filamentous Fungi: Technology and Products, D.B. Fingelstein et al, eds.,

Butterworth-Heinemann, Stoneham, MA, USA, 1992, pp. 3-39. After the introduction of the vector and selection of the transformant, recipient cells are grown in a selective medium, which selects for the growth of vector-containing cells. Expression of the cloned gene sequence(s) results in the synthesis and secretion of the desired heterologous or homologous protein, or in the production of a fragment of this protein, into the medium of the host cell.

In a preferred embodiment, the coding sequence is the sequence of an enzyme that is capable of hydrolysing lignocellulose. Examples of such sequences include a DNA sequence encoding cellobiohydrolase I (CBHI), cellobiohydrolase II (CBHII), endoglucanase I (EGI), endoglucanase II (EGII), endoglucanase EH (EGUI), β-glucosidases, xylanases (including endoxylanases and β-xylosidase), side-group cleaving activities, (for example, - arabinosidase, α-D-glucuronidase, and acetyl esterase), mannanases, pectinases (for example, endo-polygalacturonase, exo-polygalacturonase, pectinesterase, or, pectin and pectin acid lyase), and enzymes of lignin polymer degradation, (for example, lignin peroxidase LIII from Phlebia radiata (Saloheimo et al, Gene

55:343-351 (1989)), or the gene for another ligninase, laccase or Mn peroxidase (Kirk, In: Biochemistry and Genetics of Cellulose Degradation, Aubert et al (eds.), FEMS Symposium No. 43, Academic Press, Harcourt, Brace Jovanovitch Publishers, London, pp. 315-332 (1988))). The cloning of the cellulolytic enzyme genes has been described and recently reviewed (Teeri, T.T. in: Biotechnology of Filamentous Fungi: Technology and Products, Chapter 14, Finkelstein, D.B. et al, eds., Butterworth-Heinemann, publishers, Stoneham, MA, (1992), pp. 417- 445). The gene for the native cellobiohydrolase CBHI sequence has been cloned by Shoemaker et al. (Shoemaker, S., et al, Bio/Technology 7:691-695 (1983)) and Teeri et al. (Teeri, T., et al. , Bio/Technology 1 :696-699 ( 1983)) and the entire nucleotide sequence of the gene is known (Shoemaker, S., et al, Bio/Technology 7:691-695 (1983)). From T. reesei, the gene for the major endoglucanase (EGI) has also been cloned and characterized (Penttila, M., et al, Gene 5:253-263 (1986); Patent Application EP 137,280; Van Arstel, J.N.V., et al, Bio/Technology 5:60-64). Other isolated cellulase genes include cbh2 (Patent

Application WO 85/04672; Chen, CM., et al, Bio/Technology 5:274-278 (1987)) and egl3 (Saloheimo, M., et al, Gene 63:11-21 (1988)). The genes for the two endo-xylanases of T reesei (xlnl and xln2 have been cloned and described in The xylanase proteins have been purified and characterized (Tenkanen, M. et al, Proceeding of the Xylans and Xylanases Symposium, Wageningen, Holland (1991)).

The expressed protein may be isolated and purified from the medium of the host in accordance with conventional conditions, such as extraction, precipitation, chromatography, affinity chromatography, electrophoresis, or the like. For example, the cells may be collected by centrifugation, or with suitable buffers, lysed, and the protein isolated by column chromatography, for example, on

DEAE-cellulose, phosphocellulose, polyribocytidylic acid-agarose, hydroxy- apatite or by electrophoresis or immunoprecipitation.

The manner and method of carrying out the present invention may be more fully understood by those of skill by reference to the following examples, which examples are not intended in any manner to limit the scope of the present invention or of the claims directed thereto.

Examples

Example 1

Isolation method for genes coding for positively-acting regulatory proteins

The reporter plasmid pAS3 (Figure 1) was constructed as follows. pRS315 (Sikorski, R.S., and Hieter, P., Genetics 722: 19-27 (1989)), the yeast single-copy vector containing the LEU2 marker, was digested with the restriction enzymes BamHI and Sail. The HIS3 reporter gene of S. cerevisiae was cloned from cosmid p3030 (Hohn and Hinnen, unpublished; see Penttila, M.Ε., et al, Mol. Gen. Genet. 194Α9A-A99 ( 1984), incorporated herein by reference) by PCR using the 5 primer AAA GGA TCC TTA TAC ATT ATA TAA AGT AAT G (SEQ ED NO. 1) and the 3 'primer ATA TAG TCG ACC TCG GGG ACA CCA AAT ATG G (SEQ ID NO. 2). The underlined GGATCC is a BamHI site and GTCGAC is a Sail site. The PCR fragment was digested with the above mentioned enzymes and ligated to the vector followed by sequencing of the PCR fragment. The HIS3 gene of the resulting pASl plasmid contains a minimal promoter, 55 bp upstream from the ATG, which is not able to support growth in a medium lacking histidine. In the next step, pASl was digested with the restriction enzymes Sacl and Xbal. A 1.15 kb promoter fragment upstream from the TATA box of the T. reesei cbhl promoter was cloned from the plasmid pMLO16 (Ilmen, M., eta , Mol. Gen. Genet. 255:303-314 (1996)) by PCR using the 5 'primer ATA CCC GGG AGC TCA TTC CCG AAA AAA CTC GG (SEQ ID NO. 3) and the 3 primer ATT CCC GGG TCT AGA CAC ATT CGC TGA

CTT TGC C (SEQ ID NO. 4). The underlined GAGCTC is a Sacl site and

TCTAGA is a Xbal site. The promoter fragment was digested with the restriction enzymes Sacl and Xbal and ligated in front of the HIS3 gene in the pASl vector.

Plasmid pMS95 (Figure 2), used as a negative control plasmid, was constructed as follows. pAS 1 plasmid was digested with the restriction enzyme

Sacl. A 1.4 kb Sacl-fragment from a non-relevant cDNA (5' end of a glutamate receptor cDNA from rat) was ligated in front of the HIS3 gene. This plasmid was used as a negative control containing no promoter elements, since the polylinker region present in the vector pASl caused leakage of the HIS3 gene. Plasmids pAS3 and pMS95 were transformed into the yeast strain

DBY746 (ATCC 44773, his3-\ leu2-3 leu2-\ \2 ura3-52 trpl-2 9 cyH cir; Dr. D. Bothstein, Massachusetts Institute of Technology, Cambridge,MA) by electroporation according to the manufacturer^'s instructions (Bio-Rad). Transformants were plated on synthetic complete plates (Sherman, F., Meth. Enzymol. 194:3-21 (1991)) supplemented with 1M sorbitol and all other amino acids except leusine (SC-Leu). Colonies were streaked on SC-Leu and SC-Leu- His -plates. The reporter strain DBY746-pAS3 and the control strain DBY746- pMS95 could not grow on media lacking histidine showing that the Trichoderma promoter or the non-relevant DNA fragment could not drive the expression of the reporter gene in yeast. A cDNA library of Trichoderma reesei grown in hydrolase-inducing conditions (Solka floe cellulose, spent grain, locust bean gum, lactose, acetyl glucuronoxylan, arabinoxylan) was prepared into a 75_45-selectable yeast multicopy expression vector pAJ401 (Figure 3, Saloheimo, A., et al, Mol. Microbiol 75:219-228 (1994)) under the constitutive PGK promoter. Alternatively, the expression library could be prepared using an inducible yeast promoter such as GALl (West et al., 1984). The growth conditions and preparation of mRNA from T. reesei Rut-C30 strain (ATCC 56765,Eveleigh, D.E and Montenecourt, B.S., Adv. in Appl. Microbiol. 25:57 -7 (1979)) have been described (Stalbrand, H., et al, Appl. Environ. Microbiol. (57:1090-1097 (1995)). cDNA, synthesized by the ZAP-cDNA synthesis kit (Stratagene), was ligated to the EcoRI-XhoI cut plasmid pAJ401.

Transformation of E. coli strain JS4 (Bio-Rad) by electroporation according to the manufacturer^'s instructions (Bio-Rad) yielded a library of 10⁵ independent clones. The colonies were scraped from the plates and a plasmid stock was prepared from them without further amplification.

The DBY746-pAS3 reporter yeast strain was transformed with the library plasmid stock by electroporation according to the manufacturer's instructions (Bio-Rad). Electroporation with the total of 40 μg of the plasmid pool gave a library of 10⁶ yeast cells growing on SC-Leu-Ura plates supplemented with 1M sorbitol. After 7d of growth at 30 °C the colonies were scraped from the plates. In order to screen HIS⁺ colonies the library was plated on SC-Leu-Ura-His plates to a density of 10⁶ cells per plate and grown at 30°C for 5d. If, for instance, the inducible GALl promoter is used, the library is plated on a medium containing galactose instead of glucose as a carbon source. 0.004% of the yeast cells could grow in the selection conditions. Plasmid DNA was isolated from 48 growing colonies and transformed to the DBY746 yeast strain with and without the reporter constructs. 75% of the plasmids supported growth of all the strains on media lacking histidine. The existance of the Trichoderma reesei his3 gene in these clones was verified by sequencing of the cDNA from the 5' end followed by homology comparison of the open reading frame (ORF) against the yeast and Neurospora crassa hi 3 genes. 15% of the plasmids could not support growth of either the host or the reporter strains and thus represented false positives. Five plasmids supporting growth on the SC-Leu-Ura-His plates only with the reporter plasmid pAS3 were analysed further. One of the plasmids (pAS27) contained a 1.9 kb cDNA and was named acel (activator of cellulase expression). The other four plasmids each contained an identical cDNA of 1.4 kb that was different from the αce7 DNA. One of these plasmids (pAS26) was studied further and the gene was named ace2.

Example 2

Isolation of the full-length cDNA of acel

Sequencing of the αce7 cDNA from the pAS27 plasmid revealed a 1943 bp cDNA with an ORF of 491 amino acids (aa) starting from the first ATG codon. Northern hybridization using the cDNA as a probe gave two signals of about 3.2 and 3.0 kb in length and thus showed that the cDNA was not full- length. Therefore the full-length cDNA of the αce7 gene was isolated from a library prepared to λZAP from the same induced Trichoderma cDNA as used in the initial screening (Stεtlbrand, H., et al, Appl Environ. Microbiol (57:1090- 1097 (1995)). A total of 8 x 10⁵ plaques were plated to NZY plates to a density of 5 x 10⁴ per 150 mm diameter plate. Two replicas were taken from each plate to nitrocellulose filters. A 300 bp PCR fragment from the 5' end of the original cDNA, labelled with ³²P-dCTP by using the nick translation kit (Boehringer), was used as a probe in plaque hybridization in order to minimize the amount of signals coming from the non-full-length copies. Hybridization was carried out overnight at 42°C in a solution containing 50% formamide, 5x Denhardt's, 5xSSPE, 0.1% SDS, 100 μg/ml herring sperm DNA, 1 μg/ml polyA DNA and

3 x 10⁵ cpm/ml of the probe. The filters were washed four times at room temperature in 2xSSC - O.lxSDS followed by a wash at 68°C in lxSSC - O.lxSDS for 60 min and exposed to Kodak XAR-5 X-ray film at -70°C. 55 plaques giving the strongest signals were purified by plaque hybridization and the plasmids were in vivo excised according to the λZAP manual (Stratagene). The plasmids were analysed by restriction enzyme digestions, and six clones having the longest inserts were sequenced. Three plasmids (one of them, pAS28, was studied further) started from the same nucleotide and contained cDNA inserts of 3223 bp which is in good accordance with the estimated sizes of the messenger

RNAs. The ORF of 733 aa starting from the first ATG codon of the cDNA maintains the frame of the original ORF and contains 242 additional amino acids. There is a rather long (over 600 bp) non-translated 5' leader sequence in the cDNA but the existance of three in-frame stop codons before the first ATG in the cDNA confirms that the plasmid contains at least the whole coding sequence of the acel gene (Figure 4; SEQ ID No. 5).

Example 3

Isolation of the acel and acel genomic genes

Chromosomal DNAs from different T. reesei strains, isolated according to Raeder, U. and Broda, P., Xett. Appl. Microbiol. 7:17-20 (1985), were subject to Southern hybridization in stringent conditions as follows. Two μg of DNA was fully digested with appropriate restriction enzymes (Sacl, PvuII, Acel or Haell) that cut the genes in two to four short fragments, separated in an 0.8% agarose gel, denatured and neutralized according to Sambrook, J., et al, Molecular Cloning. A Laboratory Manual. 2nd ed., Cold Spring Harbor Laboratory Press,

Cold Spring Harbor, NY. (1989)) and capillary blotted to the Hybond N nylon filter. Hybridization conditions were as in the plaque hybridization except that the amount of the ²P-labelled full-length cDNA probes was 10⁶ cpm/ml. The filters were washed twice at room temperature in 2xSSC - 0.1% SDS for 5 min followed by a wash at 68°C in O.lxSSC - 0.1% SDS for 60 min. The strains used were QM9414 (ATCC 26921,Mandels, M., et al, Appl. Microbiol. 27:152-154 (1971)), VTT-D-79125 and VTT-D-80133 (cellulase overproducing strains,Bailey, M.J. and Nevalainen, K.M.H., Enzyme Microb. Technol 5:153- 157 (1981)), Rut-C30 and the cellulase negative strains VTT-D-81152, VTT-D- 81153, VTT-D-81155, VTT-D-81158 and VTT-D-81168 (eel- 18, cel-7, cel-1, cel-22 and cel-25 in Nevalainen, K.M.H. and Palva, E.T. Appl. Environ. Microbiol.35:l \-\6 (1978), respectively), acel and ace2 were shown to be single-copy genes in the genome. In order to clone the genomic copies of the ace genes, a genomic cosmid library of Trichoderma reesei VTT-D-80133 (Mantyla, A.L., et al, Curr. Genet. 27:471-477 (1992)) was screened using a PCR fragment of the coding sequence of acel as a probe in colony hybridization. Total of 2 x 10⁴ colonies were plated to a density of 2000 per 150 mm diameter plate, transferred to nitrocellulose filters and replicated. After denaturation, neutralization and fixation the filters were prewashed at 42°C in a solution containing 50mM Tris pH 8, 1M NaCl, lrnM EDTA and 0.1% SDS. Hybridization conditions and washes were the same as those used in Southern hybridization except that the amount of the probes was 5 x 10⁵ cpm/ml. Two cosmids giving strong signals were purified and found to contain the acel chromosomal gene. The gene was subcloned as a 7 kb HindHI fragment into the pZErO-1 vector (Invitrogen) resulting in plasmid pAS34. Sequencing of the chromosomal gene revealed three introns, one of which interrupts the long 5' non-coding sequence present in the cDNA.

A chromosomal lambda library of T reesei strain QM9414 (Vanhanen, S., et al, Curr. Genet. 75:181-186 (1989)) was screened in plaque hybridization using PCR fragments of the coding sequence of ace2 gene as a probe. A total of 4 x 10⁴ plaques were plated to NZY plates to a density of 10 'per 150 mm diameter plate. Two replicas were taken from each plate. The filters were denatured, neutralized and fixed. Hybridization conditions and washes were as in the screening of the cosmid library. Three plaques giving strong signals with the ace2 probe were purified. Lambda DNA was isolated from the purified clones and the chromosomal ace2 gene was subcloned as a 6 kb EcoRI-Hindlll fragment into the pZErO-1 vector resulting in plasmid pAS33. Sequencing of the pAS33 plasmid revealed no introns. In the promoter regions of the ace genes, several sequences similar to the consensus binding sites for the glucose repressor proteins CREI of T. reesei (Ilmen, M., et al, Mol. Gen.Genet. 257:451-460 (1996)) and CREA of A. nidulans (Kulmburg, P., et al, Mol. Microbiol. 7:847-857 (1993)) can be found. This suggests that the ace genes are under glucose repression mediated by the CREI repressor protein.

Example 4

Features of the deduced ACEI protein

The acel cDNA sequence (SEQ ID NO. 5) is shown in Figure 4, the chromosomal sequence (SEQ ED NO. 6) and the deduced protein sequence (SEQ ED NO. 7) is shown in Figure 5. Amino acids 387-403 form a putative bipartite nuclear targeting signal RRKKNATPEDVAPKKCR (SEQ ID NO. 25) (basic residues are shown in bold) fitting well to the consensus (at least two basic residues separated by a spacer often residues from a block of at least three basic residues within the next five amino acids, Dingwall, C, and Laskey, R.A., Trends Biochem. Sci. 7(5:478-481 (1991)). After, and partially overlapping with, the nuclear targeting signal there is an area containing two perfect and, between them, one possible, degenerate zinc finger of the C₂H₂ type (underlined in FIG. 5). Therefore the acel gene seems to code for a DNA binding protein. The developmental activator protein coded by the Aspergillus nidulans brlA gene has two similar zinc fingers both of which are needed for activator functions (Adams, τ.H., etal., Mol. Cell. Biol. 10: 1815-1817 (1990)). The deduced protein sequence was subject to homology comparisons using the BLAST and the BLITZ programs. Only DNA binding proteins containing similar zinc finger domains gave meaningful scores. No relevant similarities were found to any sequence in the data base when the protein sequence without the zinc fingers was used in the comparisons. There are, however, sequences rich in either proline or glutamine residues that show weak sequence similarities to some known regulatory proteins. Example 5

Features of the deduced ACEII protein

Sequencing of the ace2 cDNA from the pAS26 plasmid revealed a 1373 bp cDNA with an ORF of 341 aa starting from the first ATG codon only 6 bp after the start of the cDNA. Northern hybridization using the cDNA as a probe gave 2-3 signals of about the size of the cDNA and a weaker signal of about 1.6 - 1.8 kb. In the chromosomal sequence, the last codon preceding the start of the cDNA is a stop codon in the frame of the ORF. In the promoter region upstream from the start of the cDNA, there are several ATG start codons in all three frames followed by stop codons either at the promoter or within the first 120 bp or the cDNA. These results show that the cDNA contains the whole coding region of the ace2 gene.

The acel cDNA sequence (SEQ ID NO. 8) is shown in Figure 6, the chromosomal sequence (SEQ ED NO. 9) and the deduced protein sequence (SEQ ID NO. 10) is shown in Figure 7. At the N-terminus of the deduced protein there is a sequence ACDRCHDKKLRCPRISGSPCCSRCAKANVAC ( SEQ ID 26) fitting well to the consensus of a fungal Zn₂C₆ binuclear cluster domain [G/A/S/T V]-C-x(2)-C-[R/K/H]-x(2)-[R/K/H]-x(2)-C-x(5 to 9)-C-x(2)-C-x(6 to 8)-C (conserved residues are shown in bold, Pan, T., and Coleman, J.E. Proc. Natl. Acad. Sci. (USA) 87:2077-2081 (1990)). This DNA binding domain is found in many regulatory proteins of fungal origin. The deduced protein sequence was subject to homology comparisons using the BLAST and the BLITZ programs. The DNA binding domains of GAL4 and LAC9 activator proteins of galactose metabolism of S. cerevisiae and Kluyveromyces lactis, respectively, showed similarity to the DNA binding domain of ace2. A short area rich in histidine (PHEPLNHSHEHSHSHSHNH) (SEQ ID 27) following the DNA binding domain showed similarity to some homeobox-containing regulatory proteins of Drosophila melanogaster. This area in bicoid protein is called the PRD repeat (Berleth et al, EMBOJ. 7:1749-1756 (1988)). In ACEII protein this histidine-rich area is followed by a glutamine and proline-rich stretch (EQQQEQQQGQPQHPPPPVQ) (SEQ ED 28). This area gave also similarity to some regulatory proteins of Drosophila in the search. As in the case of ACEI, the ACEII sequence showed no significant similarity to known regulatory proteins except for the short regions mentioned above most probably corresponding to general features of regulatory proteins.

Example 6

Expression studies of the acel and acel genes in T. reesei

In order to study the regulation of the ace genes at the mRNA level, Trichoderma was grown in 250 ml conical flasks in 50 ml of the minimal medium containing 2% of carbon source indicated, 1.5% KH₂PO₄, 0.5%

(NH₄)₂SO₄, 0.06% MgSO₄, 0.6% CaCl₂, 0.0005% FeSO₄ 7H₂O, 0.00016% MnSO₄ H₂O, 0.00014% ZnSO₄ 7H₂O and 0.00037% CoCl₂ 6H₂O. Strain QM9414, a relative of the natural isolate QM6a, was chosen. Total RNA was extracted from the T. reesei mycelia as described by Chirgwin, J.M., et al, Biochemistry 75:5294-5299 (1979). RNA concentrations were measured spectrophotometrically and further controlled by running 2 μg samples in an agarose gel followed by an acridine orange staining. RNAs were glyoxylated according to Sambrook, J., et al, Molecular Cloning. A Laboratory Manual. 2nd ed., Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY. (1989)), run in a 1 % agarose gel and capillary blotted to the Hybond N nylon filter. The filters were hybridized with ²P-labelled cDNA probes overnight at 42°C in 10-15 ml of the RNA hybridization solution containing 50% formamide, 1M NaCl, 1% SDS, 10% dextran sulphate, 100 μg/ml herring sperm DNA and 4-8 x 10⁵ cpm/ml of the probe. The filters were washed for 20 min once at room temperature in lxSSC - 0.1% SDS and three times at 68 C in 0.2xSSC - 0.1%

SDS. A smaller, differently-expressed signal of about 1.8 kb was occasionally detected with the acel probe. When acel or ace2 cDNA was used as a probe, 2 μg or 5 μg of total RNA was used, respectively. Northern hybridization showed that on media containing cellulose or xylan, or on the medium used to prepare the library, rather strong signals were detected from the activators. On sorbitol medium the activator genes were expressed, but when sophorose was added in mM amounts to the cultivation (Ilmen, M., et al, Appl. Environ. Microbiol. (55:1298 - 1306 (1997), the expression of the activators was further induced to comparative levels as in cellulose-induced conditions. On glycerol medium the expression of acel and ace2 differed strongly from that of the cellulases. Strong signals of acel and ace2 which could not be further induced by sophorose addition were detected whereas no signals of the cellulases could be seen.

On glucose-containing medium, no cellulase transcripts can be detected due to glucose repression caused by the crel repressor gene. No ace2 signals but variable acel signals were detected, depending on other culture conditions.

The expression of acel was studied also from the strain Rut-C30. This strain is partially derepressed on glucose medium caused by a deletion in the glucose repressor gene cre7 (Ilmen, M., et al, Mol. Gen.Genet. 257:451-460 (1996)). A clear signal was seen from RutC cultivated on glucose medium.

Example 7

Construction of Trichoderma strains overproducing ACEI under the control of the gpdA promoter of A. nidulans

Construction of the pMI-66 Trichoderma expression vector was as follows. The coding region of the acel gene (working name S26) was first cloned from plasmid pAS28 by PCR using the 5 'primer GCT CTA GAG CCG GAT CCA TCC TTT TCG AAC CCC CGC (SEQ ED NO. 11) and the 3 'primer TCC CCC CGG GGG GAG GAT CCT TAC TCT TGA AAC CCC TGG T (SEQ LD

NO. 12) The underlined GGATCC is a BamHI site. The PCR fragment was ligated to the pGEM-t vector purchased from Promega. The resulting plasmid was digested with the restriction enzyme BamHI and the isolated acel fragment was ligated to a similarily-cut pAN52-lNotI fungal expression vector (Punt, P.J. et al, Gene (59:49-57 (1988)) under the constitutive gpdA promoter of A. nidulans that supports high expression levels also in Trichoderma. The resulting pMI-66 plasmid (FIG. 8) was linearized and transformed to T. reesei. Three different T. reesei host strains were used. The effect of the activator on cellulase production was studied in the strain QM9414 (ATCC 26921, Mandels, M., et al,

Appl. Microbiol. 27:152-154 (1971)). In order to study the effect of the activator on a single cellulase promoter and to the production of a heterologous product, two reporter strains, pMLO16-67A and pMI24-37A (Ilmen, M., et al, Mol. Gen. Genet. 255:303-314 (1996)), were used. pMLO16-67A is QM9414 where the native cbhl gene is replaced by an expression casette where E. coli LacZ gene is under the T. reesei cbhl promoter. In pMI24-37A the cbhl promoter has been mutated so that the functional binding sites for the glucose repressor protein CRΕI have been mutated. Protoplast transformation of T. reesei QM9414 was performed as cotransformation with the p3SR2 plasmid containing the acetamidase gene (amdS) of A. nidulans according to Penttila, M., et al, Gene

(57:155-164 (1987) selecting transformants for growth on acetamide as the sole nitrogen source. Transformation of the T. reesei reporter strains was done using the plasmid pRLM_ΕX30 (Mach, R.L. et al, Curr. Genet. 25:567-570 (1994)) containing the hygromycin resistance marker. Transformants containing different amounts of copies (1-3) were selected by Southern hybridization. As expected, the strong and constitutive gpdA promoter directed the expression of αce7 in both repressing and inducing conditions, as studied by Northern hybridization. Thus overexpression could be achieved in conditions where the cellulases are normally not expressed, or are expressed at low levels.

Example 8

Construction of Trichoderma strains producing ACEI under the control of the cDNAl promoter of T. reesei Construction of the pMI-69 expression vector was as follows. Plasmid pAS28 containing the full-length acel cDNA in the Bluescript SK- vector was digested from the 5 'end of the cDNA with the restriction enzyme Spel. The strong cDNAl promoter of T reesei (Nakari-Setala, T. and Penttila, M., Appl Environ. Microbiol. 61 :3650-3655 (1995)) was first cloned by PCR using the

5 'primer GAG AGA ATC GAT ACT AGT GGT CTG AAG GAC GTG GAA TG (SEQ ID NO. 13) and the 3 'primer TAT ATA ATC GAT GCT AGC GTT GAG AGA AGT TGT TGG ATT G (SEQ ID NO. 14). The underlined ATCGAT is a Clal site, the underlined ACTAGT is a Spel site and the underlined GCTAGC is a Nhel site. The PCR fragment was digested with the Clal enzyme and ligated to the similarily digested pSP72 resulting in plasmid pMI-36. The promoter fragment was then cloned from the plasmid pMI-36 by digestion with the restriction enzymes Spel and Clal and ligated to the Spel-Clal cut pAS28 in front of the acel cDNA. The expression vector pMI-69 (Figure 10) is linearized with the restriction enzyme Notl and transformed to the same T. reesei strains as in Example 7. Transformants are analysed as in Example 7.

Example 9

Construction of Trichoderma strains producing ACEII under the control of gpdA promoter of A. nidulans

Construction of the pARO3 Trichoderma expression vector was as follows. The coding region of the ace2 gene (working name S5) was first cloned from plasmid pAS26 by PCR using the 5 'primer ATC TGT CTA GAC AGG ATC CCC GGC AAG CAT GTG ATC GAT (SEQ ED NO. 15) and the 3 'primer TAC GTC CCG GGC TGG ATC CTC ACT TCA GCA GTC TGG CTC (SEQ

ID NO. 16) The underlined GGATCC is a BamHI site. The PCR fragment was ligated to the pGEM-t vector resulting in plasmid pARO2. pARO2 was digested with the restriction enzyme BamHI and the isolated ace2 coding region was ligated to a similarily-cut pAN52-1 Notl vector. The resulting pARO3 plasmid (FIG. 11) was linearized and protoplast transformation of T. reesei QM9414 was performed as in Example 7. Cotransformation of the T. reesei reporter strains was done using the plasmid pAN8-l containing the phleomycin resistance marker according to Nyyssδnen, E., et al, Bio/Technology 77:591-595 (1993)). Transformants having different copy numbers (1-4) were screened by

Southern hybridization. Northern hybridization (Fig. 9) showed that all the transformants produced strong signals of the ace2 gene on Solka floe cellulose medium. Strong expression was also seen on glucose medium where the expression of the ace2 gene is normally weak. Therefore the overproduction of the activators is achieved in all conditions, and they are used in hydrolase expression.

Example 10

Construction of Trichoderma strains overproducing both ACEI and ACEII

In order to study the effect of both ACEI and ACEII in the regulation of cellulase expression, the plasmid pMI-66 containing the αce7 production cassette was transformed into three different T. reesei strains earlier transformed with the ace2 cassette (Example 9). Among the transformants from each of the host strains

QM9414, pMLO 16-67 A and pMI24-37A, one transformant producing the highest levels of the ace2 transcript was chosen. Linearized pMI-66 was cotransformed together with the selection plasmid pRLM_EX30 (Mach et al. , Curr. Genet. 25:567-

570 (1994)) containing the hygromycin resistance marker. The transformants were selected as in Example 7. Hydrolase production of the strains overproducing the activators singly or both of them on glucose-containing medium, on sorbitol- containing medium, or on inducing conditions on cellobiose, whey or Solka floe cellulose or a combination of plant materials, as described in Example 1, was analysed. Expression can be detected by enzyme activity measurements or by

Northern analysis as described earlier. In strains producing E. coli β-galactosidase from the cbhl promoter, expression was studied by Northern analysis of the lacZ mRNA or by the plate assay as described in Ilmen, M., et al, Mol. Gen. Genet. 255:303-314 (1996).

Example 11

Construction of plasmids for production of activators under t e gpdA promoter of A. nidulans targeted to the egll locus.

In order to target the activator expression cassettes to the egll locus in the subsequent Trichoderma transformations, the expression vector pMI-73 was first prepared. The expression cassette including the gpdA promoter, the acel coding region and the trpC terminator was digested from plasmid pARO3 (Figure 11) with the restriction enzymes Notl and HindΩl and blunt-ended by using the T4

DΝA polymerase. The plasmid pALK1062ΝB (Figure 12), containing the 5 'and 3' flanking regions of the egll gene of T reesei and the hph expression cassette coding for the marker for selection of the transformants (HmB phosphotransferase conferring resistance to hygromycin B), was digested with BamHI, similarily blunt-ended and the acel expression cassette was ligated into the vector. The egll 5 '-flanking fragment is the 1.4 kb Xhol - Sacl about 2.2 kb upstream of the egll gene and the 3 ^'-flanking fragment is the 1.6 kb AvrR - Smal about 0.2 kb from the end of the egll gene. Both of the flanking fragments have been isolated from T. reesei VTT-D-80133 (Bailey and Nevalainen, Enzyme Microb. Technol. 3: 153-157 (1981)) and they have been subcloned from a lambda clone, originally called egl3, isolated by Saloheimo et al. (Gene 63: 11-21 (1988)). The hph gene is from E. coli and it is expressed from T. reesei pki (pyruvate kinase) promoter. T. reesei cbh2 terminator is ligated after the hph gene to terminate transcription. The hph expression cassette originates from the plasmid pRLMex30 (Mach et al, Curr. Genet. 25: 567-570 (1994)).

In order to prepare the expression vector pMI-76, pMI-73 was digested with the restriction enzyme BamΗl and the released ace2 coding region was replaced with the acel coding region similarily digested from the plasmid pMI- 66. Prior to Trichoderma transformation, the plasmids pMI-73 and pMI-76 were digested with the restriction enzyme Asp718, the production cassettes were separated from the vector parts by agarose gel electrophoresis and purified from the gel by using the QIAquick gel extraction kit (Qiagen).

Example 12

The effect of ACEI overproduction on CBHI production by Trichoderma

The T. reesei strain ALKO2221 (7th generation low protease mutant from T. reesei QM9414/ATCC26921) was transformed with the expression cassette of the pMI-76 plasmid. The transformants were selected by their ability to grow on minimal medium plates containing lOOμg/ml hygromycin B. About 30 transformants were purified and grown in shake flasks on a complex medium containing whey and a complex nitrogen source derived from cereal grain (Suominen, P.L., et al, Mol. Gen. Genet. 241:523-530 (1993)) for 7 days. The total protein secreted to the medium was TCA-precipitated and analysed by

Lowry (Lowry et al, J. Biol. Chem. 193: 265-275 (1951)). The endoglucanase activities of the culture supernatants were measured (Ghose, T.K., Pure & Appl. Chem. 59:257-268 (1987)). Lower activities of the supernatants towards hydroxyethyl cellulose (HEC) indicated replacement of the egl2 locus with the transforming DNA. This was further verified by Western blotting using monoclonal antibodies against the EG I protein (EGEE1-142B). The CBHI activity secreted to the medium was measured using methylumbelliferyl lactoside (MUL) as a substrate in the following way. The total MUL activity representing the activities of the CBHI and EGI enzymes was first measured (van Tilbeurgh et al, Methods in Enzymology 7(50:45-59 (1988)). In this measurement the activity of the β-glucosidase enzyme was inhibited by the presence of 100 mM glucose. The portion of the EGI activity was measured by inhibiting the action of CBHI by the presense of 5 mM cellobiose, and this value was subtracted from the total MUL activity. The production of total secreted protein and CBHI activity by the selected transformants is shown in Table 1. The values have been normalised so that the level produced by the host strain (the medium of five parallel bottles) is given the value 1. The chosen transformants produced 1.1 - 1.2 times more MUL activity compared to the host strain ALKO2221.

Table 1.

Example 13

The effect of ACEII overproduction on CBHI production by Trichoderma

The T. reesei strain ALKO2221 was transformed with the expression cassette of the pMI-73 plasmid, transformants were purified, grown in shake flasks and analysed similarily as in Example 12. The production of total secreted protein and CBHI activity by the selected transformants is shown in Table 2. The values have been normalised so that the level produced by the host strain is given the value 1. About 10 % increase in MUL activity was observed in the culture supernatants of the chosen transformants.

Table 2.

Example 14

The effect of ACEI overproduction on the production of xylanase I under the cbhl promoter by Trichoderma

The T. reesei strain ALKO3625 (ALKO2221/pALK807/20) was transformed with the expression cassette of the pMI-76 plasmid, transformants were purified and grown in shake flasks similarily as in Example 12. ALKO3625 strain has been constructed as described in WO93/24621, Example 4. This strain contains 4-5 copies of the cbhl promoter - xylanase 1 production cassettes integrated at unknown positions in the genome (estimated from Southern blot and by scanning of a dot blot filter). The production of xylanase I was measured using standard xylanase activity measurement (Bailey M.J. et al, J. Biotechnol. 25:257-270 (1992)) at pH 4.3. The production of total secreted protein and xylanase activity by the selected transformants is shown in Table 3. The values have been normalised so that the level produced by the host strain is given the value 1. The selected transformants showed 1.2 - 1.4 fold increase in xylanase activity compared to the host strain.

Table 3.

Example 15

The effect of ACEII overexpression on the production of xylanase I under the cbhl promoter by Trichoderma

The T. reesei strain ALKO3625 was transformed with the expression cassette of the pMI-73 plasmid, transformants were purified and grown in shake flasks similarily as in Example 12. The production of total secreted protein and xylanase activity by the selected transformants is shown in Table 4. The values have been normalised so that the level produced by the host strain is given the value 1. The increase in xylanase activity in the culture supernatants of chosen transformants was 1.2 - 1.6 times compared to the host.

Table 4.

Example 16

Determining the binding sites of ACEI and ACEII proteins in the cbhl promoter

Fragments of 30- 200 bp of cbhl promoter are synthesized by PCR and cloned into the negative control reporter plasmid using the same strategy as described in the previous example and transformed into the yeast strain DBY746. Possible leakiness of the constructs is avoided, when necessary, by using different amounts (1-100 mM) of aminotriazole in the medium. Subsequently the reporter yeast strain is transformed with plasmids pAS26 and pAS27 encoding ACEII and ACEI. respectively. Transformants are selected and subsequently screened for HIS⁺ phenotype as described in Example 1. Transformants that are able to grow only in the presence of both plasmids are obtained. In these colonies the reporter plasmid carries a specific nucleotide sequence which is recognized and bound by either ACEI or ACEII leading to activation of HIS3 gene expression.

Example 17

cbhl Promoter - Sequence Homologies

The 6-bp nucleotide sequence 5'GGC(T/A)AA is repeated 15 times in cbhl promoter of T. reesei. One of the repeats is situated between nucleotides -161 and -146 upstream of initiator ATG, within the 29-bp region that is sufficient for sophorose induction in T. reesei. The repeats are found in both upper and lower strands. The same sequence is found also in cbh2 (3x), egll (2x), xyll (3x), egl5 (3x) promoters of T. reesei. Furthermore, these sequences are found in cellulase and hemicellulase promoters in other filamentous fungi. These include Aspergillus tubigiensis xlnA, Aspergillus nidulans xlnC, Aspergillus niger xynB, Aspergillus aculeatus endoglucanase (FI-CMCase),and Aspergillus kawachii xynC. The sequence is not found in the ere 7 promoter of T. reesei, or in the glucoamylase (glaA) promoters of Aspergillus niger or Aspergillus oryzae.

Another sequence element of interest found within the above mentioned region is 5'CGAAT, which is found in glucoamylase (glaA) promoters of

Aspergillus niger and Aspergillus oryzae this sequence was shown to be a part of a region which is responsible for high expression and starch induction

(reviwed in MacKenzie, D.A., et al, J. Gen. Microbiol. 139:2295-2307 (1993)).

cbhl promoter sequence in question in bold (just upstream of TATA box):

-161 .. TG AGCT AGTAGGCAAAGTC AGCGAATGTG TA TA TA TAAAA...

cbhl promoter sequence between and including the nucleotides -184 and -1 is shown as SEQID NO. 18, cbhl promoter sequence between and including the nucleotides -161 and -1 is shown as SEQID NO.19 and cbhl promoter sequence between and including the nucleotides -140 and -1 is shown as SEQID NO.20.

Example 18

Transcriptional activation of HIS expression through cbhl promoter sequences at -161 to -133

The 29-bp region in cbhl promoter located between nucleotides -161 and -133 upstream of protein-coding region, which was implicated in the mediation of sophorose induction of the cbhl promoter in Trichoderma as described in Example 17, is cloned into the negative control plasmid (see example 1) to the

BamHI site just upstream of the TATA box. Complementary oligonucleotides 5 'GAT CCT GAG CTA GTA GGC AAA GTC AGC GAA TGT GTG AGC TAG TAG GCA AAG TCA GCG AAT GTG TGA GCT AGT AGG CAA AGT CAG CGA ATG TGG (SEQ ID 21) and 5'GAT CCC AC A TTC GCT GAC TTT GCC TAC TAG CTC ACA CAT TCG CTG ACT TTG CCT ACT AGC TCA CAC ATT CGC TGA CTT TGC CTA CTA GCT CAG (SEQ ID 22), covering sequences from -161 to -133 in three copies and BamHI compatible ends

(underlined), are synthesized, annealed and ligated to the BamHI cut vector. Oligonucleotides having a random sequence of similar size and overall nucleotide composition 5 'GAT CCT GAA GAA TGG GAA GCA TTG CTA AGC GGT GTG AAG AAT GGG AAG CAT TGC TAA GCG GTG TGA AGA ATG GGA AGC ATT GCT AAG CGG TGG(SEQ ID 23) and 5 'GAT CCC ACC GCT TAG

CAA TGC TTC CCA TTC TTC ACA CCG CTT AGC AAT GCT TCC CAT TCT TCA CAC CYGC TTA GCA ATG CTT CCC ATT CTT CAG (SEQ ID 24) are made as controls and cloned into the same vector. The transformants carrying the reporter plasmids do not grow on media lacking histidine. The reporter yeast is transformed with a cDNA library of T. reesei, transformant colonies are selected on SC-LEU-URA plates and subsequently screened for HIS⁺ phenotype as described in Example 1. Plasmids originating from the cDNA library that support growth only in the presence of the reporter plasmid, but not alone or with the negative control plasmid, are obtained. These plasmids carry genes that code for proteins which activate transcription through binding to the cbhl promoter sequences present in the reporter construct.

Example 19

Production of the Activators in Aspergillus

In order to study the activators in Aspergillus, the following construct is made. The gene encoding xylanase I of Trichoderma reesei is fused to the cbhl promoter of T. reesei. The fusion can be made in such a way that the TATA box of the gpdA gene of Aspergillus nidulans is inserted between the promoter and the xylanase gene to ensure the binding of the basic transcription machinery to the construct. Alternatively, the 29 bp promoter fragment mentioned in Examples 16 and 17, or the 6 bp part of it, 5' GGC(T/A)AA, mentioned in Example 16 can be multiplied in front of the TATA box instead of the promoter. The construct is transformed to Aspergillus nidulans strains.

A construct producing the corresponding activator under the gpdA promoter of Aspergillus is transformed into the same strains and elevated levels of xylanase production is shown.

Example 20

Coordinated regulation of cellulase and hemicellulase expression

In order to study regulation of expression of genes encoding hemicellulose- degrading enzymes, T. reesei QM9414 was grown on minimal medium

(described in Example 6) supplemented with different carbon sources. These were sorbitol, sorbitol+sophorose, sorbitol+mannobiose, sorbitol+xylobiose, sorbitol+cellobiose, cellobiose, glycerol, glycerol+mannobiose, glycerol+xylobiose. mannose, xylose, xylitol, arabinose, arabitol, galactose, lactose, Lenzing xylan, methylglucuronoxylan. oat spelts xylan, Solka floe cellulose, beta-glucan, glucose, glucose+sophorose, glucose+mannobiose, and glucose+xylobiose. Total RNA was isolated as described by Chirgwin, J.M. et al, Biochem. J. 75:5294-5299 (1979) and analysed by Northern blotting and hybridized as described in Example 6. The following genes were used as probes: cbhl, egl5, bgll. xyll, xyl2, bxll, abfl, girl, axel, manl, agll, agl2, agl3. The filters were washed in once 5xSSPE, twice in lxSSPE, 0.1% SDS, and twice in O.lxSSPE, 0.1% SDS at 42 °C for 20 min each wash. Results of the hybridizations are shown in Table 5. The results show common patterns of regulation of genes encoding cellulases and hemicellulases, and regulation of enzymes attacking related substrates is even further coordinated. This suggests that common regulatory proteins regulate the expression of enzymes degrading hemicellulose and cellulose. Promoter sequences of cbhl, egl5, bgll, xyll and xyll are available to date. A common sequence element, 5' GGC(T/A)AA is found in cbhl, egl5 and xyll promoters raising the possibility that a common regulatory protein may bind to the sequence.

Table 5. Expression of hydrolase genes in T. reesei cultivated for 3 days** on difTerent carbon sources.

4^

* Ct ** h f h b ll h d f l

Example 21

Screening for Activators from Gene Banks using Probes Derived from acel or ace2

Labelling of the DNA fragment containing the acel or acel gene or part of the gene is done by random priming using a labelling kit (Pharmacia) and ³²P- dATP according to the manufacturer^'s insturctions. The radioactively labelled DNA is denatured by incubation for three to five minutes at 100° C and is kept single stranded by rapid chilling on ice, before addition to a hybridization buffer containing 6xSSC, 5x Denhardf s solution, 0.1% sodium pyrophosphate and 100 μg/ml heat denatured herring sperm DNA.

Screening of the genomic libraries for acel or acel related genes is performed by hybridization. The sample is first prehybridized in 6xSSC, 0.1 % SDS, 0.05%) sodium pyrophosphate and 100 μg/ml denatured herring sperm DNA at 60°C for 3 - 5 hours, followed by hybridization in 6xSSC, 0.1% SDS, 0.05% sodium pyrophosphate and 100 μg/ml denatured herring sperm DNA at 57 °C for

44 hours, followed by two washes in 5xSSC, 0.1 % SDS at 57 °C and two washes in 3xSSC at 57°C. The filters are exposed for 72 hours to Kodak XAR-5 X-ray film at -70°C using Kodak X-Omatic cassettes with regular intensifying screens.

Labelling of synthetic oligonucleotides is done by using gamma-³²P ATP. The reaction mixture contains, in a final volume of 50 μl: 37 pmol oligonucleotide. 66 mM Tris-HCl pH 7.6, 1 mM ATP, 1 mM spermidine, 10 mM MgCL. 15 mM dithiothreitol, 200 μg/ml BSA, 34 pmol gamma ³²P-ATP (New England Nuclear, 6000 Ci/mmol) and 30 units T4-polynucleotide kinase. The reaction mixture is incubated for 60 min at 37°C, after which the reaction is terminated by the addition of 4 μl 0.5 M EDTA, pH 8.0.

The genomic libraries are screened using oligonucleotide probes in the following way. The filters are wetted and washed for 60 min at room temperature in 3xSSC according to Maniatis, T. et al, Molecular Cloning (A Laboratory Manual), Cold Spring Harbor Laboratory, second edition, 1988). Prehybridization is for two hours at 65 °C in 6xSSC, 0.5% SDS, lOxDenhardt's, 100 μg.ml heat denatured herring sperm DNA. Hybridization is done by adding the ³²P-labelled oligo (see above) into the hybridization buffer (which is the same buffer as the prehybridization except that no herring DNA is added). Hybridization is performed for 18 hours at a final temperature of 38 ° C, achieved by slow, controlled cooling from the initial temperature of 65 °C. After hybridization, the filters are washed in 2xSSC, after which the filters are washed in prewarmed hybridization buffer at 38 °C. Finally the filters are washed for 30 min at 38 °C in 6xSSC, and 0.05%) sodium pyrophosphate. Hybridizing plaques are identified by exposure of Kodak XAR X-ray film for 72 hours at -70 °C using an intensifying screen.

SEQUENCE LISTING

(1) GENERAL INFORMATION:

(l) APPLICANT:

(A) NAME: Rohm Enzyme Finland Oy (B) STREET: Tyk imaentie 15

(C) CITY: Rajamaki

(D) STATE: Nurmi]arvι

(E) COUNTRY: Finland

(F) POSTAL CODE (ZIP : FIN-05200 (G) TELEPHONE: +358 3 13307

(H) TELEFAX: 4-358 9 133 1236

(n) TITLE OF INVENTION: Genes encoding fungal transcription-regulating proteins and uses thereof

(m) NUMBER OF SEQUENCES: 28

(lv) COMPUTER READABLE TORM:

(A) MEDIUM TYPE: Flopoy disk

(B) COMPUTER: IBM PC^' compatible

(C) OPERATING SYSTEM: PC-DOS/MS-DOS (D) SOFTWARE: Patentln Release #1.0, Version #1.30 (EPO;

(vi) PRIOR APPLICATION DATA:

(A) APPLICATION NUMBER: US 60/032,156

(B) FILING DATE: 29-NOV-1996

(vi) PRIOR APPLICATION DATA: (A) APPLICATION NUMBER: US 60/032,959

(B) FILING DATE: 13-DEC-1996

(vi) PRIOR APPLICATION DATA:

(A) APPLICATION NUMBER: US 60/040,140

(B) FILING DATE: 10-MAR-1997 (2) INFORMATION FOR SEQ ID NO: 1:

(l) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 31 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: single (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: other nucleic acid

(A) DESCRIPTION: /desc = "PCR primer"

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: AAAGGATCCT TATACATTAT ATAAAGTAAT G 31 (2) INFORMATION FOR SEQ ID NO: 2:

(l) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 31 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: single (D) TOPOLOGY: linear

(i ) MOLECULE TYPE: other nucleic acid

(A) DESCRIPTION: /desc = "PCR primer"

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: ATATAGTCGA CCTCGGGGAC ACCAAATATG G 31 (2) INFORMATION FOR SEQ ID NO: 3:

(l) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 32 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: single (D) TOPOLOGY: linear (11) MOLECULE TYPE: other nucleic acid

(A) DESCRIPTION: /desc = "PCR primer"

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: ATACCCGGGA GCTCATTCCC GAAAAAACTC GG 32 (2) INFORMATION FOR SEQ ID NO: 4:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 34 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: single (0) TOPOLOGY: linear

(ii) MOLECULE TYPE: other nucleic acid

( A ) DESCRIPTION: /desc = "PCR primer"

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: ATTCCCGGGT CTAGACACAT TCGCTGACTT TGCC 34 (2) INFORMATION FOR SEQ ID NO: 5:

(l) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 3223 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: single (D) TOPOLOGY: linear

(n) MOLECULE TYPE: cDNA

(vi) ORIGINAL SOURCE:

(A) ORGANISM: Trichoderma reesei

(B) STRAIN: Rut-C-30

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5:

CTTCCTCGTC CAGCGGCCTC TCGAAGTGGT ACGAGAGTAC AGATACCGCA GCATCGTCGC 60

GTTTGTCGTT TGTCCTCTTC GTTTTCCCCT GCTCCTCCAC CACCGCACAG ACAGGTACTA 120

CTCCGTACTC CGTACATAGT TTCCTGTGCT GCACACGCCC CCCGGCTCTC CGCTTCTCTT 180

TTCCACTCCC TCGAGTGTCT CAGTTGCTCG TCTGGCTCTC GTCTGCCCAC CCCCCGTCGG 240 TGCCGAGTCC TTGCGTTCAT ATATCCAATT TACGCCCCGT CTTCTGTCAA GGCAGTTTTT 300

TCCCACATCT CTCCGCCTCC TTGCCATCTC CTCCTCCTCG TCAGATTCCA TCTCGAGACG 360

ATCCTCCTCC GGCACCAGCG TCGTTCGTGT AGCGTCTGCT TGCTCTGCCA CTGTCGTCTT 420

CGTCCATCGC GATACACAAT TGGTGGTCGC TTCTGTTCGA GATCGATTGT CTGGACCCTG 480

TCAGCTTTGC ATTGAGAGCC GCCACTGGTT TGACCGCCCG TCCCCGTTGA GTCAGCTAAT 540 TGGACTCGTG TGGCCATATC TCATACATCT TCCCCACCCA CGGAAGGAAA AGAACACAGA 600

TCTCGGCCGC CATGTCCTTT TCGAACCCCC GCAGAAGGAC ACCGGTGACT CGTCCCGGAA 660

CCGACTGCGA ACATGGCCTG TCTCTTAAGA CTACCATGAC CCTCCGCAAG GGTGCCACCT 720

TTCACTCTCC TACATCTCCC AGCGCTTCAT CTGCTGCCGG CGACTTCGTC CCTCCTACTC 780

TCACGAGGTC TCAATCGGCT TTTGATGATG TCGTCGACGC AAGCCGTCGT CGTATTGCCA 840 TGACTCTGAA CGATATCGAC GAGGCCCTCT CCAAAGCCTC GCTCTCCGAC AAGAGCCCTC 900

GGCCGAAGCC CCTTCGCGAC ACCAGCCTGC CCGTCCCTCG CGGCTTCCTC GAACCTCCCG 960

TCGTCGACCC CGCCATGAAC AAACAAGAGC CTGAGCGAAG GGTCCTGCGC CCTCGCTCTG 1020

TTCGACGCAC CAGAAACCAC GCCTCCGACA GCGGCATTGG CAGCTCAGTC GTCTCGACAA 1080

ACGACAAGGC TGGCGCCGCC GACTCTACAA AGAAGCCCCA GGCCTCCGCC CTGACAAGGT 1140 CGGCCGCCTC GAGCACCACC GCGATGCTTC CCAGCCTCAG CCACCGCGCT GTCAACCGCA 1200

TCCGCGAACA CACTCTCCGC CCTCTGCTGG AGAAGCCCAC GTTGAAGGAA TTCGAGCCCA 1260

TCGTCCTAGA CGTGCCCCGG CGCATCCGAT CCAAGGAAAT CATTTGCTTG CGAGATCTCG 1320

AGAAGACCCT GATCTTCATG GCACCGGAAA AGGCCAAGTC CGCCGCCTTA TACCTTGATT 1380 TCTGCCTCAC GTCCGTCCGA TGCATTCAAG CGACAGTCGA ATATCTCACC GACCGCGAAC 1440

AAGTTCGCCC CGGCGACCGG CCTTACACTA ACGGATACTT TATCGACCTG AAGGAGCAAA 1500

TCTACCAGTA CGGCAAGCAA CTGGCCGCCA TCAAGGAAAA GGGAAGCCTT GCCGACGACA 1560

TGGACATTGA CCCATCTGAC GAGGTTCGCC TCTATGGCGG CGTCGCTGAG AACGGCCGCC 1620

CTGCTGAGCT CATCCGCGTC AAGAAGGACG GCACTGCCTA CTCAATGGCC ACCGGAAAGA 1680 TTGTCGACAT GACCGAATCC CCTACGCCGC TCAAGCGCTC CCTCAGCGAG CAGCGTGAGG 1740

ACGAGGAGGA GATTATGCGG TCCATGGCCC GCCGGAAGAA GAACGCCACC CCCGAGGACG 1800

TGGCGCCCAA GAAGTGCCGC GAGCCCGGCT GCACCAAGGA GTTCAAGCGC CCTTGCGACC 1860

TCACCAAGCA CGAGAAGACT CACTCTCGTC CCTGGAAGTG CCCCATCCCC ACTTGCAAGT 1920

ACCACGAGTA CGGCTGGCCC ACCGAGAAGG AAATGGACCG CCACATCAAC GACAAGCACT 1980 CGGACGCCCC GGCCATGTAC GAATGCCTCT TCAAGCCCTG CCCGTACAAG TCGAAGCGTG 2040

AGTCGAACTG CAAGCAGCAC ATGGAAAAGG CCCACGGCTG GACCTATGTC CGCACCAAGA 2100

CCAACGGCAA GAAGGCACCG AGCCAGAATG GCTCCACTGC CCAGCAGACC CCCCCTCTCG 2160

CCAACGTGTC TACGCCTTCC TCCACGCCCA GCTACAGCGT TCCCACGCCT CCCCAAGACC 2220

AGGTCATGTC CACCGACTTC CCCATGTATC CGGCTGATGA CGATTGGCTC GCTACCTACG 2280 GCGCGCAGCC CAACACCATC GACGCCATGG ACCTGGGTCT CGAGAACCTT TCCCCTGCCT 2340

CTGCAGCTTC CTCGTACGAG CAGTACCCTC CCTACCAGAA CGGTTCCACC TTCATCATCA 2400

ACGATGAGGA CATCTACGCC GCCCATGTTC AGATTCCTGC CCAGCTGCCC ACTCCTGAGC 2460

AGGTGTACAC CAAGATGATG CCCCAGCAAA TGCCGGTCTA CCACGTCCAG CAGGAGCCAT 2520

GCACCACCGT TCCCATCCTG GGCGAGCCTC AATTCTCCCC CAATGCTCAG CAGAATGCAG 2580 TTCTGTACAC TCCGACCTCG CTGCGCGAGG TTGATGAAGG CTTTGACGAG TCGTACGCCG 2640

CAGACGGCGC CGACTTTCAG CTGTTCCCGG CGACGGTCGA CAAGACGGAT GTGTTCCAGT 2700

CATTGTTTAC CGATATGCCA AGTGCCAACC TCGGCTTCTC CCAGACCACA CAGCCCGACA 2760

TCTTCAACCA AATAGATTGG AGCAACCTCG ACTACCAGGG GTTTCAAGAG TAAAAAGATT 2820

GCGACACATA CAATGACTAC TGCACAAGAT GCTGCAAACG CTTATCCACT CGCCGCTTAC 2880 ACCACTTGTT CTTTTTAACG ATTTCATGAA GAGGTTTCCG TTGGTTCATT GAAAAAGGAT 2940

TGCCTTTGTG TATTAAAGAG TTTTGTTTTC CTCATTTTCA CCTCATTTTC TTCTCTTCTC 3000

ATCTGCACGA GAACGGATGC TTCATTTGCA TCGAAACAAG CGGATTAGTT GGCCGTATGG 3060

GCCTGGAGGA GGAACAGCAG CAGGCTTGAG TTGAGGCCAT GGGCAGCAGC CGTCTTTCAA 3120

GATACTGACA GCTTGCTGGT GGGGGTAAAG GTTTACTTTT ATTACACTCT TGAAGAATGA 3180 TAGAAGATTT ATCCCAAAAA AAAAAAAAAA AAAAAAAAAA AAA 3223 (2) INFORMATION FOR SEQ ID NO: 6:

(l) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 4034 base pairs

(B) TYPE: nucleic acid (C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(n) MOLECULE TYPE: DNA (genomic) (71) ORIGINAL SOURCE:

(A) ORGANISM: Trichoderma reesei

(B) STRAIN: VTT-D-80133

(IX) FEATURE: (A) NAME/KEY: exon

(B) LOCATION: 242..783

(ix) FEATURE:

(A) NAME/KEY: exon

(B) LOCATION: 1012..1815 (D) OTHER INFORMATION: /codon start= 1081

/product= "ACEI" /gene= "acel"

(ix) FEATURE:

(A) NAME/KEY: exon (B) LOCATION: 1879..2105

(D) OTHER INFORMATION: /codon start= 1879 /product= "ACEI" /gene= "acel"

(ix) FEATURE: (A) NAME/KEY: exon

(B) LOCATION: 2172..3794

(D) OTHER INFORMATION: /codon start= 2173 /product= "ACEI" /gene= "acel" (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6:

CGGCTGTTAG TGCTGCAGGT ACTCTTGCGG CTCTCCCTAC CGCTGCGACC TCCCCCCAGC 60

AGGGATACGC CTCCTCCGCA GCGCTTACGC CTCAGCCCAC CCCCGCGCTA TGCCGAAATG 120

GGCCGCCGGG TTAGGCAGCA GCTGCGATGG GCTGAGCGAC CTGGTTTCTA ATCAATCACG 180

CTTCGCTTCC ACTCCTCGAC ATTTCGTCTT CCTCCTCCTC GACAACTCGA TCCGTTGCCC 240 TCTTCCTCGT CCAGCGGCCT CTCGAAGTGG TACGAGAGTA CAGATACCGC AGCATCGTCG 300

CGTTTGTCGT TTGTCCTCTT CGTTTTCCCC TGCTCCTCCA CCACCGCACA GACAGGTACT 360

ACTCCGTACT CCGTACATAG TTTCCTGTGC TGCACACGCC CCCCGGCTCT CCGCTTCTCT 420

TTTCCACTCC CTCGAGTGTC TCAGTTGCTC GTCTGGCTCT CGTCTGCCCA CCCCCCGTCG 480

GTGCCGAGTC CTTGCGTTCA TATATCCAAT TTACGCCCCG TCTTCTGTCA AGGCAGTTTT 540 TTCCCACATC TCTCCGCCTC CTTGCCATCT CCTCCTCCTC GTCAGATTCC ATCTCGAGAC 600

GATCCTCCTC CGGCACCAGC GTCGTTCGTG TAGCGTCTGC TTGCTCTGCC ACTGTCGTCT 660

TCGTCCATCG CGATACACAA TTGGTGGTCG CTTCTGTTCG AGATCGATTG TCTGGACCCT 720

GTCAGCTTTG CATTGAGAGC CGCCACTGGT TTGACCGCCC GTCCCCGTTG AGTCAGCTAA 780

TTGGTATGGG TTGCGCTTCT CTTTATTCAT ATATCTGTCC TGTCCTCTTC CCTGCTGGCC 840 TCAGCGCCTC GGTTTCTGCC GTTTCCTCTT CGGTGTTTGC ATAGCCTTCG TCTATCTCGG 900

GCTTGTGACT CACACCTGTC GACCCCCCCC CCCCCCCCCC CTTCAAAGAG CCTCACTGCA 960

ACATCTCTTC TATCGACAAG GGAAGCTAAC CGCTTGCCGC TGCACTGTTA GGACTCGTGT 1020

GGCCATATCT CATACATCTT CCCCACCCAC GGAAGGAAAA GAACACAGAT CTCGGCCGCC 1080

ATGTCCTTTT CGAACCCCCG CAGAAGGACA CCGGTGACTC GTCCCGGAAC CGACTGCGAA 1140 CATGGCCTGT CTCTTAAGAC TACCATGACC CTCCGCAAGG GTGCCACCTT TCACTCTCCT 1200

ACATCTCCCA GCGCTTCATC TGCTGCCGGC GACTTCGTCC CTCCTACTCT CACGAGGTCT 1260

CAATCGGCTT TTGATGATGT CGTCGACGCA AGCCGTCGTC GTATTGCCAT GACTCTGAAC 1320

GATATCGACG AGGCCCTCTC CAAAGCCTCG CTCTCCGACA AGAGCCCTCG GCCGAAGCCC 1380

CTTCGCGACA CCAGCCTGCC CGTCCCTCGC GGCTTCCTCG AACCTCCCGT CGTCGACCCC 1440 -_»

GCCATGAACA AACAAGAGCC TGAGCGAAGG GTCCTGCGCC CTCGCTCTGT TCGACGCACC 1500

AGAAACCACG CCTCCGACAG CGGCATTGGC AGCTCAGTCG TCTCGACAAA CGACAAGGCT 1560

GGCGCCGCCG ACTCTACAAA GAAGCCCCAG GCCTCCGCCC TGACAAGGTC GGCCGCCTCG 1620

AGCACCACCG CGATGCTTCC CAGCCTCAGC CACCGCGCTG TCAACCGCAT CCGCGAACAC 1680 ACTCTCCGCC CTCTGCTGGA GAAGCCCACG TTGAAGGAAT TCGAGCCCAT CGTCCTAGAC 1740

GTGCCCCGGC GCATCCGATC CAAGGAAATC ATTTGCTTGC GAGATCTCGA GAAGACCCTG 1800

ATCTTCATGG CACCGGTAAG TCGACTTCTG ACTGATGACG GTGTTTGGGG TGATGCTTAT 1860

CGGATGTTAC GTCAAAAGGA AAAGGCCAAG TCCGCCGCCT TATACCTTGA TTTCTGCCTC 1920

ACGTCCGTCC GATGCATTCA AGCGACAGTC GAATATCTCA CCGACCGCGA ACAAGTTCGC 1980 CCCGGCGACC GGCCTTACAC TAACGGATAC TTTATCGACC TGAAGGAGCA AATCTACCAG 2040

TACGGCAAGC AACTGGCCGC CATCAAGGAA AAGGGAAGCC TTGCCGACGA CATGGACATT 2100

GACCCGTACG TACTACCATG CCTGATACCT CTCGTAGACG TCGAGACCAC CCCTACTAAC 2160

GCCCCGTCAA GATCTGACGA GGTTCGCCTC TATGGCGGCG TCGCTGAGAA CGGCCGCCCT 2220

GCTGAGCTCA TCCGCGTCAA GAAGGACGGC ACTGCCTACT CAATGGCCAC CGGAAAGATT 2280 GTCGACATGA CCGAATCCCC TACGCCGCTC AAGCGCTCCC TCAGCGAGCA GCGTGAGGAC 2340

GAGGAGGAGA TTATGCGGTC CATGGCCCGC CGGAAGAAGA ACGCCACCCC CGAGGACGTG 2400

GCGCCCAAGA AGTGCCGCGA GCCCGGCTGC ACCAAGGAGT TCAAGCGCCC TTGCGACCTC 2460

ACCAAGCACG AGAAGACTCA CTCTCGTCCC TGGAAGTGCC CCATCCCCAC TTGCAAGTAC 2520

CACGAGTACG GCTGGCCCAC CGAGAAGGAA ATGGACCGCC ACATCAACGA CAAGCACTCG 2580 GACGCCCCGG CCATGTACGA ATGCCTCTTC AAGCCCTGCC CGTACAAGTC GAAGCGTGAG 2640

TCGAACTGCA AGCAGCACAT GGAAAAGGCC CACGGCTGGA CCTATGTCCG CACCAAGACC 2700

AACGGCAAGA AGGCACCGAG CCAGAATGGC TCCACTGCCC AGCAGACCCC CCCTCTCGCC 2760

AACGTGTCTA CGCCTTCCTC CACGCCCAGC TACAGCGTTC CCACGCCTCC CCAAGACCAG 2820

GTCATGTCCA CCGACTTCCC CATGTATCCG GCTGATGACG ATTGGCTCGC TACCTACGGC 2880 GCGCAGCCCA ACACCATCGA CGCCATGGAC CTGGGTCTCG AGAACCTTTC CCCTGCCTCT 2940

GCAGCTTCCT CGTACGAGCA GTACCCTCCC TACCAGAACG GTTCCACCTT CATCATCAAC 3000

GATGAGGACA TCTACGCCGC CCATGTTCAG ATTCCTGCCC AGCTGCCCAC TCCTGAGCAG 3060

GTGTACACCA AGATGATGCC CCAGCAAATG CCGGTCTACC ACGTCCAGCA GGAGCCATGC 3120

ACCACCGTTC CCATCCTGGG CGAGCCTCAA TTCTCCCCCA ATGCTCAGCA GAATGCAGTT 3180 CTGTACACTC CGACCTCGCT GCGCGAGGTT GATGAAGGCT TTGACGAGTC GTACGCCGCA 3240

GACGGCGCCG ACTTTCAGCT GTTCCCGGCG ACGGTCGACA AGACGGATGT GTTCCAGTCA 3300

TTGTTTACCG ATATGCCAAG TGCCAACCTC GGCTTCTCCC AGACCACACA GCCCGACATC 3360

TTCAACCAAA TAGATTGGAG CAACCTCGAC TACCAGGGGT TTCAAGAGTA AAAAGATTGC 3420

GACACATACA ATGACTACTG CACAAGATGC TGCAAACGCT TATCCACTCG CCGCTTACAC 3480 CACTTGTTCT TTTTAACGAT TTCATGAAGA GGTTTCCGTT GGTTCATTGA AAAAGGATTG 3540

CCTTTGTGTA TTAAAGAGTT TTGTTTTCCT CATTTTCACC TCATTTTCTT CTCTTCTCAT 3600

CTGCACGAGA ACGGATGCTT CATTTGCATC GAAACAAGCG GATTAGTTGG CCGTATGGGC 3660

CTGGAGGAGG AACAGCAGCA GGCTTGAGTT GAGGCCATGG GCAGCAGCCG TCTTTCAAGA 3720

TACTGACAGC TTGCTGGTGG GGGTAAAGGT TTACTTTTAT TACACTCTTG AAGAATGATA 3780 GAAGATTTAT CCCACTTTCT GGTGTACCTT GTGACATGCT GTGAAAGTAA GGTAGGTAGT 3840 ACCTTGGTAC CTAGTAAGTA AGGTAAGGTA AGGTAAGGTA GGTAGGTAGT AAGTAATTAG 3900

GTCGGTCTAA TGCAAACCTG AAGGTAAAAA GAAAACCCCT TCGTTTGCCA CGCCTACACG 3960

CAAAAAGAAT GAATGTGTGC CACACTTCAA AAAGGTGAAG ACGATGTTCT CCGATACTGG 4020

GAATCGAACC CAGG 4034 (2) INFORMATION FOR SEQ ID NO: 7:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 733 ammo acids

(B) TYPE: amino acid

(C) STRANDEDNESS: (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: protein

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7:

Met Ser Phe Ser Asn Pro Arg Arg Arg Thr Pro Val Thr Arg Pro Gly 1 5 10 15 Thr Asp Cys Glu His Gly Leu Ser Leu Lys Thr Thr Met Thr Leu Arg

20 25 30

Lys Gly Ala Thr Phe His Ser Pro Thr Ser Pro Ser Ala Ser Ser Ala 35 40 45

Ala Gly ASP Phe Val Pro Pro Thr Leu Thr Arg Ser Gin Ser Ala Phe 50 ^' 55 60

Asp Asp Val Val Asp Ala Ser Arg Arg Arg lie Ala Met Thr Leu Asn 65 70 75 80

Asp lie ASP Glu Ala Leu Ser Lys Ala Ser Leu Ser Asp Lys Ser Pro 85 90 95 Arg Pro Lys Pro Leu Arg Asp Thr Ser Leu Pro Val Pro Arq Gly Phe

100 ^" 105 110

Leu Glu Pro Pro Val Val Asp Pro Ala Met Asn Lys Gin Glu Pro Glu 115 ^' 120 125

Arq Arg Val Leu Arg Pro Arq Ser Val Arg Arg Thr Arg Asn His Ala 130 135 140

Ser Asp Ser Gly lie Gly Ser Ser Val Val Ser Thr Asn Asp Lys Ala 145 150 155 160

Gly Ala Ala Asp Ser Thr Lys Lys Pro Gin Ala Ser Ala Leu Thr Arg 165 170 175 Ser Ala Ala Ser Ser Thr Thr Ala Met Leu Pro Ser Leu Ser His Arg

180 185 190

Ala Val Asn Arg lie Arg Glu His Thr Leu Arg Pro Leu Leu Glu Lys 195 200 205

Pro Thr Leu Lys Glu Phe GH Pro lie Val Leu Asp Val Pro Arg Arg 210 215 220

He Arg Ser Lys Glu He He Cys Leu Arg Asp Leu Glu Lys Thr Leu 225 230 235 240

He Phe Met Ala Pro Glu Lys Ala Lys Ser Ala Ala Leu Tyr Leu Asp 245 250 255 Phe Cys Leu Thr Ser Val Arα Cvs He Gin Ala Thr Val Glu Tyr Leu

260 ^{" "} 265 270

Thr Asp Arq Glu Gin Val Arσ Pro Gly Asp Arg Pro Tyr Thr Asn Gly 275 ^{^} 280 285

Tyr Phe He Asp Leu Lys Glu Gin He Tyr Gin Tyr Gly Lys Gin Leu 290 295 300

Ala Ala He Lys Glu Lys Gly Ser Leu Ala Asp Asp Met Asp He Asp 305 310 315 320

Pro Ser ASP Glu Val Arg Leu Tyr Gly Gly Val Ala Glu Asn Gly Arg 325 330 335

Pro Ala Glu Leu He Arg Val Lys Lys Asp Gly Thr Ala Tyr Ser Met 340 345 350

Ala Thr Gly Lys He Val Asp Met Thr Glu Ser Pro Thr Pro Leu Lys 355 360 365

Arg Ser Leu Ser Glu Gin Arg Glu Asp Glu Glu Glu He Met Arg Ser 370 375 380 Met Ala Arg Arg Lys Lys Asn Ala Thr Pro Glu Asp Val Ala Pro Lys

385 390 395 400

Lvs Cys Arg Glu Pro Gly Cys Thr Lys Glu Phe Lys Arg Pro Cys Asp 405 410 415

Leu Thr Lys His Glu Lys Thr His Ser Arg Pro Trp Lys Cys Pro He 420 425 430

Pro Thr Cys Lys Tyr His Glu Tyr Gly Trp Pro Thr Glu Lys Glu Met 435 440 445

Asp Arg His He Asn Asp Lys His Ser Asp Ala Pro Ala Met Tyr Glu 450 455 460 Cys Leu Phe Lys Pro Cys Pro Tyr Lys Ser Lys Arg Glu Ser Asn Cys

465 470 475 480

Lys Gin His Met Glu Lys Ala His Gly Trp Thr Tyr Val Arg Thr Lys 485 490 495

Thr Asn Gly Lys Lys Ala Pro Ser Gin Asn Gly Ser Thr Ala Gin Gin 500 505 510

Thr Pro Pro Leu Ala Asn Val Ser Thr Pro Ser Ser Thr Pro Ser Tyr 515 520 525

Ser Val Pro Thr Pro Pro Gin Asp Gin Val Met Ser Thr Asp Phe Pro 530 535 540 Met Tyr Pro Ala Asp Asp Asp Trp Leu Ala Thr Tyr Gly Ala Gin Pro

545 550 555 560

Asn Thr He Asp Ala Met Asp Leu Gly Leu Glu Asn Leu Ser Pro Ala 565 570 575

Ser Ala Ala Ser Ser Tyr Glu Gin Tyr Pro Pro Tyr Gin Asn Gly Ser 580 585 590

Thr Phe He He Asn Asp Glu Asp He Tyr Ala Ala His Val Gin He 595 600 605

Pro Ala Gin Leu Pro Thr Pro Glu Gin Val Tyr Thr Lys Met Met Pro 610 615 620 Gin Gin Met Pro Val Tyr His Val Gin Gin Glu Pro Cys Thr Thr Val

625 630 635 640

Pro He Leu Gly Glu Pro Gin Phe Ser Pro Asn Ala Gin Gin Asn Ala 645 650 655

Val Leu Tyr Thr Pro Thr Ser Leu Arg Glu Val Asp Glu Gly Phe Asp 660 665 670

Glu Ser Tyr Ala Ala Asp Gly Ala Asp Phe Gin Leu Phe Pro Ala Thr 675 680 685

Val Asp Lys Thr Asp Val Phe Gin Ser Leu Phe Thr Asp Met Pro Ser 690 695 700 Ala Asn Leu Gly Phe Ser Gin Thr Thr Gin Pro Asp He Phe Asn Gin

705 710 715 720

He Asp Trp Ser Asn Leu Asp Tyr Gin Gly Phe Gin Glu ^F 725 730 (2) INFORMATION FOR SEQ ID NO: 8:

(l) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 1373 base pairs

(B) TYPE: nucleic acid (C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(n) MOLECULE TYPE: cDNA

(vi) ORIGINAL SOURCE:

(A) ORGANISM: Trichoderma reesei (B) STRAIN: Rut-C-30

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8:

TTCGCCATGG ACCTCCGGCA AGCATGTGAT CGATGCCACG ACAAGAAGCT CAGGTGTCCC 60

AGGATTTCGG GCTCGCCCTG CTGCAGCCGC TGCGCAAAGG CCAATGTCGC CTGCGTCTTT 120

AGTCCGCCAT CGAGGCCATT TCGCCCTCAC GAGCCCTTGA ACCACAGCCA TGAACACAGT 180 CATAGTCACA GTCACAATCA TAACGGGGTA GGCGTCAGCT TTGACTGGCT CGATCTCATG 240

AGCTTGGAGC AGCAGCAGGA GCAGCAACAA GGCCAGCCTC AACATCCTCC ACCACCAGTC 300

CAAACCCTCT CCGAACGGCT GGCAGCTCTT CTGTGTGCCC TGGACCGCAT GCTCCAAGCC 360

GTACCCTCAT CCCTCGACAT GCATCACGTC TCAAGGCAGC AGCTGAGAGA GTACGCCGAC 420

ACCGTGGGAA CCGGCTTCGA CCTGCAGTCC ACCCTCGACA GCCTCCTCCA CCACGCCCAG 480 GATCTCGCCT CCCTCTATTC CGAGGCCGTA CCCGCCTCGT TCAACAAGCG CACAACCGCT 540

GCCGAGGCCG ACGCCCTCTG TGCCGTTCCG GACTGCGTAC ACCAGGACCG CACCTCGTTG 600

CACACGACGC CGTTGCCCAA GCTGGACCAC GCCCTGTTGA ACCTCGTCAT GGCGTGCCAC 660

ATCCGTCTGC TCGATGTCAT GGACACTCTC GCAGAGCACG GGCGGATGTG CGCCTTCATG 720

GTGGCCACTC TGCCGCCGGA CTACGACCCC AAGTTTGCGG TTCCTGAGAT ACGGGTGGGC 780 ACTTTTGTCG CCCCTACCGA TACGGCTGCC TCAATGCTGC TCTCTGTTGT CGTGGAGCTT 840

CAAACGGTGC TGGTGGCGAG GGTCAAGGAC TTGGTGGCCA TGGTTGACCA GGTGAAGGAT 900

GATGCAAGAG CGGCGAGAGA AGCAAAGGTC GTTCGTCTGC AGTGTGGGAT TTTACTGGAA 960

CGAGCCGAGT CGACGCTTGG AGAGTGGTCC AGGTTCAAGG ACGGGCTGGT CAGTGCCAGA 1020

CTGCTGAAGT GAGTCTCTCA TATGGACGGC AACGGTGAGG GATTGGCAGT TCCCTACAGT 1080 GAACAGTGAA TGCCGGGACA TGTTCTGCAC GACTAGAACT CGGATATGAT GTCTGATCAC 1140

AGCCGGAAAC GAGGGGCGAC ACGTCTAACG CAGTACGGGG CTGATGCTCA AGGAGCAGCC 1200

GCAGCCTGCA TGTACTTCGC ACGTTACCTT ACCTACATGC AGTGATTGGC CGGGCTCTTC 1260

GATCGTCTCT GGGGCAGCAT CCTCTTCTCT AACCCTCTTG CGAAGCACTT ACAGCTCATG 1320

CTTGAGATGT GGATATCGGA ATCTGAACAG AACTTGTGCC GTCAAAAAAA AAA 1373 (2) INFORMATION FOR SEQ ID NO: 9:

(l) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 2100 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: single (D) TOPOLOGY: linear

(n) MOLECULE TYPE: DNA (genomic)

(vi) ORIGINAL SOURCE:

(A) ORGANISM: Trichoderma reesei

(B) STRAIN: QM9414 (ix) FEATURE:

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9:

CTGGGATCGT GGGATGTGAT GAAAAAAAAA AGTCAAGGTG GCGTGGTTGT GCTTCCTTAC 60

TCCGTACATG TACAGGGGAA TTGAGATCTA TGCCTCAGTG CCGTAAGAGC ACAGGTCGAG 120

GGCAGTTCAA TGAGCTCCGG TGGCGACGTA ATTGATGGCG GGGCCCGACA GCTGGCTGGG 180 CCGTTCTAGC GG GCTTTAG CGGGCGCTCC ACTGAACCTG ACAGGCTGCA GCAGCCCCGC 240

ATCGCTAAAA CGGCGATGCA GATGCTCACA TCAGTGGAAC TGCCACTGTA CCTAGGTATT 300

ATTCTCAATA GGTGGATGTA CATACGTGGA CCGTGGAGGC GATGGAGGCG ATCACGATAA 360

ATGGGCAGAT CCAGTATTGG AGGCGAATGC CATCATGTAT CTTGCATTTC CCATCGGTCT 420

CTGGAAAGCC AGACGTCTTG CATATTCTGA GTCTCGCCCT GGCATCCCAC CGATGGAACT 480 GTTTAGGCTA GGTACGCAAG TGCAAAGAAC ATCGCCGTCC ATGGCGTGAT ATCGGTCCAT 540

GTTGCCAGCA TGGCCCTACC GTATCCGAGA CTCCCCGCGG CTGCCCTGCC TGGGGCTTGT 600

CGGACGACAT CTGCTCTCTC TCAACGCTGT GCATCATCGC GATTCCCTTC TGATTCGCCA 660

TGGACCTCCG GCAAGCATGT GATCGATGCC ACGACAAGAA GCTCAGGTGT CCCAGGATTT 720

CGGGCTCGCC CTGCTGCAGC CGCTGCGCAA AGGCCAATGT CGCCTGCGTC TTTAGTCCGC 780 CATCGAGGCC AT TCGCCCT CACGAGCCCT TGAACCACAG CCATGAACAC AGTCATAGTC 840

ACAGTCACAA TCATAACGGG GTAGGCGTCA GCTTTGACTG GCTCGATCTC ATGAGCTTGG 900

AGCAGCAGCA GGAGCAGCAA CAAGGCCAGC CTCAACATCC TCCACCACCA GTCCAAACCC 960

TCTCCGAACG GCTGGCAGCT CTTCTGTGTG CCCTGGACCG CATGCTCCAA GCCGTACCCT 1020

CATCCCTCGA CATGCATCAC GTCTCAAGGC AGCAGCTGAG AGAGTACGCC GACACCGTGG 1080 GAACCGGCTT CGACCTGCAG TCCACCCTCG ACAGCCTCCT CCACCACGCC CAGGATCTCC 1140

CCTCCCTCTA TTCCGAGGCC GTACCCGCCT CGTTCAACAA GCGCACAACC GCTGCCGAGG 1200

CCGACGCCCT CTGTGCCGTT CCGGACTGCG TACACCAGGA CCGCACCTCG TTGCACACGA 1260

CGCCGTTGCC CAAGCTGGAC CACGCCCTGT TGAACCTCGT CATGGCGTGC CACATCCGTC 1320

TGCTCGATGT CATGGACACT CTCGCAGAGC ACGGGCGGAT GTGCGCCTTC ATGGTGGCCA 1380 CTCTGCCGCC GGACTACGAC CCCAAGTTTG CGGTTCCTGA GATACGGGTG GGCACTTTTG 1440

TCGCCCCTAC CGATACGGCT GCCTCAATGC TGCTCTCTGT TGTCGTGGAG CTTCAAACGG 1500

TGCTGGTGGC GAGGGTCAAG GACTTGGTGG CCATGGTTGA CCAGGTGAAG GATGATGCAA 1560

GAGCGGCGAG AGAAGCAAAG GTCGTTCGTC TGCAGTGTGG GATTTTACTG GAACGAGCCG 1620

AGTCGACGCT TGGAGAGTGG TCCAGGTTCA AGGACGGGCT GGTCAGTGCC AGACTGCTGA 1680 AGTGAGTCTC TCATATGGAC GGCAACGGTG AGGGATTGGC AGTTCCCTAC AGTGAACAGT 1740

GAATGCCGGG ACATGTTCTG CACGACTAGA ACTCGGATAT GATGTCTGAT CACAGCCGGA 1800

AACGAGGGGC GACACGTCTA ACGCAGTACG GGGCTGATGC TCAAGGAGCA GCCGCAGCCT 1860

GCATGTACTT CGCACGTTAC CTTACCTACA TGCAGTGATT GGCCGGGCTC TTCGATCGTC 1920

TCTGGGGCAG CATCCTCTTC TCTAACCCTC TTGCGAAGCA CTTACAGCTC ATGCTTGAGA 1980 TGTGGATATC GGAATCTGAA CAGAACTTGT GCCGTCATGG GCTCGCGAAC CCAGCCCACC 2040

TGCATGTCTT GAAGTCATCA AAAACAGACG AGCTCGACAC GCCGTCCACC CTCCCTATGC 2100 (2) INFORMATION FOR SEQ ID NO: 10:

(i) S ICS: o acids

(ii) MOLECULE TYPE: protein

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10:

Met Asp Leu Arg Gin Ala Cys Asp Arg Cys His Asp Lys Lys Leu Arg 1 5 10 15

Cys Pro Arg He Ser Gly Ser Pro Cys Cys Ser Arg Cys Ala Lys Ala 20 25 30

Asn Val Ala Cys Val Phe Ser Pro Pro Ser Arg Pro Phe Arg Pro His 35 40 45 Glu Pro Leu Asn His Ser His Glu His Ser His Ser His Ser His Asn

50 55 60

His Asn Gly Val Gly Val Ser Phe Asp Trp Leu Asp Leu Met Ser Leu 65 70 75 80

Glu Gin Gin Gin Glu Gin Gin Gin Gly Gin Pro Gin His Pro Pro Pro 85 90 95

Pro Val Gin Thr Leu Ser Glu Arg Leu Ala Ala Leu Leu Cys Ala Leu 100 105 110

Asp Arg Met Leu Gin Ala Val Pro Ser Ser Leu Asp Met His His Val 115 120 125 Ser Arg Gin Gin Leu Arg Glu Tyr Ala Asp Thr Val Gly Thr Gly Phe

130 135 140

Asp Leu Gin Ser Thr Leu Asp Ser Leu Leu His His Ala Gin Asp Leu 145 150 155 160

Ala Ser Leu Tvr Ser Glu Ala Val Pro Ala Ser Phe Asn Lys Arg Thr ^J 165 170 175

Thr Ala Ala Glu Ala Asp Ala Leu Cys Ala Val Pro Asp Cys Val His 180 185 190

Gin Asp Arg Thr Ser Leu His Thr Thr Pro Leu Pro Lys Leu Asp His 195 200 205 Ala Leu Leu Asn Leu Val Met Ala Cys His He Arg Leu Leu Asp Val

210 215 220

Met Asp Thr Leu Ala Glu His Gly Arg Met Cys Ala Phe Met Val Ala 225 230 235 240

Thr Leu Pro Pro Asp Tyr Asp Pro Lys Phe Ala Val Pro Glu He Arg 245 250 255

Val Gly Thr Phe Val Ala Pro Thr Asp Thr Ala Ala Ser Met Leu Leu 260 265 270

Ser Val Val Val Glu Leu Gin Thr Val Leu Val Ala Arg Val Lys Asp 275 280 285 Leu Val Ala Met Val Asp Gin Val Lys Asp Asp Ala Arg Ala Ala Arg

290 295 300

Glu Ala Lys Val Val Arg Leu Gin Cys Gly He Leu Leu Glu Arg Ala 305 310 315 320

Glu Ser Thr Leu Gly Glu Trp Ser Arg Phe Lys Asp Gly Leu Val Ser 325 330 335

Ala Arg Leu Leu Lys 340 (2) INFORMATION FOR SEQ ID NO: 11:

(l) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 36 base pairs

(B) TYPE: nucleic acid (C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(n) MOLECULE TYPE: other nucleic acid

(A) DESCRIPTION: /desc = "PCR primer"

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: GCTCTAGAGC CGGATCCATC CTTTTCGAAC CCCCGC 36

(2) INFORMATION FOR SEQ ID NO: 12:

(l) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 40 base pairs

(B) TYPE: nucleic acid (C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(n) MOLECULE TYPE: other nucleic acid

(A) DESCRIPTION: /desc = "PCR primer"

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: TCCCCCCGGG GGGAGGATCC TTACTCTTGA AACCCCTGGT 40

(2) INFORMATION FOR SEQ ID NO: 13:

(l) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 38 base pairs

(B) TYPE: nucleic acid (C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(n) MOLECULE TYPE: other nucleic acid

(A) DESCRIPTION: /desc = "PCR primer"

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: GAGAGAATCG ATACTAGTGG TCTGAAGGAC GTGGAATG 38

(2) INFORMATION FOR SEQ ID NO: 14:

(l) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 40 base pairs

(B) TYPE: nucleic acid (C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(n) MOLECULE TYPE: other nucleic acid

(A) DESCRIPTION: /desc = "PCR primer"

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: TATATAATCG ATGCTAGCGT TGAGAGAAGT TGTTGGATTG 40

(2) INFORMATION FOR SEQ ID NO: 15:

(l) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 39 base pairs

(B) TYPE: nucleic acid (C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: other nucleic acid

(A) DESCRIPTION: /desc = "PCR primer"

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: ATCTGTCTAG ACAGGATCCC CGGCAAGCAT GTGATCGAT 39 (2) INFORMATION FOR SEQ ID NO: 16:

(l) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 39 base pairs

(B) TYPE: nucleic acid (C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: other nucleic acid

(A) DESCRIPTION: /desc = "PCR primer"

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: TACGTCCCGG GCTGGATCCT CACTTCAGCA GTCTGGCTC 39

(2) INFORMATION FOR SEQ ID NO: 17:

(i) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 29 base pairs

(B) TYPE: nucleic acid (C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: DNA (genomic)

(ix) FEATURE:

(A) NAME/KEY: misc signal (B) LOCATION: 1..29^~

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17:

TGAGCTAGTA GGCAAAGTCA GCGAATGTG 29

(2) INFORMATION FOR SEQ ID NO: 18:

(l) SEQUENCE CHARACTERISTICS: (A) LENGTH: 184 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: DNA (genomic) (ix) FEATURE:

(A) NAME/KEY: misc signal

(B) LOCATION: 1..181^"

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18:

AAAGATAGCC TCATTAAACG GAATGAGCTA GTAGGCAAAG TCAGCGAATG TGTATATATA 60 AAGGTTCGAG GTCCGTGCCT CCCTCATGCT CTCCCCATCT ACTCATCAAC TCAGATCCTC 120

CAGGAGACTT GTACACCATC TTTTGAGGCA CAGAAACCCA ATAGTCAACC GCGGACTGCG 180

CATC 184

(2) INFORMATION FOR SEQ ID NO: 19:

(l) SEQUENCE CHARACTERISTICS: (A) LENGTH: 161 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: single

(D) TOPOLOGY: linear

(ii) MOLECULE TYPE: DNA (genomic) (ix) FEATURE:

(A) NAME/KEY: m sc signal

(B) LOCATION: 1..16T (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: TGAGCTAGTA GGCAAAGTCA GCGAATGTGT ATATATAAAG GTTCGAGGTC CGTGCCTCCC 60 TCATGCTCTC CCCATCTACT CATCAACTCA GATCCTCCAG GAGACTTGTA CACCATCTTT 120 TGAGGCACAG AAACCCAATA GTCAACCGCG GACTGCGCAT C 161 (2) INFORMATION FOR SEQ ID NO: 20:

(l) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 140 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: single (D) TOPOLOGY: linear

(n) MOLECULE TYPE: DNA (genomic)

(ix) FEATURE:

(A) NAME/KEY: misc signal

(B) LOCATION: 1..14U (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20:

CGAATGTGTA TATATAAAGG TTCGAGGTCC GTGCCTCCCT CATGCTCTCC CCATCTACTC 60 ATCAACTCAG ATCCTCCAGG AGACTTGTAC ACCATCTTTT GAGGCACAGA AACCCAATAG 120 TCAACCGCGG ACTGCGCATC 140

(2) INFORMATION FOR SEQ ID NO: 21: (l) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 93 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: single

(D) TOPOLOGY: linear (n) MOLECULE TYPE: other nucleic acid

(A) DESCRIPTION: /desc = "oligonucleotide"

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: GATCCTGAGC TAGTAGGCAA AGTCAGCGAA TGTGTGAGCT AGTAGGCAAA GTCAGCGAAT 60 GTGTGAGCTA GTAGGCAAAG TCAGCGAATG TGG 93 (2) INFORMATION FOR SEQ ID NO: 22:

(l) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 93 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: single (D) TOPOLOGY: linear

(ii) MOLECULE TYPE: other nucleic acid

(A) DESCRIPTION: /desc = "oligonucleotide"

(xi ) SEQUENCE DESCRIPTION: SEQ ID NO: 22: GATCCCACAT TCGCTGACTT TGCCTACTAG CTCACACATT CGCTGACTTT GCCTACTAGC 60 TCACACATTC GCTGACTTTG CCTACTAGCT CAG 93

(2) INFORMATION FOR SEQ ID NO: 23:

(l) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 93 base pairs

(B) TYPE: nucleic acid (C) STRANDEDNESS: single

(D) TOPOLOGY: linear

( i) MOLECULE TYPE: other nucleic acid

(A) DESCRIPTION: /desc = "oligonucleotide" (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: GATCCTGAAG AATGGGAAGC ATTGCTAAGC GGTGTGAAGA ATGGGAAGCA TTGCTAAGCG 60 GTGTGAAGAA TGGGAAGCAT TGCTAAGCGG TGG 93

(2) INFORMATION FOR SEQ ID NO: 24: (l) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 93 base pairs

(B) TYPE: nucleic acid

(C) STRANDEDNESS: single

(D) TOPOLOGY: linear (n ) MOLECULE TYPE: other nucleic acid

(A) DESCRIPTION: /desc = "oligonucleotide"

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: GATCCCACCG CTTAGCAATG CTTCCCATTC TTCACACCGC TTAGCAATGC TTCCCATTCT 60 TCACACCGCT TAGCAATGCT TCCCATTCTT CAG 93 (2) INFORMATION FOR SEQ ID NO: 25:

( ) S ICS: acids

(ii) MOLECULE TYPE peptide (v) FRAGMENT TYPE internal

(ix) FEATURE:

(A) NAME/KEY: Peptide (B) LOCATION: 1..17

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25:

Arg Arg Lys Lys Asn Ala Thr Pro Glu Asp Val Ala Pro Lys Lys Cys 1 5 10 15

Arg

(2) INFORMATION FOR SEQ ID NO: 26:

(_.) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 31 ammo acids

(B) TYPE: amino acid (C) STRANDEDNESS:

(D) TOPOLOGY linear MOLECULE TYPE peptide FRAGMENT TYPE N-termmal x) FEATURE: (A) NAME/KEY: Peptide

(B) LOCATION: 1..31

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26:

Ala Cys Asp Arg Cys His Asp Lys Lys Leu Arg Cys Pro Arg He Ser 1 5 10 15 Gly Ser Pro Cys Cys Ser Arg Cys Ala Lys Ala Asn Val Ala Cys

20 25 30

(2) IFFORMATION FOR SEQ ID NO: 27:

'D SEQUENCE CHARACTERISTICS: (A) LENGTH: 19 ammo acids

(B) TYPE: ammo acid

(C) STRANDEDNESS:

(D) TOPOLOGY: linear

(n) MOLECULE TYPE peptide (v) FRAGMENT TYPE internal

(ix) FEATURE:

(A) NAME/KEY: Peptide

(B) LOCATION: 1..19 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27:

Pro His Glu Pro Leu Asn His Ser His Glu His Ser His Ser His Ser 1 5 10 15

(2) INFORMATION FOR SEQ ID NO: 28:

(l) SEQUENCE CHARACTERISTICS:

(A) LENGTH: 19 ammo acids

(B) TYPE: ammo acid

(C) STRANDEDNESS: (D) TOPOLOGY linear

(ll) MOLECULE TYPE peptide

(v) FRAGMENT TYPE internal

(ix) FEATURE:

(A) NAME/KEY: Peptide (B) LOCATION: 1..19

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28:

Glu Gin Gin Gin Glu Gin Gin Gin Gly Gin Pro Gin His Pro Pro Pro 1 5 10 15

Pro Val Gin

INDICATIONS RELATING TO A DEPOSITED MICROORGANISM

( PCT Rule \ 3bis)

A. The indications made below relate to the microorganism referred to in the description on page 5 . line 8

B. IDENTIFICATION OF DEPOSIT Further deposits are identified on an additional sheet [ [

Name of depositary institution

DSMZ-Deutsche Sammlung von Mikroorganismen und Zell ulturen GmbH

Address of depositary institution /including postal coae ana country)

Mascneroder Weg 1b, D-38124 Braunschweig, Germany

Date of deposit Accession Number

7 March 1997 DSM 1 1451

C. ADDITIONAL INDICATIONS /leave blank it not apoucable) This information is continued on an additional sheet | |

Regarding those designations in which a European patent is sought, a sample of the deposited microorganism will be made available only by the issue of such a sample to an expert nominated by the person requesting the sample (Rule 28(A) EPC) until the publication of the mention of the grant of the European patent or until the date on which the application has been refused or is deemed to be withdrawn. This request also applies to other designated countries in which similar or corresponding provisions are in force.

D. DESIGNATED STATES FOR WHICH INDICATIONS ARE MADE (if the indications are not for all designated States)

E. SEPARATE FURNISHING OF INDICATIONS i leave blank if not applicable)

The indications listed below will be submitteα to the International Bureau later (specify the general nature ofthe indications e . "Accession Sumoer of Deposit")

For International Bureau use onlv

| I This sheet was received by the International Bureau on:

Authorized officer

Form PCT/RO/134 (July 1992) INDICATIONS RELATING TO A DEPOSITED MICROORGANISM

( PCT Rule 13 _>/_^■)

A. The indications made below relate to the microorganism referred to in the descπption on page 5 . hne 1 1

B. IDENTIFICATION OF DEPOSIT Further deposits are identified on an additional sheet ~\

Name of depositary institution

DSMZ-Deutsche Sammlung von Mikroorganismen und Zellkulturen GmbH

Address ot depositary institution i including postal coae ana country/

Mascneroder Weg 1b, D-38124 Braunschweig, Germany

Date of deposit Accession Number

7 Marcn 1 997 DSM 1 1 452

C. ADDITIONAL INDICATIONS /leave oιanκ. it not applicable) This information is continued on an additional sheet ϊ~\

Regarding those designations in which a European patent is sought, a sample of the deposited microorganism will be made available only by the issue of such a sample to an expert nominated by tne person requesting the sample (Rule 28(4) EPC) until the publication of the mention of the grant of the European patent or until the date on which the application has been refused or is deemed to oe withdrawn. This reαuest also applies to other designated countries in which similar or corresponding orovisions are in force.

E. SEPARATE FURNISHING OF INDICATIONS /leave olank if not applicable)

The indications listed below will πe submitteα to the International Bureau later (specify the general nature of the indications e g.. "Accession \ιιmoer of Deposit")

For receiving Office use only For International Bureau use onlv lYj This sheet was received with the international application | I This sheet was received by the international Bureau on-

Authorized ! Authorized officer INDICATIONS RELATING TO A DEPOSITED MICROORGANISM

( PCT Rule 13Z>«)

A. The indications made below relate to the microorganism reterred to in tne description on page 5 . lme 1_4

B. IDENTIFICATION OF DEPOSIT Further deposits are identified on an additional sheet f~\

Name of depositary institution

DSMZ-Deutsche Sammlung von Mikroorganismen und Zellkulturen GmbH

Address of depositary institution i including postal cυae ana country_/

Mascheroder Weg 1b, D-38124 Braunschweig, Germany

Date or deposit Accession Number

7 March 1997 I DSM 1 1 453

C. ADDITIONAL INDICATIONS /leave oianκ it not applicable) This information is continued on an additional shee_t j j

Regarding those designations m which a European patent is sought, a sample of the deposited microorganism will be made available only by the issue of such a sample to an expert nominated DV the person requesting the sample (Rule 28(4) EPC) until the publication of the mention of the grant of the European patent or until the date on which the application has been refused or is deemed to oe withdrawn. This request also applies to other designated countries in whicn similar or corresDonding Drovisions are in force.

E. SEPARATE FURNISHING OF INDICATIONS i leave olank if not applicable)

The indications listed below will _e submitteα to the International Bureau later {specify the general nature of the inaicanons eg., "Accession Sumoer of Deposit")

For receiving Office use onlv For International Bureau use oniv

HA] This sheet was received with the international application I I This sheet was received by the International Bureau on:

Authorized officer Authorized officer INDICATIONS RELATING TO A DEPOSITED MICROORGANISM

( PCT Rule 13_>ύ)

A. The indications made below relate to the microorganism reterred to in the description on page 5 . ne 1_7_

B. IDENTIFICATION OF DEPOSIT Further deposits are identified on an additional sheet f~]

Name of depositary institution

DSMZ-Deutsche Sammlung von Mikroorganismen und Zellkulturen GmbH

Address ot depositary institution i including postal coae ana country)

Mascheroder Weg 1b, D-38124 Braunschweig, Germany

Date of deposit Accession Numoer

7 March 1 997 DSM 1 1 454

C. ADDITIONAL INDICATIONS (leave oianκ it not applicable) This information is continued on an additional sheet ϊ~\

Regarding those designations in which a European patent is sought, a sample of the deposited microorganism will be made available only by the issue of such a sample to an expert nominated by the person requesting the sample (Rule 28(4) EPC) until the publication of tne mention of the grant of the European patent or until the date on which the application has been refused or is deemed to De withdrawn. This request also applies to other designated countries in which similar or corresponding provisions are in force.

E. SEPARATE FURNISHING OF INDICATIONS (leave blank if not applicable)

The indications listed below will be submitted to tne international Bureau later (specify the general namre of the indications e.g. 'Accession \ iimoer of Deposit")

For receiving Office use only For International Bureau use only sπ This sheet was received with the international application I I This sheet was received by the International Bureau on:

Claims

What is Claimed is:

1. A purified nucleic acid molecule encoding a polypeptide having the ability to transcriptionally regulate promoters, wherein said nucleic acid molecule is selected from the group consisting of: a. a nucleic acid molecule encoding a polypeptide comprising an amino acid sequence of acel as depicted in SEQ ID NO.7; b. a nucleic acid molecule comprising the coding sequence of the nucleotide sequence of acel as depicted in SEQ ID NO.5 and 6 c. a nucleic acid molecule encoding a polypeptide comprising a coding sequence that differs from the coding sequence of (a) or (b) due to degeneracy of the genetic code; and d. a nucleic acid molecule that hybridizes to (b) and that encodes a polypeptide having transcriptional regulator activity and an amino acid sequence that has at least 80% identity to the amino acid sequence of acel as depicted in SEQ ID NO.7.

2. The purified nucleic acid molecule of claim 1 , wherein said nucleic acid molecule is a nucleic acid molecule encoding a polypeptide comprising an amino acid sequence of acel as depicted in SEQ ID NO.7.

3. The purified nucleic acid molecule of claim 1 , wherein said nucleic acid molecule is a nucleic acid molecule comprising the coding sequence of the nucleotide sequence of acel as depicted in SEQ ID NO. 5 and 6.

4. The purified nucleic acid molecule of claim 1, wherein said polypeptide is a transcriptional activator.

5. The purified nucleic acid molecule of claim 1, wherein said promoter is the promoter of a gene encoding a hydro lytic enzyme.

6. The purified nucleic acid molecule of claim 5, wherein said hydrolytic enzyme is an enzyme that is capable of hydrolysing lignocellulose.

7. The purified nucleic acid of claim 6, wherein said enzyme is selected from the group consisting of cellobiohydrolase I, cellobiohydrolase II, endoglucanase I, endoglucanase π, endoglucanase III, endoglucanase V, β-glucosidase, xylanase, α-arabinosidase, -D-glucuronidase, acetyl esterase, mannanase, pectinase, pectinesterase, and pectin acid lyase.

8. A purified nucleic acid molecule encoding a polypeptide having the ability to transcriptionally regulate promoters, wherein said nucleic acid molecule is selected from the group consisting of: a. a nucleic acid molecule encoding a polypeptide comprising an amino acid sequence of ace2 as depicted in SEQ ID NO. 10; b. a nucleic acid molecule comprising the coding sequence of the nucleotide sequence of ace2 as depicted in SEQ ID NO. 8 and 9; c. a nucleic acid molecule encoding a polypeptide comprising a coding sequence that differs from the coding sequence of (a) or (b) due to degeneracy of the genetic code; and d. a nucleic acid molecule that hybridizes to (b) and that encodes a polypeptide having transcriptional regulator activity and an amino acid sequence that has at least 80% identity to the amino acid sequence depicted in SEQ ID NO. 10.

9. The purified nucleic acid molecule of claim 8, wherein said nucleic acid molecule is a nucleic acid molecule encoding a polypeptide comprising an amino acid sequence of ace2 as depicted in SEQ ID NO. 10.

10. The purified nucleic acid molecule of claim 8, wherein said nucleic acid molecule is a nucleic acid molecule comprising the coding sequence of the nucleotide sequence of ace2 as depicted in SEQ ID NO. 8 and 9.

11. The purified nucleic acid molecule of claim 8, wherein said polypeptide is a transcriptional activator.

12. The purified nucleic acid molecule of claim 8, wherein said promoter is the promoter of a gene encoding a hydrolytic enzyme.

13. The purified nucleic acid molecule of claim 12, wherein said gene encodes an enzyme that is capable of hydrolysing lignocellulose.

14. The purified nucleic acid of claim 13, wherein said enzyme is selected from the group consisting of cellobiohydrolase I, cellobiohydrolase II, endoglucanase I, endoglucanase π, endoglucanase HI, endoglucancase V, β-glucosidase, xylanase, α-arabinosidase, α-D-glucuronidase, acetyl esterase, mannanase, pectinase, pectinesterase, and pectin acid lyase.

15. The purified nucleic acid molecule of any one of claims 1 or 8, wherein said molecule is RNA.

16. The purified nucleic acid molecule of any one of claims 1 or 8, wherein said molecule is DNA.

17. A vector comprising a sequence of a nucleic acid molecule that encodes a polypeptide, wherein said polypeptide has the ability to transcriptionally regulate promoters, wherein said nucleic acid molecule is selected from the group consisting of: a. a nucleic acid molecule encoding a polypeptide comprising an amino acid sequence of acel as depicted in SEQ ID NO.7; b. a nucleic acid molecule comprising the coding sequence of the nucleotide sequence of acel as depicted in SEQ ID NO.5 and 6; c. a nucleic acid molecule encoding a polypeptide comprising a coding sequence that differs from the coding sequence of (a) or (b) due to degeneracy of the genetic code; and d. a nucleic acid molecule that hybridizes to (b) and that encodes a polypeptide having transcriptional activator activity and an amino acid sequence that has at least 80% identity to the amino acid sequence of acel as depicted in SEQ ID NO.7.

18. A vector comprising a sequence of a nucleic acid molecule that encodes a polypeptide, wherein said polypeptide has the ability to transcriptionally regulate promoters, wherein said nucleic acid molecule is selected from the group consisting of: a. a nucleic acid molecule encoding a polypeptide comprising an amino acid sequence of ace2 as depicted in SEQ ID NO. 10; b. a nucleic acid molecule comprising the coding sequence of the nucleotide sequence of ace2 as depicted in SEQ ID NO. 8 and 9; c. a nucleic acid molecule encoding a polypeptide comprising a coding sequence that differs from the coding sequence of (a) or (b) due to degeneracy of the genetic code; and d. a nucleic acid molecule that hybridizes to (b) and that encodes a polypeptide having transcriptional activator activity and an amino acid sequence that has at least 80% identity to the amino acid sequence depicted in SEQ ID NO. 10.

19. The vector of any one of claims 17 or 18, wherein said nucleic acid molecule is capable of being expressed.

20. The vector of claim 19, wherein expression of said nucleic acid molecule is operably linked to a promoter that is not repressed when a host into which said vector has been inserted is grown on glucose.

21. A host cell transformed with at least one member selected from the group consisting of: a. the nucleic acid molecule of claim 1; b. the nucleic acid molecule of claim 8; c. the vector of claim 18; d. the vector of claim 19; and e. a nucleic acid molecule that comprises a promoter that has been modified so that is capable of being regulated by ACEI and/or ACEII to a greater degree than the unmodified promoter.

22. The host cell of claim 21, wherein said host is a member of the Trichoderma or Aspergillus genus.

23. The host cell of claim 22, wherein said member of said Trichoderma genus is T. reesei.

24. A method for producing ACEI, said method comprising providing the nucleic acid molecule of claim 1, or the vector of claim 17 to a host cell and expressing the ACEI protein that is encoded by said molecule or said vector.

25. A method for producing ACEII, said method comprising providing the nucleic acid molecule of claim 8, or the vector of claim 18 to a host cell and expressing the ACEII protein that is encoded by said molecule or said vector.

26. A purified polypeptide having the activity of ACEI, said polypeptide having an amino acid sequence encoded by: a. a nucleic acid molecule encoding a polypeptide comprising an amino acid sequence of acel as depicted in SEQ ID NO. 7; b. a nucleic acid molecule comprising the coding sequence of the nucleotide sequence of acel as depicted in SEQ ID NO. 5 and 6; c. a nucleic acid molecule encoding a polypeptide comprising a coding sequence that differs from the coding sequence of (a) or (b) due to degeneracy of the genetic code; and d. a nucleic acid molecule that hybridizes to (b) and that encodes a polypeptide having transcriptional activator activity and an amino acid sequence that has at least 80% identity to the amino acid sequence of acel as depicted in SEQ ID NO. 7.

27. The purified polypeptide of claim 26, wherein said polypeptide comprises the amino acid sequence of the coding region as depicted in SEQ ID NO. 7.

28. A purified polypeptide having the activity of ACEII, said polypeptide being encoded by: a. a nucleic acid molecule encoding a polypeptide comprising an amino acid sequence of ace2 as depicted in SEQ ID NO. 10; b. a nucleic acid molecule comprising the coding sequence of the nucleotide sequence of ace2 as depicted in SEQ ID NO. 8 and 9; c. a nucleic acid molecule encoding a polypeptide comprising a coding sequence that differs from the coding sequence of (a) or (b) due to degeneracy of the genetic code; and d. a nucleic acid molecule that hybridizes to (b) and that encodes a polypeptide having transcriptional activator activity and an amino acid sequence that has at least 80% identity to the amino acid sequence depicted in SEQ ID NO. 10.

29. The purified polypeptide of claim 28, wherein said polypeptide comprises the amino acid sequence of the coding region as depicted in SEQ ID NO. 10.

30. A method of making a desired protein, said method comprising expressing said protein in the host cell of any one of claims 21-23, wherein the expression of said protein is regulated by ACEI and/or ACEII.

31. The method of claim 30, wherein the copy number of said nucleic acid molecule encoding said ACEI and/or said ACEII is raised in said host when compared to the copy number of said nucleic acid molecule in the native host.

32. The method of claim 30, wherein said promoter that is operably linked to said desired protein has a greater affinity for said ACEI and/or said ACEII than the unmodified promoter in the native host.

33. The method of claim 30, wherein the protein encoded by said nucleic acid molecule enhances transcription of said desired sequence.

34. The method of claim 30, wherein said host cell is a yeast cell or a filamentous fungus cell.

35. The method of claim 34, wherein said host cell is a yeast cell.

36. The method of claim 35, wherein said yeast is Saccharomyces.

37. The method of claim 36, wherein said Saccharomyces is S. cerevisiae.

38. The method of claim 34, wherein said host cell is a filamentous fungus cell.

39. The method of claim 38, wherein said filamentous fungi is selected from a member of the group consisting of Trichoderma, Aspergillus, Claviceps purpurea, Penicillium chrysogenum, Magnaporthe grisea, Neurospora, Mycosphaerella spp. , CoUectotrichum trifolii, the dimorphic fungus Histoplasmia capsulatum, Nectria haematococca solαni f. sp. phαseoli and f. sp. pisi), Ustilago violacea, Ustilago maydis, Cephalosporium acremonium. Schizophyllum commune, Podospora anserina, Sordaria macrospora, Mucor circinelloides, Humicola, Melanocarpus, Myceliophthora, Chaetomium and CoUectotrichum capsici.

40. The method of claim 39, wherein said filamentous fungus is Aspergillus.

41. The method of claim 39, wherein said filamentous fungus is Trichoderma.

42. The method of claim 41, wherein said Trichoderma is selected from the group consisting of T. reesei, T. harzianum, T. longibrachiatum, T. viride, and T. koningii.

A3. The method of claim 42, wherein said Trichoderma is T reesei.

AA. The method of claim 30, wherein said host cell is transformed with a DNA construct encoding ACEI.

45. The method of claim 30, wherein said host cell is transformed with a DNA construct encoding ACEII.

46. A protein produced by the method of claim 30.

47. A purified nucleic acid sequence encoding a fungal transcriptional regulator, said sequence being capable of hybridizing to primers or probes of at least 20 nucleotides in length that are derived from the nucleic acid sequence of acel or acel when hybridization is performed in όxSSC, 0.1% SDS, 0.05% sodium pyrophosphate and 100 μg/ml denatured herring sperm DNA at 57 °C for 44 hours, followed by two washes in 5xSSC, 0.1% SDS at 57°C and two washes in

3xSSC at 57°C.

48. A purified gene encoding a protein, wherein said protein regulates transcription of genetic sequences through the presence of a sequence contained in the region located between and including the nucleotides -184 and -1 (SEQ ID NO.18 ), - 161 and -1 (SEQ ID NO.19 ), -140 and -1 (SEQ ID NO.20 ) or -161 and -133

(SEQ ID NO.17 ) upstream of the protein-coding region of the T. reesei cbhl gene.