WO2005010176A1 - Procedes pour fabriquer des variants de polypeptides par echange de fragments combinatoires - Google Patents

Procedes pour fabriquer des variants de polypeptides par echange de fragments combinatoires Download PDF

Info

Publication number
WO2005010176A1
WO2005010176A1 PCT/DK2004/000505 DK2004000505W WO2005010176A1 WO 2005010176 A1 WO2005010176 A1 WO 2005010176A1 DK 2004000505 W DK2004000505 W DK 2004000505W WO 2005010176 A1 WO2005010176 A1 WO 2005010176A1
Authority
WO
WIPO (PCT)
Prior art keywords
enzyme
polypeptide
structurally related
parent
region
Prior art date
Application number
PCT/DK2004/000505
Other languages
English (en)
Inventor
Dafydd Jones
Original Assignee
Novozymes A/S
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Novozymes A/S filed Critical Novozymes A/S
Publication of WO2005010176A1 publication Critical patent/WO2005010176A1/fr

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/10Processes for the isolation, preparation or purification of DNA or RNA
    • C12N15/1034Isolating an individual clone by screening libraries
    • C12N15/1089Design, preparation, screening or analysis of libraries using computer algorithms
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/14Hydrolases (3)
    • C12N9/48Hydrolases (3) acting on peptide bonds (3.4)
    • C12N9/50Proteinases, e.g. Endopeptidases (3.4.21-3.4.25)
    • C12N9/52Proteinases, e.g. Endopeptidases (3.4.21-3.4.25) derived from bacteria or Archaea
    • C12N9/54Proteinases, e.g. Endopeptidases (3.4.21-3.4.25) derived from bacteria or Archaea bacteria being Bacillus
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • G16B15/20Protein or domain folding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment

Definitions

  • TITLE Method for making polypeptide variants by combinatorial fragment exchange.
  • the present invention relates to methods for identifying one or more interchangeable region in a parent polypeptide and in structurally related polypeptides, by super-imposing three-dimensional structure-models of the polypeptides, and then identifying regions wherein the start- and end-points of the corresponding regions overlap sufficiently in the superimposed models for the regions to be interchanged; methods of producing the combinatorial variant polypeptides resulting from interchanging said identified corresponding region(s) in a parent polypeptide with region(s) of structurally related polypeptides; and the resulting combinatorial variant(s).
  • WO 95/22625 discloses a method for shuffling of homologous DNA sequences, wherein the homologous double-stranded template polynucleotide is cleaved into random fragments of a desired size followed by homologously reassembling of the fragments into full-length genes.
  • Such reassembly has also been done with PCR-based methods, wherein conserved DNA regions of the homologous sequences are utilized as recombination cross-over points; see e.g.
  • WO 98/41623 and WO 98/41622 (Novozymes A/S, DK). Other methods focus on more rational protein design. Instead of randomly shuffling the encoding sequences, three-dimensional structure-models of related proteins are used to determine which specific amino acid(s) might be altered to achieve the desired property changes.
  • two structurally related polypeptide enzymes with a distant common ancestor may have no more than 50% amino acid sequence identity today, but their three-dimensional structures are still conserved in some regions, while other regions have become very different, typically the loops are not very conserved. Such two enzymes may have completely different substrate-specificities and/or other different properties. It is therefore of interest to identify interchangeable potentially variable regions in structurally related polypeptides, typically loop-structures, so that they may be used in the creation of combinatorial variants, and/or in the creation of a combinatorial variant library based on a parent polypeptide or "backbone polypeptide", wherein such regions have been introduced in a combinatorial manner, to achieve altered properties of said parent.
  • a problem to be solved by the present invention is how to identify those regions in structurally related polypeptides, that may be interchangeable in the polypeptides with a high probability of maintaining some biological activity after such exchange(s), so that it becomes possible to produce a combinatorial variant library of a parent polypeptide, that comprises one or more such region derived from structurally related polypeptides, so that in turn a combinatorial variant of the parent polypeptide with one or more altered property can be isolated.
  • the invention relates to a method for identifying one or more interchangeable region in a parent polypeptide and in at least one structurally related polypeptide, wherein each region comprises at least 2 contiguous amino acids, said method comprising the steps of: (a) super-imposing a three-dimensional structure-model of the parent polypeptide and a three-dimensional structure-model of the at least one structurally related polypeptide; and (b) identifying one or more region within the parent polypeptide and one or more corresponding region within the super-imposed at least one structurally related polypeptide, wherein the start- and end-points of said region and of said corresponding region respectively are located at spatially overlapping positions within the super-imposed three-dimensional structure-models.
  • structurally related means, that the structurally related polypeptides have three-dimensional structures, that are at least partially super- imposable.
  • Two polypeptides are partially super-imposable, and thus structurally related, if they comprise one or more region of 50 contiguous amino acids or less, preferably 30, preferably 20, more preferably 15, even more preferably 10, and most preferably 5 contiguous amino acids or less, that are super-imposable with a root mean square deviation value of said region(s) of 4 Angstrom or less, preferably 2 Angstrom or less, more preferably 1.5 Angstrom or less, or most preferably 1 Angstrom or less, calculated as outlined below.
  • three dimensional structure-model of a polypeptide in the present invention comprises an actual structure which was solved using structural biology methods, e.g. X-ray crystallography or nuclear magnetic resonance spectroscopy. Such a structure is represented by a set of structural coordinates for the placement of atoms (it may or may not include hydrogen atoms) constituting the three dimensional structure.
  • the structural coordinates are usually given in the standard PDB format (Protein Data Bank, Brook Haven National Laboratory, Brookhaven, CT, USA).
  • the term "three-dimensional structure-model” also comprises a computationally built model which may have been modeled or built on the basis of the amino acid sequence of the polypeptide, and the actual solved structure of another highly related polypeptide.
  • the one or more region to be selected within the parent polypeptide is said to have one or more "corresponding region" within the super-imposed at least one structurally related polypeptide, when the start- and end-points of said corresponding regions in each polypeptide are located at spatially overlapping positions within the super-imposed structure- models, i.e. the regions themselves are not required to be spatially overlapping.
  • each polypeptide is located between the "start- and end-points" of said one or more region.
  • Each start-point and each end-point comprises at least 1 , preferably at least 2, more preferably at least 4, 6, 8, 10, 15, 20, 30, 40 contiguous amino acids.
  • the start- and end-points of a region in a polypeptide are said to be located at spatially overlapping positions with the start- and end-points of a corresponding region in a super-imposed structurally related polypeptide, if the equivalent backbone atoms of the start- and end-points of said corresponding regions have a root mean squared deviation (RMSD) value of 4 Angstrom or less; preferably of 2 Angstrom or less; and most preferably of 1 Angstrom or less.
  • RMSD root mean squared deviation
  • the invention in a second aspect, relates to a method for producing at least one combinatorial variant of a parent polypeptide, said at least one variant comprising one or more region from at least one structurally related polypeptide, wherein each region comprises at least 2 contiguous amino acids, said method comprising the steps of: (a) super-imposing a three-dimensional structure-model of the parent polypeptide and a three-dimensional structure-model of the at least one structurally related polypeptide; (b) identifying one or more region within the parent polypeptide and one or more corresponding region within the super-imposed at least one structurally related polypeptide, wherein the start- and end-points of said region and of said corresponding region respectively are located at spatially overlapping positions within the super-imposed three-dimensional structure-models; and (c) expressing a polynucleotide segment encoding the parent polypeptide in a recombinant host cell, wherein the selected one or more region in the parent polypeptide have been substituted by the corresponding one or more
  • the invention relates to a combinatorial variant of a parent polypeptide, said variant comprising one or more region from at least one structurally related polypeptide, wherein each region comprises at least 2 contiguous amino acids, and (a) wherein the start- and end-points of said one or more region of the variant are located in spatially overlapping positions with the corresponding start- and end-points of said one or more region of the parent polypeptide when a three-dimensional structure-model of the variant is super-imposed with a three-dimensional structure-model of the parent polypeptide; and/or (b) wherein the start- and end-points of said one or more region of the variant are located in spatially overlapping positions with the corresponding start- and end-points respectively of said one or more region of the at least one structurally related polypeptide when a three-dimensional structure-model of the variant is super-imposed with a three-dimensional structure-model of the at least one structurally related polypeptide.
  • FIGURES Figure 1 Illustrates the general principle of the invention.
  • Protein A, B, C, D and E represent proteins with known or potential tertiary structures in which the whole or part of the protein is structurally homologous to the core protein.
  • Each fragment represents a region that corresponds to the equivalent tertiary position in the core protein with flanking sequences corresponding to the core protein.
  • the method allows the creation of new recombinant genes with various replacements at the designated regions.
  • Figure 2 General scheme for the fragmentation of the core protein gene, SavinaseTM, and the role played by region replacement oligonucleotides in determining assembly.
  • the invention relates to a method for identifying one or more interchangeable region in a parent polypeptide and in at least one structurally related polypeptide, wherein each region comprises at least 2 contiguous amino acids, said method comprising the steps of: (a) super-imposing a three-dimensional structure-model of the parent polypeptide and a three-dimensional structure-model of the at least one structurally related polypeptide; and (b) identifying one or more region within the parent polypeptide and one or more corresponding region within the super-imposed at least one structurally related polypeptide, wherein the start- and end-points of said region and of said corresponding region respectively are located at spatially overlapping positions within the super-imposed three-dimensional structure-models.
  • the invention relates to where the parent polypeptide and the at least structurally related polypeptide are both enzymes, preferably subtilisins; preferably the parent enzyme and the structurally related enzyme both are from different enzyme classes, according to the Enzyme Commission (EC) classification; or preferably the parent enzyme and the structurally related enzyme are from the same enzyme class, according to the Enzyme Commission (EC) classification; even more preferably the parent enzyme and the structurally related enzyme are from different enzyme sub-classes, according to the Enzyme Commission (EC) classification, and still more preferably the parent enzyme and the structurally related enzyme both are from the same enzyme sub-class, according to the Enzyme Commission (EC) classification; or preferably the parent enzyme and the structurally related enzyme both are from the same sub-sub-class, according to the Enzyme Commission (EC) classification; or preferably the parent enzyme and the structurally related enzyme both are from the same sub-sub-class, according to the Enzyme Commission (EC) EC) classification; or preferably the parent enzyme and the structurally related enzyme both are from the
  • each region comprises at least 5 contiguous amino acids, preferably at least 10, 15, 20, 30, 40, or at least 50 contiguous amino acids.
  • the parent polypeptide and the structurally related polypeptide share an amino acid sequence identity of 90% or less; preferably 95% or less; 80%, 85%, 70%, 65%, or most preferably 60% or less.
  • Amino acid sequence alignment should be performed using the ClustalV algorithm with the default settings values: Gap Penalty of 10 and Gap length penalty of 10.
  • the PAM250 weight table should be used.
  • This version of ClustalV was embedded in the program MegAlign 4.03, DNASTAR Inc.
  • a preferred embodiment relates to where the super-imposing of the three- dimensional structure-models is done by using a computer equipped with a suitable program.
  • the computer programs within the free and publicly available Swiss-PDBViewer Deep View version 3.7 are used to super-impose the three- dimensional structure models of the invention (Guex, N. and Peitsch, M.C.
  • Swiss-PdbViewer is tightly linked to Swiss-Model, an automated homology modeling server developed within the Swiss Institute of Bioinformatics (SIB) in collaboration between GlaxoSmithKline R&D and the Structural Bioinformatics Group at the Biotechnik in Basel. Working with these two programs greatly reduces the amount of work necessary to generate models, as it is possible to thread a protein primary sequence onto a 3D template and get an immediate feedback of how well the threaded protein will be accepted by the reference structure before submitting a request to build missing loops and refine sidechain packing. Swiss-PdbViewer can also read electron density maps, and provides various tools to build into the density. In addition, various modeling tools are integrated and command files for popular energy minimization packages can be generated. The basics of some of the computer programs or tools available in SwissPDB-viewer are presented briefly here:
  • Fit molecules After having selected pairs of amino-acids that are equivalent in two layers, i.e. in two super-imposed three-dimensional structures, you can use this tool to match two layers at best. This is also called 3D-match and it is a more precise method than the "three corresponding atoms" that is accessible from the main window tools.
  • Magic Fit chooses automatically the best equivalent amino-acids in two structures, and feeds them into the 3D match procedure used in the Fit Molecule (auto) tool. Note, that this tool will fail when proteins are too divergent in the primary amino acid sequence. Iterative Magic Fit Same as Magic Fit, but the fitting will be even better, and the structural alignment will be automatically updated. Depending on the option you choose in the dialog, carbon atom (CA) or backbone, the root mean square deviation (RMSD) value will be minimized for those.
  • This tool is equivalent to several rounds of "Improve Fit", and it will generate a structural alignment for the fitted molecule, but it will also disrupt the structural alignment of other layers, and they might have to be regenerated subsequently.
  • This tool will find those amino-acids of the current (selected) layer, that are spatially close to the equivalent ones in the reference layer (the first pdb file to have been loaded), and then put appropriate gaps in the multiple sequence alignment displayed in the Align window. Note, that a best fit of the proteins to be super-imposed should be done before using this tool.
  • RMS between two molecules (two layers).
  • RMS means the square root of the arithmetic mean of the squares of the deviations from the mean. Only selected groups are taken into account. It must be specified which two layers should be calculated and which of the following atoms should be used in calculations: Carbon Alpha only (CA); Backbone only (N, CA, C); Sidechain only (all atoms except N, C, CA and O); or All atoms. Note: hydrogen atoms are never used for calculations. Note: HETATM should not be included in RMS calculations unless their atoms appear in the same order in the two PDB files to be super-imposed.
  • the computer programs for super-imposition such as the Swiss-PDBViewer, outputs the root mean squared deviation (RMSD) value for each three-dimensionally aligned atom.
  • RMSD root mean squared deviation
  • This value is essentially the distance between the two corresponding atoms in the superimposed structures or models. Consequently, the lower the value is for a given atom, or for a given number of contiguous atoms, the more structurally related the structures are.
  • the RMSD value for an entire backbone polypeptide is defined as the average value of each atom in the backbone. There may be regions, that have a very low RMSD, e.g. conserved structural features, and other regions may have a higher RMSD value, e.g. non- conserved structures, typically loops.
  • the RMSD value is usually calculated between equivalent or corresponding atoms in the structures. If there is no equivalent atom, e.g. if there is an insertion or deletion in the sequence, then no RMSD can be calculated at that position. For example, in the examples below, no RMSD can be calculated for the region R1 between Savinase and BPN' due to an insertion in that region; but in the R3 region of the same example, the RMSD value is approx. 0.88 Angstrom.
  • the RMSD value of equivalent atoms and regions can be calculated automatically by a suitable computer program as listed above using the alignment information, or manually in which case the user defines the aligned region or atom. The latter is useful, when homology is restricted to only certain points of the protein.
  • the start- and end-points of said corresponding regions in each polypeptide are located at spatially overlapping positions, so that the equivalent backbone atoms of the start- and end-points of said corresponding regions in each polypeptide have a root mean squared deviation (RMSD) of 4 Angstrom or less; preferably of 2 Angstrom or less; and most preferably of 1 Angstrom or less.
  • RMSD root mean squared deviation
  • the invention in a second aspect, relates to a method for producing at least one combinatorial variant of a parent polypeptide, said at least one variant comprising one or more region from at least one structurally related polypeptide, wherein each region comprises at least 2 contiguous amino acids, said method comprising the steps of: (a) super-imposing a three-dimensional structure-model of the parent polypeptide and a three-dimensional structure-model of the at least one structurally related polypeptide; (b) identifying one or more region within the parent polypeptide and one or more corresponding region within the super-imposed at least one structurally related polypeptide, wherein the start- and end-points of said region and of said corresponding region respectively are located at spatially overlapping positions within the super-imposed three-dimensional structure-models; and (c) expressing a polynucleotide segment encoding the parent polypeptide in a recombinant host cell, wherein the selected one or more region in the parent polypeptide have been substituted by the corresponding one or more
  • a preferred embodiment relates to where more than one combinatorial variant is produced in step (c) of the method of the second aspect, and a subsequent step of selecting one or more produced combinatorial variant is carried out.
  • the one or more interchangeable region is identified region in a parent polypeptide and in at least one structurally related polypeptide
  • at least one combinatorial variant of the parent polypeptide can be produced which comprises one or more of the region of the at least one structurally related polypeptide. This may be done by constructing encoding at least one polynucleotide encoding said combinatorial variant(s), and expressing said polynucleotide(s) in a host cell.
  • polynucleotide(s) there are many ways of artificially constructing said polynucleotide(s), as outlined below or as exemplified herein.
  • a preferred embodiment relates to where the polynucleotide segment encoding the at least one variant of the parent polypeptide is constructed artificially; preferably by a PCR method using primers or oligonucleotides designed to anneal with DNA encoding the parent polypeptide and/or with DNA encoding the at least one structurally related polypeptide under appropriate conditions.
  • nucleic acid construct or “polynucleotide segment” means a nucleic acid molecule, either single- or double-stranded, which is isolated from a naturally occurring gene or which has been modified to contain segments of nucleic acids in a manner that would not otherwise exist in nature.
  • nucleic acid construct is synonymous with the term "expression cassette" when the nucleic acid construct contains the control sequences required for expression of a coding sequence of the present invention.
  • control sequences is defined herein to include all components, which are necessary or advantageous for the expression of a polypeptide of the present invention. Each control sequence may be native or foreign to the nucleotide sequence encoding the polypeptide. Such control sequences include, but are not limited to, a leader, polyadenylation sequence, propeptide sequence, promoter, signal peptide sequence, and transcription terminator. At a minimum, the control sequences include a promoter, and transcriptional and translational stop signals.
  • control sequences may be provided with linkers for the purpose of introducing specific restriction sites facilitating ligation of the control sequences with the coding region of the nucleotide sequence encoding a polypeptide.
  • linkers for the purpose of introducing specific restriction sites facilitating ligation of the control sequences with the coding region of the nucleotide sequence encoding a polypeptide.
  • operably linked is defined herein as a configuration in which a control sequence is appropriately placed at a position relative to the coding sequence of the DNA sequence such that the control sequence directs the expression of a polypeptide.
  • coding sequence is intended to cover a nucleotide sequence, which directly specifies the amino acid sequence of its protein product. The boundaries of the coding sequence are generally determined by an open reading frame, which usually begins with the ATG start codon.
  • the coding sequence typically includes DNA, cDNA, and recombinant nucleotide sequences.
  • expression includes any step involved in the production of the polypeptide including, but not limited to, transcription, post-transcriptional modification, translation, post-translational modification, and secretion.
  • expression vector covers a DNA molecule, linear or circular, that comprises a segment encoding a polypeptide of the invention, and which is operably linked to additional segments that provide for its transcription.
  • the present invention also relates to nucleic acid constructs comprising a nucleotide sequence of the present invention operably linked to one or more control sequences that direct the expression of the coding sequence in a suitable host cell under conditions compatible with the control sequences.
  • a nucleotide sequence encoding a polypeptide of the present invention may be manipulated in a variety of ways to provide for expression of the polypeptide. Manipulation of the nucleotide sequence prior to its insertion into a vector may be desirable or necessary depending on the expression vector.
  • the control sequence may be an appropriate promoter sequence, a nucleotide sequence which is recognized by a host cell for expression of the nucleotide sequence.
  • the promoter sequence contains transcriptional control sequences, which mediate the expression of the polypeptide.
  • the promoter may be any nucleotide sequence which shows transcriptional activity in the host cell of choice including mutant, truncated, and hybrid promoters, and may be obtained from genes encoding extracellular or intracellular polypeptides either homologous or heterologous to the host cell.
  • Suitable promoters for directing the transcription of the nucleic acid constructs of the present invention are the promoters obtained from the E. coli lac operon, Streptomyces coelicolor agarase gene (dagA), Bacillus subtilis levansucrase gene (sacB), Bacillus licheniformis alpha-amylase gene ⁇ amyL), Bacillus stearothermophilus maltogenic amylase gene (amyM), Bacillus amyloliquefaciens alpha-amylase gene (amyQ), Bacillus licheniformis penicillinase gene (penP), Bacillus subtilis xylA and xylB genes, and prokaryotic beta-lactamase gene (Villa-Kamaroff et al., 1978, Proceedings of the National Academy of Sciences USA 75: 3727-3731), as well as the tac promoter (DeBoer et al., 1983
  • promoters obtained from the genes for Aspergillus oryzae TAKA amylase, Rhizomucor miehei aspartic proteinase, Aspergillus niger neutral alpha-amylase, Aspergillus niger acid stable alpha- amylase, Aspergillus niger or Aspergillus awamori glucoamylase ⁇ glaA), Rhizomucor miehei lipase, Aspergillus oryzae alkaline protease, Aspergillus oryzae triose phosphate isomerase, Aspergillus nidulans acetamidase, and
  • useful promoters are obtained from the genes for Saccharomyces cerevisiae enolase (ENO-1), Saccharomyces cerevisiae galactokinase (GAL1), Saccharomyces cerevisiae alcohol dehydrogenase/glyceraldehyde-3-phosphate dehydrogenase (ADH2/GAP), and Saccharomyces cerevisiae 3-phosphoglycerate kinase.
  • ENO-1 Saccharomyces cerevisiae enolase
  • GAL1 Saccharomyces cerevisiae galactokinase
  • ADH2/GAP Saccharomyces cerevisiae alcohol dehydrogenase/glyceraldehyde-3-phosphate dehydrogenase
  • Saccharomyces cerevisiae 3-phosphoglycerate kinase Other useful promoters for yeast host cells are described by Romanos et al., 1992, Yeast 8: 423
  • the terminator sequence is operably linked to the 3' terminus of the nucleotide sequence encoding the polypeptide.
  • Any terminator which is functional in the host cell of choice may be used in the present invention.
  • Preferred terminators for filamentous fungal host cells are obtained from the genes for Aspergillus oryzae TAKA amylase, Aspergillus niger glucoamylase, Aspergillus nidulans anthranilate synthase, Aspergillus niger alpha-glucosidase, and Fusarium oxysporum trypsin- like protease.
  • Preferred terminators for yeast host cells are obtained from the genes for
  • the control sequence may also be a suitable leader sequence, a nontranslated region of an mRNA which is important for translation by the host cell.
  • the leader sequence is operably linked to the 5' terminus of the nucleotide sequence encoding the polypeptide.
  • leader sequence that is functional in the host cell of choice may be used in the present invention.
  • Preferred leaders for filamentous fungal host cells are obtained from the genes for Aspergillus oryzae TAKA amylase and Aspergillus nidulans triose phosphate isomerase.
  • Suitable leaders for yeast host cells are obtained from the genes for Saccharomyces cerevisiae enolase (ENO-1), Saccharomyces cerevisiae 3-phosphoglycerate kinase, Saccharomyces cerevisiae alpha-factor, and Saccharomyces cerevisiae alcohol dehydrogenase/glyceraldehyde-3-phosphate dehydrogenase (ADH2/GAP).
  • the control sequence may also be a polyadenylation sequence, a sequence operably linked to the 3' terminus of the nucleotide sequence and which, when transcribed, is recognized by the host cell as a signal to add polyadenosine residues to transcribed mRNA. Any polyadenylation sequence which is functional in the host cell of choice may be used in the present invention.
  • Preferred polyadenylation sequences for filamentous fungal host cells are obtained from the genes for Aspergillus oryzae TAKA amylase, Aspergillus niger glucoamylase, Aspergillus nidulans anthranilate synthase, Fusarium oxysporum trypsin-like protease, and Aspergillus niger alpha-glucosidase.
  • Useful polyadenylation sequences for yeast host cells are described by Guo and Sherman, 1995, Molecular Cellular Biology 15: 5983-5990.
  • the control sequence may also be a signal peptide coding region that codes for an amino acid sequence linked to the amino terminus of a polypeptide and directs the encoded polypeptide into the cell's secretory pathway.
  • the 5' end of the coding sequence of the nucleotide sequence may inherently contain a signal peptide coding region naturally linked in translation reading frame with the segment of the coding region which encodes the secreted polypeptide.
  • the 5' end of the coding sequence may contain a signal peptide coding region which is foreign to the coding sequence.
  • the foreign signal peptide coding region may be required where the coding sequence does not naturally contain a signal peptide coding region.
  • the foreign signal peptide coding region may simply replace the natural signal peptide coding region in order to enhance secretion of the polypeptide.
  • any signal peptide coding region which directs the expressed polypeptide into the secretory pathway of a host cell of choice may be used in the present invention.
  • Effective signal peptide coding regions for bacterial host cells are the signal peptide coding regions obtained from the genes for Bacillus NCIB 11837 maltogenic amylase, Bacillus stearothermophilus alpha-amylase, Bacillus licheniformis subtilisin, Bacillus licheniformis alpha-amylase, Bacillus stearothermophilus neutral proteases ⁇ nprT, nprS, nprM), and Bacillus subtilis prsA. Further signal peptides are described by Simonen and Palva, 1993, Microbiological Reviews 57: 109-137.
  • Effective signal peptide coding regions for filamentous fungal host cells are the signal peptide coding regions obtained from the genes for Aspergillus oryzae TAKA amylase, Aspergillus niger neutral amylase, Aspergillus niger glucoamylase, Rhizomucor miehei aspartic proteinase, Humicola insolens cellulase, and Humicola lanuginosa lipase.
  • Useful signal peptides for yeast host cells are obtained from the genes for Saccharomyces cerevisiae alpha-factor and Saccharomyces cerevisiae invertase. Other useful signal peptide coding regions are described by Romanos et al., 1992, supra.
  • the control sequence may also be a propeptide coding region that codes for an amino acid sequence positioned at the amino terminus of a polypeptide.
  • the resultant polypeptide is known as a proenzyme or propolypeptide (or a zymogen in some cases).
  • a propolypeptide is generally inactive and can be converted to a mature active polypeptide by catalytic or autocatalytic cleavage of the propeptide from the propolypeptide.
  • the propeptide coding region may be obtained from the genes for Bacillus subtilis alkaline protease (aprE), Bacillus subtilis neutral protease (nprT), Saccharomyces cerevisiae alpha-factor, Rhizomucor miehei aspartic proteinase, and Myceliophthora thermophila laccase (WO 95/33836). Where both signal peptide and propeptide regions are present at the amino terminus of a polypeptide, the propeptide region is positioned next to the amino terminus of a polypeptide and the signal peptide region is positioned next to the amino terminus of the propeptide region.
  • regulatory sequences which allow the regulation of the expression of the polypeptide relative to the growth of the host cell.
  • regulatory systems are those which cause the expression of the gene to be turned on or off in response to a chemical or physical stimulus, including the presence of a regulatory compound.
  • Regulatory systems in prokaryotic systems include the lac, tac, and trp operator systems.
  • yeast the ADH2 system or GAL1 system may be used.
  • filamentous fungi the TAKA alpha-amylase promoter, Aspergillus niger glucoamylase promoter, and Aspergillus oryzae glucoamylase promoter may be used as regulatory sequences.
  • Other examples of regulatory sequences are those which allow for gene amplification.
  • the nucleotide sequence encoding the polypeptide would be operably linked with the regulatory sequence.
  • the present invention also relates to recombinant expression vectors comprising the nucleic acid construct of the invention.
  • the various nucleotide and control sequences described above may be joined together to produce a recombinant expression vector which may include one or more convenient restriction sites to allow for insertion or substitution of the nucleotide sequence encoding the polypeptide at such sites.
  • the nucleotide sequence of the present invention may be expressed by inserting the nucleotide sequence or a nucleic acid construct comprising the sequence into an appropriate vector for expression.
  • the coding sequence is located in the vector so that the coding sequence is operably linked with the appropriate control sequences for expression.
  • the recombinant expression vector may be any vector (e.g., a plasmid or virus) which can be conveniently subjected to recombinant DNA procedures and can bring about the expression of the nucleotide sequence.
  • the choice of the vector will typically depend on the compatibility of the vector with the host cell into which the vector is to be introduced.
  • the vectors may be linear or closed circular plasmids.
  • the vector may be an autonomously replicating vector, i.e., a vector which exists as an extrachromosomal entity, the replication of which is independent of chromosomal replication, e.g., a plasmid, an extrachromosomal element, a minichromosome, or an artificial chromosome.
  • the vector may contain any means for assuring self-replication.
  • the vector may be one which, when introduced into the host cell, is integrated into the genome and replicated together with the chromosome(s) into which it has been integrated.
  • a single vector or plasmid or two or more vectors or plasmids which together contain the total DNA to be introduced into the genome of the host cell, or a transposon may be used.
  • the vectors of the present invention preferably contain one or more selectable markers which permit easy selection of transformed cells.
  • a selectable marker is a gene the product of which provides for biocide or viral resistance, resistance to heavy metals, prototrophy to auxotrophs, and the like.
  • Examples of bacterial selectable markers are the dal genes from Bacillus subtilis or Bacillus licheniformis, or markers which confer antibiotic resistance such as ampicillin, kanamycin, chloramphenicol or tetracycline resistance.
  • Suitable markers for yeast host cells are ADE2, HIS3, LEU2, LYS2, MET3, TRP1 , and URA3.
  • Selectable markers for use in a filamentous fungal host cell include, but are not limited to, amdS (acetamidase), argB (ornithine carbamoyltransferase), bar (phosphinothricin acetyltransferase), hygB (hygromycin phosphotransferase), t7/ ' aD (nitrate reductase), pyrG (orotidine-5'-phosphate decarboxylase), sC (sulfate adenyltransferase), trpC (anthranilate synthase), as well as equivalents thereof.
  • the vectors of the present invention preferably contain an element(s) that permits stable integration of the vector into the host cell's genome or autonomous replication of the vector in the cell independent of the genome.
  • the vector may rely on the nucleotide sequence encoding the polypeptide or any other element of the vector for stable integration of the vector into the genome by homologous or nonhomologous recombination.
  • the vector may contain additional nucleotide sequences for directing integration by homologous recombination into the genome of the host cell.
  • the additional nucleotide sequences enable the vector to be integrated into the host cell genome at a precise location(s) in the chromosome(s).
  • the integrational elements should preferably contain a sufficient number of nucleotides, such as 100 to 1 ,500 base pairs, preferably 400 to 1 ,500 base pairs, and most preferably 800 to 1 ,500 base pairs, which are highly homologous with the corresponding target sequence to enhance the probability of homologous recombination.
  • the integrational elements may be any sequence that is homologous with the target sequence in the genome of the host cell.
  • the integrational elements may be non-encoding or encoding nucleotide sequences.
  • the vector may be integrated into the genome of the host cell by non-homologous recombination.
  • the vector may further comprise an origin of replication enabling the vector to replicate autonomously in the host cell in question.
  • bacterial origins of replication are the origins of replication of plasmids pBR322, pUC19, pACYC177, and pACYC184 permitting replication in E. coli, and pUB110, pE194, pTA1060, and pAM ⁇ l permitting replication in Bacillus.
  • origins of replication for use in a yeast host cell are the 2 micron origin of replication, ARS1 , ARS4, the combination of ARS1 and CEN3, and the combination of ARS4 and CEN6.
  • the origin of replication may be one having a mutation which makes its functioning temperature-sensitive in the host cell (see, e.g., Ehrlich, 1978, Proceedings of the National Academy of Sciences USA 75: 1433). More than one copy of a nucleotide sequence of the present invention may be inserted into the host cell to increase production of the gene product.
  • An increase in the copy number of the nucleotide sequence can be obtained by integrating at least one additional copy of the sequence into the host cell genome or by including an amplifiable selectable marker gene with the nucleotide sequence where cells containing amplified copies of the selectable marker gene, and thereby additional copies of the nucleotide sequence, can be selected for by cultivating the cells in the presence of the appropriate selectable agent.
  • the procedures used to ligate the elements described above to construct the recombinant expression vectors of the present invention are well known to one skilled in the art (see, e.g., Sambrook er a/., 1989, supra).
  • the term "host cell”, as used herein, includes any cell type which is susceptible to transformation with a nucleic acid construct.
  • the recombinant host cell is a microorganism; preferably the microorganism is prokaryotic or eukaryotic; or preferably the microorganism is prokaryotic and Gram-positive; and preferably then the Gram positive microorganism is of the genus Bacillus.
  • the present invention also relates to recombinant a host cell comprising the nucleic acid construct of the invention, which are advantageously used in the recombinant production of the polypeptides.
  • a vector comprising a nucleotide sequence of the present invention is introduced into a host cell so that the vector is maintained as a chromosomal integrant or as a self-replicating extra-chromosomal vector as described earlier.
  • the host cell may be a unicellular microorganism, e.g., a prokaryote, or a non- unicellular microorganism, e.g., a eukaryote.
  • Useful unicellular cells are bacterial cells such as gram positive bacteria including, but not limited to, a Bacillus cell, e.g., Bacillus alkalophilus, Bacillus amyloliquefaciens, Bacillus brevis, Bacillus circulans, Bacillus clausii, Bacillus coagulans, Bacillus lautus, Bacillus lentus, Bacillus licheniformis, Bacillus megaterium, Bacillus stearothermophilus, Bacillus subtilis, and Bacillus thuringiensis; or a Streptomyces cell, e.g., Streptomyces lividans or Streptomyces murinus, or gram negative bacteria such as E.
  • the bacterial host cell is a Bacillus lentus, Bacillus licheniformis, Bacillus stearothermophilus, or Bacillus subtilis cell.
  • the Bacillus cell is an alkalophilic Bacillus.
  • the introduction of a vector into a bacterial host cell may, for instance, be effected by protoplast transformation (see, e.g., Chang and Cohen, 1979, Molecular General Genetics 168: 111-115), using competent cells (see, e.g., Young and Spizizin, 1961 , Journal of Bacteriology 81 : 823-829, or Dubnau and Davidoff-Abelson, 1971 , Journal of Molecular Biology 56: 209-221), electroporation (see, e.g., Shigekawa and Dower, 1988, Biotechniques 6: 742-751), or conjugation (see, e.g., Koehler and Thome, 1987, Journal of Bacteriology 169: 5771-5278).
  • protoplast transformation see, e.g., Chang and Cohen, 1979, Molecular General Genetics 168: 111-115
  • competent cells see, e.g., Young and Spizizin, 1961 , Journal of Bacteriology 81 : 823-829
  • the host cell may be a eukaryote, such as a mammalian, insect, plant, or fungal cell.
  • the host cell is a fungal cell.
  • "Fungi” as used herein includes the phyla Ascomycota, Basidiomycota, Chytridiomycota, and Zygomycota (as defined by Hawksworth et al., In, Ainsworth and Bisby's Dictionary of The Fungi, 8th edition, 1995, CAB International, University Press, Cambridge, UK) as well as the Oomycota (as cited in Hawksworth et al., 1995, supra, page 171) and all mitosporic fungi (Hawksworth et al., 1995, supra).
  • the fungal host cell is a yeast cell.
  • yeast as used herein includes ascosporogenous yeast (Endomycetales), basidiosporogenous yeast, and yeast belonging to the Fungi Imperfecti (Blastomycetes). Since the classification of yeast may change in the future, for the purposes of this invention, yeast shall be defined as described in Biology and Activities of Yeast (Skinner, F.A., Passmore, S.M., and Davenport, R.R., eds, Soc. App. Bacteriol. Symposium Series No. 9, 1980).
  • the yeast host cell is a Candida, Hansenula, Kluyveromyces, Pichia, Saccharomyces, Schizosaccharomyces, or Yarrowia cell.
  • the yeast host cell is a Saccharomyces carlsbergensis, Saccharomyces cerevisiae, Saccharomyces diastaticus, Saccharomyces douglasii, Saccharomyces kluyveri, Saccharomyces norbensis or Saccharomyces oviformis cell.
  • the yeast host cell is a Kluyveromyces lactis cell.
  • the yeast host cell is a Yarrowia lipolytica cell.
  • the fungal host cell is a filamentous fungal cell.
  • "Filamentous fungi" include all filamentous forms of the subdivision Eumycota and Oomycota (as defined by Hawksworth er al., 1995, supra). The filamentous fungi are characterized by a mycelial wall composed of chitin, cellulose, glucan, chitosan, mannan, and other complex polysaccharides. Vegetative growth is by hyphal elongation and carbon catabolism is obligately aerobic.
  • the filamentous fungal host cell is a cell of a species of, but not limited to, Acremonium, Aspergillus, Fusarium, Humicola, Mucor, Myceliophthora, Neurospora, Penicillium, Thielavia, Tolypocladium, or Trichoderma.
  • the filamentous fungal host cell is an Aspergillus awamori, Aspergillus foetidus, Aspergillus japonicus, Aspergillus nidulans, Aspergillus niger or Aspergillus oryzae cell.
  • the filamentous fungal host cell is a Fusarium bactridioides, Fusarium cerealis, Fusarium crookwellense, Fusarium culmorum, Fusarium graminearum, Fusarium graminum, Fusarium heterosporum, Fusarium negundi, Fusarium oxysporum, Fusarium reticulatum, Fusarium roseum, Fusarium sambucinum, Fusarium sarcochroum, Fusarium sporotrichioides, Fusarium sulphureum, Fusarium torulosum, Fusarium trichothecioides, or Fusarium venenatum cell.
  • the filamentous fungal parent cell is a Fusarium venenatum (Nirenberg sp. nov.) cell.
  • the filamentous fungal host cell is a Humicola insolens, Humicola lanuginosa, Mucor miehei, Myceliophthora thermophila, Neurospora crassa, Penicillium purpurogenum, Thielavia terrestris, Trichoderma harzianum, Trichoderma koningii, Trichoderma longibrachiatum, Trichoderma reesei, or Trichoderma viride cell.
  • Fungal cells may be transformed by a process involving protoplast formation, transformation of the protoplasts, and regeneration of the cell wall in a manner known perse. Suitable procedures for transformation of Aspergillus host cells are described in EP 238 023 and Yelton er al., 1984, Proceedings of the National Academy of Sciences USA 81 : 1470- 1474. Suitable methods for transforming Fusarium species are described by Malardier et al., 1989, Gene 78: 147-156 and WO 96/00787. Yeast may be transformed using the procedures described by Becker and Guarente, In Abelson, J.N.
  • the present invention also relates to methods for producing a polypeptide of the present invention comprising (a) cultivating a host cell under conditions conducive for production of the polypeptide; and (b) recovering the polypeptide.
  • the cells are cultivated in a nutrient medium suitable for production of the polypeptide using methods known in the art.
  • the cell may be cultivated by shake flask cultivation, small-scale or large-scale fermentation (including continuous, batch, fed-batch, or solid state fermentations) in laboratory or industrial fermentors performed in a suitable medium and under conditions allowing the polypeptide to be expressed and/or isolated.
  • the cultivation takes place in a suitable nutrient medium comprising carbon and nitrogen sources and inorganic salts, using procedures known in the art. Suitable media are available from commercial suppliers or may be prepared according to published compositions (e.g., in catalogues of the American Type Culture Collection). If the polypeptide is secreted into the nutrient medium, the polypeptide can be recovered directly from the medium. If the polypeptide is not secreted, it can be recovered from cell lysates.
  • the polypeptides may be detected using methods known in the art that are specific for the polypeptides. These detection methods may include use of specific antibodies, formation of an enzyme product, or disappearance of an enzyme substrate. For example, an enzyme assay may be used to determine the activity of the polypeptide as described herein.
  • the resulting polypeptide may be recovered by methods known in the art. For example, the polypeptide may be recovered from the nutrient medium by conventional procedures including, but not limited to, centrifugation, filtration, extraction, spray-drying, evaporation, or precipitation.
  • polypeptides of the present invention may be purified by a variety of procedures known in the art including, but not limited to, chromatography (e.g., ion exchange, affinity, hydrophobic, chromatofocusing, and size exclusion), electrophoretic procedures (e.g., preparative isoelectric focusing), differential solubility (e.g., ammonium sulfate precipitation), SDS-PAGE, or extraction (see, e.g., Protein Purification, J.-C. Janson and Lars Ryden, editors, VCH Publishers, New York, 1989).
  • chromatography e.g., ion exchange, affinity, hydrophobic, chromatofocusing, and size exclusion
  • electrophoretic procedures e.g., preparative isoelectric focusing
  • differential solubility e.g., ammonium sulfate precipitation
  • SDS-PAGE or extraction
  • the invention relates to a combinatorial variant of a parent polypeptide, said variant comprising one or more region from at least one structurally related polypeptide, wherein each region comprises at least 2 contiguous amino acids, and (a) wherein the start- and end-points of said one or more region of the variant are located in spatially overlapping positions with the corresponding start- and end-points of said one or more region of the parent polypeptide when a three-dimensional structure-model of the variant is super-imposed with a three-dimensional structure-model of the parent polypeptide; and/or (b) wherein the start- and end-points of said one or more region of the variant are located in spatially overlapping positions with the corresponding start- and end-points respectively of said one or more region of the at least one structurally related polypeptide when a three-dimensional structure-model of the variant is super-imposed with a three-dimensional structure-model of the at least one structurally related polypeptide.
  • wash performance is used as an enzyme's ability to remove proteinaceous or organic stains present on the object to be cleaned during e.g. wash or hard surface cleaning. See also the wash performance test in Example 4 herein.
  • the enzyme of the invention may be added to and thus become a component of a detergent composition.
  • cleaning and detergent compositions are well described in the art and reference is made to WO 96/34946; WO 97/07202; WO 95/30011 for further description of suitable cleaning and detergent compositions.
  • the detergent composition of the invention may for example be formulated as a hand or machine laundry detergent composition including a laundry additive composition suitable for pre-treatment of stained fabrics and a rinse added fabric softener composition, or be formulated as a detergent composition for use in general household hard surface cleaning operations, or be formulated for hand or machine dishwashing operations.
  • the invention provides a detergent additive comprising the enzyme of the invention.
  • the detergent additive as well as the detergent composition may comprise one or more other enzymes such as a protease, a lipase, a cutinase, an amylase, a carbohydrase, a cellulase, a pectinase, a mannanase, an arabinase, a galactanase, a xylanase, an oxidase, e.g., a laccase, and/or a peroxidase.
  • the properties of the chosen enzyme(s) should be compatible with the selected detergent, (i.e. pH-optimum, compatibility with other enzymatic and non-enzymatic ingredients, etc.), and the enzyme(s) should be present in effective amounts.
  • proteases include those of animal, vegetable or microbial origin. Microbial origin is preferred. Chemically modified or protein engineered mutants are included.
  • the protease may be a serine protease or a metallo protease, preferably an alkaline microbial protease or a trypsin-like protease.
  • alkaline proteases are subtilisins, especially those derived from Bacillus, e.g., subtilisin Novo, subtilisin Carlsberg, subtilisin 309, subtilisin 147 and subtilisin 168 (described in WO 89/06279).
  • trypsin-like proteases are trypsin (e.g.
  • proteases are the variants described in WO 92/19729, WO 98/20115, WO 98/20116, and WO 98/34946, especially the variants with substitutions in one or more of the following positions: 27, 36, 57, 76, 87, 97, 101 , 104, 120, 123, 167, 170, 194, 206, 218, 222, 224, 235 and 274.
  • Preferred commercially available protease enzymes include Durazym ® , Relase ® , Alcalase ® , Savinase ® , Primase ® , Duralase ® , Esperase ® , Ovozyme ® and Kannase ® (Novozymes A S), MaxataseTM, MaxacalTM, MaxapemTM, ProperaseTM, PurafectTM, Purafect OxPTM, FN2TM, FN3TM and FN4TM (Genencor International, Inc.).
  • Suitable lipases include those of bacterial or fungal origin. Chemically modified or protein engineered mutants are included. Examples of useful lipases include lipases from Humicola (synonym Thermomyces), e.g. from H. lanuginosa (T. lanuginosus) as described in EP 258 068 and EP 305 216 or from H. insolens as described in WO 96/13580, a Pseudomonas lipase, e.g. from P. alcaligenes or P. pseudoalcaligenes (EP 218 272), P. cepacia (EP 331 376), P. stutzeri (GB 1,372,034), P.
  • lipase variants such as those described in WO 92/05249, WO 94/01541 , EP 407 225, EP 260 105, WO 95/35381 , WO 96/00292, WO 95/30744, WO 94/25578, WO 95/14783, WO 95/22615, WO 97/04079 and WO 97/07202.
  • Preferred commercially available lipase enzymes include Lipex ® , Lipolase ® and Lipolase Ultra ® (Novozymes A/S).
  • Amylases include those of bacterial or fungal origin. Chemically modified or protein engineered mutants are included. Amylases include, for example, ⁇ -amylases obtained from Bacillus, e.g. a special strain of B. licheniformis, described in more detail in GB 1 ,296,839.
  • amylases examples include the variants described in WO 94/02597, WO 94/18314, WO 96/23873, and WO 97/43424, especially the variants with substitutions in one or more of the following positions: 15, 23, 105, 106, 124, 128, 133, 154, 156, 181, 188, 190, 197, 202, 208, 209, 243, 264, 304, 305, 391 , 408, and 444.
  • Commercially available amylases are Duramyl ® , Termamyl ® , Fungamyl ® and BAN ® (Novozymes A/S), RapidaseTM and PurastarTM (from Genencor International Inc.).
  • Suitable cellulases include those of bacterial or fungal origin. Chemically modified or protein engineered mutants are included. Suitable cellulases include cellulases from the genera Bacillus, Pseudomonas, Humicola, Fusarium, Thielavia, Acremonium, e.g. the fungal cellulases produced from Humicola insolens, Myceliophthora thermophila and Fusarium oxysporum disclosed in US 4,435,307, US 5,648,263, US 5,691 ,178, US 5,776,757 and WO 89/09259. Especially suitable cellulases are the alkaline or neutral cellulases having colour care benefits.
  • cellulases examples include cellulases described in EP 0 495 257, EP 0 531 372, WO 96/11262, WO 96/29397, WO 98/08940.
  • cellulase variants such as those described in WO 94/07998, EP 0 531 315, US 5,457,046, US 5,686,593, US 5,763,254, WO 95/24471 , WO 98/12307 and PCT/DK98/00299.
  • cellulases include Celluzyme ® , and Carezyme ® (Novozymes A/S), ClazinaseTM, and Puradax HATM (Genencor International Inc.), and KAC-500(B)TM (Kao Corporation).
  • Peroxidases/Oxidases include those of plant, bacterial or fungal origin. Chemically modified or protein engineered mutants are included. Examples of useful peroxidases include peroxidases from Coprinus, e.g. from C. cinereus, and variants thereof as those described in WO 93/24618, WO 95/10602, and WO 98/15257. Commercially available peroxidases include Guardzyme ® (Novozymes A/S).
  • the detergent enzyme(s) may be included in a detergent composition by adding separate additives containing one or more enzymes, or by adding a combined additive comprising all of these enzymes.
  • a detergent additive of the invention i.e. a separate additive or a combined additive, can be formulated e.g. as a granulate, a liquid, a slurry, etc.
  • Preferred detergent additive formulations are granulates, in particular non-dusting granulates, liquids, in particular stabilized liquids, or slurries.
  • Non-dusting granulates may be produced, e.g., as disclosed in US 4,106,991 and 4,661,452 and may optionally be coated by methods known in the art.
  • waxy coating materials are poly(ethylene oxide) products (polyethyleneglycol, PEG) with mean molar weights of 1000 to 20000; ethoxylated nonylphenols having from 16 to 50 ethylene oxide units; ethoxylated fatty alcohols in which the alcohol contains from 12 to 20 carbon atoms and in which there are 15 to 80 ethylene oxide units; fatty alcohols; fatty acids; and mono- and di- and triglycerides of fatty acids.
  • PEG poly(ethylene oxide) products
  • PEG polyethyleneglycol
  • Liquid enzyme pre- parations may, for instance, be stabilized by adding a polyol such as propylene glycol, a sugar or sugar alcohol, lactic acid or boric acid according to established methods.
  • Protected enzymes may be prepared according to the method disclosed in EP 238,216.
  • the detergent composition of the invention may be in any convenient form, e.g., a bar, a tablet, a powder, a granule, a paste or a liquid.
  • a liquid detergent may be aqueous, typically containing up to 70 % water and 0-30 % organic solvent, or non-aqueous.
  • the detergent composition comprises one or more surfactants, which may be non- ionic including semi-polar and/or anionic and/or cationic and/or zwitterionic.
  • the surfactants are typically present at a level of from 0.1 % to 60% by weight.
  • the detergent will usually contain from about 1% to about 40% of an anionic surfactant such as linear alkylbenzenesulfonate, alpha-olefinsulfonate, alkyl sulfate (fatty alcohol sulfate), alcohol ethoxysulfate, secondary alkanesulfonate, alpha-sulfo fatty acid methyl ester, alkyl- or alkenylsuccinic acid or soap.
  • an anionic surfactant such as linear alkylbenzenesulfonate, alpha-olefinsulfonate, alkyl sulfate (fatty alcohol sulfate), alcohol ethoxysulfate, secondary alkanesulfonate,
  • the detergent When included therein the detergent will usually contain from about 0.2% to about 40% of a non-ionic surfactant such as alcohol ethoxylate, nonylphenol ethoxylate, alkylpolyglycoside, alkyldimethylamineoxide, ethoxylated fatty acid monoethanolamide, fatty acid monoethanol- amide, polyhydroxy alkyl fatty acid amide, or N-acyl N-alkyl derivatives of glucosamine (“glucamides").
  • a non-ionic surfactant such as alcohol ethoxylate, nonylphenol ethoxylate, alkylpolyglycoside, alkyldimethylamineoxide, ethoxylated fatty acid monoethanolamide, fatty acid monoethanol- amide, polyhydroxy alkyl fatty acid amide, or N-acyl N-alkyl derivatives of glucosamine (“glucamides”).
  • glucamides N-acyl N-alkyl derivatives of glucosamine
  • the detergent may contain 0-65 % of a detergent builder or complexing agent such as zeolite, diphosphate, triphosphate, phosphonate, carbonate, citrate, nitrilotriacetic acid, ethylene-diaminetetraacetic acid, diethylenetriaminepentaacetic acid, alkyl- or alkenyl- succinic acid, soluble silicates or layered silicates (e.g. SKS-6 from Hoechst).
  • the detergent may comprise one or more polymers.
  • Examples are carboxymethyl- cellulose, poly(vinylpyrrolidone), poly (ethylene glycol), poly(vinyl alcohol), poly(vinyl- pyridine-N-oxide), poly(vinylimidazole), polycarboxylates such as poly-acrylates, maleic/acrylic acid copolymers and lauryl methacrylate/acrylic acid copolymers.
  • the detergent may contain a bleaching system which may comprise a H 2 O 2 source such as perborate or percarbonate which may be combined with a peracid-forming bleach activator such as tetraacetylethylenediamine or nonanoyloxybenzenesulfonate.
  • the bleaching system may comprise peroxyacids of e.g.
  • the detergent may also contain other conventional detergent ingredients such as e.g. fabric conditioners including clays, foam boosters, suds suppressors, anti-corrosion agents, soil- suspending agents, anti-soil redeposition agents, dyes, bactericides, optical brighteners, hydrotropes, tarnish inhibitors, or perfumes. Variations in local and regional conditions, such as water hardness and wash temperature calls for regional detergent compositions. Detergent Examples 1 and 2 tabulated below provide ranges for the composition of a typical Latin American detergent and a typical European powder detergent respectively.
  • the enzyme(s) of the detergent composition of the invention may be stabilized using conventional stabilizing agents, e.g., a polyol such as propylene glycol or glycerol, a sugar or sugar alcohol, lactic acid, boric acid, or a boric acid derivative, e.g., an aromatic borate ester, or a phenyl boronic acid derivative such as 4-formylphenyl boronic acid, and the composition may be formulated as described in e.g. WO 92/19709 and WO 92/19708.
  • a polyol such as propylene glycol or glycerol
  • a sugar or sugar alcohol lactic acid, boric acid, or a boric acid derivative, e.g., an aromatic borate ester, or a phenyl boronic acid derivative such as 4-formylphenyl boronic acid
  • any single enzyme in particular the enzyme of the invention, may be added in an amount corresponding to 0.01- 200 mg of enzyme protein per liter of wash liqour, preferably 0.05-50 mg of enzyme protein per liter of wash liqour, in particular 0.1-10 mg of enzyme protein per liter of wash liqour.
  • the enzyme of the invention may additionally be incorporated in the detergent formulations disclosed in WO 97/07202 which is hereby incorporated as reference.
  • Detergent Example 1 Typical Latin American detergent composition.
  • Detergent Example 2 Typical European powder detergent composition.
  • Group Subname Content Surfactants 0-30% Sulphonates 0-20% Sulphates 0-15% Soaps 0-10% Non-ionics 0-10% Cationics 0-10% Other 0-10% Bleach 0-30% SPT / SPM 0-30% NOBS+ TAED 0-10% Builders 0-60% Phosphates 0-40% Zeolite 0-40% Na2OSiO2 0-20% Na2CO3 0-20% Fillers 0-40% Na2SO4 0-40% NaCI 0-40% Others up to 100% Polymers Enzymes Foam regulators Water Hydrotropes Others
  • Bacillus subtilis DN1885 Diderichsen, B., Wedsted, U., Hedegaard, L, Jensen, B. R.,
  • aldB which encodes acetolactate decarboxylase, an exoenzyme from Bacillus brevis. Journal of Bacteriology 172, 4315-4321.
  • Bacillus subtilis PL1801 B. subtilis DN1885 with disrupted apr and npr genes.
  • Proteases comprising the amino acid sequence of the following SwisProt entries were utilized: P29600 (SavinaseTM), P00782 (BPN'), P00780 (AlcalaseTM), P04189 (Subtilisin E), P29140, Q45670 (Bacillus AK1), and P04072 (Thermitase).
  • the Bacillus TY145 protease was also used (WO 92/17577 and US 6511371).
  • Example 1 This example illustrates how to create a small library of variants by the method of the invention based on regions R3 and R4 using SavinaseTM as the core protein. Seven of the eight proteases had defined structures. Atomic coordinate files for six of these structures are publicly available from the Protein Data Bank (PDB), and the coordinates used in this example were from: 1SVN (SavinaseTM); 2ST1 (BPN'); 1SCJ (Subtilisin E); 1SBC (AlcalaseTM); 1 DBI (Bacillus AK1); and 1THM (Thermitase). The structure of TY145 was not in the public domain but was obtained in-house and is proprietary (WO 92/17577 and US 6511371).
  • PDB Protein Data Bank
  • subtilisin P29140 There was no known tertiary structure available for subtilisin P29140, so a model was created in order generate a representation of the tertiary structure.
  • the first stage was to search the PDB based on the amino acid sequence of P29140 using the BLASTP2 algorithm.
  • the SIM algorithm (SwissModel) was used to select all templates with sequence identities above 25% and projected model size larger than 20 amino acid residues.
  • the initial structure was generated using the ProModll program (SwissModel) followed by energy minimisation using the GROMOS 96 force field.
  • the core protein in this case SavinaseTM, was not used in the modelling process but PDB files 1YJA, 1YJB, 1YJC, 1SBH (all mutants of BPN') and 1BFK (AlcalaseTM) were used as the templates on which the model was built.
  • the sequence alignment was further modified manually to create a better alignment, especially in regions where the sequence length varied and/or the homology was low.
  • a new model based on the modified structure/sequence alignment was used as the starting template for the ProModll program and the resulting structure was minimised using the GROMOS 96 force field. Further rounds of energy minimisation using a version of the GROMOS 43B1 force field were performed in order to further optimise the geometry of the protein.
  • the structural quality of the resulting model was assessed using the WhatCheck program.
  • the structures of the selected subtilisins were then aligned using the core protein, SavinaseTM, as the reference structure using the MagicFit option in the Swiss-PDBViewer (ref. Guex, N. and Peitsch, M.C. (1997). Electrophoresis 18, 2714-2723) program with only backbone atoms selected.
  • the quality of the alignment was assessed by comparing the geometries of four highly conserved catalytic residues, D32, H64, N155 and S221 (numbering from PDB coordinates 2SVN). Appropriate adjustments were made when problems with alignments at these positions occurred.
  • the general quality of the alignment was also inspected manually using other conserved residues.
  • a sequence alignment based totally on the structural alignment was generated using Swiss-PDBViewer that allowed direct comparison between the amino acid sequence and the tertiary placement of the amino acid residues.
  • the structural regions R3 and R4 were selected based on the following criteria: (a) involved in substrate binding and/or, (b) have a ligand binding site (e.g. calcium) and/or, (c) involved in catalysis and/or, (d) directly interact with any of the above regions and/or, (e) mutations in these regions have shown beneficial effects, or (f) all of the above.
  • Two regions, R3 and R4, were selected for the creation of a small library.
  • R3 represented residues L96 to V104 and R4 represented residues S125 to S132 in SavinaseTM (numbering according to the 1SVN PDB file). These regions were known to be important in binding and specifying substrate, and beneficial mutations have been reported in these regions in the art.
  • the equivalent sequences BPN', AlcalaseTM, Subtilisin E, P29140, Bacillus AK1 , TY145, and Thermitase were determined using the structural alignment and they are shown in Table 1.
  • oligo's Single stranded DNA in the form of chemically synthesized oligonucleotides (oligo's) was designed to encode the regions from each protease mentioned in Table 1.
  • Each oligonucleotide had a sequence that represented the appropriate amino acid sequence representing each protease at each region and was flanked by a DNA sequence that corresponded to the core protein, SavinaseTM, amino acid sequence immediately before and immediately after R3 and R4.
  • the oligo's are shown in tables 2 and 3 immediately below:
  • the gene encoding SavinaseTM was amplified from the plasmid pSX222 using a combination of primers listed below in a standard PCR reaction using the proof-reading, thermostable Pwo polymerase (Roche).
  • DDJ317 (SEQ ID NO:17) was used in conjunction with the antisense primer DDJIsll (SEQ ID NO:18) to generate a 284 bp product equivalent to the region encompassing Met-1 to Leu96 (1SVN numbering) and primers DDJIsl4 (SEQ ID NO: 19) and DDJM2 (SEQ ID NO:20) where used to generate a 317 bp product equivalent to the region encompassing Ala133 to Ser242.
  • DDJIsl13 SEQ ID NO:21
  • All PCR generated fragments were analysed by agarose gel electrophoresis and subsequently purified from the gel using the Qiagen QIAquick ® Gel Extraction kit.
  • the following primers were used: Primer DDJ317: SEQ ID NO:17.
  • Primer DDJIsll SEQ ID NO:18.
  • Primer DDJIsl4 SEQ ID NO: 19.
  • Primer DDJM2 SEQ ID NO:20.
  • Primer DDJIsl13 SEQ ID NO:21.
  • Primer DDJr12sen SEQ ID NO:22.
  • Primer DDJ318 SEQ ID NO:23.
  • R4 oligonucleotides by PCR thereby regenerating full length core protein genes with variation in the R3 and R4 regions.
  • the PCR was performed using the Pwo polymerase under the manufacturer's buffer conditions.
  • the reaction mixture contained 5 nM of each of the core protein gene fragments generated by preceding PCRs, 5 nM of DDJIsl13 (SEQ ID NO:21) together with 0.8 ⁇ M of the terminal primers DDJ317 (SEQ ID NO:17) and DDJr12 (SEQ ID NO:20) and 2.5 units Pwo polymerase.
  • a pool of either R3 or R4 encoding oligonucleotides was created in which all the oligonucleotides were present in equal amounts.
  • the R3 pool and R4 pool of oligonucleotides were each added individually to the reaction mixture to a final concentration of 5 nM.
  • the following modified PCR thermal profile was used: 1. 94°C for 2 min. 2. 94°C for 15 seconds to denature the DNA strands 3. cooling to 45°C for 30 seconds to allow annealing that was followed by 4. heating the sample to 72°C for 1 minute. Steps 2 to 4 were repeated a further 10 times followed a further 20 cycles but with 5 seconds added at step 4.
  • the resulting circa 715 bp core protein R3-R4 library fragments were gel-purified.
  • the plasmid pSX222 was amplified using the Expand LongTM Template PCR system
  • the reaction mixture composed of the manufacturers buffer, 10 ng of circular pSX222 and 0.6 ⁇ M of primer DDJ318 (SEQ ID NO:23) and primer DDJr12sen (SEQ ID NO:22), 0.8 mM dNTP mix and 2.5 units of DNA polymerase mix.
  • a fragment corresponding to 3890 bp was purified from the gel using the Qiagen QIAquick ® Gel Extraction kit. The gel purified core protein fragments were ligated into the linearised plasmid pSX222.
  • the Expand Long Template PCR system (Roche) was used for a multimerisation process, in the presence of 0.5 pmol of R3-R4 core protein library, 0.03 pmol of linear pSX222, 1.6 mM dNTP, 2.5 units of DNA polymerase mix and the Buffer 3 component of the Expand Long Template PCR system.
  • the PCR reaction was carried out as follows: 94°C for 1 min Cycle 1 (x 10 times): 94°C for 10 sec, 55°C for 30 sec, 68°C for 5 min.
  • Cycle 2 (x 15 to 35 times): 94°C for 10 sec, 55°C for 30 sec, 68°C for 10 min.
  • the reaction product was analysed by agarose gel electrophoresis to confirm the presence of multimerised plasmid.
  • This library was designated L1BR34.
  • the LIBR34 library was used to transform Bacillus PL1801. The transformants were screened for their ability to survive on LB plates containing 6 mg/litre choramphenicol and produce clearing zones (or halos) by digestion of the media-embedded casein (skim-milk).
  • transformants capable of producing clearing zones were deemed to have protease activity due to the presence of a LIBR34 library member processing protease activity.
  • transformants with protease activity were isolated and their sequences determined so as to establish the character of R3 and R4. A selection of sequenced variants is shown below in Table 4.
  • Table 4 The diversity of some clones exhibiting protease activity on LB plates containing casein.
  • a hyphenated region ID corresponds to a hybrid sequence in which the left hand term represents the parent that contributes to the N-terminal of the region and the right hand term represents the parent that contributes the C-terminal of the region.
  • Bacillus transformants exhibiting clearing zones were picked and grown in 96 well microtitre plates containing 2TY liquid media with 6 mg/l chloramphenicol. The cells were grown for between 36 to 48 hrs.
  • the library was subjected to a screen using various p- nitroanilides (pNA) peptide substrates supplied by Bachem AG.
  • the pNA assay was performed in 96 well microtiter plates in 100 mM TrisHCI, pH 8.6, 0.0225% Brij-35 with various pNA substrates at a final concentration of 0.5 mg/ml and at various dilutions of cell culture.
  • 132 clones active on skimmed milk plates were screened from LIBR34.
  • the rate was determined by monitoring the increase in absorption at 405 nm. To monitor substrate specificity, a ratio was calculated in which the rate a culture exhibited on one substrate was compared directly to the rate with a second substrate. These ratios were compared to that of Savinase to see if any change in substrate specificity had occurred. In this case, the ratio of Suc-Phe-Ala-Ala-Phe-pNA (FAAF) to Suc-Ala-Ala-Pro-Phe-pNA (AAPF) was determined. The absolute rate was calculated for the Suc-Ala-Ala-Pro-Ala-pNA and Suc-Tyr-Val-Ala-Asp- pNA as these were relatively poor substrates. Some of the results of the screening process are shown below in Table 5.
  • a FAAF:AAPF ratio of 1 signifies equal preference for either substrate.
  • a FAAF:AAPF ratio ⁇ 1 indicates a preference for AAPF.
  • a FAAF:AAPF ratio >1 indicates a preference for FAAF.
  • the intact core protein, SavinaseTM has a FAAF:AAPF ratio of 7.3.
  • the ability of variants to utilize YVAD and/or AAPA is also shown.
  • a significant rate relates to data with an initial rate of >4 mA 405 /min showing standard linear kinetics. The cell cultures were diluted 20 fold when the assay was performed with these substrates.
  • a hyphenated region ID corresponds to a hybrid sequence in which the left hand term represents the parent that contributes to the N-terminal of the region and the right hand term represents the parent that contributes the C-terminal of the region.
  • the following abbreviations are used: Sav: P29600 (SavinaseTM); BPN': P00782 (BPN'); Alca: P00780 (AlcalaseTM); SubE: P04189 (Subtilisin E); P29140: P29140; AK1 : Q45670 (Bacillus AK1); and Ther: P04072 (Thermitase).
  • the asterisk*' indicates an insert of 19 amino acids between positions -1 and 1.
  • Example 2 The example illustrates how to create a large library of variants, based on regions R1 , R2, R3, R4, R5 and R6 (LIBRall) using SavinaseTM as the core protein.
  • the core protein is subjected to combinatorial fragment exchange to create new combinations of amino acid at designated positions from highly diverse parent sequences.
  • a structural alignment was generated using the same procedure as outlined in Example 1.
  • the structural regions R1 , R2, R3, R4, R5, and R6 were selected in each protease, based on the following selection criteria: (a) involved in substrate binding and/or, (b) have a ligand binding site (e.g.
  • R1-R6 were selected to enable the creation of a larger library (lowest potential diversity of 229376 different combinations) than the library in Example 1 , but still incorporating the regions identified in Example 1.
  • SavinaseTM R1 represents residues T33 to T38
  • R2 represents residues V51 to G63
  • R3 represented residues L96 to V104
  • R4 represented residues S125 to S132
  • R5 represents residues G154 to 174A
  • R6 represents S188 to D197.
  • sequence identities between each region of each parent protein to the corresponding region in SavinaseTM are shown in table 6 below, the sequence alignment was performed using the Clustal method with the default options in the MegAlign program, provided as part of the DNA Star suite of programs provided by DNAStar Inc.; the gap penalty was set to 10 and the gap length penalty was set to 10 and the PAM250 weighted table was used.
  • the definition of the R1-R6 regions was based on the numbering in the 1SVN PDB file.
  • the residue numbers of this file were modified to correspond to the numbering observed for BPN'. This resulted in a renumbering which omits number 36 in R1 to compensate for an insertion in this region in the BPN' sequence, and which omits number 58 in R2 to compensate for an insertion in this region in the BPN' sequence, and which omits numbers 158 and 159 in R5 to compensate for an insertion in this region in the BPN' sequence. Consequently, the numbering of the amino acids is shifted according to these omissions.
  • the SavinaseTM regions highlighted with a "+” represent regions in which the residue numbering in the PDB file has been altered so that active site residues are numbered corresponding to the numbering in the BPN' PDB file.
  • Regions labelled with a " * " have an additional mutation outside the specified region that corresponds to a S49D mutation in the core protein scaffold, SavinaseTM.
  • Regions labelled with a "#” have an additional mutation outside the specified region that corresponds to a G47C mutation in the core protein SavinaseTM.
  • Single stranded DNA in the form of chemically synthesised oligonucleotides was designed to encode the regions R1 , R2, R3, R4, R5 and R6, from each protease mentioned in Table 7 above.
  • Each oligonucleotide had a sequence that represented the appropriate amino acid sequence representing each protease at each region and was flanked by a DNA sequence at either the 5' and/or 3' end that corresponds to the core protein, SavinaseTM, amino acid sequence immediately before and/or immediately after each region.
  • the oligonucleotides representing R3 and R4 are shown in Example 1 , those for R1 , R2, R5, and R6 are shown below in tables 8-11.
  • this region was encoded by two separate oligonucleotides that have the ability to complement each other in order to create a full length version of this region. Only one end of the oligonucleotide had the ability to anneal to the core protein gene, SavinaseTM.
  • the R6 of TY145 was also too long to be encoded by one oligonucleotide and the same oligonucleotide design principle was utilised as for R5.
  • Fragment F1 was created using DDJIsl14 (SEQ ID NO:65) as the sense primer and DDJIsl16 (SEQ ID NO:66) as the anti-sense primer, and using pSX222 plasmid as the template in a standard PCR reaction with Pwo polymerase (Roche).
  • the 139 bp fragment was purified by the Qiagen QIAquick ® Gel Extraction kit after agarose gel electrophoresis.
  • Fragment F2 was created using 0.4 ⁇ M DDJIsl14 as the sense primer and an equimolar concentration of each R1 oligonucleotides (total concentration 0.4 ⁇ M) as the anti-sense primers with 0.5 nM of fragment F1 acting as the template in a standard PCR reaction using Pwo polymerase.
  • the circa 180 bp fragment was purified by the Qiagen QIAquick ® Gel Extraction kit after agarose gel electrophoresis.
  • DDJIsl14 SEQ ID NO:65
  • DDJIsl16 SEQ ID NO:66
  • Fragment F3 was created using DDJIsl6 (SEQ ID NO:67) as the sense primer and DDJIsll (SEQ ID NO: 18) as the anti-sense primer using pSX222 plasmid as the template in a standard PCR reaction using Pwo polymerase (Roche).
  • the 99 bp fragment was purified by the Qiagen QIAquick ® Gel Extraction kit after agarose gel electrophoresis.
  • Fragment F4 was created using 0.4 ⁇ M DDJIsMO (SEQ ID NO:68), DDJIslH (SEQ ID NO:69), DDJIsl12 (SEQ ID NO:70) as the sense primer at a molar ratio of 5:2:1 , respectively, and 0.4 ⁇ M DDJIsll (SEQ ID NO:18) as the anti-sense primer with 5 nM of fragment F3 and an equimolar concentration of each R2 oligonucleotides (total concentration 5 nM) acting as the templates in a standard PCR reaction using Pwo polymerase (Roche) as the template.
  • DDJIsl ⁇ SEQ ID NO:67
  • DDJIsl10 SEQ ID NO:68
  • DDJIsl11 SEQ ID NO:69
  • DDJIsl12 SEQ ID NO:70
  • Fragment F5 was created using DDJIs4 (SEQ ID NO: 19) as the sense primer and DDJIsl7 (SEQ ID NO:71) as the anti-sense primer, and with pSX222 plasmid as the template in a standard PCR reaction using Pwo polymerase (Roche).
  • the 63 bp fragment was purified by the Qiagen QIAquick ® Gel Extraction kit after agarose gel electrophoresis.
  • Fragment F6 was created using DDJIs9 (SEQ ID NO:72) as the sense primer and DDJM2 (SEQ ID NO:20) as the anti-sense primer using pSX222 plasmid as the template in a standard PCR reaction using Pwo polymerase (Roche).
  • the 134 bp fragment was purified by the Qiagen QIAquick ® Gel Extraction kit after agarose gel electrophoresis.
  • DDJIsl7 SEQ ID NO:71
  • DDJIs.9 SEQ ID NO:72
  • DDJIsl ⁇ SEQ ID NO: 73.
  • fragments F2, F4, F5 and F6, together with oligonucleotides DDJIsl ⁇ (SEQ ID NO:73) and DDJIsl13 (SEQ ID NO:21) were spliced together in the presence of R3, R4, R5 and R6 oligonucleotides by PCR to generate a full length core protein gene with variation at the R-regions.
  • the diversity at the R1 and R2 positions was generated during the production of fragment F2 and F4.
  • the PCR was performed using the Pwo polymerase (Roche) under the manufacturers buffer conditions.
  • the reaction mixture contained 5 nM of each of fragments F3, F4, F5 and F6, 5 nM of DDJIsl ⁇ (SEQ ID NO:73) and DDJIsl13 (SEQ ID NO:21), 0.8 ⁇ M of terminal primers DDJ317 (SEQ ID NO:17) and DDJM2 (SEQ ID NO:20), and 2.5 units of Pwo polymerase.
  • a pool of either R3, R4, R5 or R6 encoding oligonucleotides was created in which all the oligonucleotides representing that region were present in equal amounts.
  • the R3, R4, R5 and R6 pools were added individually to the reaction mixture to a final concentration of 5 nM.
  • the following modified PCR thermal profile was used: 1.
  • the vector pSX222 was linearized for insertion of LIBRall and subsequent ligation by multimerisation followed by transformation into a host cell, according to the procedures described in Example 1
  • the resulting transformants were screened as described in Example 1.
  • the transformants capable of producing clearing zones by digestion of the media-embedded casein were deemed to have protease activity due to the presence of a LIBRall library member having protease activity.
  • Several of the transformants with protease activity were isolated and their sequences determined so as to establish the character of R1 , R2, R3, R4, R5 and R6. A selection of sequenced variants is shown below in Table 12.
  • a FAAF:AAPF ratio of 1 relates to equal preference for either substrate.
  • a FAAF:AAPF ratio of ⁇ 1 indicates a preference for AAPF.
  • a FAAF:AAPF ratio >1 indicates a preference for FAAF.
  • the intact core protein, SavinaseTM has a FAAF:AAPF ratio of 7.3.
  • the ability of variants to utilise YVAD and/or AAPA are also shown.
  • a significant rate relates to data with an initial rate of >4 m t os/min showing standard linear kinetics. The cell cultures were diluted 20-fold when the assay was performed the AAPA and YVAD substrates.
  • Regions labelled with a "#” indicate regions were an insertion (increase in the number of amino acids that constitute that region) has taken place relative to the core protein, SavinaseTM.
  • the number that follows "#” is the number of extra residues present in that region as compared to SavinaseTM.
  • Example 3 The example illustrates how to rationally construct a variant that in turn can serve as starting points in the creation of new combinatiorial libraries, using the method of the invention.
  • regions R1 , R2, R3, R4, R5 and R6 were replaced by the fragments derived from one protease parent only, were created in a rational fashion.
  • all six R1-R6 regions of SavinaseTM were replaced by the corresponding regions solely derived from BPN'.
  • the structures were selected, the creation of molecular models of proteins with no determined structure was performed, and alignment of the structures was done as described in example 1.
  • Each variant was designed so as every region was replaced by fragments originating from just one parent.
  • Single stranded oligonucleotides were designed to encode each of the regions.
  • the oligonucleotide sequences used are shown in the table below, their sequences have all been described in Examples 1 and 2.
  • the R1-2 linker column refers to the chemically synthesized oligonucleotide sequences based on the SavinaseTM gene sequence at that area used to link the oligonucleotides linking regions R1 and R2 for that specific variant.
  • R5 required two separate oligonucleotides to encode the whole of the region and the flanking sequences compatible with the SavinaseTM gene. The sequence of every oligonucleotide was described previously in Examples 1 and 2.
  • the gene encoding SavinaseTM was fragmented as illustrated in Figure 2. Four of the fragments were generated by PCR using Pwo polymerase (Roche). The generation of fragments F1, F3, F4, and F5 was described above. The remaining sections of the gene were encoded by chemically synthesised oligonucleotides DDJIsHO (SEQ ID NO:63), DDJIslH (SEQ ID NO:69), DDJIsl12 (SEQ ID NO:70), DDJIsl13 (SEQ ID NO:21) and DDJIsl ⁇ (SEQ ID NO:73).
  • Each of the oligonucleotides encoding the regions to be exchanged were used to link each of the fragments as illustrated in Figure 2 and outlined below:
  • R1 - links fragments F1 and either DDJIsHO, DDJIslH or DDJIsl12
  • R2 - links DDJIsHO, DDJIsh 1 or DDJIsl12 and fragment F3
  • R3 - links fragment F3 and DDJIsl13
  • R5 - links fragment F5 and DDJIsl ⁇
  • R6 - links DDJIsl ⁇ and fragment F6
  • the gene fragments corresponding to SavinaseTM were spliced together in the presence of specific oligonucleotides combinations as prescribed in Table 14 to generate each variant.
  • the PCR was performed using the Pwo polymerase under the manufacturer's buffer conditions in the presence of 2.5 units of the polymerase.
  • the reaction mixture contained 5 nM fragments F1 , F3, F5 and F6, 5 nM of either DDJIsllO, DDJIslH or DDJIsl12, depending on which variant was constructed, 5 nM of DDJIsl13 and DDJIsl ⁇ .
  • the 7 (or 8 for the 7-104 variant) oligonucleotides encoding each of the six regions R1-R6 for each variant were added to a final concentration of 5 nM.
  • Terminal primers DDJ317 (SEQ ID NO: 17) and DDJr12 (SEQ ID NO:20) were added to a final concentration of 400 nM.
  • the total reaction volume was 50 ⁇ l.
  • the reassembly PCR was performed as described in Example 2 step C4. The reaction was analysed by agarose gel electrophoresis and the DNA band corresponding to the expected size for the new variant gene was purified by the Qiagen QIAquick ® Gel Extraction kit.
  • the vector pSX222 was linearized for insertion of each rational variant and ligation by multimerisation, and subsequent transformation into Bacillus cells PL1801 host cells, as described in Example 1.
  • the transformants were screened for their ability to survive on LB plates containing 6 mg/litre chloramphenicol and produce clearing zones (or halos) by digestion of the media-embedded casein. Up to 10 difference transformants from each variant were selected at random and the sequence of the variant genes determined. Only genes with the correct DNA sequence for that particular variant were selected. Transformants with the correct gene sequence corresponding to the appropriate variant were replated on LB plates containing 6 mg/litre chloramphenicol and media- embedded casein. Variants 1-104, 2-104, 3-104, 4-104, 6-104 and 8-104 were capable of producing clearing zone and were thus deemed to be active. Variants 5-104 and 7-104 produced no clearing zones and were thus deemed inactive.
  • the resulting rationally constructed recombinant variants may then be shuffled or recombined using any available technique in the art, e.g. methods as outlined in US patent 6,153,410 (Maxygen Inc.).
  • the staggered extension process Molecular evolution by staggered extension process (StEP) in vitro recombination, 1998, Zhao H, Giver L, Shao Z, Affholter JA, Arnold FH. Nat Biotechnol 1998 Mar;16(3):258-61) was used to shuffle the designed variants in this example.
  • the reaction mixture contained AmpliTaq ® buffer (Roche) together with 300 nM of primers DDJ317 and DDJr12, 0.2 mM dNTPs, 1.5 mM MgCI 2 and 2.5 units of AmpliTaq ® DNA polymerase (Roche).
  • a gene mix was created in which each variant gene was present at 12.5 nM and 2 ⁇ l of this gene pool mix was added to the StEP reaction mixture.
  • the reaction mixture was subjected to the following thermal cycling scheme: 1. 95°C for 5 min 2.94°C for 15 seconds 3.55°C for 5 seconds. Steps 2 and 3 were repeated a total of 40, 50, 60, 70, 60 or 100 times.
  • LlBRstep The LlBRstep library was screened for active proteases as described in Example 1. Active clones were identified as transformants capable of producing clearing zones by digestion of media-embedded casein. The transformants capable of producing clearing zones were deemed to have protease activity due to the presence of a LlBRstep library member possessing protease activity.
  • Example 4 Variants isolated from libraries created as described in Example 2 above were isolated as Bacillus transformants exhibiting clearing zones around colonies when plated on LB plates containing 6 mg/litre choramphenicol and casein. The transformants were then grown in 96 well microtitre plates containing 2TY liquid media with 6 mg/l chloramphenicol for 72 h at 37 C and purified as described in Example 2. Wash performance of savinase variants In order to asses the wash performance of selected variants in a commercial detergent base composition, washing experiments was performed. The enzyme variants were tested using the Automatic Mechanical Stress Assay (AMSA). With the AMSA test the wash performance of a large quantity of small volume enzyme-detergent solutions can be examined.
  • AMSA Automatic Mechanical Stress Assay
  • the AMSA plate has a number of slots for test solutions and a lid firmly squeezing the textile swatch to be washed against all the slot openings. During the washing time, the plate, test solutions, textile and lid are vigorously shaken to bring the test solution in contact with the textile and apply mechanical stress.
  • WO 02/42740 especially the paragraph "Special method embodiments" at page 23-24. Two assays were conducted under the experimental conditions specified below:
  • the performance of the enzyme variant is measured as the brightness of the colour of the textile samples washed with that specific enzyme variant. Brightness can also be expressed as the intensity of the light reflected from the textile sample when luminated with white light. When the textile is stained the intensity of the reflected light is lower, than that of a clean textile. Therefore the intensity of the reflected light can be used to measure wash performance of an enzyme variant.
  • Colour measurements are made with a professional flatbed scanner (PFU DL2400pro), which is used to capture an image of the washed textile samples. The scans are made with a resolution of 200 dpi and with an output colour dept of 24 bits. In order to get accurate results, the scanner is frequently calibrated with a Kodak reflective IT8 target.
  • a special designed software application is used (Novozymes Color Vector Analyzer).
  • the program retrieves the 24 bit pixel values from the image and converts them into values for red, green and blue (RGB).
  • the intensity value (Int) is calculated by adding the RGB values together as vectors and then taking the length of the resulting vector:
  • the results presented in the table below are Performance Scores (S) summing up the performances (P) of the tested enzyme variants as: S (2) which indicates that the variant performs better than the reference at all three concentrations (5, 10 and 30 nM) and S (1) which indicates that the variant performs better than the reference at one or two concentrations. S (0) which indicates that the variant performs as the reference at all three concentrations (5, 10 and 30 nM).
  • Table 16 Wash performance as compared with SavinaseTM of purified variants isolated from libraries created as described in Example 2 with the indicated regions, detailed in table 7.

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Physics & Mathematics (AREA)
  • Biotechnology (AREA)
  • Genetics & Genomics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Molecular Biology (AREA)
  • Medical Informatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Analytical Chemistry (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Organic Chemistry (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Biomedical Technology (AREA)
  • Microbiology (AREA)
  • Biochemistry (AREA)
  • Plant Pathology (AREA)
  • Medicinal Chemistry (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)

Abstract

L'invention concerne un procédé pour identifier une ou plusieurs régions interchangeables dans un polypeptide mère par superposition de modèles structurels en trois dimensions, et identification des régions, les points de début et de fin des régions s'imbriquant dans les modèles superposés. L'invention concerne également un procédé de production des polypeptides variants combinatoires résultant de l'échange desdites régions interchangeables dans un polypeptide mère avec celles des polypeptides concernés de manière structurelle, ainsi que les variants combinatoires ainsi obtenus.
PCT/DK2004/000505 2003-07-30 2004-07-15 Procedes pour fabriquer des variants de polypeptides par echange de fragments combinatoires WO2005010176A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
DKPA200301113 2003-07-30
DKPA200301113 2003-07-30

Publications (1)

Publication Number Publication Date
WO2005010176A1 true WO2005010176A1 (fr) 2005-02-03

Family

ID=34089562

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/DK2004/000505 WO2005010176A1 (fr) 2003-07-30 2004-07-15 Procedes pour fabriquer des variants de polypeptides par echange de fragments combinatoires

Country Status (1)

Country Link
WO (1) WO2005010176A1 (fr)

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
BIOTECHNOLOGY (READING, MASS.) 1992, vol. 22, 1992, pages 189 - 217, ISSN: 0740-7378 *
DATABASE MEDLINE [online] US NATIONAL LIBRARY OF MEDICINE (NLM), BETHESDA, MD, US; 1992, JARNAGIN A S ET AL: "Extracellular enzymes: gene regulation and structure function relationship studies.", XP002305827, Database accession no. NLM1504587 *
ESCHENBURG SUSANNE ET AL: "Crystal structure of subtilisin DY, a random mutant of subtilisin Carlsberg", EUROPEAN JOURNAL OF BIOCHEMISTRY, vol. 257, no. 2, October 1998 (1998-10-01), pages 309 - 318, XP002305824, ISSN: 0014-2956 *
GREER J: "COMPARATIVE MODELING METHODS APPLICATION TO THE FAMILY OF THE MAMMALIAN SERINE PROTEASES", PROTEINS: STRUCTURE, FUNCTION AND GENETICS, ALAN R. LISS, US, vol. 7, no. 4, 1990, pages 317 - 334, XP002188828, ISSN: 0887-3585 *
HOPFNER K P ET AL: "New enzyme lineages by subdomain shuffling", PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF USA, NATIONAL ACADEMY OF SCIENCE. WASHINGTON, US, vol. 95, no. 17, August 1998 (1998-08-01), pages 9813 - 9818, XP002103693, ISSN: 0027-8424 *
LUTZ S & BENKOVIC S.J: "Engineering Protein Evolution", 2002, WILEY-VCH VERLAG GMBH & CO. KGAA, ISBN: 3-527-30423-1, XP002305826 *
OHDAN K & KURIKI, T: "An approach for introducing a different Function to an Industrial Enzyme", TRENDS IN GLYCOSCIENCE AND GLYCOTECHNOLOGY, vol. 12, no. 68, November 2000 (2000-11-01), pages 403 - 410, XP002305825 *

Similar Documents

Publication Publication Date Title
EP1678296B1 (fr) Protease a stabilite amelioree dans les detergents
US11859221B2 (en) Enzyme variants and polynucleotides encoding the same
CN1942584B (zh) 蛋白酶变体
EP2013339B1 (fr) Variants de savinase offrant une meilleure efficacité de lavage contre les taches d'oeuf
US7705137B2 (en) Microbial trypsin mutants having chymotrypsin activity and nucleic acids encoding same
US8008057B2 (en) Subtilases
JP2012045000A (ja) サブチラーゼ
US6777218B1 (en) Subtilase enzymes having an improved wash performance on egg stains
WO2015144932A1 (fr) Variants d'enzymes et polynucléotides codant pour ces variants
AU2001239214A1 (en) Novel subtilase enzymes having an improved wash performance on egg stains
WO2005010176A1 (fr) Procedes pour fabriquer des variants de polypeptides par echange de fragments combinatoires
US20070010416A1 (en) Protease with improved stability in detergents

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase