CA2412226A1 - Compositions, methods and systems for discovery of lipopeptides - Google Patents

Compositions, methods and systems for discovery of lipopeptides Download PDF

Info

Publication number
CA2412226A1
CA2412226A1 CA002412226A CA2412226A CA2412226A1 CA 2412226 A1 CA2412226 A1 CA 2412226A1 CA 002412226 A CA002412226 A CA 002412226A CA 2412226 A CA2412226 A CA 2412226A CA 2412226 A1 CA2412226 A1 CA 2412226A1
Authority
CA
Canada
Prior art keywords
ala
leu
arg
gly
val
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
CA002412226A
Other languages
French (fr)
Inventor
Chris M. Farnet
Alfredo Staffa
Emmanuel Zazopoulos
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Thallion Pharmaceuticals Inc
Original Assignee
Ecopia Biosciences Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ecopia Biosciences Inc filed Critical Ecopia Biosciences Inc
Publication of CA2412226A1 publication Critical patent/CA2412226A1/en
Abandoned legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/52Genes encoding for enzymes or proenzymes
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/195Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from bacteria
    • C07K14/36Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from bacteria from Actinomyces; from Streptomyces (G)

Landscapes

  • Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biotechnology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biochemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Biophysics (AREA)
  • Physics & Mathematics (AREA)
  • Microbiology (AREA)
  • Plant Pathology (AREA)
  • Gastroenterology & Hepatology (AREA)
  • Medicinal Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Peptides Or Proteins (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Enzymes And Modification Thereof (AREA)
  • Preparation Of Compounds By Using Micro-Organisms (AREA)

Abstract

The invention relates to isolated polypeptides involved in lipopeptides biosynthesis and polynucleotides encoding such polypeptides. In particular, the isolated polypeptide may be an acyl-specific C-domain, an adenylating enzyme, or an acyl carrier. The invention also relates to methods for detecting a polypeptide involved in lipopeptide biosynthesis or a polynucleotide encoding such a polypeptide, as well as relevant useful computer readable medium and computer systems.

Description

_1_ TITLE OF INVENTION: Compositions, methods and systems for discovery of lipopeptides RELATED APPLICATIONS:
This application claims the benefit of U.S. Provisional Application No.
601342,133, filed on December 26, 2001, U.S. Provisional Application No.
601372,789, filed on April 17, 2002. The application is a continuation-in-part of U.S.
Application No.
09/976,059, filed October 15, 2001, and of U.S. Application No. 101232,370, filed September 3, 2002, which is a continuation-in-part of U.S. Application No.
091910,813.
The teachings of the above applications are incorporated herein by reference in their entirety.
FIELD OF INVENTION:
The invention relates to genes and proteins irwolved in the biosynthesis of lipopeptides and related compounds, and to methods, systems and compositions for the discovery and engineering of new lipopeptide biosynthetic loci and new lipopeptides.
BACKGROUND:
Lipopeptides are natural products that exhibit potent, broad-spectrum antibiotic activity with a high potential for biotechnological and pharmaceutical applications as antimicrobial, antifungal, or antiviral agents. Examples include compounds such as lichenysin, fengycin, surfactin, syringomycin, serrawettin, ramoplanin, daptomycin, A54145, the "calcium-dependf:nt antibiotic" of Streptomyces coelicolor, echinocandin, pneumocandin, aculeacin, etc. Even within a group of relatively closely related actinomycete lipopeptide producers, lipopeptide natural products may differ in structure and can be classified into distinct sub-groups based on their chemical features. Lipoglycopeptides are lipopeptiide natural products that are glycosylated, for example, ramoplanin. Acidic lipopeptides are lipopeptide natural products that are characterized by having acidic amino .acid residues incorporated in the peptide chain portion of the lipopeptide, for example, daptomycin, A54145 and the calcium dependent antibiotic of Streptomyces coelicolor~.

-2_ A single microorganism may produce a mixture of related lipopeptides that differ in the lipid moiety that is attached to the peptide core via a free amine, usually the N-terminal amine of the peptide core. The lipid moiety c;an have a major influence on the biological properties of lipopeptide natural products. For example, the lipopeptide antibiotic A21978C complex produced by S. rose~spor~rs comprises at least six related microbiologically active factors Co, C~, C2, C3, C4, and C5. All factors of the lipopeptide antibiotic A21978C complex bear an identical 13-amino acid cyclic, acidic polypeptide core, but differ from one another in the identity of the fatty acyl group at the terminal amino group. The biological properties, e.g., antibacterial efficacy, toxicity, solubility, etc. of the different A21978C factors vary. One of the six factors identified as part of the A21978C complex, the A21978C factor Co, is also known as daptomycin. Likewise, the A54145 antibiotics produced by S. fradiae are a group of lipopeptides related to the A21978C complex. Like the A21978C complex, the A5~4145 antibiotics comprise at least eight microbiologically active, related factors A, A~, B, B~, C, D, E, and F. Each A54145 factor bears a cyclic 13-amino acid, acidic polypeptide core and a fatty aryl group attached to the N-terminal amine. The eight A54145 factors differ in the identity of the amino acid residue at positions 12 and 13 of the peptide core as well as in the identity of the fatty acyl group attached to the terminal amino group of the amino acid residue at position 1. There is a continuing need for compositions, methods and systems useful in discovery of lipopeptide natural products and related compounds.
Methods for natural product discovery have faced many challenges.
Discovery efforts that focus on microbial-derived natural products are hampered by difficulties in cultivating the microbes; indeed most microbes have yet to be cultivated in vitro. In addition, many cultivated microorganisms are not amenable to fermentation.
Furthermore many secondary metabolites are not expressed to detectable levels under in vitro conditions. Furthermore, natural products produced under in vitro conditions often vary according to the growth conditions, e.g. nutrients provided, and may not be representative of the full biosynthetic potential of the microorganism.
Genomics-based compositions, methods and systems for discovering lipopeptides would obviate or mitigate one or more of these disadvantages.
Lipopeptides produced by micororganisms arse synthesized nonribosomally on large multifunctional proteins termed nonribosomal peptide synthetases (NRPSs) _3_ (Doekel and Marahiel, 2001, Metabolic Engineering, Vol. 3, pp. 64-77). NRPSs are modular proteins that consist of one or more polyfunctional polypeptides each of which is made up of modules. The amino-terminal to carboxy-terminal order and specificities of the individual modules correspond to the sequential order and identity of the amino acid residues of the peptide product. Each NRPS module recognizes a specific amino acid substrate and catalyzes a stepwise condensation to form the growing peptide chain. The identity of the amino acid recognized by a particular unit can be determined by comparison with other units of known specificity (Challis and Ravel, 2000, FEMS
Microbiology Letters, Vol. 187, pp. 111-114). In many peptide synthetases, there is a strict correlation between the order of repeated units in ;~ peptide synthetase and the order in which the respective amino acids appear in the peptide product, making it possible to correlate peptides of known structure with putative genes encoding their synthesis, as demonstrated k>y the identification of the mycobactin biosynthetic gene cluster from the genome of Mycobacterium Tuberculosis (Quadri et al.; 1998, Chem.
Biol. Vol. 5, pp. 631-645).
The modules of a peptide synthetase are corr~posed of smaller units or "domains" that each carry out a specific role in the recognition, activation, modification and joining of amino acid precursors to form the peptide: product. One type of domain, the adenylation (A) domain, is responsible for selectively recognizing and activating the amino acid that is to be incorporated by a particular unit of the peptide synthetase. This activation step is ATP-dependent and involves the transient formation of an amino-acyl-adenylate. The activated amino acid is covalently attached to the peptide synthetase through another type of domain, the thiolation (T) domain, that is generally located adjacent to the A domain. The T domain is post-translationally modified by the covalent attachment of a phosphopantetheinyl prosthetic arm to <~ conserved serine residue. The activated amino acid substrates are tethered onto the nonribosomal peptide synthetase via a thioester bond to the phosphopantetheinyl prosthetic arm of the respective T
domains. Amino acids joined to successive units of the peptide synthetase are subsequently covalently linked together by the formation of amide bonds catalyzed by another type of domain, the condensation (C) domain.
Little is known about the mechanism involved in attachment of lipid moieties to the peptide core. The literature is sparse regarding the enzymatic mechanism or timing of addition of the acyl group to lipopeptide natural products. In particular, the enzymes involved in N-acylation of peptide natural products have not been identified, and it remains unknown whether acylation occurs prior to, concomitant with, or subsequent to the formation of the peptide core. Doekel and Marahiel, (2001, Metabolic Engineering, 3, 64-77) reviews catalytic domains in peptide synthetases and notes that condensation domain sequences vary according to the domain arrangements of NRPSs, referring to condensation domains located C-terminal to epimerization domains, condensation domains located C-terminal to thiolation domains, and condensation domains involved in initiation of aryl-transfer during assembly of lipopeptides. Understanding the mechanism by which the lipid moieties are covalently attached to the peptide core would allow for introduction of alternative fatty acyl moieties onto a given peptide core by means of recombinant DNe~ technologies, or to increase the yield of products) containing the desirable fatty acyl moiety or moieties by recombinant DNA technologies.
Selective feeding experiments indicate that growth nutrients can affect the relative amounts of lipopeptide products. Growth conditions that favor the synthesis of one given lipid precursor will preferentially lead to the synthesis of the corresponding lipopeptide containing that lipid moiety. For example, daptomycin is normally produced by S. roseosporus in trace amounts. A great deal of effort is required to generate adequate amounts of biologically pure daptomycin. Continuous feeding of fermentation cultures with caproic acid or decanoic acid mixed 1:1 (v: v) in methyl oleate has been shown to increase the yield of daptomycin (R. H. Baltz, L_ipopeptide Antibiotics Produced by Streptomyces roseosporus and Strepfomyc;es fradiae, in:
Biotechnology of Antibiotics, Second Edition, pp. 415-435, edited by W. R. Strohl).
Alternatively, a chemical process requiring enzymatic deacylation of A2'1978C factors, protection of a certain reactive sidechain in tile peptide portion of the compound, synthetic addition of the fatty aryl group, and finally deprotection to yield the desired daptomycin product has been developed. However, these methods are compound-specific, laborious and inefficient and highlight the need for improved methods of producing lipopeptides and derivatives thereof.

SUMMARY OF THE INVENTION:
In one aspect, the invention provides an isolated polynucleotide encoding an acyl-specific C-domain, wherein said isolated polynucleotide encodes a polypeptide which comprises at least 45% sequence identity to at least one sequence selected from SEQ ID NOS: 1 and 2. Certain embodiments expressly exclude one or more sequences, in particular the nucleotide sequence corresponding to the C-domain of NRPS protein of GenBank accession no. CAB 38518, i.~e. coordinates 195135 to 217526 of Genbank nucleotide accession AL939115, arid SEQ ID NO: 21. Other embodiments, exclude nucleic acid sequences originating from an organism other than an organism of the actinomycetes taxon. Other sequernces can be excluded without departing from the scope of the invention. In a related aspect the invention provides an isolated polynucleotide comprising a sequence selected from the group consisting of:
(a) a sequence selected from the group consisting of SE:Q ID NOS: 5, 7, 9, 11, 13, 15, 17 and 19; (b) a sequence that is complementary to (a); (c) a sequence which hybridizes to said sequence of (a) or (b) under conditions of high stringency;
and (d) a sequence which has at least 70% or higher homology to said sequence of (a), (b), or (c). Certain embodiments expressly exclude one or more sequences, in particular the nucleotide sequence corresponding to the C-domain of IVRPS protein of GenBank accession no. CAB 38518, i.e. coordinates 195135 to 217526 of Genbank nucleotide accession AL939115, and SEQ ID NO: 21. Other embodiments, exclude nucleic acid sequences originating from an organism other than an organism of the actinomycetes taxon. Other sequences can be excluded without departing from the scope of the invention. In one embodiment of the invention, the acyl-specific C-domain encoded by the isolated polynucleotide is involved in lipopeptide acyll-capping. In one embodiment the acyl-specific C-domains reside in cosmids 008CH, 184CM and 024CK having accession numbers IDAC 190901-2, IDAC 260202-1 and IDAC 260202-5, respectively.
In a further embodiment, the isolated polynucleotide encoding an acyl-specific C-domain resides in a gene locus selected from the group consisting of the biosynthetic locus for ramoplanin from Acfinoplanes sp. ATCC 3307Ei; the biosynthetic locus for A21978C from Streptomyces roseosporus NRRL 11379;. the biosynthetic locus for A54145 from Strepfomyces fradiae ATCC 18158; the biosynthetic locus for the caloium-dependent antibiotic from Sfreptomyces coelicolorA3(2); the biosynthetic locus for a 3~02-1 OCA

_g_ lipopeptide natural product from Streptomyces ghanaensis NRRL B-12104; the biosynthetic locus for a lipopeptide natural product from Streptomyces refuineus NRRL
3143; the biosynthetic locus for a lipopeptide natural product from Streptomyces aizunensis NRRL B-11277; the biosynthetic locus for a lipopeptide natural product from Actinoplanes nipponensis FD 24834 ATCC 31145; and the biosynthetic locus for a lipopeptide natural product from a Streptomyces sp. org;anism.
In another embodiment, the isolated polynuclE;otide encoding an aryl-specific C-domain does not reside in the biosynthetic locus for tree calcium-dependent antibiotic from Streptomyces coelicolor~ A3(2) (CADA).
The invention provides two or more isolated polynucleotides, wherein the first polynucleotide encodes a polypeptide which comprises <~t least 45% sequence identity to at least one sequence selected from SEQ ID NOS: 1 and 2, and the second polynucleotide encodes a polypeptide selected from the group consisting of a polypeptide having at least 55% sequence identity to SE.Q ID NO: 3 and a polypeptide having at least 50% sequence identity to SEQ ID NO:4. In a related aspect the invention provides two or more isolated polynucleotides wherein the first polynucleotide encodes an aryl-specific C-domain and the second polynucleotide encodes an adenylating enzyme, an aryl carrier protein or a fusion of an adenylating enzyme and an acyl carrier protein.
The invention also provides an isolated polynucleotide comprising a sequence selected from the group consisting of: (a) a sequence selected from the group consisting of SEQ ID NOs. 23, 25, 27, 29, 31, 33, 35, 3T, 39, 41, 43, 45 and 47; (b) a sequence that is complementary to (a); (c) a sequence which hybridizes to said sequence of (a) or (b) under conditions of high stringency; and (d) a sequence vNhich has at least 70% or higher homology to said sequence of (a), (b), or (c). In one embodiment the polynucleotide encodes a polypeptide selected from the group consisting of a poiypeptide having at least 55% sequence identity to SEQ ID
NO: 3. In another embodiment, the polynucleotide encodes a polypeptide having at least 50%
sequence identity to SEQ ID N0:4.
In one embodiment the polynucletide encodes an adenylating enzyme. In another embodiment the polynucleotide encodes an acyl carrier protein. In a further embodiment, the polynucleotide encodes a fusion of an <~denylating enzyme and an 3~02-1 OCA

_7_ acyl carrier protein. In another embodiment the polypeptide encoding an adenylating enzyme, an aryl carrier protein or a fusion of the two is derived from a biosynthetic locus selected from the group consisting of the biosynthetic locus for ramoplanin from Actinoplanes sp. ATCC 33076; the biosynthetic locus for A21978C from Streptomyces roseosporus NRRL 11379; the biosynthetic locus for A5~4145 from Streptomyces fradiae ATCC 18158; the biosynthetic locus for a lipopeptide natural product from Streptomyces ghanaensis NRRL B-12104; the biosynthetic locus for a lipopeptide natural product from Streptomyces refuineus NRRL 3143; the biosynthetic locus for a lipopeptide natural product from Streptomyces aizunensis NRRL B-11277; ~~rhe biosynthetic locus for a lipopeptide natural product from Actinoplanes nipponen~>is FD 24834 ATCC
31145; and the biosynthetic locus for a lipopeptide natural product from a Streptomyces sp.
organism. In one embodiment the adenylating enzyme is from cosmids 00800 and 0240K having accession numbers IDAC 190901-2 and IDAC 260202-5, respectively.
In another embodiment the aryl carrier protein is from cosrnids 0080H and 0240K
having accession numbers IDAC 190901-3 and IDAC 260202-5 respectively. In one embodiment the fusion protein containing an adenylatinq enzyme and an acyl carrier protein is from cosmid 184011s1 having accession number- IDAC 260202-1.
The invention also provides an isolated acyl-specific C-domain comprising at least 45% sequence homology to at least one sequence selected from SEQ ID NO.

and SEC ID NO. 2. Certain embodiments expressly exclude one or more sequences, in particular the polypeptide sequence corresponding to thE: C-domain of NRPS
protein of GenBank accession no. CAB 38518, and SEQ ID NO: 22. Other embodiments, exclude polypeptide sequences originating from an organism other than an organism of the actinomycetes taxon. Other sequences can be excluded without departing from the scope of the invention. In a related aspect, the invention provides an isolated acyl-specific C-domain comprising a polypeptide sequence selected from the group consisting of: (a) a sequence selected from the group consisting of SEQ ID
NOs. 6, 8, 10, 12, 14, 16, 18, 20 and 22; and (b) a sequence which has at least 70% or higher homology to said sequence of (a). Certain embodiments expressly exclude one or more sequences, in particular the polypeptide sequence corresponding to the C-domain of NRPS protein of GenBank accession no. CAB 38518, and SEQ ID NO: 22. Other embodiments, exclude polypeptide sequences originating from an organism other than _g_ an organism of the actinomycetes taxon. Other sequences can be excluded without departing from the scope of the invention.
The invention further provides two or more isolated polypeptides, wherein the first isolated polypeptide is an acyl-specific C-domain comprising at least 45% sequence homology to at least one sequence selected from SEQ ID NO. 1 and SECT PD NO.
2, and the second isolated polypeptide is selected from the: group consisting of a polypeptide having at least 55% identity to SEQ ID NO. 3 and a polypeptide having at least 50% identity to SEQ ID NO. 4. In still a further aspect, the invention provides an N-acyl-capping cassette comprising at least one aryl-specific C-domain polypeptide and another polypeptide selected from the group consisting of an adenylating protein and an acyl-carrier protein.
In one embodiment, isolated aryl-specific C-domain is not derived from the biosynthetic locus for the calcium-dependent antibiotic from Streptomyces coelicolor A3(2) (CADA).
The invention provides an isolated polypeptidE; comprising a polypeptide selected from the group consisting of: (a) SEQ ID NOs. 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46 and 48; and (b) a sequence which has at least 70% or higher homology to said sequence of (a). In one embodiment, such isolated polypeptide is not derived from the biosynthetic locus for the calcium-dependent antibiotic from Streptomyces coelicolor A3(2) (CADA).
The invention further provides a computer readable medium comprising a computer program and data, comprising: (a) a computer program stored on said media containing instructions sufficient to implement a process for effecting the identification, analysis, or modeling of a representation of a polynuclec~tide or polypeptide sequence;
(b) data stored on said media representing a sequence of a polynucleotide selected from the group consisting of: (i) a polynucleotide encoding an acyl-specific C-domain, said polynucleotide encoding a polypeptide having at least 45% sequence identity with either SEQ ID NO: 1 or SEQ ID NO: 2; (ii) a polynucleotide encoding a polypeptide having at least 55% sequence identity with SEQ ID NO: 3; and (iii) a polynucleotide encoding a polypeptide having at least 50% sequence identity with SEQ ID NO:
4; and (c) a data structure reflecting the underlying organization and structure of said data to facilitate said computer progr<~m access to data element, corresponding to logical sub-3~02-1 OCA

_g_ components of the sequence, said data structure being iinherent in said program and in the way in which said computer program organizes and accesses said data. In a related aspect, the invention provides a computer readable medium comprising a computer program and data, comprising: (a) a computer program stored on said media containing instructions sufficient to implement a process for effecting the identification, analysis, or modeling of a representation of a polypeptide sequence; (b) data stored on said media representing a sequence of a polypeptide selected from the group consisting of: (i) a polypeptide representing an aryl-specific C-domain and having at least 45%
sequence identity with either SEQ ID NO: 1 or SEQ ID rJO: 2; (ii) a polypeptide having at least 55% sequence identity with SEQ ID NO: 3; and (iii) a polypeptide having at least 50% sequence identity with SEQ ID NO: 4; and (c) a data structure reflecting the underlying organization and structure of said data to facilitate said computer program access to data elements corresponding to logical sub-components of the sequence, said data structure being inherent in said program and ire the way in which said computer program organizes and accesses said data.
The invention also provides a memory for storing data that can be accessed by a computer programmed to implement a process for effecting the identificatian, analysis, or modeling of a sequence of a polynucleotide or a polypeptide, said memory comprising data representing a polynucleotide selected f=rom the group consisting of: (a) a polynucleotide encoding an acyl-specific C-domain, said polynucleotide encoding a polypeptide having at least 45% sequence identity with either SEQ ID NO: 1 or SEQ ID
NO: 2; (b) a polynucleotide encoding a polypeptide having at least 55%
sequence identity with SEQ ID NO: 3; and (c) a polynucleotide encoding a polypeptide having at least 50% sequence identity with SEQ ID NO: 4. In a rE:lated aspect, the invention provides a memory for storing data that can be accessed by a computer programmed to implement a process for effecting the identification, analysis, or modeling of a sequence of a polypeptide, said memory comprising data representing a polypeptide selected from the group consisting of: (a) a polypeptide having at least 45% sequence identity with either SEQ ID NO: 1 or SEQ ID NO: 2; (b) a polypeptide having at least 55%
sequence identity with SEQ ID NO: 3; and (c) a polypeptide having at least 50%
sequence identity with SEQ ID NO: 4.

The invention provides a method for detecting a polypeptide involved in lipopeptide biosynthesis or a polynucleotide encoding such a polypeptide comprising the step of identifying (a) a polypeptide having at least 45% sequence identity to SEQ
ID NO: 1 or SEQ ID NO: 2, or (b) a polynucleotide encoding a polypeptide having at least 45% sequence identity to SECT ID NO:1 or SEQ ID NO: 2, wherein said at least 45% sequence identity indicates a polypeptide involved in lipopeptide biosynthesis. In one embodiment the method comprises the steps of: (a) providing a reference polynucleotide or polypeptide sequence selected from the group consisting of a polynucleotide or polypeptide sequences representing ain acyl-specific domain;
(b) comparing said reference sequence to one or more candidate polynucleotide or polypeptide sequences stored on a computer readable nnedium; (c) determining level of homology between said reference sequence and said one or more candidate sequences, and (d) identifying a candidate sequence which shares at least 70%
homology with reference sequence. In one ernbodimeni: the method further comprising the step of identifying, in proximity to the polypeptide of I;a) or the polynucleotide of (b), at least (c) one polypeptide having at least 55% sequen<;e identity to SEQ ID
NO: 3 or one polynucleotide sequence encoding a polypeptide having at least 55%
sequence identity to SEQ ID NO: 3; or (d) one polypeptide having at least 50% sequence identity to SEQ ID NO: 4 or one pofynucleotide sequence encoding a polypeptide having at least 50% sequence identity to SEQ ID NO: 4. In another embodiment of the method the polypeptide of c) is a polypeptide of SEQ ID NO: 24, 26, 28, 30, 32, 34, 36, 38 or 40, or a polypeptide having at least 70% sequence identity to a polypeptide of SEQ
ID NO:
24, 26, 28, 30, 32, 34, 36, 38 or 40; or the nucleotide of (d) is a nucleotide encoding a polypeptide of SEQ ID NO: 24, 26, 28, 30, 32, 34, 36, 38 on40 or a nucleotide encoding a polypeptide having at least 70% sequence identity to a polypeptide of SEt~
ID NO: 24, 26, 28, 30, 32, 34, 36, 38 or 40.
The invention provides a computer system comprising: (a) a database of reference sequences, wherein the reference sequences encode proteins involved in lipid biosynthesis, and wherein the reference sequences include one or more of: (i) a polypeptide sequence representing an acyl-specific C-domain or a polynucleotide encoding an acyl-specific C-domain; and (b) a user interlface capable of: (ii) receiving a test sequence for comparing against each of the referernce sequences in the database;

and (iii) displaying the results of the comparison. In one: embodiment, reference sequences of the computer system further include one or more of: (iv) a polypeptide sequence representing an adenylating enzyme or a polynucleotide encoding an adenylating enzyme; and (v) a polypeptide sequence representing an acyl carrier protein or a poynucleotide encoding an acyl carrier protein. In another embodiment, the reference sequence of (i) is selected from SEQ ID NOS: 1, 2, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 and 22; the reference sequence of (iv) is selected from SEQ ID NOS: 3, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33 and 34; and the reference sequence of (v) is selected from SEQ ID NO: 4, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47 and 48.
BRIEF DESCRIPTION OF THE DRAWINGS:
Figures 1 a, 1 b, 1 c, 1 d and 1 a represent schematic views of the biosynthetic loci for: (1 a) ramopianin from Actinopianes sp. ATCC 3;:.076 (RAMO) and from Streptomyces roseosporus NRRL 11379 (DAFT); (1 b) A54145 from Streptomyces fradiae ATCC 18158 (A541 ) and the lipopeptide from Streptomyces ghanaensis NRRL
B-12104 (009H); (1 c) a lipopeptide from Streptomyces refuineus NRRL 3143 (024A) and a lipopeptide from Streptomyces aizunensis NRRL B-11277 (023C); (1d) a lipopeptide from Actinoplanes nipponensis FD 24834 ATCC 31145 (A410) and a putative lipopeptide natural product (070B) from organism 070 in Ecopia's private culture collection; and (1e) the calcium-dependent antibiotic from Streptomyces coelicolorA3 (CADA), showing a scale in base pairs, and the relative position and orientation of open reading frames (ORFs) encoding representative acyl-specific C-domains of the invention and representative adenylating enzymes and acyl carrier proteins of the invention. Deposited cosmids containing genes of the invention are also indicated in regard to RAMO, A541 and 024A.
Figure 2 represents a dendrogram showing the ev~lutionary relatedness of C-domains from various lipopeptide NRPSs with a clearly branching cluster of representative C-domains of the invention involved in N-,acyiation highlighted in gray.
Figures 3a and 3b represent an amino acid alignment of representative acyl-specific C-domains of the invention as found in each of tlhe RAMO, DAPT, A541, CADA, 009H, 024A, 023C, A410 and 0708 lipopeptide biosynthetic loci. Conserved motifs are 3~02-1 OCA

highlighted. In each of the clustal alignments a line above the alignement is used to mark strongly conserved positions. In addition, three characters, namely *
(asterisk),:
(colon) and . (period) are used, wherein "*" indicates positions which have a single, fully conserved residue; ":" indicates that one of the following strong groups is fully conserved: STA, NEQK, NHQK, NDEQ, QHRK, MILV, MILF, HY, and FYW; and "." Indicates that one of the following weaker groups is fully conserved: CSA, ATV, SAG, STNK, STPA, SGND, SNDEQK, NDEQHK, NEQHRK, FVLIM, and HFY.
Figures 4a, 4b and 4c represent an amino acid alignment of representative ADLE proteins of the invention as found in each of the RAMO, DAPT, 009H, 024A, 023C and A410 loci, together with the ADLE portion of the ADLF fusion protein from the A541 locus. Conserved motifs of acyl CoA ligases are highlighted.
Figure 5 is an amino acid alignment of representative ACPH proteins of the invention from the RAMO, DAPT, 009H, 024A, 023C and A410 loci together with the corresponding portion of the ADLF fusion protein from the A541 locus. The conserved serine residue of the thio(ation domain to which a phosphopantetheine group is covalently attached post-translationally is highlighted.
Figure 6a is a dendrogram showing the evolutionary relatedness of the representative NRPS C-domains of the invention. Figure 6b is a dendrogram showing the evolutionary relatedness of the representative ADLE proteins of the invention.
Figure 6c is a dendrogram showing the evolutionary relatedness of the representative ACPH proteins of the invention.
Figures 7a and 7b illustrate a general biosynthetic scheme for formation of N-acyl peptide linkage in lipopeptides using the acyl-specific C-domain, ADLE
protein and ACPH protein of the invention.
Figure 8 illustrates the biosynthetic scheme of Figures 7a and 7b as applied to formation of the N-acyi peptide linkage in ramoplanin and A54145.
Figures 9a and 9b are photographs of plates clenerated in the bioassay of anionic lipopeptide isolation experiments and illustrating an enrichment of activity, based on IRA67 anion exchange chromatography of lipopeptides from Sfreptomyces refuineus subsp. thermotoJerans and Streptomyces fradiae.
Figure 1 Oa and 10b illustrate use of NRPS biosynthetic machinery of a nonlipopeptide natural product, complestatin, to produce an N-acylated analogue of complestatin. Figure 10a illustrates the biosynthesis of complestatin. Figure 10b illustrates a rationally designed recombinant NRPS system that gives rise to N-acylated complestatin analogue(s).
Figure 11 is a block diagram of a computer system according to one embodiment of the invention.
Figure 12 is a flow chart representing a process performed by the computer system to compare candidate sequences with one or more reference sequences according to one embodiment of the invention.
Figure 13 is a flow chart representing a process performed by the computer system to compare candidate sequences with one or more reference sequences and the display of comparison results according to one embodiment of the invention.
DETAILED DESCRIPTI~N:
The invention provides compositions, methods and systems useful in the discovery and engineering of lipopeptides and related compounds. The compositions can be used in identifying lipopeptide natural products, lipopeptide genes, lipopeptide gene clusters and lipopeptide-producing organisms.
Lipopeptide biosynthetic loci from a variety of organisms were discovered and analyzed. For convenience, the lipopeptide biosynthetic loci and the organism in which the locus is found is sometimes indicated by reference to a source designation wherein "RAMO" refers to the biosynthetic locus for ramoplanin from Actinoplanes sp.
ATCC
33076, "RAPT" refers to the biosynthetic locus for A21978C from Streptomyces roseosporus NRRL 11379, "A541" refers to the biosynthetic locus for A54145 from Streptomyces fradiae ATCC 18158, "CADA" refers to the biosynthetic locus for the calcium-dependent antibiotic from Streptomyces coelicoiorA3(2) (Bentley et al., 2002, Nature, vol. 417, pp 141-147), "009H" refers to the biosynthetic locus for a lipopeptide natural product from Streptomyces ghanaensis NRRL B-12104, "024A" refers to the biosynthetic locus for a lipopeptide natural product from Streptomyces refuineus NRRL
3143, "023C" refers to a biosynthetic locus for a lipopeptide natural product from Streptomyces aizunensis NRRL B-11277, "A410" refers to the biosynthetic locus for a lipopeptide natural product fr~m Actinoplanes nipponensis FD 24834 ATCC 31145, and 3~02-1 OCA

_14_ "070B" refers to the biosynthetic locus for a lipopeptide natural product from a Streptomyces sp. organism in Ecopia's private culture ~:ollection.
Surprisingly, a conserved gene domain and conserved genes common to lipopeptide biosynthetic loci have been discovered. The conserved domain is referred to as an "acyl-specific C-domain (unusual C-domain)" which means a condensation-domain (C-domain) involved in N-acyl capping for lipopeptide biosynthesis. The "acyl-specific C-domain" is required for the N-acyl peptide IinN;age found in lipopeptides between the lipid moiety and the first amino acid residue of the peptide core.
Representative examples of the acyl specific C-domains of the invention include the acyl specific C-domain residing in the ramoplanin biosynthetic locus from Actinoplanes sp. ATCC 33076 (SEQ ID NO: 6), the acyl-specific C-domain residing in the locus in Streptomyces roseosporus NRRL 11379 (SEQ ID NO: 8), the acyl specific C-domain residing in the A54145 locus in Streptomyces fradiae ATCC 18158 (SEQ ID
NO: 10), the acyl-specific C-domain residing in a lipopeptide biosynthetic locus in Streptomyces ghanaensis NRRL B-12104 (SEQ ID NO: 12), the acyl-specific C-domain residing in a lipopeptide biosynthetic locus in Streptomyces refuineus NRRL
3143 (SEQ
ID NO: 14), the acyl-specific C-domain residing in a lipopeptide biosynthetic locus in Streptomyces aizunensis NRRL B-11277 (SEQ ID NO: 16), the acyl-specific C-domain residing in the A41,012 lipopeptide biosynthetic locus in Actinoplanes nipponensis FD 24834 ATCC 31145 (SEC;3 ID NO: 18), the acyl-specific C-domain residing in a putative lipopeptide biosynthetic locus from the Streptomyces sp. organism 070 in Ecopia's private culture collection (SEQ ID NO: 20) and the acyl-specific C-domain residing in the biosynthetic locus for the calcium-dependent antibiotic from the Streptomyces coelicolorA3(2) (SEQ ID NO: 22). Certain embodiments expressly exclude the acyl-specific C-domain residing in the calcium dependent antibiotic biosynthetic locus from the Sfreptomyces coelicolor A3(2) (SEQ ID NO: 22 and the polypeptide sequence corresponding to the C-domain of NRPS protein of GenBank accession no. CAB 38518). Other embodiments, exclude polypeptide sequences originating from an organism other than an organism of the actinomycetes taxon.
An "acyl-specific C-domain" of the present invention is defined structurally as a polypeptide sequence that produces an alignment with at least 45% identity to one of the two following consensus sequences using the BLASTP 2Ø10 algorithm (with the 3~02-1 OCA

filter option -F set to false, the gap opening penalty -G set to 11, the gap extension penalty -E set to 1, and all remaining options set to default values):
>Consensus sequence 1 GglReLmAgQLAvWhAqQLaPenPvYnvGEYveidGevDIdLLvaAvrrv meEadaaRLRfrevDgvPRQYfaedeDypveViDvSaeaDPrAAAeSIMa aDLrRprDIrdgeLytqkiykvgedIvfWYqRahHiiIDGrSaGIVa:>Rv AaVYsALaaGgdveegALPsssVLmdAedeYraSeefeIDReYW reaLAg IPeevslganePsrlprepvRheedvsdaaAaeLraaARRLgTsIAqlai AAAAIYqHrITGqrDVvvgVPVaGRsktaeIdiPGMTaNVvPvRIAVaPk ttVaeLvrqvaRGVrdGLRHQRYrYediIdDIkLvgrdgLypIIVNvISf DydLrFGdAvsvahgLSagpvddvsldvYdrsSdGsmkvvvdvNPDltdr sdadEvarkFIaIIrW LaesdAeepVaridLlded >Consensus sequence 2 svRhgvtaAQrgvWvAQQLrpdsrIYnCGIyLeIdgaIDpavLsrAvRrt IaeTEALRsrFeedddGaIIqrvIapaPdeqtrIleDGvPYtPvLLRHiD
IsgddDPeaAArrW MDadIAePvdLdragtsrHaLItLGgdRhLlh'IgYH
HiaLDGfGaaLYIdRIAaVYrALrtGrePppcpFgpLdrlvaeeaaYrdS
aRhrrDrayWtgrfadIpEPvgLagraAaAapapLRrtvrLpperTae~La aaAeatGsrWpavviAAVAAFIrRIagaeeWvgLPVTARvTrAAIrTPG
MLaNvIPLRLeVrqgasfAaLIeetsraIsaILRHC~RFRGEdLgF;eLGIa GerAgIapttVNVMaFapvIdFGdcrAvvHqLSsGPVeDLaInIyGTPgt GdelrvtvaANPalYtaddVasIqeRLvRfLaaIgaDPaapvGrvrLLdpa where consensus sequence 1 is based on the sequences of the acyl-specific condensation domains from the calcium-dependent antibiotic (CADA) locus in Streptomyces coelicolor A3(2) (GenBank accession numbers CAB38517, CAB38518;
CAB38516 and CAB38876), A21978C (DAPT) locus in Streptomyces roseosporus (NRRL 11379), A54145 locus in Sfreptomyces fradiae (ATCC 18158), A410 locus from an Actinoplanes nipponensis, 009H locus from Streptorr,~yces ghanaensis (NRRL
B-12104), and 024A locus in Streptomyces refuineus (NRRL 3143); and where consensus sequence 2 is based on the sequences of the acyl specific condensation domains from the ramoplanin (RAMO) locus (Actinoplanes sp. ATCC 33076), 023C locus from Streptomyces aizunensis (NRRL B-11277), and 0708, a putative lipopeptide locus found from Ecopia's private culture collection.
The consensus sequences were generated as follows. First, the listed sequences were aligned with the ClustalX 1.81 program using default settings.
Then a profile hidden Markov model (HMM) was made from the alignment file with the hmmbuild program of the HMMER 2.2 package (Sean Eddy, Washington University;
world-wide-web hmmer.wustl.edu/) and was calibrated rrvith the hmmcalibrate program of the HMMER package, both using default settings. Briefly, a profile hidden Markov model is a statistical description of a sequence family's consensus. HMMER is a freely distributable implementation of profile HMM software for protein sequence analysis and is available from the above web site. Finally, the consensus sequences were generated from the HMM with the hmmemit program of the HMMER package using the -c option so as to predict a single majority rule consensus sequence from the HMM's probability distribution. Highly conserved amino acid residues (p>==0.5) are shown in upper case in the consensus sequence, others are shown in lower ca se.
A "polynucleotide encoding an acyl-specific condensation domain (C-domain)" refers to a polynucleotide encoding an acyl-specific C-domain.
Representative examples of a polynucleotide encoding an aryl specific C-domain of the invention include the polynucleotide encoding the acyl specific C-domain residing in the ramoplanin biosynthetic locus from Actinoplanes sp. ATCC 33076 (SEQ ID NO: 5), the polynucleotide encoding the acyl-specific C-domain residing in the A21978C
locus in Sfreptomyces roseosporus NRRL 11379 (SEQ ID NO: 7), the polynucleotide encoding the acyl specific C-domain residing in the A54145 locus in Streptomyces fradiae ATCC
18158 (SEQ ID NO: 9), the polynucleotide encoding the acyl-specific C-domain residing in a lipopeptide biosynthetic locus in Strepfomyces ghanaensis NRRL B-12104 (SEQ ID
NO: 11 ), the polynucleotide encoding the acyl-specific C-domain residing in a lipopeptide biosynthetic locus in Streptomyces refuineus NRRL 3143 (SEQ ID NO:
13), the polynucleotide encoding the acyl-specific C-domain residing in a lipopeptide biosynthetic locus in Sfreptomyces aizunensis NRRL B-11277 (SEQ 1D NO: 15), the polynucleotide encoding the acyl-specific C-domain residing in a lipopeptide biosynthetic locus in Actinoplanes nipponensis FD 24834 ATCC 31145 (SEQ ID NO:
17), the polynucleotide encoding the aryl-specific C-domiain residing in a biosynthetic locus of a Streptomyces sp. in Ecopia's private culture collection (SEQ ID NO:
19), and the polynucleotide encoding the acyi-specific C-domain residing in the calcium dependent antibiotic biosynthetic locus from the Streptornyces coelicolor A3(2) (SEQ ID
NO: 21 ). Certain embodiments expressly exclude polynucleotides encoding the acyl-specific C-domain residing in the calcium dependent antibiotic biosynthetic locus from -~7-the Streptomyces coelicolorA3(2) (SEQ ID NO: 21 and nucleotide sequences encoding the polypeptide sequence of the C-domain of NRPS protein of GenBank accession no.
CAB 38518, i.e. coordinates 195135 to 217526 of nucleotide accession AL939115 represent the nucleotide sequence of the NRPS of CAE~38518). Other embodiments, exclude polypeptide sequences originating from an organism other than an organism of the actinomycetes taxon.
The acyl-specific C-domains of SEQ ID NOS: 6, 8, 10, 12, 14, 16, 18 and 20 were compared using the BLASTP algorithm with the default parameters to the sequences of the National Center for Biotechnology Information (NCBI) nonredundant protein database and to sequences of the DECIPHER~ database of microbial genes, pathways and natural products (Ecopia BioSciences Inc., St-Laurent, Canada).
The accession numbers of the top GenBank hits of this BLAST analysis are presented in Table 1 along with the corresponding E values. The E value assists in the determination of whether two sequences display sufficient similarity to justify an inference of homology. The E value relates the expected number of chance alignments with an alignment score at least equal to the observed alignment score. An E
value of 0.00 indicates a perfect homolog. The E-values are calculated as described in Altschul et al. 1990, J. Mol. Bioi. 215(3):403-410; Gish et al., 1993, Nature Genetics 3:266-272.

N N N N N N N

U
t0 N ~ ~ ~' T ~ N ~ N N

~ ~ ~ ~ ~ ~ ~
~

_ O N d O O O O O O
E N N E N ~ E

7 v N ~ N N N

> .ns N
f4 N f~ (~ C~ (~ In(b ~ f~9 _ _ B ~ N
C U tall tBl1 B N N B C
.C U ~

N f6N N N c0 N f0 w :O L l0 l0 'O O C VI
V - E N . ~ N . ~ N - = N - E _ -~ E E

d ~ ~ r w a ~ L Q n Q

N O- , T ~ _ > N ,_ 7 ~ . T . ~ N
N C C p ~

N t~ t , , - t t p- fl-. N i7 >' N 7. N d /~ N N B
~ ~

N N - "
d ? ~m ~ m ~ ~ ~ ~ ~ din~ ~ Ew ~ din.
'~ ~ _ U '- 'O _ _ ~ _ ,D _ _ ' G (ON ~ Q t0 'r, Q.(6 7. D.N U a U N U N m cttd O B O B N N N

m ~ C fl-m N O-m Q Q-~ m fl m d m ~ fn ~ G U U ~ U V d ~

L O O .C IL . 11~ LL ~~C
L ~ ~ ' ' L L_ C

O ~fl-O ~1 ~OL ~7t ~ OC t D L QO . ~OX
B ~ C
L

C Q (~ U ~ f0 D U D C 0 C G
N U U

O ' O N N n N N h M ~ O O M ~ .-R o ~ ~ ~ ~ ~ ~ ~ > o o a O M N ~ M M (Do t!7 u7C M \ O V M
M o ~ ~ V V' o t0~ V o ~. ~. ~ ~ .'' y o Ir\d' M ~C7c0 ~' V v7 d'V O a0N o V
Cq n at c~ ~
~

. r O V'~ f0 _ O tD ~ GO M O N W N O O~
ONO rM ~ N M f~~ '- ~ O O O ~ _ O CO~
' r O O ~-O ~ A
~~ O N
O

N V ~ CV~~ V V ~ OV' O'~OV N~ OV ~ ~ O V G~
' ' ' ' ' r- N ~1' d' u7 o ~O

~
~

o CO Oy ~ O~ N Nor~ No N~n~ V'o ~o O O~ M~ CO
47M N t0M M t0et tn ~ M M M o ~ ~
o ~ o o a o o o G7 C~'7O a~0 V ~ OD st~ 4M'7 n ~ ~ M M
M 1 ' ~
M

CO a0 M C 0 _ CO a0O M C' ~ 0 ~ a0 ~ 0 O O st (O ,~ _ 1~ O
~ M t o p y -N N ~MC~ M M N MM ~M~M MM ~M MMMM NM ~ ~N

N
'-COV' N eta'~h. a0if7O M M M M M h st c0V' ~ t~ N V ~ V' ~ 47 4 t 4 ' ~

~ ) f 7 ~ O
B B

.Q f~a P GB9N M m O ~ O N B O' B M B M O

Q C V C C

LCN N CO N td B f0iC B ~ (Cf6 t0 t0f9 ~C 07O M M M f9 tD O O CB9 1~ O O O

N d' r r N ~ ~ M ~ ~

C ', N N N N v- N r- N e- N r r r ~ ~ - r O'0 N V'O f0 ~ O cD ~ ' O

N 0 N N N ~ M N V V'~ V ~ M
0 ' ' ~.O f W M ~ M P N N '07C0 N (NO~ CN4 ' 4 (0 0 O ~ O COsn O COf0 47 O (CJCD ~ COM
7 N ~ ~

Q M CL~~ ~ d~~ ~ ~ Q ~ ~

m z D_ p_ z z m z m m as t~a m ~ ~ ~ m B fci k M M ~ M
' V V et d ~ V
' C U

~ D d J Q o p ~' (n N u3 C~ N l~
. E E ~ ~ ~ E

E ~.o ~~ a~ a d a s o o o C

. ~ Z Z Z Z Z U
t U U U U U

r Z

W

N N

T N T V

O '-' O
t1 d p ~

m ~ m ~

C N ~ N N
N O

m ~p U N N
N m N N ~ ~-.C E
d O

C 3 ~ O ~ 7 N ~ T N ~ Q ),N =
N ~ N T

E ~o = E E a ~ c ~ ~

~Um 'Q ~U.m N Q .U ~T
~ N

O m ~ a O CO E
O ~ O O
Q N I O
U d U

X ~O L X C Q ~ C
C ~~ N
L' t ~ D O ~ C D N
D U U
U

o N N ~ N ~ 'p O ~
o o a o o o a ~' ~' '.V M
V\''- ' ~7 M ~ ~ M M O \
6jtfjV lfl~ ~ V d.
~ N M
N

N ~ c- N N N _etN
d'~1'~1' t0 ~

O N
N

d' ~ N 10 O COO a0 ' o o o p ~'o 0 O ' N M N ~''~n O ~ V

V d d V ~ ~ V~V e O N N I~O O~M -_ _ ~ ~ O r ~ O M
O

M M M c'c1 M ~ ~ ~
~ M t N c0 ~O c0M O 1~.
CO~O ~ ofo7 i~. O I~ CO

tn t~~ ~ O N (fl m m ~ ~ m m m ~ m mm~ v r ~ co r~ N <r v ~ N m _ _ m O O ~ V'M 00 V'O c0 ~ r N N

N ~ N M N _ O

~I ~ t~

a Z CO Z

m m m m m m M M O

V

O

a a o U

a -~

Z Z Z
U U C
~

N N

As used herein, the term "adenylating enzyme" or ADLE, means member of a family of proteins involved in N-acyl capping for lipopeptide biosynthesis.
Representative adenylating enzymes of the invention include the adenylating enzyme residing in the ramoplanin biosynthetic locus from Actinoplanes sp. ATCC 33076 (SEQ
ID NO: 22), the adenylating enzyme residing in the A21978C locus in Streptomyces roseosporus NRRL 11379 (SEC2 1D NO: 24), the adenylating enzyme residing in a lipopeptide biosynthetic locus in Streptomyces ghanaer,~sis NRRL B-12104 (SEQ
ID NO:
26), the adenylating enzyme residing in a lipopeptide biosynthetic locus in Streptomyces refuineus NRRL 3143 (SEQ ID NO: 28), the adenylatinq enzyme residing in a lipopeptide biosynthetic locus in Streptomyces aizunen:>is NRRL B-11277 (SEQ
ID NO:
30), and the adenyiating enzyme residing in a lipopepticle biosynthetic locus in Actinoplanes nipponensis FD 24834 ATCC 31145 (SECT ID NO: 32). The adenylating enzyme may be a portion of a fusion protein, for example, the adenylating enzyme residing in the A54145 locus in Streptomyces fradiae A-fCC 18158 is residues 1 to 648 of a fusion protein designated ADLF (SECT ID NO: 34).
The adenylating enzyme is defined structurally as a polypeptide sequence that produces an alignment with at least 55% identity to the following consensus sequence using the BLASTP 2Ø10 algorithm (with the filter option -F set to false, the gap opening penalty -G set to 11, the gap extension penalty -E set to 1, and all remaining options set to default values):
>Consensus sequence 3 vsavmvdlaagpsvpaaLRahAearPdRtAvvfVrDtdradgtasLsYae LDrrARavAvwLrarIapGdRvLLLhPaGpeFvaAyLgCLYAGIvAVPAP
LPGgysherrRVvgIAaDagaga'VLTdadteAeVreWIaEtGLpc~LPVIA
vDpIAadgDPgaWrpPgIradtVAvLQYTSGSTGsPKGVvVIrHgNLLaNa rsLsrsfgltedtvfGGWLPIyHDMGLfGILIPaLfIGatvVLMSPsAFI
rRPhIW Lrl IDRfgvvfSAAPDFAYDLCvRRVtDEQiAgLDL~~RW RwAaN
GSEPIrAaTIRaFaeRFApAGLRpeaLtPCYGLAEATIfVSgksagplrt rrVDpaaLEdHrfeeAvpGrpaREiVsCGrvpdIevRIVDPgtgrpLPdG
aVGEIwLRGpSVaaGYWgrpEataetFgavtDGgDGpwLRTGDLGALyeG
ELYVTGRiKEILiVhGRNIYPhDiEhELRAaHdELagavGAaFaVpapGg GeEvIVVvHEVrprvpaDeIpaLAsAmRaTvaREFGvpaagVvLvRRGTV
rRTTSGKvQRramReLFItGeLapvHaelgphlqaaaagearaatalApa Stv where consensus sequence 3 is based on the ADLE polypeptide sequences of the RAPT, A410, 009H, 024A, RAMO and 023C lipopeptide loci as described herein above and residues 1 to 648 of the ADLF (as defined hereinafter) polypeptide sequence of the A541 lipopeptide locus. Consensus sequence 3 was generated as described above in relation to consensus sequences 1 and 2.
A "polynucleotide encoding an adenylating enzyme" or a "polynucleotide encoding ADLE" refers to a polynucleotide encoding a member of the ADLE family of proteins involved in N-aryl capping for lipopeptide biosynthesis.
Representative polynucleotides encoding adenylating enzymes of the invention include the polynucleotide encoding the adenylating enzyme residing in the ramoplanin biosynthetic locus from Actinoplanes sp. ATCC 33076 (SEQ ID NO: 21 }, the polynucleotide encoding the adenylating enzyme residing in the A21978C locus in Streptomyces roseosporus NRRL 11379 (SEQ fD NO: 23), the polynucleotide encoding the adenylating enzyme residing in a lipopeptide biosynthetic locus in Streptomyces ghanaensis NRRL B-12104 (SEQ ID NO: 25), the polynucleotide encoding the adenylating enzyme residing in a lipopeptide biosynthetic locus in Streptomyces refuineus NRRL 3143 (SEQ ID NO: 27), the polynucleotide encoding the adenylating enzyme residing in a lipopeptide biosynthetic locus in Streptomyces aizunensis NRRL
B-11277 (SEQ ID NO: 29), and the polynucleotide encoding the adenylating enzyme residing in a lipopeptide biosynthetic locus in Actinoplanes nipponensis FD

ATCC 31145 (SEQ ID NO: 31). The nucleotide encoding an adenylating enzyme may be a portion of a gene encoding a fusion protein, for example, the nucleotide encoding the adenylating enzyme residing in the A54145 locus in Streptomyces fradiae ATCC
18158 is residues 1 to 1944 of the nucleotide encoding a fusion protein designated ADLF (SEQ ID NO: 33). The ADLE portion of the ADLF fusion protein is sometimes designated with an asterisk "~'" in the figures.
The ADLE polypeptides of SEQ ID NOS: 24, 26, 28, 30, 31, 32, and residues 1 to 648 of SEQ ID NO 34, i.e. the portion of the ADLF fusion protein representing an ADLE protein, were compared using the BLASTP algorithm with the default parameters to the sequences of the National Center for Biotechnology Information (NCBI) nonredundant protein database and to sequences of the DECIPHERS database of microbial genes, pathways and natural products (Ecopia BioSciences Inc., St-Laurent, Canada). The accession numbers of the top GenBank hits of this BLAST analysis are presented in Table 2 along with the corresponding E values.

N
C ~. ~ ~ ~. ~~. v _ N N N
N N N N N O N ~ N
U ~p N CE N U N N j~ (fS N T f0 N
07 7. N _- ,~ ~ s = ~ = w ~ _ ' j O C ~E N C .E N O C ~E Q N G ~~ Q N C .~ O N
p_ ~, d y Y ?~ N p'~ N d '~" N G1 N N ~ N U _~ N fp fOd O M N
_N N t0 > N N N tG > d N N N (N4 > N N_ t4 > U N > O
O V fn TJ O N w _ :p G. N ~ ~ ~ O O C~ .C a C ~ Wp T ~ c O z N N ,~w~ . N N . ~ N N E
HIS N O. O O T C a. ~ O T G N ~fl. O ~ N T C ~~. Q N N 'a O Q m V C ~Q O
V N Q7 O U N N Q7 O U N t6 N y O N N i0 N U f6 d n~ G7 ~ ~ d C ~ I6 X d X a0.. 0.a ,~~~, X O-a ,~~~, _d d O-a O O- 'J
O N Q 7. ~ X N >. ~ X M N T ~ c X N 7. ~ O ~p >' p d (0 "- t '° r ° ~ ~ '° ~ ° ~ ~ ~ ~° L $ >, g ~
~° d uT7 ~ ~ E ~ a7 E ~
C O b N C V O TO-'' 07 C V ~ p 'B N ~ c_ c~ o a ~ ° o ai a~ O a ~
~ ~ ° N
m N N Q ~U V N a ~U V N N N Q :O O ~U U N Q ~8 O N c~N6 (C N
7 O (n >. O (n T ~ O (n - T O (n - O O fn N O
_'~°-USE°x ~UcE°~y_ ~UcY,~E~°c ~Uc~~ c.co ~Uc~~~~
O U O~ C~. ~V=~Q~"~ C>~~>'dv=,~ CTNTOCCp CTNfl~CCC
N U3i0m ~~~~RmQN O~~QOt6~ CU=QV O>'~ CN BOO
~~ N a M ~ c0 ~~ n o O V n ~ ~ tf) ~ et o O o OO o N o O p N
rC n CO ° n p n V M n n ° (D ° 07 ~ n n N ~ W ~ o n i0 c0 t0 ~ ~' d' ~ ~ ~ N t~ ~ ~ tf ~ M ~ N c0 °p ~ r t0 N n 07 ~~ r ~ O ~ M n M M M CO (O _~ CD N ~ a0 n '- M N
O O n 07 n ~, GO O n O f0 O ~' W O N N et M N O M M AO CO
~f/I N ~ N ~ N N ~ N ~ N ~Y M O N O N ~O M ~ M ~ M ~ M O M O M M

r I
:ir I N o M o (p a n n O a V o n o BCD \° 10 \° V \° O o ~ o N o O \° N \°
II~ n c0 n n st M n n ° c0 ° O) ° d~ n n o (O
° n °
O a0 M W O M d, O~ M N t0 M tf~ O'WO M_ _t0 n_ ~ n CO WO n~ t0 M t=1 !D ~O
O n tn r N c~ N N 1~ a0 M N
M N 47 007 V' O M O N m (O V~~ N N O O n ~ M CO O V ~ (p0 ~ N CO M O
~ N ~ N M r M N "~1~ N M e~ M N ~' N M N M N v N ~'Y N ~' N N ~' N ~' tn N 07 ~ N n ~ N r ~ ~ p cOr7 ~ O
.G a7 a7 n a7 co ~ ~ 07 a7 ,.~ r ~ N N r N ~ ~ M N
O
t.
Q
O
N
N c6 (0C N N ~ ~ ' O
m ~ m ° ~7 n _~a n O ~ c0 Cr9 n c0 N O N ~ N r n N N f3 N R N N_ N ~ N_ N y11 O r' O r ~ r O c- r W - V r ap O r O r ~ O M T O M N O p 00 O ~f~r_ nr_~ n0 ~''O~ NrN
N ~ CN7 N 07 M ' Cn M ~O O7 O fn (] 07 pMp a1 ~ U~ fd OI pip CO OI Z CO O~ C~
CO m Z CO Z
(8f R c06 N N N
e- O
~k ~ tD

M
a ~ °o o No ~E o n n o 0 a a Q a a N
d op O N
N N N M M

N.

O
N ~ O d C
C

p C N ~ f 0 ~ N

0 O .Ø 'L~
fn T t N ~ m O E =
o =

O C ~
~

E 'O N N Y . N N
T fl. N N ~ ~ N ~ O
N ~ j ~
y N
~
~

c ~ a c ~a E ~ ~ E d ~ ~ N E N : c _ - ~ a 7 O C a7 ~ O a ~. Q.'OO
O N p t. 07 O
N ~ :. . V
f .

y~U N IOUv~ .' p~ 00- QO
~~ ~ . .' a d c NO.

~ m a a - y E d v ~

~o t ~ uz ~a ~in~ ~m E C i6 ~ O O O O ~
$ ~ d d ~ N N
p ~ N _~ U

O ~ 0 O ~ S7 N C
O O ~ O ~ d ~

~ N t ' Q .fl ~ ' ~ ~ .c ~ 4 V N d ~ ~
~~ C o - ~ ~
' ~~ .
~
~

V 3 c o ~.'CQ C C C C o O
~ T'NG C T'NC T
O ~ O ~
T ~ T
7 i O Q 7. ~ T ~ 0 C
~ N ~ =
~ O

~ M N n ~ t ~ O
o 0 O ~W tp t~

'1 G ~ CD CO ) 9 _ CD O O) n CO CO ~ M ~
1(7f~ ~ ~ N M
W r a0 ~ 0 ~p d. M N
t(7N O ,~ ~ ~ ~ M
M ~ ~ M O M u7 ~
v!7 M M M vf7M M ..r ~C ~ ~
7 (7 M a~DoOo No pr7o~o ~(Ot~fDL~O'~
o ~

O ~ '7 c0 c0 ~ ~..~ O
7 M tn ~ ~ ~ 01 co n ~ o c%~
M N ~ N cp M V' d' N
M

N N N N N
Vv M cY d' f~ MO WO O C7 P

P N N
P d ~ O1 ~D

(6 (a I c6 N ~ ~ N N
M

f0 ~ M 'Nd'~ ~ M
O

CO ~ ~- cD M n N V
N d N to P r r a- r P N N

67 M O r N N
r N M N N n ~ 11>

N 1 m 2 y Z

m C7 ~ m I ~

m z m m u>

Q

J

a W I
'~

E P
L

Q Q ~
-i I

As used herein, the term acyl carrier protein or ACPH refers to a member of a family of proteins involved in N-acyl capping for lipopeptide biosynthesis.
Representative acyl carrier proteins of the invention include the acyl carrier protein residing in the ramoplanin biosynthetic locus from Actinoplanes sp. ATCC 33076 (SEQ
ID NO: 36), the acyl carrier protein residing in the A21978C locus in Streptomyces roseosporus NRRL 11379 (SEQ ID NO: 38), the acyl carrier protein residing in a lipopeptide biosynthetic locus in Streptomyces ghanaensis NRRL B-12104 (SEQ !D
NO:
40), the acyl carrier protein residing in a lipopeptide bio synthetic locus in Strept~myces refuineus NRRL 3143 (SEQ ID NO: 42), the acyl carrier protein residing in a lipopeptide biosynthetic locus in Strepfomyces aizunensis NRRL B-11277 (SEQ ID NO: 44), and the acyl carrier protein residing in a lipopeptide biosynthetic locus in Actinoplanes nipponensis FD 24834 ATCC 31145 (SEQ ID NO: 46). The acyl carrier protein may be a portion of a fusion protein, for example, the acyl carrier protein residing in the A54145 locus in Streptomyces fradiae ATCC 18158 is residues 649 to 743 of a fusion protein designated ADLF (SEQ ID NO: 34). The ACPH portion of the ADLF fusion protein is sometimes designated with a double asterisk "**" in the figures.
The acyl carrier protein (ACPH) of the invention is defined structurally as a polypeptide sequence that produces an alignment with at least 50% identity to the following consensus sequence using the BLASTP 2Ø10 algorithm (with the filter option -F set to false, the gap opening penalty -G set to 11, the gap extension penalty -E set to 1, and all remaining options set to default values}:
>Consensus sequence 4 MsdltappArhTPeeIRaWLrecvAdyVgIppaelatDvPLtdYGLDSVy aIaLCAeiEDhIGievdptLLWDhPTIdeLsaaLaprlarr where consensus sequence 4 is based on the ACPH polypeptide sequences of the DAPT, A410, 009H, 024A, RAMO, 023C lipopeptide loci and residues 649 to 743 of the ADLF polypeptide sequence of the A541 lipopeptide locus. A "polynucleotide encoding an ACPH" is defined as a nucleotide sequence encodincl an acyl carrier protein as defined above. Consensus sequence 4 was generated as described above in relation to consensus sequences 1 and 2.

_25_ A "polynucleotide encoding an acyl carrier protein" or a "polynucleotide encoding ACPH" refers to a polynucleotide encoding a member of the ACPH family of proteins involved in N-acyf capping for lipopeptide biosynthesis.
Representative polynucleotides encoding acyl carrier proteins of the invention include the polynucleotide encoding the acyl carrier protein residing in the ramoplanin biosynthetic locus from Actinoplanes sp. ATCC 33076 (SEQ ID NO: 35), the polynucleotide encoding an aryl carrier protein residing in the A21978C locus in Sfreptomyces roseosporus NRRL 71379 (SEQ iD NO: 37), the polynucleotide encoding the acyl carrier protein residing in a lipopeptide biosynthetic locus in Streptomyces ghanaensis NRRL B-12104 (SEQ ID NO: 39), the polynucleotide encoding an aryl carrier protein residing in a lipopeptide biosynthetic locus in Streptomyces refuineus NRRL
3143 (SEQ
1D NO: 41 ), the polynucleotide encoding an acyl carrier protein residing in a lipopeptide biosynthetic locus in Streptomyces aizunensis NRRL B-11277 (SEQ ID NO: 43), and the polynucleotide encoding the acyl carrier protein residing in a lipopeptide biosynthetic locus in Actinoplanes nipponensis FD 24834 ATCC 31145 (SEQ ID NO: 45). The polynucleotide encoding an aryl carrier protein may be a portion of a gene encoding a fusion protein, for example, the polynucleotide encoding the acyl carrier protein residing in the A54145 locus in Streptomyces fradiae ATCC 18158 is residues 1945 to 2229 of the polynucleotide encoding fusion protein designated ADLF (SEQ ID NO: 33).
The ACPH polypeptides of SEQ ID NOS: 36, 38, 40, 42, 44 and 46, and residues 649 to 743 of SEQ ID NO: 34, i.e. the ACPH portion of the ADLF fusion protein, were compared using the BLASTP algorithm with the default parameters to the sequences of the National Center for Biotechnology Information (NCBI) nonredundant protein database and to sequences of the DECIPHER~ database of microbial genes, pathways and natural products (Ecopia BioSciences Inc., St-Laurent, Canada).
The accession numbers of the top GenBank hits of this BLAST analysis are presented in Table 3 along with the corresponding E values.

' ~

O ~0 , ' U G U N ~ U U U

C
C G C U

a m ' oE E

0 ~ a E . E E a ~ E E E - E

O O U O O U~ O
- N~N

N f047 p, O N g Q O a a N N Q Oa d VJQ.

O 7.U ~ N U ~lU ~ ~ N t6 d U O L7 ind N U_~
>r '- , G O
C

. 7 O O O O O O O 3 O
O N V Z i (l) Z tnZ Q Cn N ZIn fn N(O

y ~ . N ) E n = v1C C C d C a ~ C C~ C ON
U d 0 C O O O U

N z N d z N z m _ _ ~ N Z ~ m ~Nm ~d Zm N a7 .

3 c O O O ~ O O O O O .C O O O O OL ~O Ot C
C

"- 7 - ~~ a 'o~ a ~ a. a n.~ a .o'~l a ay a ~ol~

. U a T U U U O

N d U U fl- _UO-~ _ U_01 _U W N tn -N
E N NJ M N '- M = N

o v c y alc alc d - ~ = c _ ,v_ c;o ~ ~ - ' t = ~ . :c -:~:

n. Y 3 L L 3 .c3 L .~ al ~ 3 0. r =d t OY
o O . O . O m : O E ~ E E
~ ~ ~ y E

z' O >'Y a O Y a Y d a a Y a C Q T T> T 7 ~ N N ~ N >' N 7 a O C T C >,C ,~ T T O 7. O > CO
~ > T 7 7 >

O_O (4 O L O L L L L L .Ca L 7Q
U n1 f0 c9 tC3 y o \ 0 0 0 0 ~ 0 0 0 0 0 0 o a o0 0 ~o ' N c ~ 0 h f~ N o V' M r etc0- 1~ 1~N f~ cN

I~-O N V'(6 O O ~ c0CO'~ ODCOr (O 0M M .,~.h R N ~hCO cav I~ O CO tnO N O N O

~ tv y _O ~f! V CO(Dv ~ v ~ .~...Ow n ~..~O.n ' o7 M ~ N CO((l N V (Of of I~M ~ 07 aN
~ ~

N girlm y ..\ ~o ~o 5 ~ r_. C ~ C ", ~co ~ coC

N p ~ ~ ~ M ~ ~

M N 4 e1~ .d 'C t ~ c' ~V' '7 ~

o p O ~ O ~ O O O O O O O O ~ ~ DO O OO

CO M 01 c V'o t~. KS ~ 01~t fDCOp o MM N MN

w, CO W N p GO~p ~ 1~Gl M a011 f0c0 CO OtD c0 MN

C 0 I~~ ~ CC~ V ~ C V'7 07 I~cVM ~ NI~. O ~O

d v M ~ ~ M M M ~ ~'M ~ ~ ~'' .~ ~ v..'~.i M
~

a0 .,~ N COtn . ..~ v ... h M ~ .O W ON
.~ 1D a .,.~ .. ..~ a0 C .
OD N '~' COt t~ p p n f~ a0M 00 M (~-' I'N ~ N ~t'0 a0 Nl~

M N ~ b' 00N COCOCfl Ol' 0 '~N ~' 1~CO ~W

N N e- N M M

.r V'N 1- 0 0~ O I~c0 d'0 47 Q7tn~P' o~ 07I~ a0 CVc0 O 0 0 X 0 0 x-'0 0 c-O O 0 0 0 ~ OO c- ~OO

N ~ N d N N N N N N O N N N N NO N NN
N N

u'7I~O e- N N c0t00 r-~~'~ ~ N N M NM W CDM

~ , i T N

R ~ ~ m c O

!6 c0(C O R t6~ O R cCN pR f6Qf R t0 t0 0 ca 0 1~ N tp R f0 R (ON R r c01~ ~g N R N R N N

~ N W - o e- o e-0 A N o0 rN a0 M
O D

07 . CO 0D~ ~t1 00~ I~ OD .~~Y o~ ~d;

d r ~ M O O

O ~ O ~ ~ ~ ~ ~
Y ~ 0 N M N ~ 00 N N MI~ N 0(~

M O O N 6 0 O 0 M ~
1 O ., O

!C W ~ ~ N O ~ N ~ p ~ ID N a ~ N N N ~N
D

m N t600 0 p c0 01aDO 0 p M O)a0N O) pO O) CO61 = c0 N et fG p V' COV'p cf1p O fDV t0 c0 pCD f0 stCO

u_ LLI m / I m I I m I m I w m Im m Im Q a o. a a ~- Q d n. c a a ~. a a.a a a z m z m z m m z m m m zm W

O

0 <l' M

O Q ~ O O O Q

J

J
a ~ d a a a ~ a~ ~ a a a a a Q

, <,.

M

d M p M V' ~Y ~ V V' (n As used herein, the term "ADLF" refers to a single open reading frame located in the A54145 locus (SEQ ID NO: 33), where the single open reading frame is formed by the genes encoding the ADLE and ACPH proteins fused together. The gene product of the open reading frame of SEQ ID NO: 33 is provided in SEQ fD NO:

wherein residues 1 to 648 of SEQ !D NO: 34 represent an ADLE protein and residues 649 to 743 of SEQ ID NO: 34 represent an ACPH protein. It is expected that a similar fusion of ADLE and ACPH homologues may occur in other lipopeptide biosynthetic loci.
It is also expected that other permutations of fusion proteins involving protein families of the invention may be found in lipopeptide loci, for example a fusion of ADLE
and ACPH
and the acyl-specific C-domain or a fusion of ACPH and the acyl specific C-domain.
Cosmid clones containing genes and proteins of the invention have been deposited with the International Depositary Authority of Canada, Bureau of Microbiology, Health Canada, 1015 Arlington Street, Winnipeg, Manitoba, Canada 3R2 under the terms of the Budapest Treaty on the International Recognition of the Deposit of Microorganisms for Purposes of Patent Procedure. An E. coli DH10B
strain harboring cosmid clone 0080H containing the ACPH gene and the acyl-specific C-domain in the biosynthetic locus for ramoplanin from Actinoplanes sp. ATCC
33076 was deposited on September 19, 2001 and assigned accession number IDAC 190901-3.
An E. coli DH10B strain harboring cosmid clone 00800 containing the ADLE gene in the biosynthetic locus for ramoplanin from Actinoplanes s,p. ATCC 33076 was deposited on September 19, 2001 and assigned accession number IDAC 190901-2. An E. coli DH10B strain harboring cosmid clone 0240K containing the ADLE and ACPH gene and the aryl-specific C-domain in the biosynthetic locus for the lipopeptide from Streptomyces refuinecrs subsp. iherm~tolerans was deposited on February 26, and assigned accession number 1DAC 260202-5. An E. c;oii DH10B strain harboring cosmid clone 1840M containing the ADLF fusion protein and the aryl-specific G-domain in the biosynthetic locus for A54145 lipopeptide from Sfreptomyces fradiae was deposited on February 26, 2002 and assigned accession number IDAC 260202-1.
The E. coli strain deposits are referred to herein as "the deposited strains".

The sequences of the nucleotides encoding members of the protein families ADLE, ADLF, ACPH and the acyl specific C-domains of the invention present in the deposited strains as well as the amino acid sequences of the corresponding polypeptides are controlling in the event of any conflict with any description of sequences herein. A license may be required to make, use or sell the deposited strains, nucleic acids therein or compounds derived therefrom, and no such license is hereby granted.
As used herein, the term "a polypeptide involved in lipopeptide synthesis"
refers to any polypeptide as defined herein as an acyl-specific C-domain, or an adenylating enzyme, or an acyl carrier protein. A "polynucleotide involved in lipopeptide synthesis" refers to a nucleotide sequence encoding a poiypeptide involved in lipopeptide synthesis as defined herein.
As used herein, "a condition of high stringency" refers to any one of the hybridization conditions described herein, and include other "high stringency"
conditions known in the art. In one condition, a polymer membrane containing immobilized denatured nucleic acids is first prehybridized for 30 minutes at 45 °C
in a solution consisting of 0.9 M NaCI, 50 mM NaH2P04, pH 7.0, 5.0 rnM Na2EDTA, 0.5% SDS, Denhardt's, and 0.5 mg/ml polyriboadenylic acid. Approximately 2 x 107 cpm (specific activity 4-9 x 108 cpmlug) of 32P end-labeled oligonucleotide probe are then added to the solution. After 12-16 hours of incubation, the membrane is washed for 30 minutes at room temperature in 1X SET (150 mM NaCI, 20 mM Tris hydrochloride, pH 7.8, mM Na 2EDTA) containing 0.5% SDS, followed by a 30 minute wash in fresh 1X SET
at Tm-10°C for the oligonucleotide probe, where Tm is the melting temperature of the probe. Stringency may be varied by conducting the hybridization at varying temperatures below the melting temperatures of the probes. The melting temperature of the probe may be calculated using the following formula: for oligonucleotide probes between 14 and 70 nucleotides in length, the melting temperature (Tm) in degrees Celcius may be calculated using the formula: Tm=81.5+16.6(log [Na+]) + 0.41 (fraction G+C)-(600/N), where N is the length of the oligonucleotide. If the hybridization is carried out in a solution containing formamide, the melting temperature may be calculated using the equation Tm=81.5+16.6(log [Na +)) + 0.41 (fraction G + C)-(0.63%
formamide)-(600/N), where N is the length of the probe. For probes over 200 nucleotides in length, the hybridization may be carried out at 15-25 °C
below the Tm.
For shorter probes, such as oligonucleotide probes, the hybridization may be conducted at 5-10 °C below the Tm. Preferably, the hybridization is conducted in 6X SSC for shorter probes and the hybridization is conducted in 50°/« formamide containing solutions for longer probes.
As used herein, the term "homology" refers to the optimal alignment of sequences (either nucleotides or amino acids), which may be conducted by computerized implementations of algorithms. "Homologsr", with regard to polynucleotides, for example, may be determined by analysis with BLASTN
version 2.0 using the default parameters. "Homology", with respect to polypeptides (i.e., amino acids), may be determined using a program, such as BLASTP version 2.2.2 with the default parameters, which aligns the polypeptides or fragments being compared and determines the extent of amino acid identity or similarity between them. It will be appreciated that amino acid "homology" includes conservative substitutions, i.e. those that substitute a given amino acid in a polypeptide by another amino acid of similar characteristics. Typically seen as conservative substitutions are the following replacements: replacements of an aliphatic amino acid such as Ala, Val, Leu and Ile with another aliphatic amino acid; replacement of a Ser with a Thr or vice versa;
replacement of an acidic residue such as Asp or Glu with another acidic residue;
replacement of a residue bearing an amide group, such as Asn or Gln, with another residue bearing an amide group; exchange of a basic residue such as Lys or Arg with another basic residue; and replacement of an aromatic residue such as Phe or Tyr with another aromatic residue. A "homology of 70°!° or higher"
includes a homology of, for example, 70%, 75%, 80%, 85%, 90%, 95%, and up to 100% (identical) between two or more nucleotide or amino acid sequences. A "homology of at least 45%" includes a homology of, for example, 45%, 50%, 60%, 70%, 80%, 9~0%, and up to 100%
(identical) between two or more nucleotide or amino acid sequences.

The present invention provides a method for detecting a polypeptide involved in lipopeptide biosynthesis or a polynucleotide encoding such a polypeptide.
In one embodiment, the method of the present invention provides one or more reference sequences and compares a candidate sequence (either a specific single candidate sequence or a candidate database sequence) with the one or more reference sequences. The sequence homology is determined for the sequences compared. A candidate sequence sharing at least 45% homology to one or more reference sequences is considered to be a candidate polypeptide or a candidate polynucleotide encoding a candidate polypeptide which is involved in lipopeptide biosynthesis. Preferably, a candidate polypeptide sequence sharing 45%
homology to consensus sequences 1 or 2, is considered as a candidate acyl-specific C-domain polypeptide, a candidate polypeptide sequence sharing 55% homology to consensus sequence 3 is considered a candidate adenylating enzyme, a candidate polypeptide sequence 50% homology to consensus 4 is considered a candidate acyl-carrier protein.
The involvement of these identified sequences in lipopeptide biosynthesis may be confirmed by first expressing the pofypeptide from the polynucleotide candidates and performing the function analysis according to methods known in the art and as described herein in Examples 1-2.
In another embodiment of the invention, the subject method compares one or more reference sequences against sequences within a candidate database of a specific organism. This will determine whether the specific organism may contain a polypeptide involved in lipopeptide biosynthesis or a polynucleotide encoding such a polypeptide. If it is determined that a specific organism may contain such a polynucleotide sequence encoding a polypeptide for lipopeptide biosynthesis, protE;ins from the candidate database (e.g., a part of the whole genome sequence) may be expressed and analyzed according to methods known in the art and as described herein in Examples 1-2.
In a preferred embodiment, the reference sequences used in the subject method are selected from the group consisting of polynucfeotide or pofypeptide sequences representing: an acyl-specific C-domain, an ADLE, an ACPH, and an ADLF

in one or more of the biosynthetic loci selected from the group consisting of RAMO, RAPT, A541, 009H, 024A, 0230, A410, 0708 and CADA.
In another preferred embodiment, the reference sequences may further include one or more reference polypeptides having at least 45% sequence homology to SEQ ID NO: 1 or SEQ ID NO: 2, one or more reference polypeptides having at least 55% sequence homology to SEQ ID NO: 3, one or more reference polypeptides having at least 50% sequence homology to SEQ ID NO: 4, or one or more reference polynucleotides encoding such polypeptide sequences.
Also within the scope of the present invention .are a memory system for storing data that can be accessed by a computer, a computer readable medium comprising a computer program and data for sequence comparison, and a computer system for performing sequence comparison of the present invention.
The computer system of the present invention will provide one or more reference polynucleotide or polypeptide sequences selected from the group consisting of polynucleotide or polypeptide sequences representing an acyl-specific C-domain, an adenylating enzyme (ADZE) or an acyl carrier protein ACPH or a fusion of the two (ADLF) in one or more of the biosynthetic loci selected from RAMO, DAFT, A541, 009H, 024A, 023C, A410, 0708 and CADA.
Additionally or alternatively, the computer systE:m of the present invention will provide one or more reference polypeptides comprising the consensus sequences of the present invention, i.e. one or more reference polypep~tides having at least 45%
sequence homology to SEQ ID NO: 1 or SEQ ID NO: 2, one or more reference polypeptides having at least 55% sequence homology to SEQ ID NO: 3, one or more reference polypeptides having at least 50% sequence homology to SEQ ID NO: 4, or one or more reference polynucleotides encoding such polypeptide sequences.
The computer system of the invention may also provide candidate polynucleotide or polypeptide sequence(s). The candidate polynucleotide or polypeptide may exist as a specific single sequence or it may be a candidate database, e.g., a part of the entire genome sequence of an organism, or protein family sequences.

The computer system of the invention will perform sequence comparison between one or more candidate sequences and one or more reference sequences.
The computer system will also determine the level of homology of two or more sequences compared and identify a candidate sequence which shares at least 45%
homology with a SEQ ID NO: 1 or SEQ ID NO: 2, and in some embodiments additionally identify a candidate sequence which shares at least 55% homology with SEQ ID NO: 3 or a candidate sequence which shares at least 50% homology with SEQ
ID NO: 4.
The memory and computer system of the present invention permits the quick development of methods to search candidate databases and individual candidate sequences for their sequence homolagy against one or rnore reference sequences. In addition, the memory and computer system of the present invention will also permit the prediction of protein sequences from polynucleotide sequences, the prediction of homologous protein domains between two or more polypeptides, and the analysis of structure and function from sequence data.
The computer may be programmed to implement a process for effecting the identification, analyses, or modeling of a sequence of a ~>olypeptide or a polynucleotide.
In one embodiment the memory of the present invention contains data representing a polypeptide with 70% sequence homology to any one sequence selected from the group consisting of: SEQ ID NOs. 1, 2, 3, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, and 48. The preferred process by which data for a source database according to the present invention may be obtained is illustrated in Figures 12 and 13.
One use of the memory and computer system involves studying an organism's genome (e.g., database candidate sequences) to determine the sequence homology between the polynucleotide/polypeptide sequences in the genome and one or more reference polynucleotide or polypeptide sequences. Such information is of significant interest in assessing whether an organism contains a lipopeptide biosynthesis locus or any polynucleotidelpolypeptide involved in lipopeptide biosynthesis.

Another use of the memory and computer system involves studying one or more specific candidate sequences to determine the sequence homology between the specific candidate polynucleotidelpolypeptide sequences and one or more reference polynucleotide or polypeptide sequences. Such information helps to determine whether the specific candidate sequence is involved in lipopeptid~' biosynthesis.
Where a specific polynucleotide candidate sequence or polynucleotide database candidate sequences are being analyzed, the memory and computer system may permit the prediction of an Open Reading Frame (ORF) from a candidate sequence. The ORF corresponds to a nucleotide sequence which could potentially be translated into a polypeptide. Such a stretch of sequence is uninterrupted by a stop codon. An ORF that represents the coding sequence for a full protein generally begins with an ATG "start" codon and terminates with one of the three "stop" codons.
For the purposes of this application, an ORF may be any part of a coding sequence, with or without start and/or stop codons. For an ORF to be considered as a good candidate for coding for a bona fide cellular protein, a minimum size requirement is often set, for example, a stretch of DNA that would code for a protein of 50 amino acids or more.
To make the above sequence information manipulation easy to perform and understand, sophisticated computer database systems rrsay be used. In one embodiment, the reference sequences are electronically recorded and annotated with information available from public sequence databases. Examples of such databases include GenBank (NCB/) and the Comprehensive Microbial Resource database (The Institute for Genomic Research). The resulting information is stored in a relational database that may be employed to determine homologies between the reference sequences and genes within and among genomes.
To identify homologies between the sequences, one or more sequence alignment algorithms such as BLAST (Basic Local Alignment Search Tool) or FAST
(using the Smith-Waterman algorithm) may be employed. In a particularly preferred embodiment, these two alignment protocols are used in combination. Both of these algorithms look for regions of similarity between two sequences; the Smith-Waterman algorithm is generally more tolerant of gaps, and is used t;o provide a higher resolution match after the BLAST search provides a preliminary match. These algorithms determine (1 ) alignment between similar regions of the trnro sequences, and (2) a percent identity between sequences. For example, alignment may be calculated by matching, base-by-base or amino acid-by-amino acid, the regions of substantial similarity.
Figure 11 is a block diagram of a computer system according to one embodiment of the invention. The system shown in Figure 11 for performing the sequence comparison processing of the invention may be a general purpose computer used alone or in connection with a specialized processing computer. Such processing may be performed by a single platform or by a distributed processing platform.
In addition, such processing and functionality can be implemented in the form of special purpose hardware or in the form of software being run by a general purpose computer.
Any data handled in such processing or created as a result of such processing can be stored in a temporary memory, such as in the RANI of a given computer system or subsystem. In addition, or in the alternative, such data may be stored in longer-term storage devices, for example, magnetic disks, rewritable optical disks and so on. For purposes of the disclosure herein, computer-readable media may comprise any form of data storage mechanism, including such existing memonr technologies as well as hardware or circuit representations of such structures and of such data.
The computer system 40 (Figure 11 ) may include an operating system (e.g., UNIX) on which runs a relational database management system, a World Wide Web application, and a World Wide Web server. The softwarE; on the computer system may assume numerous configurations. For example, it may be provided on a single machine or distributed over multiple machines.
World Wide Web application includes the executable code necessary for generation of database language statements [e.g., Standard Query Language (SQL) statements]. Generally, the executables will include embedded SQL statements.
In addition, the World Wide Web application may include a configuration file which contains pointers and addresses to the various software entities that comprise the server as well as the various external and internal databases which must be accessed to service user requests. The Configuration file also directs requests for server resources to the appropriate hardware--as may be nece >sary should the server be distributed over two or more separate computers.
A World Wide Web browser may be used for providing a user interface 10 (Figure 11 ). Through the Web browser, a user may construct search requests for retrieving data from a sequence database and/or a genomic database. Thus, the user will typically point and click to user interface elements such as buttons, pull down menus, scroll bars, etc. conventionally employed in graphical user interfaces.
The requests so formulated with the user's Web browser are transmitted to a Web application which formats them to produce a query that can be employed to extract the pertinent information from sequence databases or genomic databases.
When network 40 employs a World Wide Web server, it supports a TCPlIP
protocol. Local networks such as this are sometimes referred to as "Intranets." An advantage of such Intranets is that they allow easy comrr~unication with public domain databases residing on the World Wide Web (e.g., the GenBank World Wide Web site).
Thus, in a particular preferred embodiment of the present invention, users can directly access data (via Hypertext links for example) residing on Internet databases using a HTML interface provided by Web browsers and Web servers.
Example 1: Conserved genes and proteins involved in N-acylation in i~poaeptides The acyl-specific C-domains and A~LE, A~LF and ACPH protein farr~ilies of the invention were discovered by identifying, characterizing and comparing several full-length biosynthetic loci, each producing a lipopeptide of known structure and each residing in a microorganism reported to produce the lipopeptide of known structure.
RAM4: Ramoplanin is a lipopeptide produced by Actinoplanes sp. ATCC
33076 (see US Patent No. 4,303,646). Ramoplanin is a glycosylated lipodepsipeptide of known structure (see, for example, US Patent No. 4,427,656). The full-length biosynthetic locus for ramoplanin from Actinoplanes sp. (F;AMO) was cloned and sequenced (Fig. 1 a). The open reading frames in RAMO were identified and a function was attributed to each protein encoded by the open reading frames. RAMO is described in detail in co-pending US application USSN 09/976,059 and in PCT
international application PCT/CA01I01462, published as ~JVO 02131155.
DAPT: A21978C is a lipopeptide produced by Sfrepfomyces roseosporus.
The structure of A21978C is known. While some progress has been reported towards elucidation of the biosynthetic locus responsible for the production of A21978C in Streptomyces roseosporus (DAFT), the full locus was not known. Transposon mutagenesis techniques had been performed to locate D,APT [McHenney et al.
(1998) J. Bact. Vol. 180 pp. 143-151 and DNA fragments derived therefrom had been used for insertional mutagenesis experiments that demonstrated inactivation of A21978C
production. Analysis of the DNA sequence of the fragments revealed the presence of NRPS genes involved in the biosynthesis A21978C. This genetic and biological data demonstrated beyond doubt that the identified pathway was indeed responsible for A21978C expression. However', the full biosynthetic locus for A21978C had not been reported.
The method used to clone DAFT, a partial IocLis formed of seven complete and one partial open reading frames (ORFs) (Fig. 1a), is disclosed in USSN
60/342,133. Actinomycetes generally produce lipopeptides using NRPS proteins and a number of the ORFs discovered corresponded to NRPS proteins. Moreover, one of the NRPS ORFs discovered contained the partial NRPS sequences previously demonstrated to be part of the A21978C locus, thereby confirming the identify of RAPT.
The module and domain organization analysis of ORFs designated 7 to 9 in USSN
60/342,133 is consistent with that expected for biosynthesis of A21978C as described in detail in USSN 60/342,133. The nature and order of the amino acid residues specified by ORFs 7 to 9 coincide with the exact chemical structure of A21978C
(see Table 3 and Fig.1 of USSN 60/342,133). This analysis, as described in detail in USSN
601342,133 demonstrate beyond doubt that DAPT is indE:ed the biosynthetic locus for A21978C from S. roseosporus.
A541: Streptomyces fradiae strain NRRL 181;18 was known to produce the lipopeptide antibiotic complex A54145 of known structurE:. However the biosynthetic _37_ locus for A54145 in Streptomyces fradiae (A541 ) was not known. We cloned, sequenced and annotated A541, as disclosed in detail in USSN 60/342,133, USSN
601372,789 and in co-pending USSN 1 OJXXX,XXX entiled Genes and Proteins involved in the Biosynthesis of Lipopepfides filed concurrently with the present application and also claiming priority from USSN 601342,133 and USSN 60/372,789. The contents of USSN 101~;XX,XXX are incorporated herein in its entirety for all purposes.
A541 contains three complete and one partial 1~RPS genes (Fig. 1b).
Analysis of the NRPS ORFs revealed the presence of conserved domains involved in the recognition, activation, modification and condensation of amino acids. A
total of 13 modules responsible for the condensation of 13 amino acid residues were identified as expected given that A54145 is composed of 13 amino acids. The adenylation domains were examined in order to determine the specificity of thE: amino acids that they activate and tether to the cognate thiolation domain of the NRPS. The nature and order of the amino acid residues specified by the NRPS ORFs exactly correspond to the nature and order of the amino acid residues found in the A54145 chemical structure (see Table 4 and Figure 2 of USSN 601372,789). A methylation domain of ORF 8, module 5 as disclosed in USSN 60/372,789 specifying the amino acid glycine corresponds to the amino acid incorporated in the fifth position of A54145 which is a N-methylated glycine (sarcosine). The nature and order of the amino acids specified by the NRPS
genes as well as the presence of domains involved in the modification of some of the amino acids confirm that A541 is indeed the biosynthetic locus for A54145 in S. fradiae.
RAMO, RAPT and A541 were analyzed and compared. All three loci contain NRPS loading modules that begin with a condensation domain instead of the conventional adenylation-thiolation domains (Fig.1a and b, SEQ ID NOS: 6, 8 and 10 respectively). Such modules would generally be considered not to be capable of initiating peptide assembly on the assumption that the C-domain would likely interfere with this initiation process (see, for example, Linne and Marahiel, 2000, Biochemistry, Vol. 39, pp. 10439-10447). The nucleotide sequences of the members of the conserved family of unusual NRPS C-domains in RAMO, RAPT and A541 are disclosed as SEQ
ID NOS: 5, 7 and 9 respectively. The polypeptides coding for the members of the conserved family of unusual NRPS C-domains in RAMO, DAPT and A541 are disclosed as SEQ ID NOS: 6, 8 and 10 respectively.
These C-domains were assessed by computer comparison with proteins found in the GenBank database of protein sequences (National Center for Biotechnology Information, National Library of Medicine, Bethesda, MD, USA) using the BLASTP algorithm (Altschul et al., supra) and the results are presented in Table 1.
Amino acid sequence comparison analysis indicates that the RAMO, DAPT and A541 C-domains are related to condensation domains found in other lipopeptide-encoding NRPS systems.
The RAMO, DAPT and A541 C-domains were also compared to a collection of condensation domains derived from various lipopeptide NRPSs obtained from GenBank or disclosed herein. Figure 2 shows the evolutionary relatedness of these C-domains. Apart from RAMO, DAPT, A541, figure 2 refers to additional lipopeptide biosynthetic loci by way of a four letter designations wherein CADA is the biosynthetic locus for the calcium-dependent antibiotic, FENG is the biosynthetic locus far fengycin, SURF is the biosynthetic locus for surfactin, SYRI is the biosynthetic locus for syringomycin, SERR is the biosynthetic locus for serrawE;ttin, LICH is the biosynthetic locus for lichenysin, ITUR is the biosynthetic locus for iturin, and MYSU is the biosynthetic locus for mycosubtilin. All C-domains included in this analysis are fuil-length C domains. The convention used to identify and distinguish C domains in Figure 2 is as follows. Those NRPS G-domain sequences that were obtained from the GenBank database are denoted by accessions beginning with three letters and are followed by digits (usually numbering 5). These first eight characters identifying each of the C domains correspond to the GenBank accession number. The lower case "n"
serves to denote "NRPS domain", and the "CD" followed by two digits denotes "C
domain" and its number relative to the other C domains contained on that polypeptide sequence. For example "AAC80285nCD06~SYRl" represents the amino acid sequence corresponding to the sixth C domain contained on the GenBank entry AAC80285 for an NRPS from the syringomycin biosynthetic locus. The NRPS C domain sequences that are disclosed for the first time in this application, in U.S. provisional patent application USSN 60!342,133 or U.S. patent application USSN 09/976,059 follow a similar nomenclature (nCD00) but are denoted by nine-character accessions beginning with three numbers.
Analysis of a clustal alignment of the C-domains clearly shows that these domains are evolutionarily related to C-domains found in the starter modules of known N-acylated lipopeptides such as calcium-dependent antibiotic (CADA) (Fig.1e, domain 22), surfactin (SURF), syringomycin (SYRI) and mycosubtilin (MYCO) among others (Fig. 2). Moreover, these special C-domains are significantly evolutionarily distant from regular condensation domains found in NRPSs that catalyze amide bond formation and condensation between two adjacent amino acids (Fig. 2).. Alignment of these unusual C-domains demonstrates the conservation of motifs and specific amino acid residues important for their catalytic activity (Fig. 3). Based on these observations, the unusual C-domains are considered to catalyze N-acyi peptide linkages between a fatty acid and the amino terminal group of an amino acid.
A conserved family of activating enzymes (ADLE) was also found tn be common to RAMO, DAFT and A541, although the gene encoding the activating enzyme in A541 was fused together with the gene encoding an acyl-carrier protein to form a single ORF (ADLF). The nucleotide sequences of the members of the conserved family of activating enzymes in RAMO, DAPT and A541 are disclosed as SEQ ID NOS: 23, and 35 respectively. The polypeptides coding for these activating enzymes are disclosed as SEQ ID NOS: 24, 26 and 36 respectively. T'he ADLE activating enzyme portion of the ADLF fusion protein is referred to as SEQ ID NO: 36*.
A conserved family of aryl carrier proteins (ACPFi) was also found to be common to RAMO, DAPT and A541, although the gene encoding the acyl carrier protein in A541 was fused together with the gene encoding the activating enzyme to form a single ORF (ADLF). The nucleotide sequences e~f the members of the conserved family of acyl carrier proteins in RAMO, DAP'1~ and A541 are disclosed as SEQ ID NOS: 37, 39 and 35 respectively. The polypeptides coding for these acyl carrier proteins are disclosed as SEQ ID NOS: 38, 40 and 36 respectively. The ACPH
acyl carrier portion of the ADLF fusion protein is referred to as SEQ ID NO:
36**.

_40_ The biological function of the ADLE, ADLF and ACPH ORFs was assessed by amino acid sequence similarity analysis. The ADLE family of proteins shows similarity to various aryl CoA ligase enzymes whereas the ACPH family of proteins has sequence similarities to acyl carrier proteins found in the aryl-condensing polyketide synthase enzymatic systems (Tables 2 and 3). Clustal alignment of ADLE ORFs shows the conservation of domains and residues important for their enzymatic function (Fig.
4). Alignment of ACPH ORFs shows their overall sequence conservation and the absolute conservation of the serine residue that is modified by phosphopantetheinylation to form the active holo-acyl carrier protein (Fig.
5). Both ADLE and ACPH protein families are evolutionarily closely related to corresponding protein families from other lipopeptide loci (Fig 6).
The ADLE and ACPH proteins as well as the acyl-specific C-domains of the invention are widely conserved throughout the biosynthetic loci of structurally diverse lipopeptides, including glycosylated lipopeptides and acidic glycopeptides.
The only structural feature common to ramoplanin, A21978C and ,A54145 is a peptide backbone appended with a fatty acyl group at the N-terminal amino acid residue. Based on these correlations, the ADLE and ACPH proteins, and the unusual C-domain are considered to be responsible for activating and tethering fatty acyl groups and catalyzing the formation of the N-aryl peptide linkage.
Example 2: Biosynthesis of N-acylated peptides:
Despite the significant overall evolutionary disi:ance between the lipopeptide-producing microorganisms described in this invention, they all contain closely related C-domains that are used for peptide N-acylation, a step which doubles as the peptide chain initiation step. Without intending to be limited to any particular biosynthetic scheme or mechanism of action, the ADLE, ACPH and unusual NRPS C-domain of the present invention can explain formation of the N-acyl peptide linkage found in lipopeptides. Figure 7 illustrates a mechanism for NRP'> chain initiation in which the fatty acyl group primes the synthesis of the peptide by the NRPS. CoA-linked fatty acyl precursors are channeled from the primary metabolic pool and modified while still attached to CoA by accessory enzymes such as oxidoreductases, epoxidases, desaturases, etc. encoded by genes of primary metabolism or by genes within the biosynthetic locus. The mature fatty acyl-CoA intermediate is then recognized by the cognate adenyiating enzyme and transferred onto the phosphopantetheinyl prosthetic arm of the free holo-ACP, releasing CoA-SH and utilizing ATP in the process.
It is alternatively contemplated that the adenylating enzyme may recognize free fatty acyl substrates) and transfer them onto the phosphopantethE;inyl prosthetic arm of the free holo-ACP, utilizing ATP in the process. Once the fatty acyl group is tethered onto the free holo-ACP, the C domain of the first module carries out a reaction in which the carbonyl group of the activated fatty aryl is condensed with the amino group of the amino acid substrate that had been previously activated .and tethered by the first module of the NRPS. Hence, peptide chain initiation and N-acylation are closely coupled. Subsequent peptide elongation and termination steps can then proceed as with typical NRPS modules.
Figure 8 illustrates the above-described amino acid N-acylation mechanism using specific examples in known lipopeptide biosynthetic pathways. in ramoplanin biosynthesis, an ADLE enzyme activates specific fatty acid moieties and subsequently tethers them onto the phosphopantetheinyl prosthetic arm of the ACPH
(disclosed herein as SEQ ID NOS: 24 and 38 respectively). The carbonyl group of the activated fatty acyl is then condensed to 'the amino group of the ad;paragine residue (Asn) that had been previously activated by and tethered to the first module of the NRPS.
The condensation reaction is catalyzed by the acyl-specific C-domain, disclosed herein as SEQ ID NO: 6, of the first module of the NRPS (Figs 1a and 8).
In another example, biosynthesis of the acylat~ed peptide chain of antibiotic A54145 is initiated by activation and tethering of specific fatty acid units onto the ACPH
component of the ADLF protein disc)osed herein as SEC,3 ID NO: 36. ADLF
represents the fusion of the two protein families, ADLE and ACPH, required for activation of fatty acids in lipopeptide biosynthesis. Once the fatty acid is activated, the acyl-specific C-domain of the first module, disclosed herein as SEQ ID NO 10, catalyzes the 30'02-1 OCA

condensation of the carbonyl group of the fatty acyl and the amino group of the tryptophan residue (Trp) that had been previously activated by and tethered to the first module of the NRPS (Figs 1 b and 8).
The same mechanism for peptide N-acylation may be present in other microorganisms. Evidence supporting this hypothesis includes the fact that other lipopeptide NRPS enzymes that have been identified in very diverse microorganisms contain a specialized C domain in the first module. Examples include the syringomycin biosynthetic locus from Pseudomonas syringae pv. syringae (Guenzi at al.
(1998) J.
Biol. Chem. Vol. 273, pp. 32857-32863); the serrawettin W2 biosynthetic locus from Serratia liquefasciens MG1 (t_indum et al. (1998) Vol 180, pp. 6384-6388); the fengycin biosynthetic loci from Bacillus subtilis b213 and A1/3 (Steiler et al. (1999) Chem. Biol.
Vol. 6, pp. 31-41 ); the surfactin biosynthetic locus from Bacillus swotilis;
the lichenysin biosynthetic locus from Bacillus licheniformis (Konz et al. (1999) J. Bact.
Vol. 181, pp.
133-140); and the "calcium-dependent antibiotic" (CADA) biosynthetic locus from Sfrepfomyces coelicolor A3(2) (Hajati et al. (2002) Chem~. Biol. Vol. 9, pp.
1175-1187).
The CADA biosynthetic locus does not apparently have an adenylating enzyme homologue but it does contain a free aryl carrier protein that may participate together with the unusual C domain of the first NRPS module in the N-acylation mechanism.
Therefore, certain fatty acids may require specialized enzymes to transfer the fatty aryl moiety onto the acy! carrier protein, but once tethered ont~ the free acyl carrier protein the mechanism is analogous to that outlined in Figure 7. It is noteworthy to point out that the fatty acyl moiety of CDA is unique in that it contains an epoxy modification.
Hence such fatty acids may be transferred onto the ACP by some other specialized enzyme.
It is possible that the N-acylation mechanism of the present invention extends beyond bacteria to even more diverse microorganisms such as lower eukaryotes and other organisms. For example, the fungi Aspergillus nidulans var. roseus, Glarea lozoyensis, and Aspergillus japonicas var. aculeatus are known to produce the antifungal lipopeptides echinocandin S, pneumocandin B0, and aculeacin A, respectively (Hino et al. (2001 ) Journal of Industrial Microbiology and Biotechnology Vol 30'02-10CA

27, pp. 157-162). Based on the overall similarity between fungal and bacterial NRPS
systems and on the fact that we have shown that very diverse NRPS systems employ the same mechanism of N-acylation, the mechanism of peptide N-acylation described in this invention is likely to be operative in these and/or other lipopeptide-producing lower eukaryotes as well.
Although the disclosed mechanism for peptide N-acylation is apparently widespread among very diverse microorganisms, it is nol: the only means by which lipopeptides can be generated. For example, the lipopeptides mycosubtilin and iturin A
produced by Bacillus subtilis ATCC and RB14, respectively, are each assembled by multifunctional hybrid polypeptides comprising fused fatty acid synthase, amino transferase, and NRPS activities (Duitman et al. (1999) Proc. lVatl. Acad. Sci USA. Vol:
96, pp. 13294-13299; Tsuge et al. (2001 ) J. Bact, Vol. 1 X13, pp. 6265-6273).
This alternative mechanism of peptide N-acylation may be more evolutionarily restricted as, to the best of our knowledge, it has been identified only in members of the genus Bacillus, and the lipopeptides produced by these biosynthetic loci are members of a distinct sub-group of lipopeptides that contain a ~i-amino fatty acyi smoiety linked to the amino terminus of the peptide core. Despite the fact that this mechanism of N-acylation does not involve the action of ADLE and ACPH homologues, the C-domains that condense the ~i-amino fatty acyl moiety to the first amino acid of both mycosubtilin and iturin are found to cluster within the highlighted group of acyl-specific C-domains as shown in Figure 2.
The widespread N-acylation mechanism for peptide natural products provides a knowledge-based approach for discovery and identification of lipopeptide biosynthetic foci in microorganisms. The highly conserved nucleotide sequences that are distinguishing signatures of the adenylating enzyme, the acyl carrier protein, and/or the specialized C-domain involved in the N-acylation mechanism can be identified and utilized as probes to screen libraries of microbial genomic DNA for the purpose of rapidly identifying, isolating, and characterizing lipopeptide biosynthetic loci in microorganisms of interest. The sequences of ADLE, A~CPH proteins and the acyl-specific C-domain can also be used for in silico screening of large collections of 3G'~2-1 OCA
44 _ microorganisms. Such a genetic-based screen has the added advantage over traditional fermentation approaches in that organisms having the genetic potential to produce lipopeptide natural products can be identified without the laborious fermentation, isolation, and characterization of the lipopeptide natural product. In addition, those organisms that normally produce lipopeptides only at very low or undetectable amounts or those organisms that only produce lipopeptides under very specialized growth conditions can nevertheless be readily identified using this genetic approach.
I0 Example 3: Identification of putative lipopeptide biosynthetic locus 009H:
The sequences of the ADLE, ACPH and the acyi-specific C-domain were used in silico to screen a proprietary database of bacteria I secondary metabolism foci, DECIPHER~ (Ecopia BioSciences Inc; CA 2,352,451). To facilitate sequence comparisons, a protein domain database was generated that is part of the DECIPHER~
database and comprises domains from multimodufar proteins such as NRPSs and polyketide syntheses, as well as equivalent domains found in non-modular proteins.
Protein sequences from loci RAMO, DAPT and A541 corresponding to acyl-specific C-domains, disclosed as SEQ ID NOS: 6, 8 and 10 respectively, ADLE
ORFs, disclosed as SEQ ID NOS: 24, 26 and 36*, and ACPH CiRFs, disclosed as SECT ID
20 NOS: 38, 40 and 36**, were compared to the DECIPHER~ domain database using the BLASTP algorithm (Altschuf et al., supra). Moreover, consensus sequences from the aryl-specific C-domain, the ADLE and ACPH proteins, generated using the HMMER
software package as described herein and disclosed as SEQ ID NOS: 1, 2, 3 and 4, were also compared to the DECIPHER~ domain database.
Determination of sequence homology is assisted by the E value that indicates whether two sequences display sufficient similarity to justify an inference of homology.
An E value of 0.00 indicates a perfect homolog. The E values are calculated as described in Altschul et ai.1990, J. Mol. Biol. 215(3): 403-410; in Altschul et a1.1993, Nature Genetics 3: 226-272.

_45-Comparison analysis of acyl-specific C-domain sequences with sequences derived from over 450 loci in the DECIPHER~ database revealed the presence of a condensation domain, disclosed herein as SEQ ID NO: 12, that is included in locus 009H found in Streptomyces ghanaensis (NRRL ~-12104). Table 4 shows that SEQ
ID
NO: 12 shows higher sequence similarity with sequences from the acyl-specific C-domains of RAMO, RAPT and A541 (that condense an aryl group to the amino terrrr~inal group of an amino acid) than with a typical NRPS condensation domain that catalyzes joining of two amino acids, as exemplified by the C-domain of the first module found in the ramoplanin ORF13 as described in detail in PCT/CAU1I01462.
Table 4 Target SEQ ID Probing Domain E value Domain NO SEQ ID
IVO

1 Consensus C1 4 e-54 12 2 Consensus C2 1 e-115 12 6 RAMO C-domain 4 e-4~4 12 8 DAPT C-domain 4 e-T5 12 10 A541 C-domain 1 e-E.5 12 -, RAMO ORF13, C-domain 4 e-11 12 3 Consensus ADLE 0.00 28 24 RAMO ADLE 1 e-118 28 26 DAPT ADLE 1 e-1 ~41 28 36* A541 ADLE 1 e-141 28 4 Consensus ACPH 2 e-22 42 38 RAMO ACPH 4 e-115 42 40 DAPT ACPH 6 e-115 42 36** A541 ACPH 3 e-07 42 Similarly, ADLE domains with SEQ ID NOS: 3, 24, 26 and 36* as well as ACPH domains with SEQ ID NOS: 4, 38, 40 and 36** were compared to the DECIPHER~ domain database. Comparison analysis inclicated the presence o~F
proteins with high sequence homology to ADLE and ACPH sequences, disclosed as SEQ ID NOS: 28 and 42 respectively, also found in the 009H locus. The relatedness of SEQ ID NOS: 12, 28 and 42 to acyl-specific C-domains, ADLE and ACPH proteins was further confirmed by clustal sequence alignment showings the conservation of specific protein domains and by phylogenetic analysis (Figs 3-6).
Closer inspection of locus 009H shows the presence of 4 NRPS ORFs composed of 13 modules (Fig. 1 b). The first NRPS ORF begins with the aryl-specific C-domain (SEQ ID NO: 12) instead of a typical adenylation domain. The ADLE and ACPH
proteins (SEQ ID NOS: 28 and 42, respectively) are found in close proximity to the NRPS carrying the acyl-specific C-domain indicating that all three enzymes are part of the same biosynthetic locus. The simultaneous presence of these three enzymes along with the N-terminal location of the aryl-specific C-domain and the presence of a multienzymatic NRPS complex is consistent with the biosynthesis of an N-acylated lipopeptide, specified by locus 009H.
Example 4: Pdentification of putative lipopeptide biosynthetic locus 023C
In siiico screening of the DECIPHER~ database with consensus protein sequences and with sequences from loci RAMO, DAFT and A541 corresponding to acyl-specific C-domains, disclosed as SEQ ID NOS: 1, 2, 6, 8 and 10 respectively, further revealed the presence of an acyl-specific C-domain in locus 023C
present in Streptomyces aizunensis NRRL S-11277. As shown in Table 5, sequence comparison analysis demonstrates that the 0230 aryl-specific C-domain, disclosed herein as SEc~
ID NO: 16, is more closely related to the N-aryl capping C-domains from RAMO, DAPT
and A541 than to typical NRPS condensation domains represented by the G-domain of the first module found in the ramoplanin ORF13 as described in detail in PCTICA01101462.

30'02-1 OCA

Table 5 Target -..
SEQ ID Probing Domain E valuie Domair' NO SEQ ID NO

1 Consensus C1 1 e-152 16 2 Consensus C2 4 e-53 16 6 RAMO C-domain 6 e-8;2 16 8 RAPT C-domain 8 e-4.5 16 A541 C-domain 1 e-3:2 16 - RAMO ORF13, C-domain 3 e-0!~ 16 3 Consensus ADLE 0.00 32 24 RAMO ADLE 1 e-1 L'.6 32 26 RAPT ADLE 1 e-146 32 36* A541 ADLE 1 e-134 32 4 Consensus ACPH 1 e-2~~ 46 38 RAMO ACPH 2 e-15 46 40 RAPT ACPH 9 e-15 46 36** A541 ACPH 3 e-0'7 46 Proteins related to the ADLE and ACPH familiEa of proteins, disclosed herein as SEQ ID 32 and 46, were also found in locus 023C (Table 5). The relatedness of SEQ ID NOS: 16, 32 and 46 to aryl-specific C-domains, ,~,DLE and ACPH proteins was further confirmed by clustal alignment showing the conservation of specific protein domains and amino acid residues important for catalytic activity (Figures 3-5) and by phylogenetic analysis (Figure 6).
10 Analysis of locus 0230 shows the presence of 6 NRPS ORFs composed of 28 modules (Fig. 1c). The first NRPS ORF begins with the acyl-specific C-domain (SEQ
ID NO: 16) indicative of the N-acyl capping mechanism (f=ig. 7). Moreover, ADLE and ACPH proteins involved in fatty acid activation and tethering (SEQ ID NOS: 32 and 45 3x02-1 OCA

.. -48-respectively) are also found in the 023 locus near the NRPS ORF, demonstrating that locus 023C is likely to encode an N-acylated lipopeptide imetabolite.
Example 5: Identification of putative lipopeptide bios~mthetic locus 024A:
Screening of the DECIPHER~ database through protein homology analysis with sequences corresponding to acyl-specific C-domains (SEQ ID NOS: 1, 2, 6, 8 and 10) revealed the presence of are acyl-specific C-domain in locus 024A found in Streptomyces refuineus NRRL 3143. As shown in Table 6, BLASTP analysis demonstrates that the 024C encoded C-domain (SEQ ID NO: 14) is more closely related to domains condensing acyl groups to amino acids than to domains condensing two amino acids, as exemplified by the C-domain of the first module found in the ramoplanin ORF13 as described in detail in PCT/CA01/01462.
Table 6 Target SEQ ID Probing Domain E value Domain NO SEQ ID
NO

1 Consensus C1 2 e-50 14 2 Consensus C2 1 e-150 14 6 RAMO C-domain 3 e-43 14 8 DAPT C-domain 9 e-83 14 10 A541 C-domain O.Ot) 14 - RAMO ORF13, C-domain 4 e-16 14 3 Consensus ADLE 0.00 30 24 RAMO ADLE 1 e-1 ~12 30 26 DAPT ADLE 1 e-155 30 36* A541 ADLE 0.00 30 4 Consensus ACPH 6 e-23 44 38 RAMO ACPH 4 e-12 44 . -49-40 DAPT ACPH 2 e-12 44 36** A541 ACPH 5 e-32 44 ADLE and ACPH related proteins, disclosed herein as SEQ ID NOS: 30 and 44, were also found in locus 024A (Table 6). Sequence alignments of all three proteins (SEQ ID NOS: 14, 30 and 44) show conservation of domains and amino acid residues important for catalytic activity of the corresponding enzymes (Figs 3-5).
Additionally, fihese proteins are evolutionarily related to members of the aryl-specific C-domains, ADLE and ACPH families of proteins as indicated by phylogenetic analysis (Fig.
6).
Analysis of the 024A complete locus (Fig. 1c a,nd USSN 601342,133, USSN
301372,789 and co-pending USSN 10fXXX,XXX) reveals the presence of 4 NRPS
ORFs composed of 13 modules. Consistent with an N-acyf peptide capping mechanism, the acyl-specific C-domain (SEQ ID NO: 14) is located at the N-terminal position of the first NRPS ORF. Moreover, the ADLE and ACPH ORFs (SECT ID NOS:
30 and 44 respectively) are immediately adjacent to the acyl-specific C-domain suggesting a functional interaction between the three prcrteins. Based on these observations, locus 024A was predicted and subsequently proven to direct the biosynthesis of an N-acylated lipopeptide (see Example 8).
Example 6: Identification of lipopeptide 41.012 biosynthetic locus A410:
Protein homology comparison of sequences specifying acyl-specific C-domains (SEQ ID NOS: 1, 2, 6, 8 and 10) with sequencEa found in the DECIPHER~
database revealed the presence of a related C-domain, disclosed herein as SEQ
ID 18, in locus A410 found in Acfinoplanes nipponensis Routien ATCC 31145. This microorganism has been shown to synthesize an acidic polypeptide antibiotic of undetermined chemical strutcure, compound 41,012, that belongs to the amphomycin group of N-acylated lipopeptides (US 4,001,397). As shown in Table 7, BLASTP
demonstrates that the A410 encoded C-domain (SEQ ID NO: 18) is more closely related to domains condensing acyl groups to amino acids than to domains condensing two amino acids, as exemplified by the C-domain of the first module found in the ramoplanin ORF13 as described in detail in PCTICA01/01462.
Table 7 Target SEQ ID Probing Domain E value Domain NO SECT ID
NO

1 Consensus C1 9 e-70 18 2 Consensus C2 1 e-1 21 18 6 RAMO C-domain 3 e-54 18 8 DAPT C-domain 5 e-75 18 A541 C-domain 3 e-52 18 - RAMO ORF13, C-domain 1 e-13 18 3 Consensus ADLE 0.00 34 24 RAMO ADLE 1 e-111 34 26 DAPT ADLE 1 e-137 34 36* A541 ADLE 1 e-1 ~41 34 4 Consensus ACPH 4 e-31 48 38 RAMO ACPH 1 e-14 48 40 DAPT ACPH 5 e-14 48 36** A541 ACPH ~ 2 e-1_0 ~ 48 ADLE and ACPH related proteins, disclosed herein as SEQ ID NOS: 34 and 48, were also found in locus A410 (Table 7). Sequence alignments of all three proteins (SEQ ID NOS: 18, 34 and 48) show the conservation of domains and amino acid residues important for catalytic activity of these enzyme:. (Figs 3-5).
Additionally, these 10 proteins are evolutionarily related to members of the aryl-specific C-domains, ADLE
and ACPH families of proteins as indicated by phylogenetic analysis (Fig. 6).
Locus A410 specifies 3 NRPS ORFs composed of 11 modules (Fig. 1d).
Consistent with an N-aryl peptide capping mechanism, the acyl-specific C-domain 30'02-1 OCA

(SEQ ID NO: 18) is located at the N-terminal position of the first NRPS ORF.
Moreover, the ADLE and ACPN ORFs (SEQ ID NOS: 34 and 48 respectively) are found adjacent to the acyl-specific C-domain indicating that locus A410 ;>pecifies an N-acylated lipopeptide consistent with the described characteristics of antibiotic compound 41,012.
Example 7: Identification of putative iipopeptide biosynthetic locus 070B:
In slhco screening of the DECIPHER~ database with sequences corresponding to acyi-specific C-domains (SEQ ID NOS: 1, 2, 6, 8 and 10) revealed the presence of an acyl-specific C-domain in locus 070B fouind in Streptomyces sp.
(Ecopia BioSciences, strain 070). As shown in Table 8, BLASTP analysis demonstrates that the 0708 encoded C-domain (SEQ ID NO: 20) is more closely related to domains condensing acyl groups to amino acids than to domains condensing two amino acids, as exemplified by the C-domain of the first module found in the ramoplanin ORF13 as described in detail in PCTICA01/01462.
Table 8 Target SEQ ID Probing Domain E value Domain NO SEQ ID NO

1 Consensus C1 1 e-153 20 2 Consensus C2 2 e-67 20 6 RAMO C-domain 3 e-82 20 8 DAPT C-domain 3 e-47 20 10 A541 C-domain 6 e-4.8 20 - RAMO ORF13, C-domain 9 e-16 20 Sequence alignment of the 0708 acyl-specific C-domain (SEQ ID NO: 20) with related domains from various lipopeptide biosynthetic ORFs shows conservation of domains and amino acid residues important for catalytic activity of these enzymes (Fig.

_52_ 3). Additionally, this protein is evolutionarily related to members of the acyl-specific C-domains as indicated by phylogenetic analysis (Fig. 6).
In contrast to the other loci presented herein, ADLE and ACPH related proteins were not detected in 0708.
Analysis of the 0708 locus found in the DECIPHER~ database shows the presence of an incomplete NRPS ORF composed of thrE:e modules (Fig. 1d).
Consistent with the biosynthesis of an N-acylated lipopeptide, the aryl-specific C-domain is located at the N-terminus of the NRPS ORF. The lack of ADLE and ACPH
sequences can be attributed to the fact that the sequence of the locus is not yet complete. Alternatively, 0708 may be similar to the CADA locus in Streptomyces coelicolorA3(2) which specifies an N-acylated lipopeptide and lacks ADLE and ACPH
related enzymes. Despite the potential absence of ADLE and ACPH in 0708, the presence and location of the acyl-specific C-domain clearly indicates that specifies an N-acylated lipopeptide.
Example 8: Biosynthesis of an N-acylated lipoheptide by locus 024A:
Locus 024A in Streptomyces refuineus subsp. thermotolerans NRRL 3143 was shown to possess several characteristics of an N-acylated lipopeptide encoding locus, namely the presence of an acyi-specific C-domain (SEQ ID NO: 14) located at the N-terminus of the first NRPS ORF involved in the as:>embly of the polypeptide, ADLE and ACPH family proteins (SEQ ID NOS: 30 and GG4 respectively) as well as an NRPS multienzymatic system composed of 13 modules (see Example 5 and Fig. 1c).
Protein homology analysis of the aryl-specific'C-domain, the ADLE and the ACPH proteins with other proteins in the DECIPHER~ d<~tabase indicated a high homology of these proteins with corresponding proteins found in the A541 locus (SEQ
ID NOS: 10, 36* and 36**) that specifies production of antibiotic A54145 in Streptomyces fradiae NRRL 18158 (Table 6 in example ;5). Closer inspection of the two loci revealed the presence of an identical NRPS system i:hat could be responsible for 30°02-10CA

the synthesis of a 024A polypeptide scaffold identical to that of A54145 (Figs 1 b and c and USSN 60/342,133, USSN 301372,789 and co-pending USSN 10/~;XX,~XX).
Based on these observations and on the fact that there are known growth conditions for expressing lipopeptide A54145 in Streptorriyces fradiae (US
4,977,083), Streptomyces refuineus subsp. therm~tolerans was grown under identical culture conditions to assess possible induction of locus 024A and determine the nature of the specified product.
Streptomyces fradiae and Streptomyces refuineus subsp. thermotoferans were grown at 30°C for 48 hour in a rotary shaker in 25 rnL of a seed medium consisting of glucose (10 gIL), potato starch (30 gIL), soy flour (20 gIL), Pharmamedia (20gIL), and CaC03 (2 gIL) in tap water. Five mL of this seed culture was used to inoculate 500 mL of production media in a 4L baffled flask. Production media consisted of glucose (25 gIL), soy grits (18.75 gIL), blackstrap molasses (3.75 g/L), casein (1.25 gIL), sodium acetate (8 g/L), and CaCOs (3.13 g/L) in tap water, and proceeded for 7 days at 30°C on a rotary shaker. The production culture ~uvas centrifuged and filtered to remove mycelia and solid matter. The pH was adjusted to 6.4 and 46 mL of ~iaion HP20 was added and stirred for 30 minutes. HP20 resin was collected by Buchner filtration and washed successively with 140 mL water and 90 mL 15% CH3CNIHzO, and the wash was discarded. HP20 resin vvas then eluted with 140 mL 50% CH3CN/Hz0 (fraction HP20 E2). This pool was passed over a 5 mL A,mberlite IRA68 column (acetate cycle) and the flow through (fraction I RA FT) waa reserved for bioassay. The column was washed with 25 mL 50% CH3CN/Hz0 and eluted with 25 mL 50°/~
CH3CNIHzO containing 0.1 N HOAc (fraction IRA E1 ), and then eluted with 25 mL
50%
CH3CN/Hz0 containing 1.0 N HOAc (fraction IRA E2). Biological activity was followed during purification by bioassay vvith Microc~ccus luteus ins Nutrient Agar containing 5 mlUl CaClz.
Figure 9a is a photograph of a plate generated during extraction of are anionic lipopeptide from Streptomyces fi-adiae. Figure 9a shows an enrichment of activity based on IRA67 anion exchange chromatography consistent with expression of an acidic lipopeptide. This activity is concentrated during the extraction procedure as indicated by the increased diameter of lysis rings. A5414:5 was detected via HPLC/MS
in fraction I RA E2 as evidenced by mass ion ES2+ = 830.5 consistent with the structures of A54145C,D (US 4,994,270).
Figure 9b is a photograph of a plate generated during a similar extraction scheme performed on extracts from Streptomyces refuin~~us subsp.
thermotolerans .
Figure 9b shows a similar enrichment of activity based ors IRA67 anion exchange chromatography consistent with expression of an acidic lipopeptide. This activity is concentrated during the extraction procedure as indicated by the increased diameter of lysiis rings. A mass ion of ES2+ = 830.5, identical to that of A54145, was present in fraction IRA E2 confirming that an N-acylated acidic lipopeptide, identical to A54145C,D, is produced by 024A in Streptomyces refuin~eus subsp.
thermotolerans.
Example 9: Use of the N-acYl capping cassette to enginE:er peptide synthetases capable of producing novel lipopeptides The availability and understanding of lipopepticle N-aryl capping components increases the potential of redesigning (un)natural producia by engineered peptide synthetases. It has been demonstrated that, using known molecular biology techniques, functional hybride peptide synthetases may k>e engineered that are capable of producing rationally designed peptide products (Mootz et al. (2000) Proc.
Nail. Acad.
Sci. U S A. Vol 97 pp. 5848-5853). Moreover, it has been postulated that through domain swapping, change-of-substrate specificity by mutagenesis, and an induced termination to achieve release of a defined shortened product, it may be possible to obtain a recombinant NRPS system that produces antipain, a potent cathepsin inhibitor produced by Streptomyces rose~us and whose biosynthetiic machinery is unknown (Doekel S, Marahiel MA. (2001 ) Metab. Eng. Vol 3 pp. 64~-77). Mootz et al.
(supra) described genetic engineering using an NRPS system to produce a peptide product that is not a naturally occurring product, and Doeke( and Marahiel (supra) described a prophetic example of engineering an NRPS system to make the known natural product antipain.

30'02-1 OCA

The following outlines a strategy whereby the NRPS biosynthetic machinery of a nonlipopeptide natural product, complestatin, can be modified so as to produce an N-acylated analogue of complestatin (Fig. 10).
Streptomyces lavendulae produces complestai:in, a cyclic peptide natural product that antagonizes pharmacologically relevant protein-protein interactions including formation of the C4b, 2b complex in the complement cascade and gp120-binding in the HIV life cycle. Complestatin, a member of the vancomycin group of natural products, consists of an alpha-ketoacyl hexapeptide backbone modified by oxidative phenolic couplings and halogenations. The entire complestatin biosynthetic and regulatory gene cluster spanning ca. 50 kb was clonE:d and sequenced (Chin et al.
(2001 ) Proc. Natl. Acad. Sci. U S A Vol 98 pp. 8548-855~i}. It includes four NRPS
genes, cornA, comb, comC, and comb (Fig. 10, panel a}. The comA gene encodes an NRPS that is composed of a loading module that incorporates hydroxyphenylglycine (HPG; or a derivative thereof) followed by a module that incorporates tryptophan (Trp), the first two residues of complestatin. Through domain swapping, the loading module and the C domain of the tryptophan-incorporating module: can be replaced by one of the acyl-specific C-domains disclosed herein. Preferably, thE; acyl-specific C-domain of the A541, RAPT, or 024A loci would be used, as these domains are naturally specific for condensing an aryl moiety to a ~tryptophan residue. In addition to this domain swapping, the ADLE and ACPH genes would also be intr~~duced into the system so as to provide a means to generate activated aryl substrates that can be used by the acyl-specific C domain. Thus, Figure 10b depicts a rationally designed recombinant NRPS
system that should give rise to N-acylated complestatin analogue(s). The recombinant NRPS system depicted in Figure 10b could be employed either in vivo, using an appropriate recombinant host or' in vitro using purified enzymes supplemented with the appropriate substrates.
~ne approach whereby N-acylated complestatin analogues) could be generated in vivo would involve the use of Sfreptamyces lavendulae, the complestatin producer, as the host strain. Briefly, the N-acyl capping cassette would replace the comA gene. This could be accomplished either by inactivation of the comA gene on the _56_ Streptomyces lavendulae chromosome followed by the introduction of a plasmid expressing the ADLE, ACPH, and the recombinant ComA derivative, or by physically replacing, by way of a double recombination (Keiser et al., supra) the comA
gene on the Streptomyces lavendulae chromosome by a cassette containing genes encoding the ADLE, ACPH, and the recombinant ComA derivative. TI-~e resulting recombinant strains could be further modified to include genes involved in the. biosynthesis of the acyl moieties andlor could be provided aryl moieties or precursors thereof in the fermentation medium.
One approach whereby N-acylated complestatin analogues) could be generated in vitro would involve the over-expression of the ADLE, ACPH, recombinant ComA, Coma, ComC, and Comb polypeptides in an appropriate host, for example E.
coli, followed by the preparation of an extract or purified fraction thereof and use of said preparation together with appropriate substrates as outlined in Mootz et al.
(2000). It is expected that, in the absence of accessory proteins the product produced by this in vitro system might not contain certain modifications such as the cross-linking of residues that is catalyzed by specific complestatin cytochrome P450 enzymes.
All patents, patent applications, and published references cited herein are hereby incorporated by reference in their entirety. While this invention has been particularly shown and described with reference to preferred embodiments thereof, it will be understood by those skilled in the art that various .changes in form and details may be made therein without departing from the scope off the invention encompassed by the appended claims.

SEQUENCE LISTING
Applicant name: ECOPIA BIOSCIENCES INC.
FARNET, Chris STAFFA, Alfredo ZAZOPOULOS, Emmanuel Title of invention: COMPOSITIONS, METHODS AND SYSTEMS FOR DISCOVERY OF
LIPOPEPTIDES
Correspondence address: 7290 Frederick-Banting Saint-Laurent, Quebec, H4S 2A1 Current Application Data Expected Filing Date: December 26, 2002 Patent Agent Information Name: Ywe J. Looper Reference Number: 10961 File reference: 3002-lOCA
Number of SEQ ID Nos: 48 Software: PatentIn version 3.0 Information for SEQ ID NO: 1 Length: 435 Type: PRT
Organism: Artificial Feature:
Other information: HMMer software generated consensus sequence Sequence: 1 Gly Gly Leu Arg Glu Leu Met Ala Gly Gln Leu Ala Val Trp His Ala Gln Gln Leu Ala Pro Glu Asn Pro Val Tyr Asn Val Gly Glu Tyr Val Glu Ile Asp Gly Glu Val Asp Leu Asp Leu Leu Val Ala Ala Val Arg Arg Val Met Glu Glu Ala Asp Ala Ala Arg Leu Arg Phe Arg Glu Val Asp Gly Val Pro Arg Gln Tyr. Phe Ala Glu Asp Glu As:p Tyr Pro Val Glu Val Ile Asp Val Ser Ala Glu Ala Asp Pro Arg Al,a Ala Ala Glu Ser Leu Met Ala Ala Asp Leu Arg Arg Pro Arg Asp Leu Arg Asp Gly Glu Leu Tyr Thr Gln Lys Ile Tyr Lys Val Gly Glu Asp Leu Val Phe 115 120 12.5 Trp Tyr Gln Arg Ala His His Ile Ile Leu Asp Gly Arg Ser Ala Gly Leu Val Ala Ser Arg Val Ala Ala Val Tyr Ser Ala Leu Ala Ala Gly Gly Asp Val Glu Glu Gly Ala Leu Pro Ser Ser Ser Val Leu Met Asp Ala Glu Asp Glu Tyr Arg Ala Ser Glu Glu Phe Glu Leu Asp Arg Glu Tyr Trp Arg Glu Ala Leu Ala Gly Leu Pro Glu Glu Val Ser Leu Gly Ala Asn Glu Pro Ser Arg Leu Pro Arg Glu Pro Val Arg His Glu Glu Asp Val Ser Asp Ala Ala Ala Ala Glu Leu Arg Ala Ala Ala Arg Arg Leu Gly Thr Ser Leu Ala Gly Leu Ala Ile Ala Ala A1<~ Ala Leu Tyr Gln His Arg Leu Thr Gly Gln Arg Asp Val Val Val Gly Val Pro Val Ala Gly Arg Ser Lys Thr Ala Glu Leu Asp Ile Pro Gly Met Thr Ala Asn Val Val Pro Val Arg Leu Ala Val Ala Pro Lys Thx- Thr Val Ala Glu Leu Val Arg Gln Val Ala Arg Gly Val Arg Asp Gly Leu Arg His Gln Arg Tyr Arg Tyr Glu Asp Ile Leu Asp Asp Leu Ly:> Leu Val Gly Arg Asp Gly Leu Tyr Pro Leu Leu Val Asn Val Leu Ser Phe Asp Tyr Asp Leu Arg Phe Gly Asp Ala Val Ser Val Ala His Gly Leu Ser Ala Gly Pro Val Asp Asp Val Ser Ile Asp Val Tyr Asp Arq_ Ser Ser Asp Gly Ser Met Lys Val Val Val Asp Val Asn Pro Asp Lets Thr Asp Arg Ser Asp Ala Asp Glu Val Ala Arg Lys Phe Leu Ala Leu Leu Arg Trp Leu Ala Glu Ser Asp Ala Glu Glu Pro Val Ala Arg Ilea Asp Leu Leu Asp Glu Asp Information for SEQ ID N0: 2 Length: 451 Type: PRT

Organism: Artificial Feature:
Other information: HMMer software generated consensus sequence Sequence: 2 Ser Val Arg His Gly Val Thr Ala Ala Gln Arg Gly Val Trp Val Ala Gln Gln Leu Arg Pro Asp Ser Arg Leu Tyr Asn Cys Gly Leu Tyr Leu Glu Leu Asp Gly Ala Leu Asp Pro Ala Val Leu Ser Ar<_~ Ala Val Arg Arg Thr Leu Ala Glu Thr Glu Ala Leu Arg Ser Arg Phe Glu Glu Asp Asp Asp Gly Ala Leu Leu Gln Arg Val Leu Ala Pro Ala Pro Asp Glu Gln Thr Arg Leu Leu Glu Asp Gly Val Pro Tyr Thr Pro Val Leu Leu Arg His Ile Asp Leu Ser Gly Asp Asp Asp Pro Glu Ales Ala Ala Arg Arg Trp Met Asp Ala Asp Leu Ala Glu Pro Val Asp Leu Asp Arg Ala Gly Thr Ser Arg His Ala Leu Leu Thr Leu Gly Gly Asp Arg His Leu Leu Tyr Leu Gly Tyr His His Ile Ala Leu Asp Gly Phe Gly Ala Ala Leu Tyr Leu Asp Arg Leu Ala Ala Val Tyr Arg Ala Leu Arg Thr Gly Arg Glu Pro Pro Pro Cys Pro Phe Gly Pro Leu Asp Arq_ Leu Val Ala Glu Glu Ala Ala Tyr Arg Asp Ser Ala Arg His Arg Arch Asp Arg Ala Tyr Trp Thr Gly Arg Phe Ala Asp Leu Pro Glu Pro Val. Gly Leu Ala Gly Arg Ala Ala Ala Ala Ala Pro Ala Pro Leu Arg Arg Thr Val Arg Leu Pro Pro Glu Arg Thr Ala Ala Leu Ala Ala Ala Ala. Glu Ala Thr Gly Ser Arg Trp Pro Ala Val Vai Ile Ala Ala Val Ala. Ala Phe Leu Arg Arg Leu Ala Gly Ala Glu Glu Val Val Val Gly Leu. Pro Val Thr Ala Arg Val Thr Arg Ala Ala Leu Arg Thr Pro Gly Met Leu Ala Asn Val Leu Pro Leu Arg Leu Glu Val Arg Gln Gly Ala Ser Phe Ala Ala Leu Leu Glu Glu Thr Ser Arg Ala Leu Ser Ala Leu Leu Arg His Gln Arg Phe Arg Gly Glu Asp Leu Gly Arg Glu Leu Gly Leu Ala Gly Glu Arg Ala Gly Leu Ala Pro Thr Thr Val Asn Val Met Ala Phe Ala Pro Val Leu Asp Phe Gly Asp Cys Arg Ala Val Val His Gln Leu Ser Ser Gly Pro Val Glu Asp Leu Ala Ile Asn Leu Tyr G1y Thr Pro Gly Thr Gly Asp Glu Leu Arg Val Thr Val Ala Ala Asn Pro Ala Leu Tyr Thr Ala Asp Asp Val Ala Ser Leu Gln Glu Arg Leu Val Arg Phe Leu Ala 420 425 ~ 430 Ala Leu Gly Ala Asp Pro Ala Ala Pro Val Gly Arg Va1 Arg Leu Leu Asp Pro Ala Information for SEQ ID NO: 3 Length: 603 Type: PRT
Organism: Artificial Feature:
Other information: HMMer software generated consensus sequence Sequence: 3 Val Ser Ala Val Met Val Asp Leu Ala Ala Gly Pro Ser Val Pro Ala Ala Leu Arg Ala His Ala Glu Ala Arg Pro Asp Arg Thr Ala Val Val Phe Val Arg Asp Thr Asp Arg Ala Asp Gly Thr Ala Ser Leu Ser Tyr Ala Glu Leu Asp Arg Arg Ala Arg Ala Val Ala Val Trp Leu Arg Ala Arg Leu Ala Pro Gly Asp Arg Val Leu Leu L~eu His Pro Ala Gly Pro Glu Phe Val Ala Ala Tyr Leu Gly Cys Leu Tyr Ala Gly Leu Val Ala Val Pro Ala Pro Leu Pro Gly Gly Tyr Ser His Glu Arg Arg Arg Val Val Gly Ile Ala Ala Asp Ala Gly Ala Gly Ala Val Leu Thr Asp Ala Asp Thr Glu Ala Glu Val Arg Glu Trp Leu Ala Glu Thr Gly Leu Pro Gly Leu Pro Val Leu Ala Val Asp Pro Leu A1a Ala Asp Gly Asp Pro Gly Ala Trp Arg Pro Pro Gly Leu Arg Ala Asp Thr Va1 Ala Val Leu Gln Tyr Thr Ser Gly Ser Thr Gly Ser Pro Lys Gly Val Val Val Thr His Gly Asn Leu Leu Ala Asn Ala Arg Ser Leu Ser Arg Ser Phe Gly Leu Thr Glu Asp Thr Val Phe Gly Gly Trp Leu Pro Leu Tyr His Asp Met Gly Leu Phe Gly Leu Leu Leu Pro Ala Leu Phe Leu Gly Ala Thr Val Val Leu Met Ser Pro Ser Ala Phe Leu Arg Arg Pro His Leu Trp Leu Arg Leu Ile Asp Arg Phe Gly Val Val Phe Ser A1<~ Ala Pro Asp Phe Ala Tyr Asp Leu Cys Val Arg Arg Val Thr Asp Glu Gln Ile A1a Gly Leu Asp Leu Ser Arg Trp Arg Trp Ala Ala Asn Gly Ser Glu Pro Ile Arg Ala Ala Thr Leu Arg Ala Phe Ala Glu Arg Phe Ala Pro Ala Gly Leu Arg Pro Glu Ala Leu Thr Pro Cys Tyr Gly Leu Ala Glu Ala Thr Leu Phe Val Ser Gly Lys Ser Ala Gly Pro Leu Arg Thr Arg Arg Val Asp Pro Ala Ala Leu Glu Asp His Arg Phe Glu Glu Ala Val Pro Gly Arg Pro Ala Arg Glu Ile Val Ser Cys Gly Arg Val. Pro Asp Leu Glu Val Arg Ile Val Asp Pro Gly Thr G1y Arg Pro Leu. Pro Asp Gly Ala Val Gly Glu Ile Trp Leu Arg Gly Pro Ser Val Ala Ala Gly Tyr Trp Gly Arg Pro Glu Ala Thr Ala Glu Thr Phe Gly Ala Val Thr Asp Gly Gly Asp Gly Pro Trp Leu Arg Thr Gly Asp Leu Gl.y Ala Leu Tyr Glu Gly Glu Leu Tyr Val Thr Gly Arg Ile Lys Glu Leu Leu Ile Val His Gly Arg Asn Leu Tyr Pro His Asp Ile Glu His Glu Leu Arg Ala Ala His Asp Glu Leu Ala Gly Ala val Gly Ala Ala Phe Ala Val Pro Ala Pro Gly Gly Gly Glu Glu Val Leu Val Val Val His Glu Val Arg Pro Arg Val Pro Ala Asp Glu Leu Pro Ala Leu Ala Ser Ala Met Arg Ala Thr Val Ala Arg Glu Phe Gly Val Pro Ala Ala Gly Val Val Leu Val Arg Arg Gly Thr Val Arg Arg Thr Thr Ser Gly Lys Val Gln Arg Arg Ala Met Arg Glu Leu Phe Leu Thr Gly Glu Leu Ala Pro Val His Ala Glu Leu Gly Pro His Leu Gln Ala Ala Ala Ala G1y Glu Ala Arg Ala Ala Thr Ser Leu Ala Pro Ala Ser Thr Val Information for SEQ ID N0: 4 Length: 91 Type: PRT
Organism: Artificial Feature:
Other information: HMMer software generated consensus sequence Sequence: 4 Met Ser Asp Leu Thr Ala Pro Pro Ala Arg His Thr Pro Glu Glu Leu Arg Ala Trp Leu Arg Glu Cys Val Ala Asp Tyr Val Gly Leu Pro Pro Ala Glu Ile Ala Thr Asp Val Pro Leu Thr Asp Tyr Gly Leu Asp Ser Val Tyr Ala Leu Ala Leu Cys Ala Glu Ile Glu Asp Hie~ Leu Gly Ile Glu Val Asp Pro Thr Leu Leu Trp Asp His Pro Thr Ile Asp Glu Leu Ser Ala Ala Leu Ala Pro Arg Leu Ala Arg Arg Information for SEQ ID NO: 5 Length: 1305 Type: DNA
Organism: Actinoplanes sp_ Sequence:5 cctgacctgcgcccgctcacgcccgcccagctcgccgtctggcacgcgcagcagctcgcc60 ccgcacagccccgtctatcaggtcggcgagttcgtcgagatcgacggcgagtgcgacccc120 gatctcctggtggcggcgttgcgtcaggtcatgggcgaggccgagagcgcccggctgcgg180 ttccgcgtgatcgacggtacgccgtggcagtacgtcgccgaggacggcgacgacccgatc240 caggtcgtggacctcggcgcggccgcggacccgcgcgccgcggcgctgggccgcatggcg300 gccgacctcgaccggcccggcgacctgcgcgacggcccgctcgtcgagcaccacgtctac360 ctgctcggcgagggccgggtcatctggtaccaccgcgcgcaccacatcgtctgcgacggc420 ggcagcctcggcattgtcgcctcccgggtggccggcgtctattccgcgctcgcggccggt480 ggtgacgtccggccgggtgcgctgccgccgctgtcggtgttgctgtcggccgccgacgcc540 tacgagcgctccggcgaccgcgaccgggaccgcgagcactggcgctccgcgctggcgggc600 ctgcccgccgagctgctcgcgggcgcgggccggccgcggccgctgcccggaccgccggtg660 cgccacgagcacgacctctccgcggcggaggcgggccggctgcgcgcgggggcgcggcgg720 ctgcggaccagcgtggcgcaggccggcatcgcggccgcggccctctaccagcaccggctc780 accggcgcccgggacgtgctggtggcggtgcccgtcgccggccgcaccacccgcccggag840 ttcgacgtgcccggcatgacgtcgaacgtggtgccggtgcgcctcgcggtcacgcccgcc900 acgaccgtcggcgagctgctgcgcgacgtcgcccgtggtgtccgcgacggcctgcggcac960 cagcggtacccgtacccgaacatcgtggacgacctcggcctggccgaccgtgccgcgctg1020 cgcccggtgaccgtcaacgccctggcgctgggacggccgctgcgctt:cggctcggcggtg1080 ggtgtgcgctccggcctgtcggcgggcccggtggacgacgtcaccat:cggcctctacgaa1140 aaggtcagcg gcggcggcat gcagacgatc gccgagctga accccg<~gcg cacggaccgc 1200 ccggacgcgg cggaggtctc ccgctggttc cgtacgctgc tgcgcgc~gct ggccgagagc 1260 gacgccggcg acccggtggc ccgcatcgac atcgtcgacg agccc 1305 Information for SEQ ID NO: 6 Length: 435 Type: PRT
Organism: Actinaplanes sp.
Sequence: 6 Pro Asp Leu Arg Pro Leu Thr Pro Ala Gln Leu Ala Val. Trp His Ala Gln Gln Leu Ala Pro His Ser Pro Val Tyr Gln Val Gl.y Glu Phe Val Glu Ile Asp G1y Glu Cys Asp Pro Asp Leu Leu Val Ala Ala Leu Arg Gln Val Met Gly Glu Ala Glu Ser Ala Arg Leu Arg Ph.e Arg Val Ile Asp Gly Thr Pro Trp G1n Tyr Val Ala Glu Asp Gly Asp Asp Pro Ile Gln Val Val Asp Leu Gly Ala Ala Ala Asp Pro Arg Ala Ala Ala Leu Gly Arg Met Ala Ala Asp Leu Asp Arg Pro Gly Asp Leu Arg Asp Gly Pro Leu Val Glu His His Val Tyr Leu Leu Gly Glu Gly Arg Val Ile Trp Tyr His Arg Ala His His Ile Val Cys Asp Gly Gly Ser Leu Gly Ile Val Ala Ser Arg Val Ala Gly Va1 Tyr Ser Ala Lea Ala Ala Gly Gly Asp Val Arg Pro Gly Ala Leu Pro Pro Leu Ser Val Leu Leu Ser Ala Ala Asp Ala Tyr Glu Arg Ser Gly Asp Arg Asp Arg Asp Arg Glu His Trp Arg Ser Ala Leu Ala Gly Leu Pro Ala Glu Leu Leu Ala Gly Ala Gly Arg Pro Arg Pro Leu Pro Gly Pro Pro Val Arc_~ His Glu His Asp Leu Ser Ala Ala Glu Ala G1y Arg Leu Arg Ala Gly Ala Arg Arg Leu Arg Thr Ser Val Ala Gln Ala Gly Ile Ala Ala Ala Ala Leu Tyr Gln His Arg Leu Thr Gly Ala Arg Asp Val Leu Val Ala Val Pro Val Ala Gly Arg Thr Thr Arg Pro G1u Phe Asp Val Pro Gly~ Met Thr Ser Asn Val Val Pro Val Arg Leu Ala Val Thr Pro Ala Thr Thr Val Gly Glu Leu Leu Arg Asp Val Ala Arg Gly Val Arg Asp Gly Leu Arg His Gln Arg Tyr Pro Tyr Pro Asn Ile Val Asp Asp Leu Gly Leu Ala Asp g Arg Ala Ala Leu Val Asn Leu Ala Arg Pro Val Thr Ala Leu Gly Arg Pro Leu Arg Phe Gly Val Ser Gly Gly Ser Ala Val Arg Leu Ser Ala Gly Pro Val Asp Gly Leu Glu Lys Asp Val Thr Ile Tyr Val Ser Gly Gly Gly Met Gln Leu Asn Gly Arg Thr Ile Ala Glu Pro Thr Asp Arg Pro Asp Ala Ala Trp Phe Thr Leu Glu Val Ser Arg Arg Leu Arg Gly Leu Ala Glu Ser Pro Val Arg Ile Asp Ala Gly Asp Ala Asp Ile Val Asp Glu Pro Information for SEQ ID NO: 7 Length: 1305 Type: DNA

Organism: Streptomyces rus roseospo Sequence: 7 tcgcagcgcc tcggcgtcaccgccgcccaacagagcgtctggctcgccggccagctggcg60 gacgaccacc gcctgtaccactgtg~cggcgtacctgtcactcaccgggtccatcgacccg120 cggacactcg gcacggcggtccggcggaccctcgacgagaccgaggcgctgcgtacccgg180 ttcgtaccgc aggacggggaactgctgcagatcctcgaacccggtgccggacagctcctg240 ctggaagccg acttctccggcgacccggaccccgagcgggcggcacacgactggatgcac300 gcggcgctcg ccgcaccggtccgcctcgaccgcgccgggaccgccacccacgccctgctc360 accctcggcc cgtcccgccacctgctgtacttcggctaccaccacat:cgcgctcgacggc420 tacggtgccc tgctccacctgcgccgcctcgcccacgtctacaccgc:cctcagcaacggg480 gacgaccccg gcccctgcccgttcggccccctggccggtgtcctcacggaggaggcggcc540 taccgtgact ccgacaaccatcggcgcgacggggaattctggacccggtccctcgccggt600 gcggacgagg cccccgggctgagcgagcgggaggccggcgctctcgccgtcccgctgcgc660 cgcaccgtgg agctgtccggcgaacggacggagaagctggccgcctcggccgcggccact72U

ggagctcgct ggtcgtcactgctcgrcgccgccaccgccgcgttcgtacgccgccacgct780 gccgccgacg acaccgtcatcggcctgcccgtcaccgcccggctcaccgggccggcgctg84U

cgtaccccgt gcatgctcgccaacgacgtgccgctgcgcctcgacgcccggctcgatgcc900 ccgttcgccg cgctccttgccgacaccacccgcgccgtcggcacgctggcgcgccaccag960 cggttccgcg gggaagaactccaccggaacctggggggcgtcggccgcaccgcgggcctg1020 gcgcgggtcaccgtcaacgtcctggcgtatgtcgacaacatccggttcggcgactgccgg1080 gccgtggtccacgagttgtcctcgggaccggtccgcgacttccacatcaactcctacggc1140 acccccggcacccccgacggcgtccagctggtcttcagcggtaaccccgccctgtacacg1200 gccaccgatctggccgaccaccaggagcggttcctgcgcttcctcgacgctgtgaccgcc1260 gacccggacotgccgaccggaagacaccgcctcctgtcgccgggc 1305 Information for SEQ ID N0: 8 Length: 435 Type: PRT
Organism: Streptomyces roseosporus Sequence: 8 Ser Gln Arg Leu Gly Val Thr Ala Ala Gln Gln Ser Val Trp Leu Ala Gly Gln Leu Ala Asp Asp His Arg Leu Tyr His Cys Ala Ala Tyr Leu Ser Leu Thr Gly Ser Ile Asp Pro Arg Thr Leu Gly Thr Ala Val Arg Arg Thr Leu Asp Glu Thr Glu Ala Leu Arg Thr Arg Phe Val Pro Gln Asp Gly Glu Leu Leu Gln Ile Leu Glu Pro Gly Ala Gly Gln Leu Leu Leu Glu Ala Asp Phe Ser Gly Asp Pro Asp Pro Glu Arg Ala Ala His Asp Trp Met His Ala Ala Leu A1a Ala Pro Val Arg Leu Asp Arg Ala Gly Thr Ala Thr His Ala Leu Leu Thr Leu Gly Pro Sex' Arg His Leu Leu Tyr Phe Gly Tyr His His Ile Ala Leu Asp Gly Tyr Gly Ala Leu Leu His Leu Arg Arg Leu Ala His Val Tyr Thr Ala Leu. Ser Asn Gly Asp Asp Pro Gly Pro Cys Pro Phe Gly Pro Leu Ala Gly Val Leu Thr Glu Glu Ala Ala Tyr Arg Asp Ser Asp Asn His Arg Arg Asp Gly Glu Phe Trp Thr Arg Ser Leu Ala Gly Ala Asp Glu Ala Pro Gly Leu Ser Glu Arg Glu Ala Gly Ala Leu Ala Val Pro Leu Arg Arg Thr Val Glu Leu Ser GluArg ThrGluLysLeuAlaAla SerAlaAlaAlaThr Gly Gly Ala TrpSer SerLeuLeuValAlaAla ThrAl.aAlaPheVal Arg Arg Arg AlaAla AlaAspAspThrValIle GlyLeuProValThr His Ala Arg ThrGly ProAlaLeuArgThrPro CysMetLeuA1aAsn Leu Asp Val LeuArg LeuAspAlaArgLeuAsp AlaProPheAlaAla Pro Leu Leu AspThr ThrArgAlaValGlyThr LeuAlaArgHisGln Ala Arg Phe GlyGlu GluLeuHisArgAsnLeu GlyGlyValGlyArg Arg Thr Ala LeuAla ArgValThrValAsnVal LeuAlaTyrValAsp Gly Asn Ile PheGly AspCysArgAlaValVal HisGluLeuSerSer Arg Gly Pro ArgAsp PheHisIleAsnSerTyr GlyThrProGlyThr Val Pro Asp ValGln LeuVai.PheSerGlyAsn ProAl~aLeuTyrThr Gly Ala Thr LeuAla AspHisGlnGluArgPhe LeuArgPheLeuAsp Asp Ala Val AlaAsp ProAspLeuProThrGly ArgHi,~ArgLeuLeu Thr Ser Pro Gly InformationforSEQ ID 9 NO:

Length: 59 Type:
DNA

Organism:Streptomyces fradiae Sequence:9 gcacaccgtgtggccgccac ggacggc:gca gcgtctgcgc 60 gtcggcccag accgggatct ggggacgacaggctctacgc ttcctcgaac tcgaccacgt ggtggaggag ctgcggcctc gtgctgagcgaggcgatccg gccgacaccg aggcgct:gcg caccgcgttc ccgcgccgtc cgggaggacgcggacggcgc cacgtcctcg cccggccgcc gagcacgcag gctggagcag acccgcctct tccacgccga cccgagcggc ggaaccccct cccgctccgc gtccctggac 300 tggatggacc ggcaacgggc gcaaccctgg gacctcgcgt cgggcgacac ctgccgtcat 360 accctgatcc ccctcggcgg cgaccgctcg ctgctgcacc tgcgttacca ccacctcgcc 420 ctggacgggtacggcgccgcgctctatctggaccggctcgcggcggtctaccgcgcgctg480 cgcaccggccatcaaccgcccccctgcgcgttcgcgccgctggcccgcctggtcgaggag540 gaccacgcctaccggaactccgcccgtcaccgcgcggacgccaatcactggcgcgaccgc600 ttcgcggacctcccgcgccccaccagcctcgccgacgccaccacgcccgcggcgcccacc660 acgcccgccacgcccgccgcgcccgccgcgcccgacgaactgcggcgcaccgtgcgcctg720 tccgccgcccggtccgccgcgctgcgccgtgcctcggaccggagcggccgaccctggccc780 gtgtacgccacggccgcggtggccgccttcctgagccgactcgcgccgggggaggaggtc840 gtcgtcggcctcccggtcaccgccagggtgacccccgccgcggtgcgcacaccggggatg900 ctcgccaacgtcgtaccgcttcgcctgcccgtccggcagggcatgtcgacggcggagctg960 ctggagctgaccgcggccgagatcagcaccacactgcgccaccagcgccaccgcaccgag1020 gacatcgggcgggcgctcggactccacggcgctccgccagccaccacactcgtgaacgtc1080 atggcgttcgccccggtcctcgacttcggcgactgccgggccccggtgcaccagctctcg1140 gccggaccggtggaggacctggtcgtcaacctcctcggcaccccgggcgacggcggcgag1200 agcgacggcaccgagctggagatcactgtcgccgccaacccccgcctccactcggcggac1260 gcggtggcctcgctggccgcgcggctcgcggagttcctcacgcacatggggcaggacgcc1320 gaggcgcccctcggccggacccggctgctcgacgcggag 1359 Information for SEQ ID N0: 10 Length: 453 Type: PRT
Organism: Streptomyces fradiae Sequence: 10 Ala His Arg Val Ala Ala Thr Ser Ala Gln Thr Gly IlE: Trp Thr Ala Gln Arg Leu Arg,Gly Asp Asp Arg Leu Tyr Ala Cys Gly Leu Phe Leu Glu Leu Asp His Val Val Glu Glu Val Leu Ser Glu Ala Ile Arg Arg Ala Val Ala Asp Thr Glu Ala Leu Arg Thr Ala Phe Arg Glu Asp Ala Asp Gly Ala Leu Glu Gln His Val Leu Ala Arg Pro Pro Ser Thr Gln Thr Arg Leu Phe His Ala Asp Pro Ser Gly G'ly Thr Pro Ser Arg Sex Ala Ser Leu Asp Trp Met Asp Arg Gln Arg Ala Gln Pro Trp Asp Leu Ala Ser Gly Asp Thr Cys Arg His Thr Leu Ile Pro Le:u Gly Gly Asp Arg Ser Leu Leu His Leu Arg Tyr His His Leu Ala Leu Asp Gly Tyr Gly Ala Ala Leu Tyr Leu Asp Arg Leu Ala Ala Val Tyr Arg Ala Leu Arg Thr Gly His Gln Pro Pro Pro Cys Ala Phe Ala Pro Leu Ala Arg Leu Val Glu Glu Asp His Ala Tyr Arg Asn Ser Ala Arg His Arg Ala Asp Ala Asn His Trp Arg Asp Arg Phe Ala Asp Leu Pro Arg Pro Thr Ser Leu Ala Asp Ala Thr Thr Pro Ala Ala Pro Thr Thr Pro Ala Thr Pro Ala Ala Pro Ala Ala Pro Asp Glu Leu Arg Arg Thr Val Arg Leu Ser Ala Ala Arg Ser Ala Ala Leu Arg Arg A1a Ser Asp Arg Ser Gly Arg Pro Trp Pro Val Tyr Ala Thr Ala Ala Val Ala Ala Phe Leu Ser Arg Leu Ala Pro Gly Glu Glu Val Val Val Gly Leu Pro Val Thr Ala Arg Val Thr Pro Ala Ala Val Arg Thr Pro Gly Met Leu Ala Asn Val Val Pro Leu Arg Leu Pro Val Arg Gln Gly Met Ser Thr Ala Glu Leu Leu Glu Leu Thr Ala Ala Glu Ile Ser Thr Thr Leu Arg His Gln Arg His Arg Thr Glu Asp Ile Gly Arg Ala Leu Gly Leu Hi:> Gly Ala Pro Pro Ala Thr Thr Leu Val Asn Val Met Ala Phe Ala Pro Val Leu Asp Phe Gly Asp Cys Arg Ala Pro Val His Gln Leu Ser Ala Gly Pro Val Glu Asp Leu Val Val Asn Leu Leu Gly Thr Pro Gly Asp Gly Gly Glu Ser Asp Gly Thr Glu Leu Glu Ile Thr Val Ala Ala Asn. Pro Arg Leu His Ser Ala Asp Ala Val Ala Ser Leu Ala Ala Arg Leu Ala Glu Phe Leu Thr His Met Gly Gln Asp Ala Glu Ala Fro Leu Gly Arg Thr Arg Leu Leu Asp Ala Glu Information for SEQ
ID N0:

Length:

Type:
DNA

Organism:Streptomyces s ghanaensi Sequence:11 tcggtgcgtcatggggtgctggccgcgcagcgagaggtctgggtggcccagcaactgcgg60 ccgctcagccctcggttcaactgcggcgttttcctggacgtcggcgaggccctcgacgcc120 gccgtgctccgccgcgccgtgacccgtgccctggaggagacggagacgctgcgctcactg180 ttcgccgaacaggacggcgacggcgagatcatccggaccacgcggccggcccccgacgac240 tgcgtgacgacaatcgacgtgcgcgacgcggacgacccggtcgccgcggcacggcggtgg300 atggacgccgacctggccgagccggtcgacctgcggcacgacccgagctaccggcacgtg360 ctgttccggatcggcgagcggcgctccatcttctacttccgctaccaccacatcacgctc420 gacggtttcgggcagaccctgtacctgaaccgggtggccgacctctacacggccctggcc480 accgccaccgagccggacgcggccccgttcggcggcctggaccgcctgctggacgaggaa540 cggcagtacgaggactccgccaggt.gcgccgaggaccgggcccactggcacaccaccgcc600 cggtccctcgccgagggacgcggcagcggcccggcggccgcctcggaccaggtgctccgc660 gacacggtgcggctgccgcgggaactgaccgacgcggtgtgcgccca gcggtcgcac720 cgc ggttcgcgctggaccgcggtgatgctgggtgccgtggcggcctgtgcccggcggcggcag780 ggcgacgacgcggtcgtgatcgacctgccggtgaccgcccgcacgac:ccgggccgcgctg840 acgacgcccggcatgatgtcgaacgtgctgccgctgcggctggaggt:cgcgcgcgacgcg900 gacctgcgcgccctcacggaggaggtgtcccgggcactgccggcgac:actccggcaccag960 cgcttccgcggcgaggagctgtaccgcgagctcggcgcgggcggcgc;gcggggacacctc1020 tcggtgaacgtgatgccgttcgaccaccaggtgcggttcggcaccgc:gccggcgaccctg1080 caccaactggccaacggccaggtgcacgaggtggcgatcgacgtgta.cgggacccccgac1140 aagggcggcgacatccacgtcaccgtgcacgccaacgcccggacgca.caccgtcgaggac1200 gtccggcagtggcaccgggagctgcgccgcatgctcgtccacctcctcggcggaccgggc1260 cgcacggtcggcgaggccgaactgctcgacgaggcc 1296 Information for SEQ ID NO: 12 Length: 432 Type: PRT
Organism. Streptomyces ghanaensis Sequence: 12 Ser Val Arg His Gly Val Lea Ala Ala Gln Arg Glu Va.l Trp Val Ala Gln Gln Leu Arg Pro Leu Ser Pro Arg Phe Asn Cys Gly Val Phe Leu Asp Val Gly Glu Ala Leu Asp Ala Ala Val Leu Arg Arg Ala Val Thr Arg Ala Leu Glu Glu Thr Glu Thr Leu Arg Ser Leu Phe Ala Glu Gln Asp Gly Asp Gly Glu Ile Ile Arg Thr Thr Arg Pro Ala Pro Asp Asp Cys Val Thr Thr Ile Asp Va~_ Arg Asp Ala Asp Asp Pro Val Ala Ala Ala Arg Arg Trp Met Asp Ala Asp Leu Ala Glu Pro Val Asp Leu Arg His Asp Pro Ser Tyr Arg His Val Leu Phe Arg Ile Gly Glu Arg Arg Ser Ile Phe Tyr Phe Arg Tyr His His Ile Thr Leu Asp Gly Phe Gly Gln Thr Leu Tyr Leu Asn Arg Val Ala Asp Leu Tyr Thr Ala Leu Ala Thr Ala Thr Glu Pro Asp Ala. Ala Pro Phe Gly Gly Leu Asp Arg Leu Leu Asp Glu Glu Arg Gln Tyr Glu Asp Ser Ala Arg Cys Ala Glu Asp Arg Ala His Trp His Thr Thr Ala Arg Ser Leu Ala Glu Gly Arg Gly Ser Gly Pro Ala Ala Ala Ser Asp Gln Val Leu Arg Asp Thr Val Arg Leu Pro Arg Glu Leu Thr Asp Ala Val Cys Ala His Ala Arg Ser His Gly Ser Arg Trp Thr Ala Val Met Leu G1y Ala Val Ala Ala Cys Ala Arg Arg Arg Gln Gly Asp Asp Ala Val Val Ile Asp Leu. Pro Val Thr Ala Arg Thr Thr Arg.Ala Ala Leu Thr Thr Pro Gly Met Met Ser Asn Val Leu Pro Leu Arg Leu Glu Val Ala Arg Asp Ala Asp Leu Arg Ala Leu Thr Glu Glu Val Ser Arg Ala Leu Pro Ala Thr Leu Arg His Gln IS

Arg Phe Arg Glu Gly Ala Gly Gly Arg Leu Ala Gly Glu Glu Leu Tyr Arg Gly s Leu Ser Val Asn Met Pro Asp His Gln Val Hi Val Phe Arg Phe Gly r Ala Pro Ala Thr His Gln Ala Asn Gly Gln Th Leu Leu Val His Glu l Ala Ile Asp Val Gly Thr Asp Lys Gly Gly Va Tyr Pro Asp Ile His l Thr Val His Ala Ala Arg His Thr Val Glu Va Asn Thr Asp Val Arg n Trp His Arg Glu Arg Arg Leu Val His Leu Gl Leu Met Leu Gly Gly o Gly Arg Thr Val Glu Ala Leu Leu Asp Glu Pr Gly Glu Ala Information for SEQ
ID NO:

Length: 314 Type:
DNA

Organism:Streptomyces refuineus Sequence:13 gcagaccgcgtggccgccac ctcggcccagtccgggatctggacggcaca gcggctgcgc60 tcggatgaccggctctacac ctgcggcctctacctcgaactcgaccacgt ggtggaggag120 gtgctgggcgaggcgatcgg ccgtgcggtcgccgacaccgaggcgctgcg caccgccttc180 ggggaggacggggacggcgc gctggaacagcgcgtgctcgcgcggccgcc ggacacgcag240 acacggctgttccggctgga cctgggcggagacgaccggccccgcgccga ggccctggac300 tggatggaccggcagcaggc ggaaccgtgggacctcgccgccggcgacac ctgccggcac360 accctgatccgcctcggcgg ccaccgcaccgtcctgcacctgcgctacca ccacctcgcc420 ctggacgggttcggtgccgc gctctacctggacaggatcgcggcggt=gta ccgggcgctg480 cgcaccggccgggagacgcc cccctgcaccttcgcgccgctggcccgcct cgtggaggag540 gaccgcgcctaccggcggtc cgcccgccaccgcagggacgccgaccactg gcggacgcgc600 ttcgcggacctcccccgccc caccagcctcgccggcgccgccgcgc<:cgc cgcgcccgcc660 gcgctgcgccacacggtccg cgtgtccgcggccgacaccgccgcact:ggg cctgcgggcg720 gaccggagcggcagcacctg gccggtgttcgccacggccgcggtgg<:cgc cttcctgagc780 cgcctcgcgccgggggagga ggtcgtcgtcggcttcccggtcaccgccag ggtcacgccc840 gccgcggtgcgcacgccggg gatgctggcgaacgtcgtgccgctccc~gat ccgggtgcgg900 caggggatgtcgttcgccgc gctgctggaccggaccgcggccgagat:cgg cgccacgctg960 cggcaccagcgccaccgcac cgaggacatcggccgggcgctcggcct:ccc cccgcacggc1020 gcccagccggccccgaccctggtcaacgtcatggccttcgccccggtgctcgacttcggc1080 gactgcctctcgccggtgcaccagctgtcggccggcccggtcgaggacctggcggtcaac1140 ctgctcggcacccccggggacggccgggagctggagatcaccgtcgccgccaaccccctg1200 ctccactcggaggacgcggtggcgtcgctggccgcgcggctggcggagttcctggcgcgc1260 gcgggcgagcacgccgacgccccgatcggccggacacgcctgctcggcgcggcg 1314 Information for SEQ ID NO: 14 Length. 438 Type: PRT
Organism: Streptomyces refuineus Sequence: 14 Ala Asp Arg Val Ala Ala Thr Ser Ala Gln Ser Gly Ile Trp Thr Ala Gln Arg Leu Arg Ser Asp Asp Arg Leu Tyr Thr Cys Gly Leu Tyr Leu Glu Leu Asp His Val Val Glu. Glu Val Leu Gly Glu Al<~ Ile Gly Arg Ala Val Ala Asp Thr Glu Ala. Leu Arg Thr Ala Phe Gly Glu Asp Gly Asp Gly Ala Leu Glu Gln Arg Val Leu Ala Arg Pro Pro Asp Thr Gln Thr Arg Leu Phe Arg Leu Asp Leu Gly Gly Asp Asp Arq_ Pro Arg Ala Glu Ala Leu Asp Trp Met Asp Arg Gln Gln Ala Glu Pro Trp Asp Leu Ala Ala Gly Asp Thr Cys Arg His Thr Leu Ile Arg Leu Gly Gly His Arg Thr Val Leu His Leu Arg Tyr His His Leu Ala Leu Asp Gly Phe Gly Ala Ala Leu Tyr Leu Asp Arg Ile Ala Ala Val Tyr Arg Ala Leu Arg Thr Gly Arg Glu Thr Pro Pro Cys Thr Phe Ala Pro Leu Ala Arg Leu Val Glu Glu Asp Arg Ala Tyr Arg Arg Ser Ala Are_~ His Arg Arg Asp Ala Asp His Trp Arg Thr Arg Phe Ala Asp Leu Pro Arg Pro Thr Ser Leu Ala Gly Ala Ala Ala Pro Ala Ala Pro Ala Ala. Leu Arg His Thr Val Arg Val Ser Ala Ala Asp Thr Ala Ala Leu Gly Leu Arg Ala Asp Arg Ser Gly Ser Thr Trp Pro Val Phe Ala Thr Ala Ala Val Ala Ala Phe Leu Ser Arg Leu Ala Pro Gly Glu Glu Val Val Val Gly Phe Pro Val Thr Ala Arg Val Thr Pro Ala Ala Val Arg Thr Pro Gly Met Leu Ala Asn Val Val Pro Leu Arg Ile Arg Val Arg Gln Gly Met Ser Phe Ala Ala Leu Leu Asp Arg Thr Ala Ala Glu Ile Gly Ala Thr Leu Arg His Gln Arg His Arg Thr Glu Asp Ile Gly Arg Ala Leu Gly Leu Pro Pro His Gly Ala Gln Pro Ala Pro Thr Leu Val Asn Val Met Ala Phe A1a Pro Val Leu Asp Phe Gly Asp Cys Leu Ser Pro Val His Gln 355 360 36.5 Leu Ser Ala Gly Pro Val Glu Asp Leu Ala Val Asn Leu Leu Gly Thr Pro Gly Asp Gly Arg Glu Leu Glu Ile Thr Val Ala Ala Asn Pro Leu Leu His Ser Glu Asp Ala Val Ala Ser Leu Ala Ala Arg_ Leu Ala Glu Phe Leu Ala Arg Ala Gly Glu. His Ala Asp Ala Pro Ile Gly Arg Thr Arg Leu Leu Gly Ala Ala Information NO: 15 for SEQ
ID

Length:

Type:
DNA

Organism:Streptomyces aizunensis Sequence:15 ggtgggctccgggaaatgatggcgggccagctcgcgatctggtacagccatcagttggcg60 cccgagaacccgtgcttcaacggtgccgagtacctggcgcttgacgc~agacgtggatctg120 ggcctcctggtgaaggcctcgcagcggctgatggaagaggcggacgc:cgcccggctgcgg180 atccgtgaagtggacgggcagccgaggcagtacttccacgacgtggaggactaccccgtc240 gaggtcatcgacatcagctccgaggccgatccccaggcggcggccgagagcctgatgtgg300 gaagacctgcggggcgagcggggagcggccgaccgctccctctacaccatcaagatctac360 acggccggtccccggctcaccttctggtaccagcgggcctaccacgtgatcctggacggc420 cgcagcgcgggcctggtggtcggccgcctgtcgcaggtgtacaacaccctgctccagggc480 ggttccgtggaagagggcgccctgccctccagcaccgtcctgatggacgcggaacgcgag540 taccggacctccgaggcccacgaggccgaccgggagtactggcgcggcgtcctcgcgggc600 ctccccgaggccgagggcctcggcagcaactacggcggccgcgcccagcgcgcccccatc660 cggttcgtggagagcgtcggcgacgccgtcgccacggacctgaagacggccgcccgcggg720 ctggggacgaacttcgccggcctgatgatcagcgccgccgccctctaccagcaccacctc780 accggacagcaggacgtggtcgtcggcgtcccggtcagcggccgctccggaacgcgcgac840 ctcgccattccgttcatgaccaacaacgtccttccgatccgggtgacgatcgccccggac900 acctcggtcgccgacctcgtgcggcagaccacgcgcgccgtgatgaagggcctgcgccac960 cagcgctaccgctacgagcacatgctcaacgacgcgatgctcggcg~agggcggtctctgg1020 gacctgctcatcaacgtgatgtccttcgacatctacgccctccccttcggcgactgcacc1080 gtcaccgcgcacaacctctccagcggccccgtcgacagcacgcgcatcgacgtgtacgac1140 cgctccggcctgaagatcgccgtcgacgtcaaccccgacgcccccgacctgtcgccgggc1200 gacgaggtctgccgtcgcttcctggcgctcgcgcactggctcgtct~~ggtcgatcccgcc1260 gaaccggtcggccgctccggcctgctggacgcggac 1296 Information for SEQ ID NO: 16 Length: 432 Type: PRT
Organism: Streptomyces aizunensis Sequence: 16 Gly G1y Leu Arg Glu Met Met Ala Gly Gln Leu Ala Ile Trp Tyr Ser His G1n Leu Ala Pro Glu Asn Pro Cys Phe Asn Gly Ala Glu Tyr Leu Ala Leu Asp Gly Asp Val Asp Leu Gly Leu Leu Val Lys Ala Ser Gln Arg Leu Met Glu Glu Ala Asp Ala Ala Arg Leu Arg Ile Arg Glu Val Asp Gly Gln Pro Arg Gln Tyr Phe His Asp Val Glu Asp Tyr Pro Val Glu Val Ile Asp Ile Ser Ser Glu Ala Asp Pro Gln Ala Ala Ala Glu Ser Leu Met Trp Glu Asp Leu Arg Gly Glu Arg Gly Ala Ala Asp Arg Ser Leu Tyr Thr Ile Lys Ile Tyr Thr Ala Gly Pro Arch Leu Thr Phe Trp Tyr Gln Arg Ala Tyr His Val Ile Leu Asp Gly Arg Ser Ala Gly Leu Val Val Gly Arg Leu Ser_ Gln Val Tyr Asn Thr Leu Leu Gln Gly Gly Ser Val Glu Glu Gly Ala Leu Pro Ser Ser Thr Val Leu Met Asp Ala Glu Arg Glu Tyr Arg Thr Ser Glu Ala His Glu Ala Asp Arg Glu Tyr Trp Arg Gly Val Leu Ala Gly Leu Pro Glu Ala Glu Gly Leu Gly Ser Asn Tyr Gly Gly Arg Ala Gln Arg Ala Pro Ile Arg Phe Val Glu Ser Val Gly Asp Ala Val Ala Thr Asp Leu Lys Thr Ala Ala Arg Gly Leu Gly Thr Asn Phe Ala Gly Leu Met Ile Ser Ala Ala Ala Leu Tyr Gln His His Leu Thr Gly Gln Gln Asp Val Val Val Gly Val Pro Val Ser Gly Arg Ser Gly Thr Arg Asp Leu Ala Ile Pro Phe Met Thr Asn Asn Val Leu Pro Ile Arg Val Thr Ile Ala Pro Asp Thr_ Ser Val Ala Asp Leu Val Arg Gln Thr Thr Arg Ala Val Met Lys Glv Leu Arg His 305 310 315 ~ 320 Gln Arg Tyr Arg Tyr Glu His Met Leu Asn Asp Ala Met. Leu Gly Glu Gly Gly Leu Trp Asp Leu Leu Ile Asn Val Met Ser,PhE: Asp Ile Tyr Ala Leu Pro Phe Gly Asp Cys Thr Val Thr Ala His Asn Leu Ser Ser Gly Pro Val Asp Ser Thr Arg Ile Asp Val Tyr Asp Arc_~ Ser Gly Leu Lys Ile Ala Val Asp Val Asn Pro Asp Ala Pro Asp Leu Ser Pro Gly Asp Glu Val Cys Arg Arg Phe Leu Ala Leu Ala His Trp Leu Val Ser Val Asp Pro Ala Glu Pro Val G1y Arg Ser Gly Leu Leu Asp Ala Asp Information for SEQ ID NO. 17 Length: 1293 Type: DNA
Organism: Actinomycete Sequence:17 gtggatcgtcgtcccgtctccgccgcccagctgggcatctgggtcgcgcagcaggtgctg60 ccggacagtcctctgtacaactgcggctgctactacgagatcggcgcggccgatcccggg120 ctgctcgaccgcgcggtccggcacacgctggccgagaccgaggcgctgcggtcgcgcttc180 gagacgatcgacgaccagctgtggcagctcgtcgggccggccgaccccgagccgctggag240 gtcgtcgacctgcgcgcggagcccgacccggaggcggccgcccggcgctggatgggggcc300 gcgatggccgaggtccggcccctgggccgggccccgctgagccgccaggccgtgctgctg360 ctcggcgcggaccgccggctgtggttccacggctaccaccacgccgtgctggacggcttc420 ggccagtccgtctacgccgcccgggtggcgcaggtctacgccgccctggccgccggccgg480 accccgccggagcgggtcttcgcgacgctggacgaggcgcacgccgacgccgcggtggac540 cccgcgtcccgccggttcgccgccgaccgggactactggctcggcg~~cttcgccgaccgg600 ccggagccggtcgggctggccggccgggccggcgccgccgggccgacccagctgcgccgg660 atccgcccgctgccgccggggtgcgccgcccggttcgcggcggcggc ggcggtgggc720 cga agcacctggccggccgccgtgatcgccgcggtggccgcctactaccaccggatgaccggg780 cgcgaggagatcgtcttcgcgctgccgctggccggccgccgcggccgggcctcgctgagc840 acgcccggggccctggtcaacgtgctgccgatccggotctcggtgagctcccgggccacc900 ttcgccgagctggcccggcaggccggccgccggctggccgacgtgct:gcgccatcagcgc960 ttccgcggcgagcagctgttccaggagctgggcctgtccggcgagcgcgcgttctggggg1020 cccacggtcaacgtgatgggcttcggcggcgacctggccctgggcccggtcaccgcggtg1080 ccgcacccgc tcgcgaccgg cccggtccag gacctcaaga tcaactt:cta cggtaogccg 1140 gccacgggcg tccgcctcga actcgacgcc gacccggccc gctacgacgc ggtggccgtc 1200 gcggagcacc aggaccggct gatccggctg ctgacggcgc tcgggcacga cccggcgacc 1260 cggatcggcg ccgtcgacct gctcgacccc gcc 1293 Information for SEQ ID N0: 18 Length: 431 Type: PRT
Organism: Actinomycete Sequence: 18 Val Asp Arg Arg Pro Val Ser Ala Ala Gln Leu Gly Ile: Trp Val Ala Gln Gln Val Leu Pro Asp Ser Pro Leu Tyr Asn Cys Gly Cys Tyr Tyr Glu Ile Gly Ala Ala Asp Pro Gly Leu Leu Asp Arg Ala Val Arg His Thr Leu Ala Glu Thr Glu Ala Leu Arg Ser Arg Phe Glu Thr Ile Asp Asp Gln Leu Trp Gln Leu Val Gly Pro Ala Asp Pro G1u Pro Leu Glu Val Val Asp Leu Arg Ala Glu Pro Asp Pro Glu Ala Ala Ala Arg Arg Trp Met Gly Ala Ala Met Ala Glu Val Arg Pro Leu Gly Arg Ala Pro Leu Ser Arg Gln Ala Val Leu Leu Leu Gly Ala Asp Arg Arg Leu Trp Phe His Gly Tyr His His Ala Val Leu Asp Gly Phe Gly Gln Ser Val Tyr Ala Ala Arg Val Ala Gln Val Tyr Ala Ala Leu Ala Ala Gly Arg Thr Pro Pro Glu Arg Val Phe Ala Thr Leu Asp Glu Ala His Ala Asp Ala Ala Val Asp Pro Ala Ser Arg Arg Phe Ala Ala Asp Arg Asp Tyr Trp Leu Gly Ala Phe Ala Asp Arg Pro Glu Pro Val Gly Leu Ala Gly Arg Ala Gly Ala Ala Gly Pro Thr Gln Leu Arg Arg Ile Arg Pro Leu .

Pro Pro Gly Cys Ala Ala Arg Phe Ala Ala Ala Ala Glu Ala Va1 Gly Ser Thr Trp Pro Ala Ala Val Ile Ala Ala Val Ala Ala Tyr Tyr His Arg Met Thr Gly Arg Glu Glu Ile Val Phe Ala Leu Pro Leu Ala Gly 260 265 2?0 Arg Arg Gly Arg Ala Ser Leu Ser Thr Pro Gly Ala Leu Val Asn Val Leu Pro Ile Arg Leu Ser Val Ser Ser Arg Ala Thr Phe Ala Glu Leu Ala Arg Gln Ala Gly Arg Arg Leu Ala Asp Val Leu Arg His Gln Arg Phe Arg Gly Glu Gln Leu Phe Gln Glu Leu Gly Leu Ser Gly Glu Arg Ala Phe Trp Gly Pro Thr Val Asn Val Met Gly Phe Gly Gly Asp Leu Ala Leu Gly Pro Val Thr Ala Val Pro His Pro Leu Ala Thr Gly Pro Val Gln Asp Leu Lys Ile Asn Phe Tyr Gly Thr Pro Ala Thr Gly Val Arg Leu Glu Leu Asp Ala Asp Pro Ala Arg Tyr Asp Ala Val Ala Val Ala Glu His Gln Asp Arg Leu Ile Arg Leu Leu Thr Ala Leu Gly His Asp Pro Ala Thr Arg Ile Gly Ala Val Asp Leu Leu Asp Pro Ala Information for SEQ ID N0: 19 Length: 1305 Type: DNA
Organism: Streptomyces sp. (Ecopia strain) Sequence:19 ggcggccgtcgtgagctgatggccggacagcttggcttatggcatgcgcagcaactgaat60 ccggataatccgatctataacatgggtgaatacatagagattcgcggaaaggtggacacg120 agcttattcgaggcggctgtgcgaagggtcgtcctggaagtcgacggttttggtctgcgc180 tttgagggaggtgctgacgaagttccgcggcaatattttggcctgcggagcgattggctg240 tttcatgtgatcgacgtgagcggcgaggaggacccccgttccagcgcggagagttggatg300 cgggcggacatgcgacgcccggtggatctccaggtcggtgaactcttcacccaggccatc360 atcaaggtggacgaagatctcttcttctggtatcagcgaatacaccacatcatcgcggac420 ggactcgcgggaccccggatagcctcccgagtggctgcggtctacacggcactgtcggcc480 ggcgaacccctcgcggacagtgcgcctccctcgagttccgtactgatggacgccgatgcc540 gactaccgggcgtccccagaattcgaactggaccggcagtactgga<:ggagcgtctttcc600 gatcgccctcaaacggtcagcttgagcggccaggaaccttccaccacaccccatgaactg660 acacggcatacgctccacatcccacccgacgccgctgcggaactcagaagctccgcccgt720 cggctgggaacgagcctctcgggtctggccgtagccgcgagcgccg<:ctacctgcatcgc780 gcaacgggacaagaggacatcattctcggggtccccgtaatgggcagaaaaaccgcgctg840 cgggacatccccggaatgacggcgaatatcgttcctctgcgcctcgctgtgcagccgaag900 gccacggtgagggagctcgtgaagcaggtatctcgcggagtacgagacgccttgcggcat960 cagcgataccgctacgaggacatcctcagagacctgaagctcgtggggcgcgacggactc1020 taccccctactggtcaatatcgtctcctttgactacgatttgagatttggtgacgccccc1080 agcattgcgcacgggctcggcggcataaacttcaacgacctgtcgat,ttccgtgtacgac1140 aggtcgtccgacggaagcatgtccgtggttgtggacgccaatcccgacctttacagccgt1200 ggagcggtgc aagagcatgc cgcgaaattc ctcgacgtga tgaactggat ggcgcgttcc 1260 gctgcggagg aacgcatcca ccagatcacg ttgatgagcc gctcc 1305 Information for SEQ ID NO: 20 Length: 435 Type: PRT
Organism: Streptomyces sp. (Ecopia strain) Sequence: 20 Gly Gly Arg Arg Glu Leu Met Ala Gly Gln Leu Gly Leu Trp His Ala Gln Gln Leu Asn Pro Asp Asn Pro Ile Tyr Asn Met Gly Glu Tyr Ile Glu Ile Arg Gly Lys Val Asp Thr Ser Leu Phe Glu Ala Ala Val Arg Arg Val Val Leu Glu Val Asp Gly Phe Gly Leu Arg Ph~a Glu Gly Gly Ala Asp Glu Val Pro Arg Gln Tyr Phe Gly Leu Arg Se:r Asp Trp Leu Phe His Val Ile Asp Val Ser Gly Glu Glu Asp Pro Arg Ser Ser Ala Glu Ser Trp Met Arg Ala Asp Met Arg Arg Pro Va1 Asp Leu,Gln Val Gly Glu Leu Phe Thr Gln Ala. Ile Ile Lys Va1 Asp Glu Asp Leu Phe Phe Trp Tyr Gln Arg Ile His His Ile Ile Ala Asp Gl~r Leu Ala Gly Pro Arg Ile Ala Ser Arg Val Ala Ala Val Tyr Thr Ala Leu Ser Ala Gly Glu Pro Leu Ala Asp Ser Ala Pro Pro Ser Ser Sexy Val Leu Met Asp Ala Asp Ala Asp Tyr Arg Ala Ser Pro Glu Phe Glu Leu Asp Arg Gln Tyr Trp Thr Glu Arg Leu Ser Asp Arg Pro Gln Thr Val Ser Leu Ser Gly Gln Glu Pro Ser Thr Thr Pro His Glu Leu Thr Arg His Thr Leu His Ile Pro Pro Asp Ala Ala Ala Glu Leu Arg Ser Ser Ala Arg Arg Leu Gly Thr Ser Leu Ser Gly Leu Ala Val Ala Ala Ser Ala Ala Tyr Leu His Arg Ala Thr Gly Gln Glu Asp Ile Ile Leu Gly Val Pro Val Met Gly Arg Lys Thr Arg Asp Pro Gly Met Thr Ala Leu Ile Ala Asn Ile Val Pro Leu Arg Val Gln Lys Ala Thr Val Leu Ala Pro Arg Glu Leu Val Lys Gln Val Gly Val Asp Ala Leu Arg Ser Arg Arg His Gln Arg Tyr Arg Tyr Glu Leu Arg Leu Lys Leu Val Asp Ile Asp Gly Arg Asp Gly Leu Tyr Pro Val Asn Val Ser Phe Asp Leu Leu Ile Tyr Asp Leu Arg Phe Gly Asp Ser Ile His Gly Leu Gly Ala Pro Ala Gly Ile Asn Phe Asn Asp Leu Ser Val Asp Arg Ser Ser Ser Ile Tyr Asp Gly Ser Met Ser Val Val Ala Asn Asp Leu Tyr Ser Val. Asp Pro Arg Gly Ala Val Gln Glu His Lys Phe Asp Va:1 Met Asn Ala Ala Leu Trp Met Ala Arg Ser Ala Ala Arg Ile Gln Ile Thr Leu Glu. Glu His Met Ser Arg Ser Information for SEQ ID
NO: 21 Length: 1338 Type: DNA

Organism: Streptomyces or coelicol Sequence: 21 tcggttcggc acggtctgac cacgaggtgtggctcg<:cca gcagctggat60 gagcgcgcag ccgcgtggcg cgcactaccg tgcctggagatcgacggacc cctggaccac120 gacgggatcc gcggtgctga gccgcgccct gtggccggtacggagacgct ctgctcgcgc180 gcggctcacc ttcctcaccg acgaggaggg cgcgcgtactgcccgcc:cgc gccggagggt240 ccggccgtac tcggccgccg tcgaggaccc ccgtacacccccgtgctgct gcgccacatc30U
ggacggggtg gacctctccg gtcacgagga gaggcccagcggtggatgga ccgggaccgc360 ccccgagggc gcgacgccgc tgccgctgga ctgagcagccacgcgctgtt cacgctcggc420 ccggcccggc gggggccggc acctgtacta caccacatcgtgatcgacgg caccagcatg48G
cctgggcgtc gccctgttct acgagcggct taccgcgcgctgcgggacgg gcgtgcggtg540 ggccgaggtg cccgcggccg ccttcgggga atggtcgcgggcgaggaggc ctaccgcgcg600 cacggaccgg tcggcgcggt acgagcgtga tggaccggcctgttcaccga ccgccccgag660 ccgggcctac cccgtctcgctcaccgggcgcggcggcggccgggccctcgcgccgaccgtgaggagcctg720 ggcctgcccccggagcgcacggaggtgctcggccgggccgccgaggcgaccggtgcgcac780 tgggcgcgcgtggtcatcgccggtgtggccgccttcctgcaccgga.cgacgggcgcccgg840 gacgtcgtggtgtcggtgccggtcaccgggcgctacggcgcgaacgcccggatcaccccc900 ggcatggtctccaaccggctgccgctgcggctggcggtgcgccccggcgagagtttcgcg960 cgggtggtcgagaccgtgtccgaggcgatgagcggcctcctggcgcacagccgcttccgc1020 ggcgaggacctcgaccgggagctgggcggcgcgggggtgtcggggcccaccgtcaacgtc1080 atgccgtacatcaggccggtggact=tcggcggtccggtcggcctgatgcgcagcatcagt114.0 tcgggtccgaccaccgatctgaacatcgtgctgaccggcacccccgagtccggcctgcgc1200 gtcgacttcgagggcaacccgcaggtgtacggcggccaggacctgacggtgctgcaggaa1260 cgcttcgtccggttcctggcggagctggcggccgaccccgcagccaccgtcgacgaggtc1320 gcgctgctgacgccggac 1338 Information for SEQ ID N0: 22 Length. 446 Type: PRT
Organism: Streptomyces coelicolor Sequence: 22 Ser Val Arg His Gly Leu Thr Ser Ala Gln His Glu Va:1 Trp Leu Ala Gln Gln Leu Asp Pro Arg Gly Ala His Tyr Arg Thr Gly Ser Cys Leu Glu Ile Asp Gly Pro Leu Asp His Ala Val Leu Ser Arg Ala Leu Arg Leu Thr Val Ala Gly Thr Glu Thr Leu Cys Ser Arg Phe Leu Thr Asp Glu Glu Gly Arg Pro Tyr Arg Ala Tyr Cys Pro Pro Ala Pro Glu Gly Ser Ala Ala Val Glu Asp Pro Asp Gly Val Pro Tyr Thr Pro Val Leu Leu Arg His Ile Asp Leu Ser Gly His Glu Asp Pro Glu Gly Glu Ala Gln Arg Trp Met Asp Arg Asp Arg Ala Thr Pro Leu Pra Leu Asp Arg Pro Gly Leu Ser Ser His Ala Leu Phe Thr Leu Gly Gly Gly Arg His Leu Tyr Tyr Leu Gly Val His His Ile Val Ile Asp Gly Thr Ser Met Ala Leu Phe Tyr Glu Arg Leu Ala Glu Val Tyr Arg Ala Leu Arg Asp Gly Arg Ala Val Pro Ala Ala Ala Phe Gly Asp Thr Asp Arg Met Val Ala Gly Glu Glu Ala Tyr Arg Ala Ser Ala Arg Tyr Glu Arg Asp Arg Ala Tyr Trp Thr Gly Leu Phe Thr Asp Arg Pro Glu Pro Val Ser Leu Thr Gly Arg Gly Gly Gly Arg Ala Leu Ala Pro Thr Val Arg Ser Leu Gly Leu Pro Pro Glu Arg Thr Glu Val Leu Gly Arg Ala Ala Glu Ala Thr Gly Ala His Trp Ala Arg Val Val Ile Ala Gly Val Ala Ala Phe Leu His Arg Thr Thr Gly Ala Arg Asp Val Val Val Ser Val Pro Val Thr Gly Arg Tyr Gly Ala Asn Ala Arg Ile 'rhr Pro Gly Met Va1 Ser Asn Arg Leu Pro Leu Arg Leu Ala Val Arg Pro Gly Glu Ser Phe Ala Arg Val Val Glu Thr Val Ser Glu Ala Met Ser Gly Leu Leu Ala His Ser Arg Phe Arg Gly Glu Asp Leu Asp Arg Glu Leu Gly Gly Ala Gly Val Ser Gly Pro Thr Val Asn. Val Met Pro Tyr Ile Arg Pro Val Asp Phe Gly Gly Pro Val Gly Leu Met Arg Ser Ile Ser Ser Gly Pro Thr Thr Asp Leu Asn Ile Val Leu Thr Gly Thr Pro Glu Ser Gly Leu Arg Val Asp Phe Glu Gly Asn Pro Gln Val Tyr Gly Gly Gln Asp Leu Thr Val Leu Gln G1u Arg Phe Val Arg Phe Leu Ala Glu Leu Ala Ala Asp Pro Ala Ala Thr Val Asp Glu Val Ala Leu Leu Thr Pro Asp Information for SEQ ID NO: 23 Length: 1763 Type: DNA
Organism: Actinoplanes sp_ Sequence:23 tggtcatcgacgccgccacccaacccaccgttcccgacgccttccgggcgcaggcgatcg60 cgcgccccggcgagcccgccctcgtggtgctccccggcgacccggacgccgagcccgtca120 ccctcacgtacgccgagctcgaccgccgcgccgcggcgcgggcggcctggctcgccgccc180 ggttcccggccggggagcgcatcctcatcgccctgcccaccggcgccgagttcgtcgagc240 tctacctggcgtgcctctacgccggcctggtcgccgtgccggcgcccccgcccggagggt300 cgtccggcgcctccgagcgcaccgtcggcatcgcggccgactgctcccccgccctggccg360 tcgtcaacgccgacgacgcggcgccgctcaccgccgtcctgcgcgagcgcggcctgtccg420 gcctgccggtcggtgcgcttccgcccctcgcggcggaagcgatccgcccgccccgcgggc480 cccggccggactcgctggccgtcctgcagtacagctcgggctccaccggctcgcccaagg540 gcgtgatgctcagccaccgggccgtgctggccaacctccgcgcgttcgaccgcagcagcg600 ggcacaacagcgacgacgtgttcggcagctggctgccgctgcaccacgacatgggcctgt660 tcgccatgctcaccgcgggcctgctgaacggcgccggcgtcgtgctgatgtcgccgacgg720 ccttcgtccgccggccggcggactggctgcggatgatggaccgctaccgggtcaccatct780 ccgccgcgcccaacttcgcgtacgacctgtgcgtgcgcgccgtgcgggacgagcagatcg840 ccggcctcgacctgtcccgcatccgcacgctctacaacggatcggagccggtcaacccgg900 ccaccgtccgggcgttcaccgagcgcttcgcccccttcggcctgcacacccacgcggtga960 acccctgctacggcatggccgagttcaccgcgtacgtgtcgacgaa<~gtcttcgaggcgc1020 cggcggtctttcttcccgccgaccctcgcgcgctggaggacgccgcgtcgccggccctgc1080 gcccggccgaccccgccgcggcccg~ggagataccgggtgtcggccgggtgcccgacttcg1140 aggtgctcatcgtcgacccggacgggctacggccgctgcccgagggc gtcggcgaga1200 cgg tctggctgcgcgggcccggcgcgggcgccggctactggggcaggaccgagctcaaccccg1260 gcatcttcgacgccaggcccgcgggcgacggccaggacggcggctgggtgcgaacgggtg1320 acctgggtgcgctgaccggaggcgagctgttcctcaccggacgcctc:aaggagctgctca1380 tcgtgcacggccgcaacctggccccgcacgacctcgagcgggaggcccgggccgcgcacg1440 acgcggtggaccaccagatcggggcggcgttcggggtgccggcgccc:gacgagcggatcg1500 tgctggtgcaggaggtgcatccgcgcacgccgctcgacgagctgccc~cgggtggcgagcg1560 ccgtcagccgccggctcaccgtctccttcggcgtgccggtacgcaacgtgctgctggtgc1620 ggcgcggcacggtgcgccggaccacgagcggcaagatccgccggacc;gcggtccgcgagc1680 ggttcctggccggcggcatcacggcgctgcacgccgagctcgagccc~gcgctgcggccgg1740 tgcaggcgggcgcgggccgatga 1763 Information for SEQ ID NO: 24 Length: 587 Type: PRT
Organism: Actinoplanes sp.
Sequence: 24 Met Val Ile Asp Ala Ala Thr Gln Pro Thr Val Pro Asp Ala Phe Arg Ala Gln Ala Ile Ala Arg Pro Gly Glu Pro Ala Leu Val Val Leu Pro Gly Asp Pro Asp Ala Glu Pro Val Thr Leu Thr Tyr Ala Glu Leu Asp Arg Arg Ala Ala Ala Arg Ala Ala Trp Leu Ala Ala Arg Phe Pro Ala Gly Glu Arg Ile Leu Ile Ala Leu Pro Thr Gly Ala Glu Phe Val Glu Leu Tyr Leu Ala Cys Leu Tyr Ala Gly Leu Val Ala Val Pro Ala Pro Pro Pro Gly Gly Ser Ser Gly Ala Ser Glu Arg Thr Va:l Gly Ile Ala Ala Asp Cys Ser Pro Ala Leu Ala Val Val Asn Ala Asp Asp Ala Ala Pro Leu Thr Ala Val Leu Arg Glu Arg Gly Leu Ser Gly Leu Pro Val Gly Ala Leu Pro Pro Leu Ala Ala Glu Ala Ile Arg Pro Pro Arg Gly Pro Arg Pro Asp Ser Leu Ala Val Leu Gln Tyr Ser Ser Gly Ser Thr Gly Ser Pro Lys Gly Val Met Leu Ser His Arg Ala Val Leu Ala Asn Leu Arg Ala Phe Asp Arg Ser Ser Gly His Asn Ser Asp Asp Val Phe Gly Ser Trp Leu Pro Leu His His Asp Met Gly Leu Phe Ala Met Leu Thr Ala Gly Leu Leu Asn Gly Ala Gly Val Val Leu Met: Ser Pro Thr Ala Phe Val Arg Arg Pro Ala Asp Trp Leu Arg Met Met Asp Arg Tyr Arg Val Thr Ile Ser Ala Ala Pro Asn Phe Ala Tyr Asp Leu Cys Val Arg Thr Leu Tyr Asn Gly Ser Glu Pro Val Asn Pro Ala Thr Val Arg Ala Phe Thr Glu Arg Phe Ala Pro Phe Gly Leu His Thr His Ala Val Asn Pro Cys Tyr Gly Met Ala Glu Phe Thr Ala Tyr Val Ser Thr Lys Val Phe Glu Ala Pro Ala Val Phe Leu Pro Ala Asp Pro Arg Ala Leu Glu Asp Ala Ala Ser Pro Ala Leu Arg Pro Ala Asp Pro Ala Ala Ala 355 360 36.5 Arg Glu Ile Pro Gly Val Gly Arg Val Pro Asp Phe Glu Val Leu Ile Val Asp Pro Asp Gly Leu Arg Pro Leu Pro Glu Gly Arg Val Gly Glu Ile Trp Leu Arg Gly Pro Gly Ala Gly Ala Gly Tyr Trp Gly Arg Thr G1u Leu Asn Pro Gly Ile Phe Asp Ala Arg Pro Ala Gly Asp Gly Gln Asp Gly Gly Trp Val Arg Thr Gly Asp Leu Gly Ala Leu Thr Gly Gly Glu Leu Phe Leu Thr Gly Arg Leu Lys Glu Leu Leu Ile Val His Gly Arg Asn Leu Ala Pro His Asp Leu Glu Arg Glu Ala Arg_ Ala Ala His Asp Ala Val Asp His Gln Ile Gly Ala Ala Phe Gly Val Pro Ala Pro Asp Glu Arg Ile Val Leu Val Gln Glu Val His Pro Arc_~ Thr Pro Leu Asp Glu Leu Pro Arg Val Ala Ser Ala Val Ser Arg Arq Leu Thr Val 515 520 52~i Ser Phe Gly Val Pro Val Arg Asn Val Leu Leu Val Arc_~ Arg Gly Thr Val Arg Arg Thr Thr Ser Gly Lys Ile Arg Arg Thr Ala Val Arg Glu Arg Phe Leu Ala Gly Gly Ile Thr Ala Leu His Ala Glu Leu Glu Pro Ala Leu Arg Pro Val Gln Ala Gly Ala Gly Arg Information for SEQ ID NO: 25 Length: 1803 Type: DNA
Organism: Streptomyces roseosporus Sequence.25 gtgcctgccgtgagtgagagccgctgtgccgggcagggcctggtgggggcactgcggacc60 tgggcacggacacgtgcccgggagactgccgtggttctcgtacgggacaccggaaccacc120 gacgacacggcgtcggtggactacggacagctggacgagtgggccagaagcatcgcggtg180 accctccgacagcaactcgcgccggggggacgggcacttctgctgctgccgtccggcccg240 gagttcacggccgcgtacctcggctgcctgtacgcgggtctggccgccgtaccggcgccg300 ctgcccggggggcgccacttcgaacgccgccgtgtcgcggccatcgccgccgacagcgga360 gccggcgtggtgctgaccgtcgcgggtgagaccgcctccgtccacgactggctgaccgag420 accacggccccggctactcgcgtcgtggccgtggacgaccgggcggcgctcggcgacccg480 gcgcagtgggacgacccgggcgtcgcgcccgacgacgtggctctcatccagtacacctcg540 ggctcgaccggcaaccccaagggcgtggtcgtgacccacgccaacctgctggcgaacgcg600 cggaatctcgccgaggcctgcgagctgaccgccgccactcccatgggcggctggctgccc660 atgtaccacgacatggggctcctgggcacgctgacaccggccctgtacctcggcaccacg720 tgcgtgctgatgagctccacggcattcatcaaacggccgcacctgtggctacggaccatc780 gaccggttcggcctggtctggtcgtcggctcccgacttcgcgtacgacatgtgtctgaag840 cgcgtcaccgacgagcagatcgccgggctggacctgtcccgctggcggtgggccggcaac900 ggcgcggagcccatccgggcagccaccgtacgggccttcggcgaacggttcgcccggtac960 ggcctgcgccccgaggcgctcaccgccggctacgggctggccgaggc cctgttcgtg1020 cac tcgaggtcgcaggggctgcacacggcacgagtcgccaccgccgccct=cgaacgccacgaa1080 ttccgcctcgccgtacccggcgaggcagcccgggagatcgtcagctgcggtcccgtcggc1140 cacttccgcgcccgcatcgtcgaacccggcgggcaccgtgttctgccgcccggccaggtc1200 ggcgagctggtcctccagggagccgccgtctgcgccggctactggcaggccaaggaggag1260 accgagcagaccttcggcctcaccctcgacggcgaggacggtcactc3gctgcgcaccggc1320 gatctcgccgccctgcacgaagggaatctccacatcaccggccgctgcaaagaggccctg1380 gtgatacgaggacgcaatctgtacccgcaggacatcgagcacgaact:ccgcctgcaacac1440 ccggaacttgagagcgtcggcgccgcgttcaccgtcccggcggcacctggcacgccgggc1500 ttgatggtggtccacgaagtccgcaccccggtccccgccgacgaccacccggccctggtc1560 agcgccctgcgggggacgatcaaccgcgaattcggactcgacgcccagggcatcgccctg1620 gtgagccgcggcaccgtactgcgtaccaccagcggcaaggtccgccc~gggcgccatgcgt1680 gacctctgcctccgcggggagctgaacatcgtccacgcggacaagggctggcacgccatc1740 gccggcacgg ccggagagga catcgccccc actgaccacg ctccaca.tcc gcaccccgcg 1800 taa 1803 Information for SEQ ID N0: 26 Length: 600 Type: PRT
Organism: Streptomyoes roseosporus Sequence: 26 Val Pro Ala Val Ser Glu Ser Arg Cys Ala Gly Gln Gly Leu Val Gly Ala Leu Arg Thr Trp Ala Arc_~ Thr Arg Ala Arg Glu Th:r Ala Val Val Leu Val Arg Asp Thr Gly Thr Thr Asp Asp Thr Ala Ser Val Asp Tyr Gly Gln Leu Asp Glu Trp Ala. Arg 5er Ile Ala Val Th:r Leu Arg Gln Gln Leu Ala Pro Gly Gly Arg Ala Leu Leu Leu Leu Pro Ser Gly Pro Glu Phe Thr Ala Ala Tyr Leu Gly Cys Leu Tyr Ala Gly Leu Ala Ala Val Pro Ala Pro Leu Pro Gly Gly Arg His Phe Glu Arg_ Arg Arg Val Ala Ala Ile Ala Ala Asp Ser Gly Ala Gly Val Val Leu Thr Va1 Ala Gly Glu Thr Ala Ser Val His Asp Trp Leu Thr Glu Thr Thr Ala Pro Ala Thr Arg Val Val Ala Val Asp Asp Arg A1a Ala Leu Gly Asp Pro Ala Gln Trp Asp Asp Pro Gly Val Ala Pro Asp Asp Val Ala Leu Ile Gln Tyr Thr Ser Gly Ser Thr Gly Asn Pro Lys Gly Val. Val Val Thr His Ala Asn Leu Leu Ala Asn Ala Arg Asn Leu Ala Glu Ala Cys Glu Leu Thr Ala Ala Thr Pro Met Gly Gly Trp Leu Pro Met Tyr His Asp Met Gly Leu Leu Gly Thr Leu Thr Pro Ala Leu Tyr Leu Gly Thr Thr Cys Val Leu Met Ser Ser Thr Ala Phe Ile Lys Arg Pro His Leu Trp Leu Arg Thr Ile Asp Arg Phe Gly Leu Val Trp Ser Ser Ala Pro Asp Phe Ala Tyr Asp Met Cys Leu Lys Arg Val Thr Asp Glu Gln Ile Ala Gly Leu Asp Leu Ser Arg Trp Arg Trp Ala Gly Asn Gly Ala Glu Pro Ile Arg Ala Ala Thr Val Arg Ala Phe Gly Glu Arg Phe Ala Arg Tyr Gly Leu Arg Pro Glu Ala Leu Thr Ala Gly Tyr Gly Leu Ala Glu Ala Thr Leu Phe Val Ser Arg Ser Gln Gly Leu His Thr Ala Arg Val Ala Thr Ala Ala Leu Glu Arg His Glu Phe Arg Leu Ala Va:L Pro Gly Glu Ala Ala Arg Glu Ile Val Ser Cys Gly Pro Val Gly His Phe Arg Ala Arg Iie Vai Glu Pro Gly Gly His Arg Val Leu Pro Pra Gly Gln Val Gly Giu Leu Val Leu Gln Gly Ala Ala Val Cys Ala Gly Tyr Trp Gln Ala Lys Glu Glu Thr Glu Gln Th.r Phe Gly Leu Thr Leu Asp Gly Glu Asp Gly His Trp Leu Arg Thr Gly Asp Leu Ala Ala Leu His Glu Gly Asn Leu His Ile Thr Gly Arg Cys Lya Glu Ala Leu Val Ile Arg Gly Arg Asn Leu Tyr Pro Gln Asp Ile Glu His Giu Leu Arc_~ Leu Gln rlis 465 470 4 75 ~=80 Pro Glu Leu C7lu Ser Val Gly Ala Ala Phe Thr Val Prc> Ala Ala Pro Gly 'ihr Pro Gl.y Leu Met Val Val His Glu Val Arg Thr Pro Val Pro Ala Asp Asp His Pro Ala Leu Val Ser Ala Leu Arg Gly Thr Ile Asn Arg Glu Phe Gly Leu Asp Ala Gln Gly Ile Ala Leu Val Ser Arg Gly Thr Val Leu Arg Thr Thr Ser Gly Lys Val Arg Arg Gly Ala Met Arg Asp Leu Cys Leu Arg Gly Glu Leu Asn Ile Val His Ala Asp Lys Gly Trp His Ala Ile Ala Gly Thr Ala Gly Glu Asp Ile Ala Pro Thr Asp His Ala Pro His Pro His Pro Ala Information for SEQ
ID N0:

Length:

Type:
DNA

Organism:Streptomyces gha.naensis Sequence:27 atggtcaacgtcagtgaagcgcgga.gtgtccccgaactcctgcggc<~tcacgcgagttcg60 gcacccgaccgggaggcgctgcgctacctgcgcgacaccacggggacggacgggaccccg120 ctcacctaccgggaagtggaccgcgctgccgccgccgtggcacggcgcctctcccggagc180 ttcgaggcgggcgaccggctgctgctcctgcactccttcggcccggacttcatcgtgggc240 ttcctcgcctgcctctacgccggcatggtggccgttcccgcgccgct:gcccggcagatac300 cgccatgaacgcagacgggtgctgagcatcgcccacgacagcggcgc cgcggtgctc360 cgt accgacgacgcgagctccgcggaggtcggcgagtggatgcgcgagg<~gggcctggacggc420 ctccccctcatcgccaccgactggtccgccgaggagcccggtgcctl~caccccggccgcg480 gacctcggacgcgagacgctcgccatgctccagtacacctccggctc gggcgagccg540 cac aagggcgtgatggtgacgcacgggaacctgctgcggaacgtcaccgcgctgagccgtgcg600 ttcggcctcgacgagcacacccacttcggcggctggatcccccattt:ccacgacatgggc660 ctgatcgggctgctgctgccctccctcttcctgcgcagcaggtgcgt:gctgatgagcccg720 tccgccttcatccgccgaccgcacacctggctgaagatgatcgacgacttcgacgtcgcg780 tggtcggcggcccccgatttcgcctaccaactgtgctgccgacgagt:caccgacgagcag840 ctcggcagcctcgacctctcgcgctggcgctacgcgggcaacggct<:ggaacccatCCaC900 gccggcaccatcaccgccttcgccgagcggttcgccgccgccgggtt:ccgcgccgagtcg960 ctgtccccgtgctacggcctcgccgagtcgacggtctacgtctccggcggtccctccgcc1020 cggatcaccgcggtcgacgcccagtcgctggaggaccaccggctcgc~cgaggccgtaccg1080 ggacggccgcaccgctcgctggtgagctgcqgcgcgccggcggacgt:cgacctccggatc1140 gtcgacccgcggaccggggacccgctgccggacggcgcggtcggggagatctggctgcgg1200 ggaggcagcgtcgccgtcggctactgggacaaccccgcggcgtccgc:cgagaccttcggc1260 gccgtcatcgacggggtggagggccgctatctgcggaccggtgacctcggcgcgctgtac1320 gac:ggggagctgtacgtcacgggccggatcaaggaaatgatcaccgt:gcacgggaggaac1380 gtctacccacaggacgtcgagcaggaactgcgcgccgcocaccaggaactcgccggctgc1440 gtcggcgccgtcttcgccctca cggatacgggccccgacccggtcctggtcgtgtcccac1500 gaggtccgggcgggcctcggtgcggacgtactggaggcgctggcccctggacatgaagcag1560 acggtggccc gcgagatggg catgacggcg tcgtgcgtgg tgctcctgcg ccgcgggacg 1620 gtacgccgga cgacgagcgg caagatccag cgcgacgcga tgcggaagct gttccgggac 1680 ggcgagetga agccccttca cacgcactgg cacatgccga ggcagcgcgc cgcggtccac 1740 gggagcagct cggcgcagag cctggccgag gagtccacgg tatga 1785 Information for SEQ ID N0: 28 Length: 594 Type: PRT
Organism. Streptomyces ghanaensis Sequence: 28 Met Val Asn Val Ser Glu Ala Arg Ser Val Pro Glu Leu Leu Arg His His Ala Ser Ser Ala Pro Asp Arg Glu Ala Leu Arg Tyr Leu Arg Asp Thr Thr Gly Thr Asp Gly Thr Pro Leu Thr Tyr Arg Glu Val Asp Arg Ala Ala Ala Ala Val Ala Arg Arg Leu Ser Arg Ser Phe Glu Ala Gly Asp Arg Leu Leu Leu Leu His Ser Phe Gly Pro Asp Phe Ile Val Gly Phe Leu Ala Cys Leu Tyr Ala Gly Met Val Ala Val Pro Ala Pro Leu Pro Gly Arg Tyr Arg His Glu Arg Arg Arg Val Leu Ser Ile Ala His Asp Ser Gly Ala Val Ala Val Leu Thr Asp Asp Ala Sei: Ser Ala Glu Val Gly Glu Trp Met Arg Glu Glu Gly Leu Asp Gly Leu Pro Leu Ile Ala Thr Asp Trp Ser Ala Glu Glu Pro Gly Ala Phe Thr Pro Ala Ala Asp Leu Gly Arg Glu Thr Leu Ala Met Leu Gln Tyr Thr Ser Gly Ser Thr Gly Glu Pro Lys Gly Val Met Val Thr His Gly Asn Leu Leu Arg Asn Val Thr Ala Leu Ser Arg Ala Phe Gly Leu Asp Glu His Thr His Phe Gly Gly Trp Ile Pro His Phe His Asp Met Gly Leu. Ile Gly Leu Leu Leu Pro Ser Leu Phe Leu Arg Ser Arg Cys Val Leu Met Ser Pro Ser Ala Phe Ile Arg Arg Pro His Thr Trp Leu Lys Met Ile Asp Asp Phe Asp Val Ala Trp Ser Ala Ala Pro Asp Phe Ala Tyr Glu Leu Cys Cys Arg Arg Val Thr Asp Glu Gln Leu Gly Ser Leu Asp Leu Ser Arg Trp Arg Tyr Ala Gly Asn Gly Ser Glu Pro Ile His Ala Gly Thr Ile Thr Ala Phe Ala Glu Arg Phe Ala Ala Ala Gly Phe Arg Ala Glu Ser Leu Ser Pro Cys Tyr Gly Leu Ala Glu Ser Thr Val Tyr Val Ser Gly Gly Pro Ser Ala Arg Ile Thr Ala Val Asp Ala Gln Ser Leu Glu Asp His Arg Leu Gly Glu Ala Va7. Pro Gly Arg Pro His Arg Ser Leu Val Ser Cys Gly Ala Pro Ala Asp Val Asp Leu Arg Ile Val Asp Pro Arg G1y Gly Ser Val Ala Val Gly Tyr Trp Asp Asn Pro Ala Ala Ser Ala Gl.u Thr Phe Gly Ala Val Ile Asp Gly Val Glu Gly Arg Tyr Leu Arg Thr Gly Asp Leu Gly Ala Leu Tyr Asp Gly Glu Leu Tyr Val Thr Gly Arg Ile Lys Glu Met Ile Thr Val His Giy Arg Asn Val Tyr Pro Gln Asp Val Glu Gln Glu Leu Arc_~ Ala Ala His Gln Glu Lea Ala Gly Cys Val Gly Ala Val Phe Ala Leu Ser Asp Thr Gly Pro As;g Pro Val heu Val Val Ser His Gl.u Val Arch. Ala. Gly Leu Gly Ala Asp Val Leu Glu Ala Leu Ala Arg Asp Met Lys Gln Thr Val Ala Arg Glu Met Gly Met Thr Ala Ser Cys Val Val Leu Leu Arg Arg Gly Thr Val Arg Arg Thr Thr Ser Gly Lys Ile Gln Arg Asp Ala Met Arg Lys Leu Phe Arg Asp Gly Glu Leu hys Pro Leu His Thr Fii~ Trp His Met Pro Arg Gln Arg Ala Ala Val His Gly Ser Ser Ser Ala Gln Ser Leu Ala Glu Glu Ser Thr Val Information for SEQ
ID N0:

Length:

Type:
DNA

Organism:Streptomyces refuineus Sequence:29 gtggctcacgtgagcggacccccagcagacccgccggccggctcccacctggtggccgcg60 atccgcgcgacggccgaggccgaccccgagcgcaaggccgtcggcttcgtccgggatccg120 gaacgcgaaggtgaggaggcgctgcggagctactcctggctcgacgacagggcccgccgc180 atcgccgtcctcctccgcggggcgcggctcggcgcgggctcgcgcgtcctgctgctcttc240 ccgcagtccgcggagttcgcggcggcctacgccggatgcctctacggggggatggtcgcc300 gtccccgcgcccctgcccacgggaacctccctggagaccgcacgcgtcgccggcatcgcc360 cgggacgccggggcgggcgccgtcctcaccgtctccgacaccgaggcggaggtccggcgg420 tgggcggccgagaccggtctgggcgacctgcccctgttctccgtcgacgaactgcccgac480 gacaccgacccgggggagtggcgggagccggagatccgggccggcaccgtggcggtgctg540 cagtacacctccggctccaccggcagccccaagggggtcgtcgtcacccacggcgcgctc600 gccgacaacgtccgcagcctcctgt.ccgggttcgacctgggaaccggcgcccggctgggc660 ggctggctgccgatgtaccacgacatggggctgttcgggctgctgagcccggcgctgttc720 agcggcggcgccgccgtgctgatgagcggcagcgccttcctgcgcaggccgcacctgtgg780 ccgacgctgatcgaccgcttcggcgtggtcttctccgcggcgcccgacttcgcctacgac840 tactgcgtacggcgggtggagcccgagcaggtggaccggctcgacctctcgcgctggcgc900 tgggcggccaacggctcggagcccatccgggccgagacgctccgcgccttcaccaaggag960 ttcgcccccgcggggctgccccacgacgcgatgaccccctgctacggactggccgaggcg1020 accctgctggtctccctgtcggcgggcgagctgcgcacccggcgggtggacgccgcggca1080 ctggagaaccaccgcttcgtcgaggcggccgcgggccgcccgtcccgcgaggtcgtctcg1140 tgcggccggcccccggccctggaggtccgcgtggccgaccccgcgac agagcccgtc1200 cgg acgggcgatgcggtgggcgagatccaggtgcggggcgcgagcgtggccggcggctactgg1260 cggaaaccggaggcgaccgccgaga.cgttcgtcacggccgcggacggctccgggccctgg1320 ctgcgcaccggcgacctcggcgccctgtacgagggcgagctgtacgi~caccggccgcatc1380 aaggaactcctcatcgtgcacggccgcaacatctacccgcacgacgtcgagcgcgaactg1440 cgcgcccaccacgacgagctcggcgcgatcggcgccgtcttctccgtcccCaCggaggag1500 ggcgaggccgtcgtggtcacgcacgaggtggtcccgtccgtccgggacgaccggggcccc1560 gcgctggtgacggcggtacgggcgacgctcgcccgggagttcggcctggcaccggccggg1620 gtggtgctggtgcgccgcggccgcaccccgcgcaccagcagcggcaaggtgcagcgccgc1680 ctggccgcccggctcttccgcaccggggaactcgcccaggtccacgccgaccccggtgcc1740 caccggctcgtggcggcgctccgcgaggcggacggcctgcgcgacc~cccccgcgtccacg1800 acatga 1806 Information for SEQ ID N0: 30 Length: 601 Type: PRT
Organism: Streptomyces refuineus Sequence: 30 Val Ala His Val Ser Gly Pro Pro Ala Asp Pro Pro Al.a Gly Ser His Leu Val Ala Ala Ile Arg Ala Thr Ala Glu Ala Asp Pro Glu Arg Lys Ala Val Gly Phe Val Arg Asp Pro Glu Arg Glu Gly Glu Glu Ala Leu Arg Ser Tyr Ser Trp Leu Asp Asp Arg Ala Arg Arg Ile Ala Val Leu Leu Arg Gly Ala Arg Leu Gly Ala Gly Ser Arg Val Leu Leu Leu Phe Pro Gln Ser Ala Glu Phe Ala Ala Ala Tyr Ala Gly Cys Leu Tyr Gly Gly Met Val Ala Val Pro Ala Pro Leu Pro Thr Gly Thr Ser Leu Glu Thr Ala Arg Val Ala Gly Ile Ala Arg Asp Ala Gly Ala Gly Ala Val Leu Thr Val Ser Asp Thr Glu Ala Glu Val Arg Arg Trp Ala Ala Glu Thr Gly Leu Gly Asp Leu Pro Leu Phe Ser Val Asp G1u Leu Pro Asp Asp Thr Asp Pro Gly Glu Trp Arg Glu Pro Glu Ile Arg Ala Gly Thr val Ala Val Leu Gln Tyr Thr Ser Gly Ser Thr Gly Se:r Pro Lys Gly Val Val Val Thr His Gly Ala Leu Ala Asp Asn Val Arg Ser Leu Leu Ser Gly Phe Asp Leu Gly Thr Gly Ala Arg Leu Gly Gl.y Trp Leu Pro Met Tyr His Asp Met Gly Leu Phe Gly Leu Leu Ser Pro Ala Leu Phe Ser Gly Gly Ala Ala Val Leu Met Ser Gly Ser Ala Phe Leu Arg Arg Pro His Leu Trp Pro Thr Leu Ile Asp Arg Phe Gly Val Val Phe Ser.

Ala Ala Pro Asp Phe Ala Tyr Asp Tyr Cys Val Arg Arg Val Glu Pro Glu Gln Val Asp Arg Leu Asp Leu Ser Arg Trp Arg Trp Ala Ala Asn Gly Ser Glu Pro Ile Arg Ala Glu Thr Leu Arg Ala Phe Thr Lys Glu Phe Ala Pro Ala Gly Leu Pro His Asp Ala Met Thr Pro Cys Tyr Gly Leu Ala Glu Ala Thr Leu Leu Val Ser Leu Ser Ala Gly Glu Leu Arg Thr Arg Arg Val Asp Ala A1<~ Ala Leu Glu Asn His Arg Phe Val Glu Ala Ala Ala Gly Arg Pro Ser Arg Glu Val Val Ser Cys Gly Arg Pro Pro Ala Leu Glu Val Arg Val Ala Asp Pro Ala Thr Gly Glu Pro Val Thr Gly Asp Ala Val Gly Glu Ile Gln Val Arg Gly Ala Ser Val Ala Gly Gly Tyr Trp Arg Lys Pro Glu Ala Thr Ala Glu Thr Phe Val Thr Ala Ala Asp Gly Ser Gly Pro Trp Leu Arg Thr Gly Asp Leu Gly Ala Leu Tyr Glu Gly Glu Leu Tyr Val Thr Gly Arg Ile Lys Glu Leu Leu Ile Val His Gly Arg Asn Ile Tyr Pro His Asp Val Glu Arg Glu Leu Arg Ala His His Asp Glu Leu Gly Ala Ile Gly Ala Val Phe Ser Val Pro Thr Glu Glu Gly Glu Ala. Val Val Val Thr His Glu Val Val Pro Ser Val Arg Asp Asp Arg Gly Pro Ala Leu Val Thr Ala Val Arg Ala Thr Leu Ala Arg Glu Phe Gly Leu Ala Pro Ala Gly Val Val Leu Val Arg Arg Gly Arg Thr Pro Arg Thr Ser Ser Gly Lys Val Gln Arg Arg Leu Ala Ala Arg Leu Phe Arg Thr Gly Glu Leu Ala Gln Val His Ala Asp Pro Gly Ala His Arg Leu Val Ala Ala Leu Arg Gl.u Ala Asp Gly Leu Arg Asp Ala Pro Ala Ser Thr Thr 2nformation for SEQ ID N0: 31 Length: 1743 Type: DNA
Organism: Streptomyces aizunensis Sequence:31 atgaccctcgaacccagcgtgctccatctgctgcgccggcacgccgtcgaccgggcagag60 cggaccgccgtcaccttcgtccacgacttcgacgcggccgacggctcgcggagcctgaac120 tacgccgaactcgacgcggaggcacgtcgcgtcgcgtcctggctccaggagcgctgtgcg180 cccggagaccgggtgctgctgctgcacccggccggtctgcccttcgtcaccgcgttcctc29:0 gcctgcctctacgcgggtgtcatcgcggtgccgtctccgatgccgggccagttccagtac300 cagcagcgccgcgtgacgacgatcgcccgcgatgccggtgtcagcgtggcgctcaccgac360 acgggccagctgcccgaggcgcagcagtggatggccgacacccgcctcgaactgccggtc420 gccgcgagcgacgcccccggcttcggtgacgcgtcgcgctggcgcgaccccggcgccacc480 gcccaggacgtggtgctgctgcagtacacctccggctcgaccggtgaccccaagggcgtc540 atggtcacgcacgccaacctgctgcacaacgccgacagcctgagccgttccctcggcttc600 accgaggacaccaacttcggcggctggatcccgctctaccacgacatgggcctgatgggg660 cagctgctgccgggtctcttcctgggcagcagcgtcgcgctgatgt~~gccgatggcgttc720 ctcaagcgcccgcaccactggctcgcgctgatcgaccgctacgacatcggcttctccgcc780 gcgcccaacttcgcgtacgagctgtgcctgcgccgggtcaccgacgcgcagatcgccgca840 ctcgacctgt cgcgctggca gttcgccgcc aacggctccg agccgatcca ggccagcacc 900 ctgcgggagttcgcggagcgcttcggcccggccggcttccgggccg<~gcagctcgccccg960 tgctacggcatggccgaggcgacggtcttcatctccggccgctcga<:ccggccgccgcgg1020 atccgcgccgtcgacccgcaggcgctggagaagcacgtcgtccagg<~ccccgagccgggc1080 ggcctcgtgcgcgaactcgtcggctgcggcgacgtacccgacctcgacgtgcgcatcgtc1140 gaagcgggcacgcgcacggtgctgacggacggcacgaccggcgagatctggctgcgcggg1200 ccgagcgtcgcggccggctactggaaccggccggaggtgaccgaggagatcttccgcgcc1260 cacaccgccgacggcgacgggccctacatgcgcaccggcgacctcggagtgctgctcgac1320 ggcgaaatctacgtcacgggccgcaccaaggacctgctgatcgtcaacggccgcaacctc1380 tacccgcacgacctcgaacacgaactgcggctctcccacgccccgttggcgaccctcgcc1440 ggtacggcgttcaccgtccccgccccgcaggaagaggtcgtggtcgtgcacgaggtgcgc1500 ggccgcttcagccaggaagagctgcgcgagctggccatcggcatgcgcgccaccgtgcac1560 cgcgagttcggcgtgcacaccgcgggcatcgtgctgatgcggcccggcacggtccgcaag16'?.0 accaccagcggcaaggtgcagcgcgccgagatgcgcggcctgttcctcgcgggcgccctc1680 gccccgctgtacgaggagatggcgcccggagtccaggcggcgatggccggagccgccggg1740 tga 1743 Information for SEQ ID NO: 32 Length: 580 Type: PRT
Organism: Streptomyces ai~unensis Sequence: 32 Met Thr Leu Glu Pro Ser Va:L Leu His Leu Leu Arg Arg His Ala Val Asp Arg Ala Glu Arg Thr Ala Val Thr Phe Val His Asp Phe Asp Ala Ala Asp Gly Ser Arg Ser Leu Asn Tyr Ala Glu Leu As;p Ala Glu Ala Arg Arg Val Ala Ser Trp Leu Gln Glu Arg Cys Ala Pro Gly Asp Arg Val Leu Leu Leu His Pro Ala Gly Leu Pro Phe Val Th:r Ala Phe Leu Ala Cys Leu Tyr Ala Gly Val. Ile Ala Val Pro Ser Pro Met Pro Gly Gln Phe Gln Tyr Gln Gln Arg Arg Va1 Thr Thr Ile Ala Arg Asp Ala Gly Val Ser Val Ala Leu Thr Asp Thr Gly Gln Leu Pro Glu Ala Gln Gln Trp Met Ala Asp Thr Arg Leu Glu Leu Pro Val Ala Ala Ser Asp Ala Pro Gly Phe Gly Asp Ala Ser Arg Trp Arg Asp Pro Gly Ala Thr Ala Gln Asp Val Val Leu Leu Gln Tyr Thr Ser Gly Ser Thr Gly Asp Pro Lys Gly Val Met Val Thr His Ala Asn Leu Leu His Asn Ala Asp Ser Leu Ser Arg Ser Leu Gly Phe Thr Glu Asp Thr Asn Phe Gly Gly Trp Ile Pro Leu Tyr His Asp Met Gly Leu Met Gly Gl.n Leu Leu Pro Gly Leu Phe Leu Gly Ser Ser Val Ala Leu Met Ser Pro Met Ala Phe Leu Lys Arg Pro His His Trp Leu Ala Leu Ile Asp Arg Tyr Asp Ile Gly Phe Ser Ala Ala Pro Asn Phe Ala Tyr Glu Leu Cys Leu Arg Arg Val Thr Asp Ala Gln Ile Ala Ala Leu Asp Leu Ser Arg Trp Gln Phe Ala Ala Asn Gly Ser Glu Pro Ile Gln Ala Ser Thr Leu Arg Glu Phe Ala Glu Arg Phe Gly Pro Ala Gly Phe Arg Ala Glu Gln Leu Ala Pro Cys Tyr Gly Met Ala Glu Ala Thr Val Phe Ile Ser Gly Arg Ser Thr Arg Pro Pro Arg Ile Arg Ala Val Asp Pro Gln Ala Leu Glu Lys His Val Val Gln Asp Pro Glu Pro Gly Gly Leu Val Arg Glu Leu Val Gly Cys Gly Asp Val Pro Asp Leu Asp Val Arg Ile Val Glu Ala Gly Thr Arg Thr Val Leu Thr Asp Gly Thr Thr Gly Glu Ile Tr;p Leu Arg Gly Pro Ser Val Ala Ala Gly Tyr Trp Asn Arg Pro Glu Va:1 Thr Glu Glu Ile Phe Arg Ala His Thr Ala Asp Gly Asp Gly Pro Tyr Met Arg Thr Gly Asp Leu Gly Val Leu Leu Asp Gly Glu Ile Tyr Val Thr Gly Arg Thr Lys Asp Leu Leu Ile Va1 Asn Gly Arg Asn Leu Tyr Pro His Asp Leu Glu His Glu Leu Arg Leu Ser His Ala Pro Leu Ala Thr Leu Ala Gly Thr Ala Phe Thr Val Pro Ala Pro Gln Glu Glu Val Val Val Val His Glu Val Arg Gly Arg Phe Ser Gln Glu Glu Leu Arg Glu Leu Ala Ile Gly Met Arg Ala Thr Val His Arg Glu Phe Gly Val His Thr Ala Gly Ile Val Leu Met Arg Pro Gly Thr Val Arg Lys Thr Thr Ser Gly Lys Val Gln Arg Ala Glu Met Arg Gly Leu Phe Leu Ala Gly Ala Leu Ala Pro Leu Tyr Glu Glu Met Ala Pro Gly Val Gln Ala Ala Met Ala Gly Ala Ala Gly Information for SEQ ID N0: 33 Length. 1767 Type: DNA
Organism: Actinomycete Sequence:33 atggtggagctgggctcggccgaaagcattccggcggtgctgcgccggcacgcggagaat60 acgcccgaccgcgccgcccacgcct~ttgtcaccgacctcgacgaggccggcggcgtcgcc120 tggctcagccacgccgagctggacc:gccgggcccgggccgtggccgcgcagctgtccgcg180 cacgccgctcccggcgaccggatgctgctgctgcacccggccggcccggacttcctgatc240 gcgctgctcggctgcctgcacgccggtctgatcgcggtgccgtcgccgctgcccggccgc300 tacgcccatcagcggcgccgggtccggctgatcgcggccgacgccgacgtgacggccgtg360 ctgaccgaccgggccacccgcgcggaggtcgtcgagtgggccgccgagcagggactgccc420 gacatcgcggtgctgacccccgacccggaggccgacccgggtgactggcagccgccgccg480 ctgagccgggacacggtcgccgtcctgcagtacacctccggctccaccggcaaccccaag540 ggcgtggtcatcgaccacggcaacatcctgagcaacgccgccacgatcatcgcggtgacc600 gggatccggcccggcaccgtgatcggcgggtggctgccgcacttccacgacatgggcctg660 atgggcctgctgctgccgccgctgctggccggggcgacgacggtgctgagcagccccgtc720 tcgttcctgaagcgcccgctgagctggctgcggatgatcgaccggtacggcgtcgagatc780 accgccgcccccgacttcgcctacgacctgtgcgtcgccaaggtca<:cgacgccgagctg840 gccacgctcgacctgtcccgctggcgggtcgccatcaacggctccgagccggtccgggcc900 gccgtgctcacccggttccggcagcgcttcgccgccgccgggctgcc~gcccgaggtgctg960 accccgagcttcggcatggccgaggcgacgctgttcgtctccggcgacccggccaccccg1020 ttcgtcgtccgccgcgtcgacaccgaccggctggcccggcaccggtt:cgagccggccccg1080 gacggcgggccgggccgcgacgtggtggcctgcggcgcgccggccggcgtcgaggtgcgc1140 atcgtcgaccccggcagcggcgacccgttgccggacggcgCCgtCgC(Cgdgatctggctg1200 cgcggcccgtcgatcggccgcggctactgggggcgcgccgcgaacaccgcgggcttcggc1250 gcggtcaccagcatcggcgacgccgggtacctgcggaccggcgacctgggcaccctgtac1320 gagggccagctctacgtcaccggccgccgcaaggacatgctggtgctgcgcggccgcaac1380 tactacccgcaggacatcgagcacgagctgcgggcgcaccaccccgagctggccgggcgc1440 gtcggcgcctgcttcgccgtgcgatcgcgcgacggggcgggcggcc~gcgaggaggtcctc1500 gtggtcacccacgaggtgcgcgggatctccgatccggaccggctgc:gtacccttgccggg1560 gccatgcggctcacggtggcccgggagttcggcgtgccgagcgctgcggtgctgctgctg1620 cgccccggcgcggtggcccgtaccaccagcggcaagatccagcgctcggcgatgcgggag1680 ctgttcgagaccggcgcgctggagccggtcggcggcgaggtggacgaccggctggtcgcc1740 accgcggcgctgggcgcggcccgatga 1767 Information for SEQ ID NO: 34 Length: 588 Type: PRT
Organism: Actinomycete Sequence: 34 Met Val Glu Leu Gly Ser Ala Glu Ser I1e Pro Ala Val Leu Arg Arg His Ala Glu Asn Thr Pro Asp Arg Ala Ala His Ala Phe Val Thr Asp Leu Asp Glu Ala Gly Gly Val Ala Trp Leu Ser His Ala Glu Leu Asp Arg Arg Ala Arg Ala Val Ala Ala Gln Leu Ser Ala His Ala Ala Pro Gly Asp Arg Met Leu Leu Leu His Pro Ala Gly Pro Asp Phe Leu Ile Ala Leu Leu Gly Cys Leu His Ala Gly Leu Ile Ala Val Pro Ser Pro Leu Pro Gly Arg Tyr Ala His Gln Arg Arg Arg Val Arg Leu Ile Ala Ala Asp Ala Asp Val Thr A1a Val Leu Thr Asp Arg Ala Thr Arg Ala Glu Val Val Glu Trp Ala Ala Glu Gln Gly Leu Pro Asp Ile Ala Val Leu Thr Pro Asp Pro Glu Ala Asp Pro Gly Asp Trp Gln Pro Pro Pro Leu Ser Arg Asp Thr Val Ala Val Leu Gln Tyr Thr Ser Gly Ser Thr Gly Asn Pro Lys Gly Val Val Ile Asp His Gly ASn Ile Leu Ser Asn Ala Ala Thr Ile Ile Ala Val Thr Gly Ile Arg Pro Gly Thr Val Ile Gly Gly Trp Leu Pro His Phe His Asp Met Gly Leu Met Gly Leu Leu Leu Pro Pro Leu Leu Ala Gly Ala Thr Thr Val Leu Ser Ser Pro Val Ser Phe Leu Lys Arg Pro Leu Ser Trp Leu Arg Met Ile Asp Arg Tyr Gly Val Glu Ile Thr Ala Ala Pro Asp Phe Ala Tyr Asp Leu Cys Val Ala Lys Val Thr Asp Ala Glu Leu Ala Thr Leu Asp Leu Ser Arg Trp Arg Val Ala Ile Asn Gly Ser Glu Pro Val Arg Ala Ala Val Leu Thr Arg Phe Arg Gln Arg Phe Ala Ala Ala Gly Leu Arg Pro Glu Val Leu Thr Pro Ser Phe Gly Met Ala Glu Ala Thr Leu Phe Val Ser Gly Asp Pro Ala Thr Pro Phe Val Va=~ Arg Arg Val Asp Thr Asp Arg Leu Ala Arg His Arg Phe Glu Pro Ala Pro Asp Gly Gly Pro Gly Arg Asp Val Val Ala Cys Gly Ala Pro Ala Gly Val Glu Val Arg Ile Val Asp Pro Gly Ser Gly Asp Pro Leu Pro Asp Gly Ala Val Gly Glu Ile Trp Leu Arg Gly Pro Ser Ile Gly Arg Gly Tyr Trp Gly Arg Ala Ala Asn Thr Ala Gly Phe Gly Ala Val Thr Ser Ile Gly Asp Ala Gly Tyr Leu Arg Thr Gly Asp Leu Gly Thr Leu Tyr Glu Gly Gln Leu Ty_: Val Thr Gly Arg Arg Lys Asp Met Leu Vai Leu Arg Gly Arg Asn Tyr Tyr Pro Gln Asp Ile Glu His Glu Leu Arg Ala His His Pro Glu Leu Ala Gly Arg Val Gly Ala Cys Phe Ala Val Arg Ser Arg Asp Gly Ala Gly Gly Gly Glu Glu Val Leu Val Val Thr His Glu Val Arg Gly Ile Ser Asp Pro Asp Arg Leu Arg Thr Leu Ala Gly Ala Met Arg Leu Thr Val Ala Arg Glu Phe Gly Val Pro Ser Ala Ala Val Leu Leu Leu Ar-g Pro Gly Ala Val Ala Arg Thr Thr Ser Gly Lys Ile Gln Arg Ser A7.a Met Arg Glu Leu Phe Glu Thr Gly Ala Leu Glu Pro Val Gly Gly Gi.u Val Asp Asp Arg Leu Val Ala Thr Ala Ala Leu Gly Ala A1a Arg Information for SEQ ID NO: 35 Length: 2169 Type: DNA
Organism: Streptomyces fradiae Sequence:35 ttgaccgtccgggcggagcaccggaaagcgtcgaccctgccgccggggaacccggccgtc60 agcagcggcgactccgcgtcccgccgggagaagagggccgccgctgggagcagttcttcg120 gcggacccgctggccggcccccacctggtggccgcgatctccgcgacggccgaggccgac180 ccggggcgcaaggccgtcggtctcgtccgggatccggagcgcgagggcgaggaggcgctg240 cggagctacgcctggctcgacgacaccgcccgccgcatcgccgtcctcctgcgtgcggcc300 gggctggaaacgggcgcacgcgtgctgctgctcttcccgcagtccgcggagttcgcggcg360 gcctacgccgggtgtctctacgcgggcatggttgccgtccccgcgccccttccgaccggc420 acctcccatgaggccgcacgcgtcgtcggcatcgcgaaggactccgaggcaggcgccgtc480 ctcaccgtctccgaaaccgaggcggacgtccggcaatgggcggcccgcaccggcctgggc540 gcgctgcccctccactgcgtcgacgaactgcccggcgacgccgacc~~cgacacgtggcgg600 gaaccggagatccgggccgacaccgtggcggtcctccagtacacctccggctccaccggc660 agccccaagggggtcgtcgtcacccacggcgcgctcgccgacaacgtgcgcagcctgctc720 acgggcttcgatctgggatccggcgcccggctgggcggctggctgccgatgtaccacgac780 atggggctgttcggcctgctgagcccggcactgttcagcggcggagccgccgtgctgatg840 agcggcagcgccttcctgcgccgcccgtcccagtggctgaggctgatcgaccgcttcggc900 ctcgtcttctcggcggcgcccgacttcgcctacgactactgcgtacc~gcgggtgagaccc960 gaggagacggacgggctcgacctgtcgcgctggcgctgggcggccaacggctccgagccc1020 atccgcgccgagacgctgcgcgccttcgccaaggagttcgccccggccggactccacccg1080 aacgccaccaccccttgctacggactggccgaggcgaccctgctggtgtccctgcccacg1140 ggtgagctgcgcacccgacgggtggacgtcgcggaactggagaac<:accgcttcgtcgaa1200 gcggccgtgggacgcccctcccgcgagatcgtgtcctgcggccgg<:ccccgtccctggag1260 atccgcgtcgtcgaccccgcgaccggcaagtccgtcacgggcggcgacggagccggcgag1320 accagggtgggcgagatcagagtgcgcggcgcgagcgtcgccaggc~gctactggcagaaa1380 ccggaggcgaccgccgagacgttcgtcatggacgcggacggctccgggccctggctgcgc1440 accggcgacctcggcgctctgtacgagggcgagctgtacgtcaccc~gccgtatcaaggaa1500 ctcctcatcgtgcacggccgcaacatctacccccatgacatcgagcacgaactgcgcgcc1560 cgccacgccgaactcggcgctgtcggggccgccttctccctcagca.ccgaatcgggcgag1620 gttgtggtcgtcacccatgaggtgaaccccaccgtccggcccgagcagggtcccgagctg1680 gtgaccgccctgcgtgcgacgctcgcgcgggagttcggcctcgccccggccggggtggtg1740 ctggtgcgccgcggccgcatcccgcgcaccagcagcggcaaggtgcaacgccgcctgacc1800 gcccggctgttcagcacgggggaactcgcccaggtccatgccgaccccggcgcccaccgc1860 ctcctggcggaactcagggaggcgcacgaccgcggcggcgccttcccgcccccctccccg1920 cccgccagccaggaccccgaggccctgcggcagcggctgcgcgagctgtgcgccgactgt1980 ctcggcgtccccgtggactccctcgccacggacgcccccctcaccgactacgggatgacc2040 tccgtcaccggcaccgccctgtgcgggatggtggaggagtacctggacgtcgaatgcgac2100 ctggaactgctctggcaggagccgacgatcgacgggctcgcctcccggctggcctcgcgc2160 accgtgcgc 2169 Information for SEQ ID NO: 36 Length: 723 Type: PRT
Organism: Streptomyces fradiae Sequence: 36 Leu Thr Val Arg Ala Glu His Arg Lys Ala Ser Thr Leu Pro Pro Gly Asn Pro Ala Val Ser Ser Gly Asp Ser Ala Ser Arg Arg Glu Lys Arg 20 25 ' 30 Ala Ala Ala Gly Ser Ser Ser Ser Ala Asp Pro Leu Ala Gly Pro His Leu Val Ala Ala Ile Ser Ala Thr Ala Glu Ala Asp Pro Gly Arg Lys Ala Val Gly Leu Val Arg Asp Pro Glu Arg Glu Gly Glu Glu Ala Leu Arg Ser Tyr Ala Trp Leu Asp Asp Thr Ala Arg Arg Ile Ala Val Leu 47 ' Leu Arg Ala Ala Gly Leu Glu Thr Gly Ala Arg Val Leu Leu Leu Phe Pro Gln Ser Ala Glu Phe Ala Ala Ala Tyr Ala Gly Cys Leu Tyr Ala Gly Met Val Ala Val Pro Ala Pro Leu Pro Thr Gly Thr Ser His Glu Ala Ala Arg Val Val Gly Ile Ala Lys Asp Ser G1u Ala Gly Ala Val Leu Thr Val Ser Glu Thr Glu Ala Asp Val Arg Gln Trp Ala Ala Arg Thr Gly Leu Gly Ala Leu Pro Leu His Cys Val Asp Glu Leu Pro Gly Asp Ala Asp Pro Asp Thr Trp Arg Glu Pro Glu Ile Arg Ala Asp Thr Val Ala Val Leu Gln Tyr Thr_ Ser Gly Ser Thr Gly Ser Pro Lys Gly Val Val Val Thr His Gly Ala Leu Ala Asp Asn Val Arg Ser Leu Leu Thr Gly Phe Asp Leu Gly Ser Gly Ala Arg Leu Gly Gly Trp Leu Pro Met Tyr His Asp Met Gly Leu Phe Gly Leu Leu Ser Pro Ala Leu Phe Ser Gly Gly Ala Ala Val Leu Met Ser Gly Ser Ala Ph~= Leu Arg Arg Pro Ser Gln Trp Leu Arg Leu Ile Asp Arg Phe Gly Leu Val Phe Ser Ala Ala Pro Asp Phe Ala Tyr Asp Tyr Cys Val Arg Arg Val Arg Pro Glu Glu Thr Asp Gly Leu Asp Leu Ser Arg Trp Arg Trp Ala Ala Asn Gly Ser Glu Pro Ile Arg Ala Glu Thr Leu Arg Ala Phe Ala Lys Glu Phe Ala Pro Ala Gly Leu His Pro Asn Ala Thr Thr Pro Cys Tyr Gly Leu Ala Glu Ala Thr Leu Leu Val Ser Leu Pro Thr Gly Glu Leu Arg Thr Arg Arg Val Asp Val Ala Glu Leu Glu Asn His Arg Phe Val Glu A1a Ala Val Gly Arg Pro Ser Arg Glu Ile Val Ser Cys Gly Arg Pro Pro Ser Leu Glu Ile Arg Val Val Asp Pro Ala Thr Gly Lys Ser Val Thr Gly Gly Asp Gly Ala Gly Glu Thr Arg Val Gly Gl.u Ile Arg Val Arg Gly Ala Ser Val Ala Arg Gly Tyr Trp Glri Lys Pro Glu Ala Thr Ala Glu Thr Phe Val Met Asp Ala Asp Gly Ser Gly Pro Trp Leu Arg Thr Gly Asp Leu Gly Ala Leu Tyr Glu Gly Glu Leu Tyr Val Thr Gly Arg Ile Lys Glu Leu Leu Ile Val His Gly Arg Asn Ile Tyr Pro His Asp Ile Glu His Glu Leu Arg Ala Arg His Ala Glu Leu Gly Ala Val Gly Ala Ala Phe Ser Leu Ser Thr Glu Ser Gly Glu Val Val Val Val Thr His Glu Val Asn Pro Thr Val Arg Pro Glu Gln Gly Pro Glu Leu Val Thr Ala Leu Arg Ala Thr Leu Ala Arg Glu Phe Gly Leu Ala Pro Ala Gly Val Val Leu Val Arg Arg Gly Arg Ile Pro Arg Thr Ser Ser Gly Lys Val Gln Arg Arg Leu Thr Ala Arg Leu Phe Ser Thr Gly Glu Leu Ala Gln Val His Ala Asp Pro Gly Ala His Arg Leu Leu Ala Glu Leu Arg Glu Ala His Asp Arg Gly Gly Ala Phe Pro Pro Pro Ser Pro Pro Ala Ser Gln Asp Pro Glu Ala Leu Arg Gln Arg Leu Arg Glu Leu Cys Ala Asp Cys Leu Gly Val Pro Val Asp Ser Leu Ala Thr Asp Ala Pro Leu Thr Asp Tyr Gly Met Thr Ser Val Thr Gly Thr Ala Leu Cys Gly Met Val Glu Glu Tyr Leu Asp Val Glu Cys Asp Leu Glu Leu Leu Trp Gln Glu Pro Thr Ile Asp Gly Leu A1a Ser Arg Leu Ala Ser Arg Thr Val Arg Information for SEQ ID NO: 37 Length: 273 Type: DNA

Organism: Actinoplanes sp.
Sequence:37 atgtccgagaccgacctgtccgccgcccggcacacgcccgagcagatccgctcctggctg60 atcgaccggatcgcctactacgtgatgctgccgacccaggagatCgagccggacgtgtcc120 ctggccgagtacggcctggactcggtgtacgcgttcgcgctctgcggcgagatcgaggac180 acgctcggcatcccgatcgagccgaccctgctgtgggacgtcgacaccgtcgccaccctc240 accgcccacctcgccgaccgcgtcaaccgataa 273 Information for SEQ ID
NO: 38 Length: 90 Type: PRT

Organism: Atinoplanes sp.

Sequence: 38 Met Ser Glu Thr Asp Leu Ala Arg Thr Pro Glu Gln Ser Ala His Ile Arg Ser Trp Leu Ile Asp Ala Tyr Val Met Leu Pro Arg Ile Tyr Thr Gln Glu Ile Glu Pro Asp Leu Ala Tyr Gl;y Leu Asp Val Ser Glu Ser Val Tyr Ala Phe Ala Leu Glu Ile Asp Th:r Leu Gly Cys Gly Glu Ile Pro Ile Glu Pro Thr Leu Asp Val Thr Va:L Ala Thr Leu Trp Asp Leu Thr Ala His Leu Ala Asp Asn Arg Arg~ Val Information for SEQ ID
NO: 39 Length: 270 Type: DNA

Organism: Streptomyces rus roseospo Sequence: 39 atgaacccgc ccgaagcggt agcgaggtcaccgcgtcLgat caccggacag60 cagcacgccc atcgccgagt tcgtgaacga cggatcgccggtgacgcacc cctgaccgac120 gacacccgac catggcctcg actccgtctc ctctgcgcgcaggtcgagga ccgctacggg180 cggagttgcc atcgaggtcg acccggagct gtccccacactcaacgagtt cgtccaggca240 gctgtggagc ctgatgcccc agttggccga 270 ccgcacctga Information for SEQ ID
NO: 40 Length: 89 Type: PRT

Organism: Streptomyces rus roseospo Sequence: 40 Met Asn Pro Pro Glu Ala Thr Pro Glu Val Thr Ala Va1 Ser Ser Trp Ile Thr Gly Gln Ile Ala Val Asn Thr Pro Asp Arg Glu Phe Glu Ile Ala Gly Asp Ala Pro Leu His Gly Asp Ser Val Ser Thr Asp Leu Gly Val Ala Leu Cys Ala Gln Asp Arg Gly Ile Glu Val Val Glu Tyr Asp Pro Glu Leu Leu Trp Ser Thr Leu Glu Ph~~ Val Gln Val. Pro Asn Ala Leu Met Pro Gln Leu Ala Thr Asp Arg Information for SEQ ID
NO: 41 Length: 273 Type: DNA

Organism: Streptomyces is gha.naens Sequence: 41 atggaagcag cacagactcc gccgaactcggcgactggct cacccgcacg60 ccgaa.cggcc gtggccgact acgtcaggtg gagatcgacccggacgtgcc gctgtccgat120 tgatccggcg tacggcctcg actcgatctc gtgtgtgccgacatcgagga ccacttcggt180 ggcga.ccacg ctgcccgtcg aagtgacgct caccccacgataggca<~act gtcgcaggca240 gatctgggac ctggccgagg agctcgaaac tga 273 cgccgtccgc Information for SEQ ID NO: 42 Length: 90 Type: PRT
Organism: Streptomyces ghanaensis Sequence: 42 Met Glu Ala Ala Gln Thr Pro Arg Thr Ala Ala Glu Leu Gly Asp Trp Leu Thr Arg Thr Val Ala Asp Tyr Val Arg Cys Asp Pro Ala Glu Ile Asp Pro Asp Val Pro Leu Ser Asp Tyr Gly Leu Asp Ser Ile Ser Ala Thr Thr Val Cys Ala Asp Ile Glu Asp His Phe Gly Leu Pro Val Glu Val Thr Leu Ile Trp Asp His Pro Thr Ile Gly Lys Leu Ser Gln Ala Leu Ala Glu Glu Leu Glu Thr Ala Val Arg Information for SEQ ID N0: 43 Length: 300 Type: DNA
Organism: Streptomyces refuineus Sequence: 43 atgtccctgt ccccgccttc ttcgtccccg ccttcttccc cgcccccttc tccgccgcac 60 gaccccgacg ccctgcggca gtggctgcgc gagcagtgcg ccgactgcct cggcgtcccc 120 ccggcatccc tcgccaccga cgtccccctc accgactacg gcatgacctc cgtcaccggg 180 accgccctgt gcggcatggt ggaggaccac ctggacgtcg agtgcgacct gagcctgctc 240 tggcaggagc agacgatcga cggcatcacc tcccggctgg cctcgcgcac cgcgcgctga 300 Information for SEQ ID N0: 44 Length: 99 Type: PRT
Organism: Streptomyces refuineus Sequence: 44 Met Ser Leu Ser Pro Pro Ser Ser Ser Pro Pro Ser Ser Pro Pro Pro Ser Pro Pro His Asp Pro Asp Ala Leu Arg Gln Trp Leu Arg Glu Gln Cys Ala Asp Cys Leu Gly Val Pro Pro Ala Ser Leu Ala Thr Asp Val Pro Leu Thr Asp Tyr Gly Met Thr Ser Val Thr Gly Thr Ala Leu Cys Gly Met Val Glu Asp His Leu Asp Val Glu Cys Asp Leu Ser Leu Leu Trp Gln Glu Gln Thr Ile Asp Gly Ile Thr Ser Arg Leu Ala Ser Arg Thr Ala Arg Information for SEQ ID N0: 45 Length: 276 Type: DNA
Organism: Streptomyces aizunensis Sequence: 45 atgtccgacatcaccgcccccccggccacggccgatgccgcagaggt.ccgcacctggctg60 cgcgaatgcgtggccacctacgtccggcttcccgccgaggacatcgacgtgaacctgccg120 ctgtccgagtacggtctcgactccgtgtacgtgctcagcctgtgcgccgacatcgaggac180 cgctacggcatcgaggtcgagcccaccctgctctgggaccaccccgccatcggcccgatc240 gccgacgcgctgaccccgctgctcgccgctcgctag 276 Information for SEQ ID NO: 46 Length: 91 Type: PRT
Organism: Streptomyces aizunensis Sequence: 46 Met Ser Asp Ile Thr Ala Pro Pro Ala Thr Ala Asp Ala Ala Glu Val Arg Thr Trp Leu Arg Glu Cys Val Ala Thr Tyr Val Arg Leu Pro Ala Glu Asp Ile Asp Val Asn Leu Pro Leu Ser Glu Tyr Gly Leu Asp Ser Val Tyr Val Leu Ser Leu Cys Ala Asp Ile Glu Asp Arg Tyr Gly Ile Glu Val Glu Pro Thr Leu Leu Trp Asp His Pro Ala Ile Gly Pro Ile Ala Asp Ala Leu Thr Pro Leu Leu Ala Ala Arg Information for SEQ ID N0: 47 Length: 267 Type: DNA
Organism: Actinomycete Sequence: 47 atgtccgacc tcaccgccgt ccccacgccg gagagcctgc gcgcctggct cgtcgactgc 60 gtcgccgacc acctcggccg cgcgcecgcc ggcatcgcca ccgacgt:gcc gctgaccacg 120 tacggcctgg actccgtcta cgcgttgtcg atcgccgcgg agctcgagga ccacctggac 180 gtctcgctcg atccgaccct gatctgggac cacccgacga tcgacgccct cagcgcggcc 240 ctggtggccg agctgcgttc cgcctga 26'7 Information for SEQ ID NO: 48 Length: 88 Type: PRT
Organism. Actinomycete Sequence: 48 Met Ser Asp Leu Thr Ala Val Pro Thr Pro Glu Ser Leu Arg Ala Trp Leu Val Asp Cys Val Ala Asp His Leu Gly Arg Ala Pro Ala Gly Ile Ala Thr Asp Val Pro Leu Thr Thr Tyr Gly Leu Asp Ser Val Tyr Ala Leu Ser Ile Ala Ala Glu Leu Glu Asp His Leu Asp Val Ser Leu Asp Pro Thr Leu Ile Trp Asp His Pro Thr Ile Asp Ala Leu Ser Ala Ala Leu Val Ala Glu Leu Arg Ser Ala

Claims (26)

1. An isolated polynucleotide encoding an acyl-specific C-domain, wherein said isolated polynucleotide encodes a polypeptide which comprises at least 45%
sequence identity to at least one sequence selected from SEQ ID NOS: 1 and 2.
2. An isolated polynucleotide comprising a sequence selected from the group consisting of:
(a) a sequence selected from the group consisting of SEQ ID NOS: 5, 7, 9, 11, 13, 15, 17 and 19;
(b) a sequence that is complementary to (a);
(c) a sequence which hybridizes to said sequence of (a) or (b) under conditions of high stringency; and (d) a sequence which has at least 70% or higher homology to said sequence of (a), (b), or (c).
3. The isolated polynucleotide of claim 1, wherein said acyl-specific C-domain is involved in lipopeptide acyl-capping.
4. The isolated polynucleotide of claim 3, wherein said isolated polynucleotide resides in a gene locus selected from the group consisting of:
(a) the biosynthetic locus for ramoplanin from Actinoplanes sp. ATCC 33076;
(b) the biosynthetic locus for A21978C from Streptomyces roseosporus NRRL 11379;
(c) the biosynthetic locus for A54145 from Streptomyces fradiae ATCC
18158;

(d) the biosynthetic locus for the calcium-dependent antibiotic from Streptomyces coelicolor A3(2);
(e) the biosynthetic locus for a lipopeptide natural product from Streptomyces ghanaensis NRRL B-12104;
(f) the biosynthetic locus for a lipopeptide natural product from Streptomyces refuineus NRRL 3143;
(g) the biosynthetic locus for a lipopeptide natural product from Streptomyces aizunensis NRRL B-11277;
(h) the biosynthetic locus for a lipopeptide natural product from Actinoplanes nipponensis FD 24834 ATCC 31145; and (i) the biosynthetic locus for a lipopeptide natural product from a Streptomyces sp. organism.
5. Two or more isolated polynucleotides, wherein the first polynucleotide is a polynucleotide of claim 1, and the second polynucleotide encodes a polypeptide selected from the group consisting of:
(j) a polypeptide having at least 55% sequence identity to SEQ ID NO: 3, and (k) a polypeptide having at least 50% sequence identity to SEQ ID NO:4.
6. An isolated polynucleotide comprising a sequence selected from the group consisting of:
(a) a sequence selected from the group consisting of SEQ ID NOs. 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45 and 47;
(b) a sequence that is complementary to (a);

(c) a sequence which hybridizes to said sequence of (a) or (b) under conditions of high stringency, and (d) a sequence which has at least 70% or higher homology to said sequence of (a), (b), or (c).
7. The isolated polynucleotide of claim 6, wherein said isolated polynucleotide resides in a biosynthetic locus selected from the group consisting of:
(a) the biosynthetic locus for ramoplanin from Actinoplanes sp. ATCC 33076;
(b) the biosynthetic locus for A21978C from Streptomyces roseosporus NRRL 11379;
(c) the biosynthetic locus for A54145 from Streptomyces fradiae ATCC
18158;
(d) the biosynthetic locus for a lipopeptide natural product from Streptomyces ghanaensis NRRL B-12104;
(e) the biosynthetic locus for a lipopeptide natural product from Streptomyces refuineus NRRL 3143;
(f) the biosynthetic locus for a lipopeptide natural product from Streptomyces aizunensis NRRL B-11277;
(g) the biosynthetic locus for a lipopeptide natural product from Actinoplanes nipponensis FD 24834 ATCC 31145; and (h) the biosynthetic locus for a lipopeptide natural product from a Streptomyces sp. organism.
8. An isolated acyl-specific C-domain, encoded by a polynucleotide which comprises a sequence selected from the group consisting of:
(a) a sequence selected from the group consisting of SEQ ID NOs. 5, 7, 9, 11, 13, 15, 17, 19; and (b) a sequence that is complementary to (a);
(c) a sequence which hybridizes to said sequence of (a) or (b) under conditions of high stringency; and (d) a sequence which has at least 70% or higher homology to said sequence of (a), (b), or (c).
9. An isolated acyl-specific C-domain comprising at least 45% sequence homology to at least one sequence selected from SEQ ID NO. 1 and SEQ ID NO. 2.
10. An isolated acyl-specific C-domain comprising a polypeptide sequence selected from the group consisting of:
(a) a sequence selected from the group consisting of SEQ ID NOs. 6, 8, 10, 12, 14, 16, 18, 20 and 22; and (b) a sequence which has at least 70% or higher homology to said sequence of (a).
11. Two or more isolated polypeptides, wherein the first isolated polypeptide is an acyl-specific C-domain according to claim 9; and the second isolated polypeptide is selected from the group consisting of:
(a) a polypeptide having at least 55% identity to SEQ ID NO. 3 and (b) a polypeptide having at least 50% identity to SEQ ID NO. 4.
12. An isolated polypeptide comprising a polypeptide selected from the group consisting of:
(a) SEQ ID NOs. 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46 and 48; and (b) a sequence which has at least 70% or higher homology to said sequence of (a).
13. An N-acyl-capping cassette comprising at least one acyl-specific C-domain polypeptide and another polypeptide selected from the group consisting of an adenylating protein and an acyl-carrier protein.
14. A computer readable medium comprising:
(a) a computer program stored on said media containing instructions sufficient to implement a process for effecting the identification, analysis, or modeling of a representation of a polynucleotide or polypeptide sequence;
(b) data stored on said media representing a sequence of a polynucleotide selected from the group consisting of:
i) a polynucleotide encoding an acyl-specific C-domain, said polynucleotide encoding a polypeptide having at least 45%
sequence identity with either SEQ ID NO: 1 or SEQ ID NO: 2;
ii) a polynucleotide encoding a polypeptide having at least 55%
sequence identity with SEQ ID NO: 3; and iii) a polynucleotide encoding a polypeptide having at least 50%
sequence identity with SEQ ID NO: 4; and (c) a data structure reflecting the underlying organization and structure of said data to facilitate said computer program access to data elements corresponding to logical sub-components of the sequence, said data structure being inherent in said program and in the way in which said computer program organizes and accesses said data.
15. A computer readable medium comprising:
(a) a computer program stored on said media containing instructions sufficient to implement a process for effecting the identification, analysis, or modeling of a representation of a polypeptide sequence;
(b) data stored on said media representing a sequence of a polypeptide selected from the group consisting of:
i) polypeptide representing an aryl-specific C-domain and having at least 45% sequence identity with either SEQ ID NO: 1 or SEQ ID
NO: 2;
ii) a polypeptide having at least 55% sequence identity with SEQ ID
NO: 3; and iii) a polypeptide having at least 50% sequence identity with SEQ ID
NO: 4 and (c) a data structure reflecting the underlying organization and structure of said data to facilitate said computer program access to data elements corresponding to logical sub-components of the sequence, said data structure being inherent in said program and in the way in which said computer program organizes and accesses said data.
16. A memory for storing data that can be accessed by a computer programmed to implement a process for effecting the identification, analysis, or modeling of a sequence of a polynucleotide or a polypeptide, said memory comprising data representing a polynucleotide selected from the group consisting of:

(a) a polynucleotide encoding an aryl-specific C-domain, said polynucleotide encoding a polypeptide having at least 45% sequence identity with either SEQ
ID NO: 1 or SEQ ID NO: 2;
(b) a polynucleotide encoding a polypeptide having at least 55% sequence identity with SEQ ID NO: 3; and (c) a polynucleotide encoding a polypeptide having at least 50% sequence identity with SEQ ID NO: 4.
17. A memory for storing data that can be accessed by a computer programmed to implement a process for effecting the identification, analysis, or modeling of a sequence of a polypeptide, said memory comprising data representing a polypeptide selected from the group consisting of:
(a) a polypeptide having at least 45% sequence identity with either SEQ ID
NO: 1 or SEQ ID NO: 2;
(b) a polypeptide having at least 55% sequence identity with SEQ ID NO: 3;
and (c) a polypeptide having at least 50% sequence identity with SEQ ID NO: 4.
18. A method for detecting a polypeptide involved in lipopeptide biosynthesis or a polynucleotide encoding such a polypeptide comprising the step of identifying:
(a) a polypeptide having at least 45% sequence identity to SEQ ID NO:1 or SEQ
ID NO:2, or (b) a polynucleotide encoding a polypeptide having at least 45% sequence identity to SEQ ID NO:1 or SECT ID NO:2, and wherein said at least 45% sequence identity indicates a polypeptide involved in lipopeptide biosynthesis.
19. A method according to claim 18 wherein the identifying step comprising the steps of:
(a) providing a reference polynucleotide or polypeptide sequence selected from the group consisting of polynucleotide or polypeptide sequences representing an aryl-specific domain;
(b) comparing said reference sequence to one or more candidate polynucleotide or polypeptide sequences stored on a computer readable medium;
(c) determining level of homology between said reference sequence and said one or more candidate sequences, and (d) identifying a candidate sequence which shares at least 70% homology with reference sequence.
20. The method of claim 19, wherein said reference sequence is a polypeptide of SEQ ID NOS. 6, 8, 10, 12, 14, 16, 18, 20, 22 or a polynucleotide encoding a polypeptide of SEQ ID NOS. 6, 8, 10, 12, 14, 16, 18, 20 or 22.
21. The method of claim 19 further comprising determining structural motifs common to said candidate sequence and said reference sequence.
22. The method of claim 18 further comprising the step of identifying, in proximity to the polypeptide of a) or the polynucleotide of b) at least c) one polypeptide having at least 55% sequence identity to SEQ ID NO: 3 or one polynucleotide sequence encoding a polypeptide having at least 55%
sequence identity to SECT ID NO: 3; or d) one polypeptide having at least 50% sequence identity to SEQ ID NO: 4 or one polynucleotide sequence encoding a polypeptide having at least 50% sequence identity to SEQ ID NO: 4.
23. The method according to claim 22 wherein (a) the polypeptide of c) or d) is a polypeptide of SEQ ID NO: 24, 26, 28, 30, 32, 34, 36, 38 or 40, or a polypeptide having at least 70% sequence identity to a polypeptide of SEQ ID NO: 24, 26, 28, 30, 32, 34, 36, 38 or 40; or (b) the nucleotide of c) or d) is a nucleotide encoding a polypeptide of SEQ
ID
NO: 24, 26, 28, 30, 32, 34, 36, 38 or 40 or a nucleotide encoding a polypeptide having at least 70% sequence identity to a polypeptide of SEQ ID NO: 24, 26, 28, 30, 32, 34, 36, 38 or 40.
24. A computer system comprising:
(a) a database of reference sequences, wherein the reference sequences encode proteins involved in lipid biosynthesis, and wherein the reference sequences include one or more of:
(i) a polypeptide sequence representing any acyl-specific C-domain or a polynucleotide encoding an acyl-specific C-domain; and (b) a user interface capable of:
(i) receiving a test sequence for comparing against each of the reference sequences in the database; and (ii) displaying the results of the comparison.
25. A computer system of claim 24 wherein the reference sequences further include one or more of:
(iv) a polypeptide sequence representing an adenylating enzyme or a polynucleotide encoding an adenylating enzyme; and (v) a polypeptide sequence representing an aryl carrier protein or a poynucleotide encoding an acyl carrier protein.
26. A computer system of claim 25 wherein (a) the reference sequence of (i) is selected from SEQ ID NOS: 1, 2, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 and 22;

(b) the reference sequence of (iv) is selected from SEQ ID NOS: 3, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33 and 34; and (c) the reference sequence of (v) is selected from SEQ ID NO: 4, 37, 38, 39, 40, 41, 42, 4.3, 44, 45, 46, 47 and 48.
CA002412226A 2001-12-26 2002-12-24 Compositions, methods and systems for discovery of lipopeptides Abandoned CA2412226A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US34213301P 2001-12-26 2001-12-26
US60/342,133 2001-12-26
US37278902P 2002-04-17 2002-04-17
US60/372,789 2002-04-17

Publications (1)

Publication Number Publication Date
CA2412226A1 true CA2412226A1 (en) 2003-06-22

Family

ID=26992837

Family Applications (1)

Application Number Title Priority Date Filing Date
CA002412226A Abandoned CA2412226A1 (en) 2001-12-26 2002-12-24 Compositions, methods and systems for discovery of lipopeptides

Country Status (5)

Country Link
EP (2) EP1458868A2 (en)
JP (1) JP2005514067A (en)
AU (2) AU2002351636A1 (en)
CA (1) CA2412226A1 (en)
WO (2) WO2003060128A2 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102007017861A1 (en) * 2007-04-13 2008-10-16 Philipps-Universität Marburg Protein for the chemoenzymatic production of L-threo-hydroxyaspartate

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA1337758C (en) * 1988-04-11 1995-12-19 Eli Lilly And Company Peptide antibiotics
AU6510799A (en) * 1998-10-07 2000-04-26 Maxygen, Inc. Dna shuffling to produce nucleic acids for mycotoxin detoxification
US6927286B1 (en) * 1999-01-06 2005-08-09 The Regents Of The University Of California Bleomycin gene cluster components and their uses
DE19951196A1 (en) * 1999-10-22 2001-05-10 Mohamed A Marahiel Tailored peptide synthetases and their use
AU2001245263A1 (en) * 2000-01-21 2001-07-31 Kosan Biosciences, Inc. Method for cloning polyketide synthase genes
DE60109952T2 (en) * 2000-10-13 2006-02-09 Ecopia Biosciences Inc., Saint-Laurent RAMOPLANINBIOSYNTHESEGENKLUSTER
WO2002059322A2 (en) * 2000-10-17 2002-08-01 Cubist Pharmaceuticlas, Inc. Compositions and methods relating to the daptomycin biosynthetic gene cluster
CA2352451C (en) * 2001-07-24 2003-04-08 Ecopia Biosciences Inc. High throughput method for discovery of gene clusters

Also Published As

Publication number Publication date
WO2003060127A3 (en) 2004-04-29
EP1458868A2 (en) 2004-09-22
AU2002351637A1 (en) 2003-07-30
EP1461434A2 (en) 2004-09-29
JP2005514067A (en) 2005-05-19
WO2003060127A2 (en) 2003-07-24
AU2002351636A1 (en) 2003-07-30
WO2003060128A2 (en) 2003-07-24
WO2003060128A3 (en) 2004-06-10

Similar Documents

Publication Publication Date Title
Konz et al. The bacitracin biosynthesis operon of Bacillus licheniformis ATCC 10716: molecular characterization of three multi-modular peptide synthetases
US10047363B2 (en) NRPS-PKS gene cluster and its manipulation and utility
KR101261870B1 (en) Polymyxin B or E synthetase and gene cluster thereof
JP6430250B2 (en) Gene cluster for biosynthesis of glyceromycin and methylglyceromycin
Arrebola et al. A nonribosomal peptide synthetase gene (mgoA) of Pseudomonas syringae pv. syringae is involved in mangotoxin biosynthesis and is required for full virulence
US11858967B2 (en) Compositions and methods for enhanced production of enduracidin in a genetically engineered strain of streptomyces fungicidicus
US7462705B2 (en) Nucleic acids encoding an enediyne polyketide synthase complex
CA2365904A1 (en) Mitomycin biosynthetic gene cluster
JP2006503554A (en) Polynucleotide and polypeptide encoded by the polynucleotide and involved in the synthesis of diketopiperazine derivatives
WO2011130719A2 (en) A biosynthetic pathway for heterologous expression of a nonribosomal peptide synthetase drug and analogs
US20100035256A1 (en) Enduracidin biosynthetic gene cluster from streptomyces fungicidicus
CA2412226A1 (en) Compositions, methods and systems for discovery of lipopeptides
US7235651B2 (en) Genes and proteins involved in the biosynthesis of lipopeptides
US20030211567A1 (en) Compositions, methods and systems for discovery of lipopeptides
WO2002101051A2 (en) Genes and proteins for the biosynthesis of anthramycin
WO2003089641A2 (en) Dual condensation/epimerization domain in non-ribosomal peptide synthetase systems
US8329430B2 (en) Polymyxin synthetase and gene cluster thereof
CA2445687C (en) Compositions, methods and systems for the discovery of enediyne natural products
WO2005021586A2 (en) Metabolic engineering of viomycin biosynthesis
Borg Molecular Genetic and Physical Analysis of Calcium-Dependent Antibiotics from Streptomyces Species
EP2586791A1 (en) Gene cluster for biosynthesis of griselimycin and methylgriselimycin

Legal Events

Date Code Title Description
EEER Examination request
FZDE Discontinued
FZDE Discontinued

Effective date: 20111228