WO2003089641A2

WO2003089641A2 - Dual condensation/epimerization domain in non-ribosomal peptide synthetase systems

Info

Publication number: WO2003089641A2
Application number: PCT/CA2003/000575
Authority: WO
Inventors: Chris M. Farnet; Alfredo Staffa
Original assignee: Ecopia Biosciences Inc.
Priority date: 2002-04-17
Filing date: 2003-04-17
Publication date: 2003-10-30
Also published as: WO2003089641A3; AU2003222690A1; CA2430580A1

Abstract

The present invention relates to domains of non-ribosomal peptide synthetases (NRPSs) that exhibit dual condensation and epimerization activities. The 'dual condensation/epimerization NRPS domains' of the present invention allow for the incorporation of non-proteinogenic substrates, such as D-amino acids, into peptide products. These dual condensation/epimerization NRPS domains may further be used to modify the stereochemistry of synthesized peptides at selected amino acid sites.

Description

TITLE OF INVENTION

DUAL CONDENSATION/EPIMERIZATION DOMAIN IN NON-RIBOSOMAL PEPTIDE SYNTHETASE SYSTEMS

RELATED APPLICATIONS

[001] The present application is a Continuation-in-part of Application No. 09/976,059 filed on October 15, 2001 which claims priority from Application No. 60/239,924 filed on October 13, 2000. The present application further claims the benefit of Application No. 60/372,790 filed on April 17, 2002. The full disclosure of each of these applications is incorporated herein by reference.

FIELD OF THE INVENTION

[002] The present invention relates to non-ribosomal peptide synthetases (NRPSs). More specifically, the present invention concerns domains of NRPSs that exhibit dual condensation and epimerization activities allowing for the incorporation of non-proteinogenic substrates, such as D-amino acids, into peptide products. These dual condensation/epimerization NRPS domains may further be used to modify the stereochemistry of synthesized peptides at selected amino acid sites.

BACKGROUND

[003] Many low molecular weight peptides produced by microorganisms are synthesized non-ribosomally on large multifunctional proteins termed non- ribosomal peptide synthetases (NRPSs). NRPSs are modular proteins that consist of one or more polyfunctional polypeptides each of which is made up of modules. The amino- to carboxy-terminal order and specificities of the individual modules correspond to the sequential order and identity of the amino acid residues of the peptide product. Each NRPS module recognizes a specific amino acid substrate and catalyzes the stepwise condensation to form a growing peptide chain. The identity of the amino acid recognized by a particular unit can be determined by comparison with other units of known specificity. In many peptide synthetases, there is a strict correlation between the order of repeated units in a peptide synthetase and the order in which the respective amino acids appear in the peptide product, making it possible to correlate peptides of known structure with putative genes encoding their synthesis, as demonstrated by the identification of the mycobactin biosynthetic gene cluster from the genome of Mycobacterium tuberculosis.

[004] The modules of a peptide synthetase are composed of smaller units or "domains" that each carry out a specific role in the recognition, activation, modification and joining of amino acid precursors to form the peptide product. One type of domain, the adenylation (A) domain, is responsible for selectively recognizing and activating the amino acid that is to be incorporated by a particular unit of the peptide synthetase. This activation step is ATP- dependent and involves the transient formation of an amino-acyl-adenylate. The activated amino acid is covalently attached to the peptide synthetase through another type of domain, the thiolation (T) domain, that is generally located adjacent to the A domain. The T domain is post-translationally modified by the covalent attachment of a phosphopantetheinyl prosthetic arm to a conserved serine residue. The activated amino acid substrates are tethered onto the non-ribosomal peptide synthetase via a thioester bond to the phosphopantetheinyl prosthetic arm of the respective T domains. Amino acids joined to successive units of the peptide synthetase are subsequently covalently linked together by the formation of amide bonds catalyzed by another type of domain, the condensation (C) domain. NRPS modules can also occasionally contain additional functional domains that carry out auxiliary reactions, the most common being epimerization of an amino acid substrate from the L- to the D- form. This reaction is catalyzed by a domain referred to as an epimerization (E) domain that is generally located adjacent to the T domain of a given NRPS module. Thus, a typical NRPS module has the following domain organization: C-A-T-(E). [005] Product assembly by NRPSs involves three distinct phases, namely chain initiation, chain elongation, and chain termination. Polypeptide chain initiation is carried out by specialized modules termed "starter modules" that comprise an A domain and a T domain. Elongation modules have, in addition, a C domain that is located upstream of the A domain. Elongation domains cannot initiate peptide bond formation due to interference by the C domain. The peptide intermediates are covalently tethered to the NRPS during translocations as an elongating series of acyl-S-enzymes. To release the mature peptide product from the NRPS, the terminal acyl-S-enzyme bond must be broken. This process is the chain termination step and is usually catalyzed by a C-terminal thioesterase (TE) domain. Thioesterase-mediated release of the mature peptide from the NRPS enzyme involves the transient formation of an acyl-O-TE intermediate that is then hydrolyzed or hydrolyzed and concomitantly cyclized to release the mature peptide.

SUMMARY OF THE INVENTION

[006] We have discovered and isolated nucleic acid and polypeptide sequences for specialized condensation domains that direct the incorporation of non-proteinogenic substrates, such as D-amino acids (and occasionally non-chiral amino acids) into peptide products synthesized by non-ribosomal peptide synthetase systems that lack canonical epimerization domains (referred to herein and in USSN 60/372,790 as E-less NRPSs).

[007] In one aspect, the invention provides an isolated polynucleotide encoding a dual condensation/epimerization NRPS domain, said polynucleotide encodes a polypeptide having at least 45% sequence identity to SEQ ID NO: 139. Certain embodiments expressly exclude one or more sequences, in particular the nucleotide sequence corresponding to the C- domains of NRPS protein of GenBank accession no. AAC80285 (SEQ ID NOS: 18, 20, 22 and 24); the nucleotide sequence corresponding to the C- domains of NRPS protein of GenBank accession no. AAF99707, AAO72424 and AAO72425 (SEQ ID NOS: 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58 and 60) and the nucleotide sequence corresponding to the C-domains of NRPS protein of GenBank accession no. CAD17793 (SEQ ID NOS: 62 and 64). Other sequences can be excluded without departing from the scope of the invention.

[008] In a related aspect the invention provides an isolated polynucleotide comprising a sequence selected from the group consisting of: (a) a sequence selected from the group consisting of SEQ ID NOS: 2, 4, 6, 8,10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100, 102, 104, 106, 108, 1 10, 1 12, 1 14, 116, 118, 120, 122, 124, 126, 128, 130, 132, 134, 136 and 138; (b) a sequence that is complementary to (a); (c) a sequence which hybridizes to said sequence of (a) or (b) under conditions of high stringency; and (d) a sequence which has at least 60% or higher similarity to said sequence of (a), (b), or (c) as measured using the BLASTP 2.2.5 algorithm. Certain embodiments expressly exclude one or more sequences, in particular the nucleotide sequence corresponding to the C-domains of NRPS protein of GenBank accession no. AAC80285 (SEQ ID NOS: 18, 20, 22 and 24); the nucleotide sequence corresponding to the C- domains of NRPS protein of GenBank accession no. AAF99707, AAO72424 and AAO72425 (SEQ ID NOS: 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 56, 58 and 60) and the nucleotide sequence corresponding to the C- domains of NRPS protein of GenBank accession no. CAD17793 (SEQ ID NOS: 62 and 64). Other sequences can be excluded without departing from the scope of the invention.

[009] In one embodiment, the isolated polynucleotide encoding a dual condensation/epimerization domain is derived from an organism of the actinomycetes taxon. In a further embodiment, the isolated polynucleotide encoding a dual condensation/epimerization NRPS domain resides in a gene locus selected from the group consisting of: the biosynthetic locus for ramoplanin from Actinoplanes sp. ATCC 33076; the biosynthetic locus for syringomycin from Pseudomonas syringae pv. syringae strain B301 D; the biosynthetic locus for syringopeptin from Pseudomonas syringae pv. syringae strain B301 D; the biosynthetic locus for a peptide natural product from Ralstonia solanacearum GMI1000; the biosynthetic locus for a lipopeptide natural product from Streptomyces aizunensis NRRL B-11277; the biosynthetic locus for a lipopeptide natural product from Streptomyces griseofuscus NRRL B-5429; the biosynthetic locus for a lipopeptide natural product from Kitasatospona sp. ECO-03; the biosynthetic locus for a lipopeptide natural product from Streptomyces sp. ECO-38; the biosynthetic locus for a lipopeptide natural product from Streptomyces sp. ECO-59; the biosynthetic locus for a lipopeptide natural product from Streptomyces viridifaciens NRRL ISP-5239; the biosynthetic locus for a lipopeptide natural product from Actinomadura sp. ATCC 39334; and the biosynthetic locus for a lipopeptide natural product from Micromonospora chersina ATCC 53710. In still a further embodiment of the invention, the dual condensation/epimerization NRPS domain encoded by the isolated polynucleotide is involved in epimerization and condensation of amino acids in E-less NRPS systems. In one embodiment some dual condensation/epimerization NRPS domains reside in cosmids 008CH and 008CK having accession numbers IDAC 190901-3 and IDAC 190901-1 , respectively.

[0010] In another embodiment, the isolated polynucleotide encoding a dual condensation/epimerization NRPS domain does not reside in the biosynthetic locus for syringomycin from Pseudomonas syringae pv. syringae strain B301 D, the biosynthetic locus for syringopeptin from Pseudomonas syringae pv. syringae strain B301 D, or the biosynthetic locus for a peptide natural prod uct from Ralstonia solanacearum G M 11000. [0011] The invention also provides an isolated dual condensation/epimerization NRPS domain comprising at least 45% sequence identity to SEQ ID NO: 139. Certain embodiments expressly exclude one or more sequences, in particular the polypeptide sequence corresponding to the C-domains of NRPS protein of GenBank accession no. AAC80285 (SEQ ID NOS: 17, 19, 21 and 23); the polypeptide sequence corresponding to the C- domains of NRPS protein of GenBank accession no. AAF99707, AAO72424 and AAO72425 (SEQ ID NOS: 25, 27, 29, 31, 33, 35, 37, 39, 41 , 43, 45, 47, 49, 51 , 53, 55, 57 and 59) and the polypeptide sequence corresponding to the C-domains of NRPS protein of GenBank accession no. CAD17793 (Seq ID NOS: 61 and 63). Other sequences can be excluded without departing from the scope of the invention.

[0012] In a related aspect the invention provides an isolated dual condensation/epimerization NRPS domain comprising a polypeptide sequence selected from the group consisting of: (a) a sequence selected from the group consisting of SEQ ID NOS: 1 , 3, 5, 7,9, 1 1 , 13, 15, 17, 19, 21 , 23, 25, 27, 29, 31 , 33, 35, 37, 39, 41 , 43, 45, 47, 49, 51 , 53, 55, 57, 57, 59, 61 , 63, 65, 67, 69, 71 , 73, 75, 77, 79, 81 , 83, 85, 87, 89, 91 , 93, 95, 97, 99, 101 , 103, 105, 107, 109, 111 , 113, 115, 1 17, 1 19, 121 , 123, 125, 127, 129, 131 , 133, 135 and 137; and (b) a sequence which has at least 60% or higher sequence similarity to said sequence of (a) as determined using the BLASTP 2.2.5 algorithm. Certain embodiments expressly exclude one or more sequences, in particular the polypeptide sequence corresponding to the C- domains of NRPS protein of GenBank accession no. AAC80285 (SEQ ID NOS: 17, 19, 21 and 23); the polypeptide sequence corresponding to the C- domains of NRPS protein of GenBank accession no. AAF99707, AAO72424 and AAO72425 (SEQ ID NOS: 25, 27, 29, 31 , 33, 35, 37, 39, 41 , 43, 45, 47, 49, 51 , 53, 55, 57 and 59) and the polypeptide sequence corresponding to the C-domains of NRPS protein of GenBank accession no. CAD17793 (Seq ID NOS: 61 and 63). [0013] In another related aspect the invention provides an isolated dual condensation/epimerization NRPS domain comprising a polypeptide sequence selected from the group consisting of: (a) a sequence selected from the group consisting of SEQ ID NOS: 1 , 3, 5, 7,9, 1 1 , 13, 15, 17, 19, 21 , 23, 25, 27, 29, 31 , 33, 35, 37, 39, 41 , 43, 45, 47, 49, 51 , 53, 55, 57, 57, 59, 61 , 63, 65, 67, 69, 71 , 73, 75, 77, 79, 81 , 83, 85, 87, 89, 91 , 93, 95, 97, 99, 101 , 103, 105, 107, 109, 111 , 1 13, 1 15, 1 17, 119, 121 , 123, 125, 127, 129, 131 , 133, 135 and 137; and (b) a sequence which has at least 70% or higher sequence similarity to said sequence of (a) as determined using the BLASTP 2.2.5 algorithm. Certain embodiments expressly exclude one or more sequences, in particular the polypeptide sequence corresponding to the C- domains of NRPS protein of GenBank accession no. AAC80285 (SEQ ID NOS: 17, 19, 21 and 23); the polypeptide sequence corresponding to the C- domains of NRPS protein of GenBank accession no. AAF99707, AAO72424 and AAO72425 (SEQ ID NOS: 25, 27, 29, 31 , 33, 35, 37, 39, 41 , 43, 45, 47, 49, 51 , 53, 55, 57 and 59) and the polypeptide sequence corresponding to the C-domains of NRPS protein of GenBank accession no. CAD17793 (Seq ID NOS: 61 and 63).

[0014] In one embodiment, the isolated dual condensation/epimerization NRPS domain is derived from an organism of the actinomycetes taxon. In another embodiment, the dual condensation/spimerization NRPS domain is not derived from the biosynthetic locus for syringomycin from Pseudomonas syringae pv. syringae strain B301 D, the biosynthetic locus for syringopeptin from Pseudomonas syringae pv. syringae strain B301 D and the biosynthetic locus for a peptide natural product from Ralstonia solanacearum GMI1000.

[0015] The present invention further contemplates several uses or applications of the dual condensation/epimerization NRPS domain isolated polynucleotide encoded by the isolated polynucleotide described herein. In one such use or application, the dual condensation/epimerization NRPS domain is involved in the incorporation of a D-amino acid or non-chiral amino acid into a peptide product. Other uses include: use of the dual condensation/epimerization NRPS domain for epimerization and condensation of amino acids in E-less NRPS systems; use of the dual condensation/epimerization NRPS domain to modify the stereochemistry of a synthesized peptide compound in vivo, using an appropriate recombinant host; use of the dual condensation/epimerization NRPS domain to modify the stereochemistry of a synthesized peptide compound in vitro, using purified enzymes supplemented with appropriate substrates; use of the dual condensation/epimerization NRPS domain to modify the stereochemistry of ramoplanin; and use of the dual condensation/epimerization NRPS domain to modify the stereochemistry of the complestatin molecule at a specific amino acid component. In one application, the dual condensation/epimerization NRPS domain is genetically substituted for a regular condensation domain in an expression system.

[0016] The present invention also encompasses expression vectors and cultured cells comprising the isolated polynucleotide encoding the dual condensation/epimerization NRPS domain. It also includes a method of producing the dual condensation/epimerization NRPS domain, the method comprising culturing the cells under conditions permitting expression of the dual condensation/epimerization NRPS domain and purifying the dual condensation/epimerization NRPS domain from the cell or the medium of the cell.

BRIEF DESCRIPTION OF THE DRAWINGS

[0017] Figures 1a, 1 b, 1c, 1d, 1e, 1f, 1g, 1 h and 1 i each represents a schematic view of the domain organization of NRPS systems. Figure 1a shows epimerization domain-containing NRPS systems from Bacillus lichenformis ATCC 10716 (BACI) and Bacillus subtilis strain 168 (SURF); Figure 1 b shows epimerization domain-containing NRPS systems from Bacillus subtilis b213 (PLIP) and Streptomyces coelicolor A3(2) (CADA); Figure 1c shows E-less NRPS systems from Actinoplanes sp. ATCC 33076 (RAMO) and Pseudomonas syringae pv. syringae strain B301 D (SYRI); Figure 1d shows E-less NRPS systems from Pseudomonas syringae pv. syringae strain B301 D (SYRP) and Ralstonia solanacearum GMI1000; Figure 1e shows E-less NRPS system from Streptomyces aizunensis NRRL B-11277 (023C); Figure 1f shows E-less NRPS systems from Streptomyces griseofuscus NRRL B-5429 (034F) and Kitasatosporia sp. ECO-03 (040G); Figure 1g shows E-less NRPS systems from Streptomyces sp. ECO-38 (084B) and Streptomyces sp. ECO-59 (107A); Figure 1h shows E-less NRPS systems from Streptomyces viridifaciens NRRL ISP-5239 (143F) and Actinomadura sp. ATCC 39334 (153A); Figure 1i shows E-less NRPS system from Micromonospora chersina ATCC 53710 (263B). Dual condensation/epimerization domains in Figures 1c to 1 i are shown by a shaded circle.

[0018] Figure 2 is a dendrogram showing the evolutionary relatedness of C- domains from various NRPS systems with a clearly branching cluster of representative condensation/epimerization NRPS domains of the invention (designated by a shaded circle) involved in the incorporation of D-amino acids into peptide products synthesized by E-less NRPS systems. Condensation domains located downstream of canonical epimerization domains are designated by black circles. Condensation domains involved in N-acylation of lipopeptides are designated by a clear circle. Regular C-domains that catalyze condensation of amino acids in the L- configuration are designated by hatched circles.

[0019] Figure 3 represents the ramoplanin NRPS system present in Actinoplanes sp. ATCC 33076. The NRPS complex is composed of three polypeptides, 008CHP_09, 008CHPJ0 and 008CHP_11 , composed of tripartite modules (C-A-T). Representative condensation/epimerization NRPS domains of the invention are represented by a shaded circle. 008CH and 008CK are deposited cosmids containing genes of the invention.

[0020] Figure 4 illustrates the use of NRPS biosynthetic machinery to modify the stereochemistry of a synthesized peptide compound. Replacement of a regular condensation domain of the ramoplanin NRPS system (shown as a hatched circle) with a dual condensation/epimerization NRPS domain (shown as a shaded circle) alters the stereochemistry of ramoplanin at two different amino acid sites.

[0021] Figure 5 illustrates the use of NRPS biosynthetic machinery to modify the stereochemistry of a synthesized peptide compound wherein a dual condensation/epimerization NRPS domain from the ramoplanin NRPS system is used to modify the stereochemistry of the complestatin molecule at a specific amino acid component.

DETAILED DESCRIPTION

[0022] Dual condensation/epimerization NRPS domains from NRPS systems in a variety of organisms were discovered and analyzed. For convenience, peptide biosynthetic loci and organisms containing NRPS systems with representative dual condensation/epimerization NRPS domains of the invention are referred to herein and in priority application USSN 06/372,790 by reference to a source designation wherein "BACI" refers to the biosynthetic locus for bacitracin from Bacillus lichenformis ATCC 10716 (Konz et al. (1997), Chem. Biol, 4, 927-937); "SURF" refers to the biosynthetic locus for surfactin from Bacillus subtilis strain 168 (Cosmina et al. (1993), Mol. Microbiol., 8, 821-831 ); "PLIP" refers to the locus for plipastatin from Bacillus subtilis b213 (Steller et al. (1998), Chem. Biol, 6, 31-41); "CADA" refers to the calcium-dependent antibiotic from Streptomyces coelicolor A3(2) (Chong et al. (1998), Microbiology, 144, 193-199); "RAMO" refers to the biosynthetic locus for ramoplanin from Actinoplanes sp. ATCC 33076; "SYRI" refers to the biosynthetic locus for syringomycin from Pseudomonas syringae pv. syringae strain B301 D (Guenzi et al. (1998), J. Biol. Chem., 273, 32857-32683) "SYRP" refers to the biosynthetic locus for syringopeptin from Pseudomonas syringae pv. syringae strain B301 D (Scholz-Schroeder et al. (2003), Mol. Plant Microbe Interact, 16, 271-280); URSO refers to the biosynthetic locus for a peptide from Ralstonia solanacearum (Salanoubat et al. (2002), Nature, 415, 497-502); 023C refers to the biosynthetic locus for a peptide from Streptomyces aizunensis NRRL B-11277; 034F refers to the biosynthetic locus for a peptide from Streptomyces griseofuscus NRRL B-5429; 040G refers to the biosynthetic locus for a peptide from Kitasatospona sp. ECO-03; 084B refers to the biosynthetic locus for a peptide from Streptomyces sp. ECO-38; 107A refers to the biosynthetic locus for a peptide from Streptomyces sp. ECO-59; 143F refers to the biosynthetic locus for a peptide from Streptomyces viridifaciens NRRL ISP-5239; 153A refers to the biosynthetic locus for a peptide from Actinomadura sp. ATCC 39334, and 263B refers to the biosynthetic locus for a peptide from Micromonospora chersina ATCC 53710. All ECO numbers refer to organisms present in Ecopia's private culture collection..

[0023] Unless defined otherwise, the scientific and technological terms and nomenclature used herein have the same meaning as commonly understood by a person of ordinary skill to which this invention pertains. Generally, the procedures for cell cultures, infection, molecular biology methods and the like are common methods used in the art. Such standard techniques can be found in reference manuals such as for example Sambrook et al. (1989, Molecular Cloning - A Laboratory Manual, Cold Spring Harbor Laboratories) and Ausubel et al. (1994, Current Protocols in Molecular Biology, Wiley, New York). [0024] As used herein, the terms "polynucleotide" or "nucleic acid molecule", refers to a polymer of nucleotides. Non-limiting examples thereof include DNA (e.g. genomic DNA, cDNA), RNA molecules (e.g. mRNA) and chimeras thereof. The nucleic acid molecule can be obtained by cloning techniques or synthesized. DNA can be double-stranded or single-stranded (coding strand or non-coding strand [antisense]).

[0025] As used herein, the term "gene" is well known in the art and relates to a nucleic acid sequence defining a single protein or polypeptide. A "structural gene" defines a DNA sequence which is transcribed into RNA and translated into a protein having a specific amino acid sequence thereby giving rise to a specific polypeptide or protein. It will be readily recognized by the person of ordinary skill, that the nucleic acid sequence of the present invention can be incorporated into anyone of numerous established kit formats which are well known in the art.

[0026] A "heterologous" (e.g. a heterologous gene) region of a DNA molecule is a subsegment of DNA within a larger segment that is not found in association therewith in nature. The term "heterologous" can be similarly used to define two polypeptidic segments not joined together in nature. Non- limiting examples of heterologous genes include reporter genes such as luciferase, chloramphenicol acetyl transferase, β-galactosidase, and the like which can be juxtaposed or joined to heterologous control regions or to heterologous polypeptides.

[0027] The term "vector" is commonly known in the art and defines a plasmid DNA, phage DNA, viral DNA and the like, which can serve as a DNA vehicle into which DNA of the present invention can be cloned. Numerous types of vectors exist and are well known in the art. [0028] The term "expression" defines the process by which a gene is transcribed into mRNA (transcription), the mRNA is then being translated (translation) into one polypeptide (or protein) or more.

[0029] The terminology "expression vector" defines a vector or vehicle as described above but designed to enable the expression of an inserted sequence following transformation into a host. The cloned gene (inserted sequence) is usually placed under the control of control element sequences such as promoter sequences. The placing of a cloned gene under such control sequences is often referred to as being operably linked to control elements or sequences.

[0030] Operably linked sequences may also include two segments that are transcribed onto the same RNA transcript. Thus, two sequences, such as a promoter and a "reporter sequence" are operably linked if transcription commencing in the promoter will produce an RNA transcript of the reporter sequence. In order to be "operably linked" it is not necessary that two sequences be immediately adjacent to one another.

[0031] Expression control sequences will vary depending on whether the vector is designed to express the operably linked gene in a prokaryotic or eukaryotic host or both (shuttle vectors) and can additionally contain transcriptional elements such as enhancer elements, termination sequences, tissue-specificity elements, and/or translational initiation and termination sites.

[0032] Prokaryotic expressions are useful for the preparation of large quantities of the protein encoded by the DNA sequence of interest. This protein can be purified according to standard protocols that take advantage of the intrinsic properties thereof, such as size and charge (e.g. SDS gel electrophoresis, gel filtration, centrifugation, ion exchange chromatography...). In addition, the protein of interest can be purified via affinity chromatography using polyclonal or monoclonal antibodies. The purified protein can be used for therapeutic applications.

[0033] The DNA construct can be a vector comprising a promoter that is operably linked to an oligonucleotide sequence of the present invention, which is in turn, operably linked to a heterologous gene, such as the gene for the luciferase reporter molecule. "Promoter" refers to a DNA regulatory region capable of binding directly or indirectly to RNA polymerase in a cell and initiating transcription of a downstream (3' direction) coding sequence. For purposes of the present invention, the promoter is bound at its 3' terminus by the transcription initiation site and extends upstream (5' direction) to include the minimum number of bases or elements necessary to initiate transcription at levels detectable above background. Within the promoter will be found a transcription initiation site (conveniently defined by mapping with S1 nuclease), as well as protein binding domains (consensus sequences) responsible for the binding of RNA polymerase. Eukaryotic promoters will often, but not always, contain "TATA" boxes and "CCAT" boxes. Prokaryotic promoters contain -10 and -35 consensus sequences, which serve to initiate transcription and the transcript products contain Shine-Dalgamo sequences, which serve as ribosome binding sequences during translation initiation. Promoters suitable for expressing the polypeptide or fragment thereof in bacteria include the E.coli lac or trp promoters, the lad promoter, the lacZ promoter, the T3 promoter, the T7 promoter, the gpt promoter, the lambda P_R promoter, the lambda Pι_ promoter, promoters from operons encoding glycolytic enzymes such as 3-phosphoglycerate kinase (PGK), and the acid phosphatase promoter. The vector may also include appropriate sequences for modulating expression levels, an origin of replication and a selectable marker.

[0034] As used herein, the designation "functional derivative" denotes, in the contexof a functional derivative of a sequence whether a nucleic acid or amino acid sequence, a molecule that retains a dual condensation/epimerization activity (either function or structural) that is substantially similar to that of the original sequence. This functional derivative or equivalent may be a natural derivative or may be prepared synthetically. Such derivatives include amino acid sequences having substitutions, deletions, or additions of one or more amino acids, provided that the dual condensation/epimerization activity of the protein is conserved. The same applies to derivatives of nucleic acid sequences which can have substitutions, deletions, or additions of one or more nucleotides, provided that the biological activity of the sequence is generally maintained. When relating to a protein sequence, the substituting amino acid generally has chemico-physical properties which are similar to that of the substituted amino acid. The similar chemico-physical properties include, similarities in charge, bulkiness, hydrophobicity, hydrophylicity and the like. The term "functional derivatives" is intended to include "fragments", "segments", "variants", "analogs" or "chemical derivatives" of the subject matter of the present invention.

[0035] Thus, the term "variant" refers herein to a protein or nucleic acid moleculwhich is substantially similar in structure and dual condensation/epimerization activity to the protein or nucleic acid of the present invention.

[0036] The functional derivatives of the present invention can be synthesized chemically or produced through recombinant DNA technology. All these methods are well known in the art.

[0037] As used herein, the term "purified" refers to a molecule having been separated from a cellular component. Thus, for example, a "purified protein" has been purified to a level not found in nature. A "substantially pure" molecule is a molecule that is lacking in most other cellular components. [0038] For certainty, the sequences and polypeptides useful to practice the invention include without being limited thereto mutants, homologs, subtypes, alleles and the like. It shall be understood that generally, the sequences of the present invention should encode a functional (albeit defective) interaction domain. It will be clear to the person of ordinary skill that whether an interaction domain of the present invention, variant, derivative, or fragment thereof retains its function in binding to its partner can be readily determined by using the teachings and assays of the present invention and the general teachings of the art.

[0039] A host cell or indicator cell has been "transfected" by exogenous or heterologous DNA (e.g. a DNA construct) when such DNA has been introduced inside the cell. The transfecting DNA may or may not be integrated (covalently linked) into chromosomal DNA making up the genome of the cell. In prokaryotes, yeast, and mammalian cells for example, the transfecting DNA may be maintained on a episomal element such as a plasmid. With respect to eukaryotic cells, a stably transfected cell is one in which the transfecting DNA has become integrated into a chromosome so that it is inherited by daughter cells through chromosome replication. This stability is demonstrated by the ability of the eukaryotic cell to establish cell lines or clones comprised of a population of daughter cells containing the transfecting DNA. Transfection methods are well known in the art (Sambrook et al., 1989, supra; Ausubel et al., 1994 supra). The use of a mammalian cell as indicator can provide the advantage of furnishing an intermediate factor, which permits for example the interaction of two polypeptides which are tested, that might not be present in lower eukaryotes or prokaryotes. Of course, such an advantage might be rendered moot if both polypeptide tested directly interact. It will be understood that extracts from mammalian cells for example could be used in certain embodiments, to compensate for the lack of certain factors. [0040] The expression a "dual condensation/epimerization NRPS domain" of the present invention is defined structurally as a polypeptide sequence that produces an alignment with at least 45% identity with the following consensus sequence (SEQ ID NO: 139) using the BLASTP 2.2.5 algorithm, with the filter option -F set to false, the gap opening penalty -G set to 1 1 , the gap extension penalty -E set to 1 , and all remaining options set to default values:

ADIYPLAPLQEGILFHHLIadggedDaYVIpavleFDSReRLdaFlgALQ qViDRHDILRTavvWeGLrEPVQVVwRhAeLpVeevtLdpagiaadpvaq LdaaaglrmDLgrAPLIrlhvAadpgggrWLaLLrfHHLVqDHTALevLI aEiqAfLaGrgdeLPePIPFRnFVAQARIGvsraEHErFFaeLLGDVtEP TAPFGLIDVrGDGsgveearlpldaeLaaRLReqARrLGVSpATIfHLAW ARVLgavSGRdDVVFGTVLfGRMqaGaGADRvpGIFINTLPVRVrlggqg VldAVramRaqLAeLLeHEHAPLALAQRASGVpaptPLFTsLLNYRHsav aavsaealaawagaeleGirlLssrERTNYPLtVsVDDIGdgFsLtVqAv aplDaerVcallhTAIenLVdALEqaPdtpLsavdVLpaaERrrlLveWN dtaadyvpaatvpeLFeAQVartP

where consensus sequence SEQ ID NO: 139 is based on the sequences of representative dual condensation/epimerization NRPS domains from the following E-less NRPS-containing peptide biosynthetic loci: RAMO (SEQ ID NOS: 01 , 03, 05, 07, 09, 1 1 , 13 and 15); SYRI (SEQ ID NOS: 17, 19, 21 and 23); SYRP (SEQ ID NOS: 25, 27, 29, 31 , 33, 35, 37, 39, 41 , 43, 45, 47, 49, 51 , 53, 55, 57 and 59); URSO (SEQ ID NOS: 61 and 63); 023C (SEQ ID NOS: 65, 67, 69, 71 , 73, 75, 77, 79, 81 , 83, 85, 87 and 89); 034F (SEQ ID NOS: 91 , 93, 95 and 97); 040G (SEQ ID NOS: 99 and 101 ); 084B (SEQ ID NOS: 103, 105, 107 and 109); 107A (SEQ ID NOS: 111 and 1 13); 143F (SEQ ID NOS: 1 15, 1 17, 1 19 and 121 ); 153A (SEQ ID NOS: 123, 125, 127 and 129); and 263B (SEQ ID NOS: 131 , 133, 135 and 137).

[0041] The consensus sequence was generated as follows. First, the listed sequences were aligned with the ClustalX 1.81 program using default settings. Then a profile hidden Markov model (HMM) was made from the alignment file with the hmmbuild program of the HMMER 2.2 package (Sean Eddy, Washington University; world-wide-web hmmer.wustl.edu/) and was calibrated with the hmmcalibrate program of the HMMER package, both using default settings. Briefly, a profile hidden Markov model is a statistical description of a sequence family's consensus. HMMER is a freely distributable implementation of profile HMM software for protein sequence analysis and is available from the above web site. Finally, the consensus sequences were generated from the HMM with the hmmemit program of the HMMER package using the -c option so as to predict a single majority rule consensus sequence from the HMM's probability distribution. Highly conserved amino acid residues (p>=0.5) are shown in upper case in the consensus sequence, others are shown in lower case.

[0042] The expression a "polynucleotide encoding a dual condensation/ epimerization NRPS domain" refers to a polynucleotide encoding a dual condensation/epimerization NRPS domain. Representative examples of a polynucleotide encoding a dual condensation/epimerization NRPS domains of the invention include the polynucleotides encoding the dual condensation/epimerization NRPS domains residing in RAMO, i.e. SEQ ID NOS: 2, 4, 6, 8, 10, 12, 14 and 16; SYRI, i.e. SEQ ID NOS: 18, 20, 22 and 24; SYRP, i.e. SEQ ID NOS: 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58 and 60; URSO, i.e. SEQ ID NOS: 62 and 64; 023C, i.e. SEQ ID NOS: 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88 and 90; 034F, i.e. SEQ ID NOS: 92, 94, 96 and 98; 040G, i.e. SEQ ID NOS: 100 and 102); 084B, i.e. SEQ ID NOS: 104, 106, 108 and 110; 107A, i.e. SEQ ID NOS: 112 and 1 14; 143F, i.e. SEQ ID NOS: 116, 118, 120 and 122; 153A, i.e. SEQ ID NOS: 124, 126, 128 and 130; and 263B, i.e. SEQ ID NOS: 132, 134, 136 and 138.

[0043] Cosmid clones containing genes and proteins of the invention have been deposited with the International Depositary Authority of Canada, Bureau of Microbiology, Health Canada, 1015 Arlington Street, Winnipeg, Manitoba,

Canada R3E 3R2 under the terms of the Budapest Treaty on the International Recognition of the Deposit of Microorganisms for Purposes of Patent Procedure. An E. coli DH10B strain harboring cosmid clone 008CH containing a polynucleotide encoding dual condensation/epimerization NRPS domains SEQ ID NOS: 2, 4, 6 and 8 in the biosynthetic locus for ramoplanin from Actinoplanes sp. ATCC 33076 was deposited on September 19, 2001 and assigned accession number IDAC 190901-3 (Fig. 3). An E. coli DH10B strain harboring cosmid clone 008CK condensation/epimerization dual domains with SEQ ID NOS: 10, 12, 14 and 16 in the biosynthetic locus for ramoplanin from Actinoplanes sp. ATCC 33076 was deposited on September 19, 2001 and assigned accession number IDAC 190901-1 (Fig. 3). The E. coli strain deposits are referred to herein as "the deposited strains". The sequences of the nucleotides encoding the dual condensation/epimerization NRPS domains of the invention present in the deposited strains as well as the amino acid sequences of the corresponding polypeptides are controlling in the event of any conflict with any description of sequences herein. A license may be required to make, use or sell the deposited strains, nucleic acids therein or compounds derived therefrom, and no such license is hereby granted.

[0044] The expression "a condition of high stringency" refers to any one of the hybridization conditions described herein, and includes other "high stringency" conditions known in the art. In one condition, a polymer membrane containing immobilized denatured nucleic acids is first prehybridized for 30 minutes at 45 °C in a solution consisting of 0.9 M NaCl, 50 mM NaH₂PO₄, pH 7.0, 5.0 mM Na₂EDTA, 0.5% SDS, 10X Denhardt's, and 0.5 mg/ml polyriboadenylic acid. Approximately 2 x 10⁷ cpm (specific activity 4-9 x 10⁸ cpm/ug) of ³²P end- labeled oligonucleotide probe are then added to the solution. After 12-16 hours of incubation, the membrane is washed for 30 minutes at room temperature in 1X SET (150 mM NaCl, 20 mM Tris hydrochloride, pH 7.8, 1 mM Na ₂EDTA) containing 0.5% SDS, followed by a 30 minute wash in fresh 1X SET at Tm-10°C for the oligonucleotide probe, where Tm is the melting temperature of the probe. Stringency may be varied by conducting the hybridization at varying temperatures below the melting temperatures of the probes. The melting temperature of the probe may be calculated using the following formula: for oligonucleotide probes between 14 and 70 nucleotides in length, the melting temperature (Tm) in degrees Celcius may be calculated using the formula: Tm=81.5+16.6(log [Na⁺]) + 0.41 (fraction G+C)-(600/N), where N is the length of the oligonucleotide. If the hybridization is carried out in a solution containing formamide, the melting temperature may be calculated using the equation Tm=81.5+16.6(log [Na ⁺]) + 0.41 (fraction G + C)-(0.63% formamide)-(600/N), where N is the length of the probe. For probes over 200 nucleotides in length, the hybridization may be carried out at 15-25 °C below the Tm. For shorter probes, such as oligonucleotide probes, the hybridization may be conducted at 5-10 °C below the Tm. Preferably, the hybridization is conducted in 6X SSC for shorter probes and the hybridization is conducted in 50% formamide containing solutions for longer probes.

[0045] The term "homology" refers to the optimal alignment of sequences (either nucleotides or amino acids), which may be conducted by computerized implementations of algorithms. "Homology", with regard to polynucleotides, for example, may be determined by analysis with BLASTN version 2.0 using the default parameters, which aligns the polynucleotides or fragments being compared and determines the extent of nucleotide identity between them. "Homology", with respect to polypeptides (i.e., amino acids), may be determined using a program, such as BLASTP version 2.2.5 with the default parameters, which aligns the polypeptides or fragments being compared and determines the extent of amino acid identity or similarity between them. It will be appreciated that amino acid "homology" includes conservative substitutions, i.e. those that substitute a given amino acid in a polypeptide by another amino acid of similar characteristics. Typically seen as conservative substitutions are the following replacements: replacements of an aliphatic amino acid such as Ala, Val, Leu and Ile with another aliphatic amino acid; replacement of a Ser with a Thr or vice versa; replacement of an acidic residue such as Asp or Glu with another acidic residue; replacement of a residue bearing an amide group, such as Asn or Gin, with another residue bearing an amide group; exchange of a basic residue such as Lys or Arg with another basic residue; and replacement of an aromatic residue such as Phe or Tyr with another aromatic residue. A "homology of 70% or higher" includes a homology of, for example, 70%, 75%, 80%, 85%, 90%, 95%, and up to 100% (identical) between two or more nucleotide or amino acid sequences. A "homology of at least 45%" includes a homology of, for example, 45%, 50%, 60%, 70%, 80%, 90%, and up to 100% (identical) between two or more nucleotide or amino acid sequences.

[0046] We observed the surprising existence of NRPS systems lacking cononical epimerization domains but nonetheless appearing to direct the incorporation of D-amino acids (and occasionally non-chiral amino acids) into their resulting peptide products. We further noted several conserved motifs shared between C and E domains and investigated our hypothesis that E-less NRPSs systems may contain specialized dual function C domains that can epimerize the amino acid activated by the upstream module. We identified unusual C-domains downstream of the T domain carrying the amino acid destined to be epimerized. This position is where one would expect to find a canonical E domain. These unusual C-domains positioned downstream of modules that incorporate D-amino acids (and occasionally non-chiral amino acids) appeared to have dual condensation and epimerization acitivity and hence we refer to them as "dual condensation/epimerization NRPS domains". These dual condensation/epimerization NRPS domains in E-less NRPSs systems allow for accurate predictions of the stereochemistry of the peptide product.

[0047] Condensation domains from the E-less peptide biosynthetic clusters RAMO, SYRI, SYRP, URSO, 023C, 034F, 034G, 084B, 107A, 143F, 153A, and 263B were identified and analyzed. Figure 1 lists the NRPS synthetase complexes from which condensation domains were compared. NRPS from which condensation domains were analyzed were as follows: BACI open reading frames (ORFs) with GenBank accession numbers AAC06346 to AAC06348; SURF ORFs with GenBank accession numbers CAB12142, CAB12143 and CAB12145; from PLIP ORFs with GenBank accession numbers CAB13713 to CAB13717; from CADA ORFs with GenBank accession numbers CAB38517, CAB38518 and CAD55498; from RAMO ORFs with Ecopia accession numbers 008CHP_09, 008CHP_10 and 008CHP_11 (ORFs 12, 13 and 14, respectively, as defined in US application USSN 09/976,059); from SYRI ORFs with GenBank accession numbers AAC80285 and AAA85160; from SYRP ORFs with GenBank accession numbers AAF99707, AAO72424 and AAO72425; from URSO ORFs with GenBank accession numbers CAD17792 and CAD17793; 023C ORFs with Ecopia accession numbers 023CSPF09, 023CSPF10, 023CSP_11 , 023CCP_01, 023CCP_06, 023CYP_01 and 023CYP_02; 034F ORFs with Ecopia accession numbers 034CMP_76 to _78; 040G ORFs with Ecopia accession numbers 040RP_21 and 040CRPN20; 084B ORFs with Ecopia accession numbers 084CBP_46 to 48; from 107A ORFs with Ecopia accession numbers 107CAP08, 107CAPC02, 107CAP 2 and 107CAPN10; 143F ORFs with Ecopia accession numbers 143KKP_39 and 143KK_40; 153A ORFs with Ecopia accession numbers 153CAP_08, 153CAP_13, 153CAP_12 and 153CAP 10 and 263B" ORFs with Ecopia accession numbers 263CRP_46, 263CRPN44 and 263CRPN18.

[0048] The amino acid sequences of the condensation domains from the NRPS systems listed in Figure 1 were subjected to multiple sequence alignment using ClustalW ™ (Thompson et al. (1994), Nucl. Acids Res., 22, 4673-4680). In Figure 2, NRPS C domain sequences obtained from the GenBank database are denoted by accessions beginning with three letters and followed by digits (usually numbering 5). These first eight characters correspond to the GenBank accession number, followed by a lower case "n" denoting an NRPS domain, followed by the letters "CD" and two digits denoting "C domain" and its number relative to the other C domains contained on that polypeptide sequence. Thus, "AAC80285nCD02|SYRI" represents the amino acid sequence corresponding to the second C domain contained on the GenBank entry AAC80285 for an NRPS from the syringomycin biosynthetic locus. NRPS C domain sequences having Ecopia accession numbers follow the same nomenclature (nCDOO) but are characterized by a root of nine- character accessions beginning with three numbers.

[0049] The condensation domains shown in Figure 2 are divided in four classes each identified by circles. Empty circles depict condensation domains in the first module of lipopeptide NRPS systems which condensation domains are involved in the N-acyl capping mechanism described in co-pending application USSN 10/329,027. The teachings of USSN 10/329,027 are incorporated herein by reference. Hatched circles depict condensation domains that follow modules that incorporate L- amino acids. C domains that cluster above the N-acyl capping C domains in Figure 2 (empty circles) carry out condensation reaction between a D- form amino acid (or occasionally a non-chiral amino acid) activated by the upstream module and the amino acid activated by the cognate module. These "D-specific" C domains can be divided into two categories; one category of "D-specific" C domains (black circles) includes those that follow C-A-T-E modules and the other category of "D-specific" C domains (shaded circles) includes specialized C domains that follow C-A-T modules that incorporate D- amino acids (or non-chiral amino acids). Phylogenetic analysis of the amino acid sequences of these specialized C domains suggests, at the very least, that they condense the D- form of the upstream amino acid to the cognate amino acid as they surprisingly cluster together with C domains that follow C-A-T-E modules that incorporate D- amino acids in E-containing NRPS systems (Fig. 2). Therefore epimerization of amino acid residues by E-less NRPSs is likely to occur post activation but prior to condensation. These specialized C domains from E- less NRPSs, referred to herein as dual condensation/epimerization NRPS domains direct the epimerization of the amino acid activated by the upstream module.

[0050] Without being limited to any particular mechanism, the dual condensation/epimerization activity may involve the recruitment of a cellular enzyme that provides the epimerization activity in trans akin to the transacting adenylation domain in the syringomycin and ramoplanin NRPS system. In this scenario, one would predict that a highly purified syringomycin or ramoplanin NRPS complex would be incapable of generating products with D- amino acids without addition of a cellular extract containing the frans-acting epimerization activity. Alternatively, the dual condensation/epimerization NRPS domains may directly catalyze the epimerization of the amino acid activated by the upstream module prior to its condensation with the amino acid activated by the cognate module. In this scenario, the dual condensation/epimerization NRPS domains would have the inherent ability to carry out both a condensation reaction as well as an epimerization reaction. Such dual function has been found in other domains, for example, "cyclization" domains (also sometimes referred to as heterocyclization domains) which are capable of carrying out both a condensation reaction and a cyclization reaction (Doekel and Marahiel, supra).

[0051] The dual condensation/epimerization NRPS domains of the invention are expected to be found in a variety of E-less NRPS systems producing peptide products containing D- or non-chiral amino acids. NRPS systems may be found in a variety of microorganisms. Preferred microorganisms expected to contain the dual condensation/epimerization NRPS domains include but are not limited to bacteria of the order Actinomycetales, also referred to as actinomycetes. Preferred genera of Actinomycetes include Nocardia, Geodermatophilus, Actinoplanes, Micromonospora, Nocardioides, Saccharothrix, Amycolatopsis, Kutzneria, Saccharomonospora, Saccharopolyspora, Kitasatospora, Streptomyces, Microbispora, Streptosporangium, Actinomadura. The taxonomy of actinomycetes is complex and reference is made to Goodfellow (1989) Suprageneric classification of actinomycetes, Bergey's Manual of Systematic Bacteriology, Vol. 4, Williams and Wilkins, Baltimore, pp 2322-2339, and to Embley and Stackebrandt, (1994). The molecular phylogeny and systematics of the actinomycetes, Annu. Rev. Microbiol.. 48, 257-289, for genera that may also contain dual condensation/epimerization NRPS domains of the present invention. One skilled in the art would understand that a NRSP complex that lacks canonical epimerization domains but encodes a peptide containing amino acid residue(s) in the D-Stereochemistry would likely be a candidate to contain dual condensation/epimerization NRPS domains of the present invention.

[0052] Many of the E-less NRPSs with dual condensation/epimerization NRPS domains produce lipopeptides and many of these E-less NRPSs include a trans-acting A domain. They have been identified in very diverse microbes including three genera belonging to the high GC gram positive bacteria commonly known as actinomycetes, namely Actinoplanes, Actinomadura, Micromonospora and Streptomyces, as well as the gram negative bacterium Pseudomonas syringae pv. syringae and the proteobacterium Ralstonia solanacearum. E-less NRPSs with the dual epimerization/condensation NRPS domains are expected to be widespread in nature.

[0053] The dual condensation/epimerization NRPS domains of SEQ ID NOS: 1, 3, 5, 7, 9, 11 , 13, 15, 17, 19, 21 , 23, 25, 27, 29, 31 , 33, 35, 37, 39, 41 , 43, 45, 47, 49, 51 , 53, 55, 57, 59, 61 , 63, 65, 67, 69, 71, 73, 75, 77, 79, 81 , 83, 85, 87, 89, 91 , 93, 95, 97, 99, 101 , 103, 105, 107, 109, 111 , 113, 115, 117, 119, 121, 123, 125, 127, 129, 131 , 133, 135, 137 and 139 were compared using the BLASTP algorithm with the default parameters to the sequences of the National Center for Biotechnology Information (NCBI) nonredundant protein database and to sequences of the DECIPHER® database of microbial genes, pathways and natural products (Ecopia BioSciences Inc., St-Laurent, Canada). The accession numbers of the top GenBank hits of this BLAST analysis are presented in Table 1 along with the corresponding E values. The E value assists in the determination of whether two sequences display sufficient similarity to justify an inference of homology. The E value relates the expected number of chance alignments with an alignment score at least equal to the observed alignment score. An E value of 0.00 indicates a perfect homolog. The E-values are calculated as described in Altschul et al. (1990), J. Mol. Biol. 215, 403-410; Gish et al. (1993), Nature Genetics 3, 266-272.

r

-

r o

r

G

C r

C

e

C

C O

C

[0054] The availability of such specialized C domains will prove to be valuable genetic engineering tools for combinatorial construction of hybrid NRPS modules. These hybrid NRPS modules may then be used to modify the stereochemistry of a synthesized peptide compound. For example, a regular condensation domain of a given NRPS system may be replaced with a dual condensation/epimerization NRPS domain selected so as to alter the stereochemistry of polypeptides at specific amino acid sites.

[0055] Recombinant NRPS systems may be employed either in vivo, using an appropriate recombinant host, or in vitro, using purified enzymes supplemented with the appropriate substrates.

[0056] The use of dual condensation/epimerization NRPS domains for modifying the stereochemistry of peptide compounds presents major advantages over insertion of canonical epimerization domains in specific NRPS locations. Insertion of foreign domains in the NRPS complex may interfere with the activities of existing domains. However, use of dual condensation/epimerization NRPS domains instead of epimerization domains to specifically alter the stereochemistry of peptide compounds is expected to alleviate this problem as both regular condensation domains and dual condensation/epimerization NRPS domains are structurally similar.

EXAMPLE 1 : Presence of dual condensation/epimerization domains in the ramoplanin non-ribosomal peptide synthetase (NRPS) biosynthetic system

[0057] Ramoplanin is a lipopeptide produced by Actinoplanes sp. ATCC 33076 (see US Patent No. 4,303,646). Ramoplanin is a glycosylated lipodepsipeptide of known structure (see, for example, US Patent No. 4,427,656). The full-length biosynthetic locus for ramoplanin from Actinoplanes sp., referred to herein as RAMO, was cloned and sequenced using the genome scanning method as described by Zazopoulos et al. (2003), Nature Biotechnol, 21 , 187-190. The open reading frames in RAMO were identified and a function was attributed to each protein encoded by the open reading frames. RAMO is described in detail in co-pending US application USSN 09/976,059 and in PCT international application PCT/CA01/01462, published as WO 02/31155, both of which are incorporated herein by reference.

[0058] Ramoplanin is composed of 17 amino acid residues out of which 8 amino acid residues are D-enantiomers (Ciabatti et a/.(1989), J. Antibiotics, 42, 254-267). Analysis of the RAMO locus revealed the presence of an NRPS system composed of 4 ORFs specifying a total of 16 modules involved in amino acid activation and condensation resulting in the synthesis of the ramoplanin peptide backbone (Figs 1c and 4a). Iterative use of ORF 008CHP_09 insures N-acylation and condensation of the first two amino acid residues, namely L-asparagine (L-Asn) and L-hydroxyasparagine (L- (OH)Asn). ORF 008CKP_04, composed of an adenylation domain fused to a thiolation domain, activates L-alloThreonine (L-aThr) and interacts in trans with the sixth module of 088CHP_04. All NRPS modules are exclusively composed of a condensation-adenylation-thiolation tripartite module (C-A-T) that represents the minimal domain organization found in NRPS systems.

[0059] Determination of the stereochemistry of all amino acid residues found in the ramoplanin molecule implies the presence of epimerization domains (E- domains) in modules 1 , 2, 3 and 5 of ORF 008CHP_10 as well as in modules 1 , 3 and 7 of ORF 008CHP_11 that incorporate amino acid residues having a D-stereochemistry in the ramoplanin compound. However, the obvious lack of E-domains in the RAMO NRPS system suggests that alternative mechanisms for the generation and incorporation of D-amino acids may occur in the biosynthesis of ramoplanin. Sequence similarities between condensation (C- domains) and epimerization domains (Marahiel et al. (1997), Chem. Rev., 97, 2651-2673) prompted a detailed comparative analysis of all condensation domains present in the ramoplanin NRPS system.

[0060] The RAMO C-domains were compared to a collection of condensation domains derived from various peptide NRPSs obtained from GenBank or disclosed herein. Figure 2 shows the evolutionary relatedness of all RAMO C- domains that clearly indicates the presence of three distinct classes of condensation domains. The first class comprises the unique acyl-specific C- domain that is found in ORF 008CHP_09 (Fig. 2, empty circle). This domain catalyzes the condensation of an acyl unit to the first amino acid incorporated in ramoplanin (L-Asn) as well as the condensation of the second amino acid residue, L-(OH)-Asn, found in the ramoplanin molecule. This type of condensation domain clusters with related domains found in lipopeptide- specifying NRPS systems and is described in detail in co-pending US application USSN 10/329,027. A second class of C-domains is defined by domains that follow modules incorporating L-amino acids (Fig. 2, hatched circles). Condensation domains found in modules 1 , 5 and 7 of ORF 008CHP_10 as well as in modules 1 , 3, 5 and 7 of ORF 008CHP_1 1 belong to this class.

[0061] Surprisingly, a third class of C-domains composed of domains that follow modules incorporating D- or non-chiral amino acids is also present in the ramoplanin NRPS complex (Fig. 2, gray circles). These domains are found in modules 2, 3, 4 and 6 of ORF 008CHP 0 (SEQ ID NOS: 01 , 03, 05 and 07, respectively) as well as in modules 2, 4, 6 and 8 of ORF 008CHP_11 (SEQ ID NOS: 09, 1 1 , 13 and 15, respectively). The relative location of these domains following thiolation domains of modules that incorporate D-amino acids is reminiscent of that of epimerization domains. Clustering of these C- domains further supports the idea that they may have a dual function and catalyze both epimerization and condensation reactions. Indeed, these dual condensation/epimerization NRPS domains would be involved in the epimerization reaction, from L- to D-stereochemistry, of the amino acid activated by the preceding module and the condensation of this amino acid to the one specified by the cognate module. The location of all dual condensation/epimerization NRPS domains in the ramoplanin NRPS system concords in all cases with the expected location of D- or non-chiral amino acids in the ramoplanin molecule (Fig. 1c).

EXAMPLE 2: Presence of dual condensation/epimerization NRPS domains in the syringomycin NRPS system

[0062] Syringomycin, produced by phytopathogenic strains of Pseudomonas syringae pv. syringae, is a cyclic lipodepsipeptide that exhibits phytotoxic activity and a wide spectrum of antimicrobial and antifungal properties (Bender et al. (1999), Microbiol. Mol. Biol. Rev., 63, 266-292). The syringomycin non-ribosomal peptide synthetase biosynthetic gene cluster of P. syringae pv. syringae strain B301D was sequenced and characterized (Guenzi et al. (1998), J. Biol. Chem., 273, 32857-32683). This led to the unexpected finding that the syringomycin NRPS system did not contain any epimerization domains. Syringomycin contains two D-amino acids (D-Ser in position 2 and D-2,4-diaminobutyric acid in position 3) and, thus, the authors were expecting to find epimerization domains in the corresponding modules 2 and 3 of the syringomycin NRPS complex. These same authors ruled out the possibility that the adenylation domain of module 2 might specifically recognize and activate D-Ser. Consequently, the means by which D-amino acids were incorporated into syringomycin remained a mystery.

[0063] Analysis of C-domains present in the syringomycin NRPS system confirms that 4 out 9 domains belong to the class of dual condensation/epimerization NRPS domains (Figs 1c and 2). The relative location of these dual condensation/epimerization NRPS domains is in agreement with the stereochemistry of the amino acid that is activated by the preceding module. For example, dual condensation/epimerization NRPS domain of SEQ ID NO: 19 in module 3 of ORF AAC80285 is located downstream of module 2 that incorporates D-Serine (D-Ser); dual condensation/epimerization NRPS domain of SEQ ID NO: 21 in module 4 of ORF AAC80285 is located downstream of module 3 that incorporates D-2,4- diaminobutyric acid (D_DAB); dual condensation/epimerization NRPS domain of SEQ ID NO: 23 in module 8 of ORF AAC80285 is located downstream of module 7 that incorporates non-chiral 2,3-dehydroaminobutyric acid (DHAB).

[0064] It is noteworthy that the C domain from module 2 of the syringomycin synthetase clusters among the dual condensation/epimerization NRPS domains, predicting that the amino acid incorporated by the upstream module is in the D- configuration. This is inconsistent with the reported stereochemistry of the amino acid in position 1 of syringomycin, L-Ser (Fukuchi et al. (1992), J. Chem. Soc. Perkin, 1, 1149-1157). One possible explanation for this discrepancy is that the assessment of the stereochemistry for the amino acid in position 1 of syringomycin is incorrect. Alternatively, this discrepancy may have arisen from the fact that different P. syringae pv. syringae strains were used for structure determination (strain SC1) and for sequencing of the biosynthetic cluster (strain B301D). As the structure of the syringomycin produced by strain B301 D has not been confirmed to contain an L-Ser in position 1 , it is possible that this strain produces a syringomycin that is distinct from that produced by strain SC1. Alternatively, a free-standing racemase that is able to catalyze epimerization of amino acids could be involved in the conversion of D-Ser to the L- enantiomer. This mechanism occurs in Microcystis aeruginosa, where the racemase McyF is involved in the conversion of L- to D-Glutamate in microcystin biosynthesis (Nishizawa et al. (2001 ), Microbiology, 147, 1235-1241). Yet another possibility is that, although the C domain in module 2 of the syringomycin synthetase displays significant overall homology to the specialized dual condensation/epimerization NRPS domains of E-less NRPSs, it may have acquired fine mutations that inactivate the epimerization function. If this is true, it may be possible to delineate the residues or motifs that confer upon these specialized C domains the ability to carry out a dual function.

EXAMPLE 3: Presence of condensation/epimerization dual domains in the syringopeptin NRPS system

[0065] In silico sequence comparison of dual condensation/epimerization NRPS domains specified by the ramoplanin NRPS system with protein sequences contained in the GenBank protein database revealed sequence similarities between these domains and condensation domains from the syringopeptin NRPS system. Syringopeptins are lipodepsipeptide phytotoxins and are produced by strains of P.syringae pv. Syringae. The syringopeptin cluster present in strain B301 D has been sequenced and encodes 22 NRPS modules involved in syringopeptin peptide synthesis (Scholz-Schroeder et al. (2003), Mol. Plant Microbe Interact. , 16, 271 -280).

[0066] Analysis of a clustal sequence alignment of C-domains from the syringopeptin NRPS cluster shows that all C-domains, with the exception of C-domain in module 1 of ORF AAF99707, and C-domains in modules 10, 11 and 12 in ORF AAO72424, belong to the class of dual condensation/epimerization NRPS domains (Figs 1d and 2). This observation leads to the prediction that amino acid residues 1 to 18 are in the D-specific configuration whereas amino acid residues 19 to 22 are in the L-form (Fig. 2). Experimental analysis of the stereochemistry of the amino acid components of syringopeptin 22-A confirms that 17 out of 22 amino acids are in the D- configuration (Ballio et al. (1995), Eur. J. Biochem., 234, 747-758). Examination of the relative position of C/E dual domains confirms the prediction of stereochemistry for the syringopeptin amino acid components as all NRPS modules preceding a dual condensation/epimerization NRPS domain incorporate D-amino acids with the exception of three modules: module 4 of ORF AAF99707 incorporates L-Valine (L-Val) instead of the predicted D-Val residue; module 2 of ORF AAO72424 incorporates L-Alanine (L-Ala) instead of the predicted D-Ala amino acid component and module 11 of ORF AAO72424 incorporates D-2,4-diaminobutyric acid (D-DAB) instead of the predicted L-form of the amino acid (Fig. 1d). Observed discrepancies may be due to incorrect assignments of stereochemistry and/or possible presence of mutations in the dual condensation/epimerization NRPS domains inactivating the epimerization function of the domains. Alternatively, freestanding racemases could modify the stereochemistry of specific amino acids and thus contribute to the observed differences.

EXAMPLE 4: Detection of dual condensation/epimerization NRPS domains NRPS systems present in diverse microorganisms

[0067] In silico sequence analysis of dual condensation/epimerization NRPS domains specified by the ramoplanin NRPS system with protein sequences contained in the GenBank database revealed sequence similarities between these domains and C-domains from an unknown NRPS gene cluster present in Ralstonia solanacearum strain GMI1000 (Salanoubat et al. (2002), Nature,

415, 497-502) referred to herein as URSO (Fig. 1d). Clustal sequence alignment of C-domains specified by the URSO gene cluster indicates the presence of two C/E dual domains in modules 1 and 3 of NRPS ORF

CAD17793 (Fig. 2). Based on the relative location of the two dual condensation/epimerization NRPS domains, the peptide encoded by URSO is predicted to contain D-amino acid components (D-aa4 and D-aa6) incorporated by NRPS modules 4 and 6 (Fig. 1d).

[0068] Using the same approach, the proprietary Ecopia BioSciences DECIPHER® database that is populated with a variety of gene clusters involved in microbial secondary metabolism was assessed for the presence of NRPS gene clusters containing dual condensation/epimerization NRPS domains. In this way, several NRPS clusters were detected from a great variety of microorganisms. Representative examples include the following gene clusters.

[0069] Unreported NRPS cluster found in Streptomyces aizunensis strain NRRL B1 1277: Locus 023C is predicted to encode a lipopeptide as it is composed of a NRPS multienzymatic system starting with an N-acyl-specific condensation domain that was previously shown to condense acyl groups to the amino group of amino acids (USSN 10/329,027; Fig. 2). This gene cluster contains 28 NRPS modules all composed of minimal tripartite modules (C-A- T). The sequence of the first component of the 023C NRPS complex is broken into two portions, an N-terminal fragment (023CSPF09) and a C-terminal fragment (023CSPF10) due to an apparent frameshift in the region corresponding to the first A domain. ORF 023CYP_1 1 corresponds to a freestanding adenylation domain that interacts in trans with module 21 located in ORF 023CYP_01 of the NRPS system (Fig. 1e). Clustal sequence alignment analysis of C-domains indicates the presence of 14 C/E dual domains that are predicted to epimerize and condense amino acid components D-aa1 , D-aa2, D-aa3, D-aa6, D-aa7, D-aa8, D-aa10, D-aa11 , D-aa15, D-aa21 , D-aa22, D- aa24, D-aa25 and D-aa26 (Figs 1 e and 2).

[0070] Unknown NRPS cluster found in Streptomyces griseofuscus strain NRRL B-5429: Locus 034F is predicted to encode a lipopeptide as it is composed of a NRPS multienzymatic system starting with an N-acyl-specific condensation domain that condenses acyl groups to the amino group of amino acids (Figs 1f and 2)). This gene cluster contains 10 NRPS modules composed of minimal tripartite modules (C-A-T) and one module present in ORF 034CMP_78 that contains an epimerization domain (Fig. 1f). Clustal sequence alignment analysis of C-domains indicates the presence of four dual condensation/epimerization NRPS domains that are predicted to epimerize and condense amino acid components D-aa5, D-aa6, D-aa8 and D- aa10 and the presence of a C-domain that belongs to the class of C-domains found downstream of epimerization domains (Figs 1f and 2).

f007π Unknown NRPS cluster found in Kitasatosporia sp. strain ECO-03: 040G is a partial gene cluster contained in an actinomycetes strain present in the Ecopia culture collection and predicted to encode a lipopeptide as judged by the presence of an acyl-specific condensation domain in the starter module of the NRPS system (Figs 1f and 2). The incomplete gene cluster contains at least nine NRPS modules all composed of minimal tripartite modules (C-A-T). Clustal sequence alignment analysis of C-domains indicates the presence of two dual condensation/epimerization NRPS domains that are predicted to epimerize and condense amino acid components D-aa2 and D-aa6 (Figs 1f and 2).

[0072] Unknown NRPS cluster found in Streptomyces sp. strain ECO-38: 084B is a gene cluster contained in an actinomycetes strain present in the Ecopia culture collection and predicted to encode a lipopeptide as judged by the presence of an acyl-specific condensation domain in the starter module of the NRPS system (Figs 1g and 2). This gene cluster contains nine NRPS modules all composed of minimal tripartite modules (C-A-T). The sequence of the first component of the 084B NRPS complex is broken into two portions, an N-terminal fragment (084CBP_46) and a C-terminal fragment (084CBP_47) due to an apparent frameshift in the region corresponding to the second C domain (Fig. 1g). Clustal sequence alignment analysis of C- domains indicates the presence of four dual condensation/epimerization NRPS domains that are predicted to epimerize and condense amino acid components D-aa3, D-aa4, D-aa5 and D-aa6 (Figs 1g and 2).

[0073] Unknown NRPS cluster found in Streptomyces sp. strain ECO-59: 107A is a partial gene cluster contained in an actinomycetes strain present in the Ecopia culture collection and predicted to encode a lipopeptide as judged by the presence of an acyl-specific condensation domain in the starter module of the NRPS system (Figs 1g and 2). The incomplete gene cluster contains so far 5 NRPS modules all composed of minimal tripartite modules (C-A-T). The sequence of the first component of the 107A NRPS complex is broken into three portions: an N-terminal fragment (107CAP_08), a middle fragment (107CAPC02) and a C-terminal fragment (107CAP_12) due to two small sequencing gaps of approximately 100 basepairs or less in the region corresponding to the C-A junctions in modules 1 and 2 (Fig. 1g). Clustal sequence alignment analysis of C-domains indicates the presence of two dual condensation/epimerization NRPS domains that are predicted to epimerize and condense amino acid components D-aa1 and D-aa2 (Figs 1g and 2).

[0074] Unknown NRPS cluster found in Streptomyces viridifaciens strain NRRL ISP-5239: Locus 143F is predicted to encode a lipopeptide as it is composed of a NRPS multienzymatic system starting with an N-acyl-specific condensation domain that condenses acyl groups to the amino group of amino acids (Figs 1 h and 2). This gene cluster contains ten NRPS modules all composed of minimal tripartite modules (C-A-T) (Fig. 1 h). Clustal sequence alignment analysis of all C-domains indicates the presence of four dual condensation/epimerization NRPS domains that are predicted to epimerize and condense amino acid components D-aa1 , D-aa5, D-aa8 and D-aa9 (Figs 1h and 2).

[0075] Unknown NRPS cluster found in Actinomadura sp. strain ATCC 39334: Locus 153A is predicted to encode a lipopeptide as it is composed of a NRPS multienzymatic system starting with an N-acyl-specific condensation domain that was previously shown to condense acyl groups to the amino group of amino acids (USSN 10/329,027; Figs 1 h and 2). This gene cluster contains sixteen NRPS modules all composed of minimal tripartite modules (C-A-T) (Fig. 1 h). Clustal sequence alignment analysis of C-domains indicates the presence of four dual condensation/epimerization NRPS domains that are predicted to epimerize and condense amino acid components D-aa4, D-aa12, D-aa13 and D-aa14 (Figs 1h and 2).

[0076] Ramoplanin-like NRPS cluster found in Micromonospora chersina strain ATCC 53710: Locus 263B is a partially characterized NRPS locus that is predicted to encode a ramoplanin-like lipopeptide. Analysis of the specificities of the adenylation domains (Challis and Ravel, supra) present in the NRPS complex indicates that the nature and relative order of the amino acid components specified by 263B are identical to those of ramoplanin (fig. 1i). The incomplete gene cluster contains so far five NRPS modules all composed of minimal tripartite modules (C-A-T) and a partial module comprising a C-domain. The sequence of the second component of the 263B NRPS complex is broken into two portions, an N-terminal fragment (263CRPN44) and a C-terminal fragment (263CRPN18) due to a small sequencing gap of approximately 100 basepairs or less in the region corresponding to the C-A junctions in modules 1 and 2 (Fig. 1 i). Clustal sequence alignment analysis of C-domains indicates the presence of four dual condensation/epimerization NRPS domains that are predicted to epimerize and condense amino acid components D-Asparagine (D-Asn), D- Hydroxyphenylglycine (D-HPG), D-Ornithine (D-Orn) and D-Threonine (D-Thr) (Figs 1i and 2). Even though the respective order and nature of the amino acid components of 263B are predicted to be identical to those determined in ramoplanin, the stereochemistry of the compound is predicted to be different as module 1 of the 263B NRPS complex that activates asparagine precedes a dual condensation/epimerization NRPS domain that will epimerize and condense this amino acid residue to its cognate amino acid residue, L- Hydroxyasparagine (Fig. 1 i). EXAMPLE 5: Exchange of a dual condensation/epimerization NRPS domain and a regular condensation domain in the ramoplanin NRPS system alters the strereochemistrv of ramoplanin at two different amino acid components

[0077] The availability of dual condensation/epimerization NRPS domains increases the potential of redesigning (un)natural products by engineered peptide synthetases. Functional hybride peptide synthetases may be engineered to producing rationally designed peptide products using known molecular biological techniques (see, for example, Mootz et. al. (2000), Proc. Natl. Acad. Sci. USA., 97, 5848-5853). Domain swapping, change-of- substrate specificity by mutagenesis, and induced termination to achieve release of a defined shortened product are used to generate a recombinant NRPS system producing antipain, a potent cathepsin inhibitor produced by Streptomyces roseus and whose biosynthetic machinery is unknown (Doekel and Marahiel (2001 ), Metab. Eng., 3, 64-77). Mootz et al, supra, describes genetic engineering using an NRPS system to produce a peptide product that is not a naturally occurring product, and Doekel and Marahiel supra describes engineering an NRPS system to make the known natural product antipain.

[0078] The NRPS biosynthetic machinery of peptide natural product ramoplanin can be modified so as to produce modified versions of ramoplanin having altered stereochemistries.

[0079] Domain organization of the NRPS complex involved in ramoplanin biosynthesis is described in Example 1. Eight dual condensation/epimerization NRPS domains catalyzing the condensation of D- amino acid residues and eight regular C-domains that condense L- amino acid residues are present (Fig. 4a, panel a). Through domain swapping, a regular C-domain is replaced by a dual condensation/epimerization NRPS domain. Preferably, the dual condensation/epimerization NRPS domain located in module 6 of ORF 008CHP_10 involved in epimerization and condensation of L-hydroxyphenylglycine (L-HPG) to L-threonine (L-Thr) is used to replace the C-domain in module 3 of ORF 008CHP_11 as the former domain is naturally specific for condensing L-HPG and L-Thr. The modified ramoplanin locus contains a dual condensation/epimerization NRPS domain in module 3 of ORF 008CHP_11 resulting in a change of stereochemistry of the preceding amino acid residue from L- to D-HPG (Fig. 4a, panel b). Additional changes in the stereochemistry of the modified compound may be accomplished by further exchanging additional C-domains and dual condensation/epimerization NRPS domains from various NRPS systems.

[0080] The recombinant NRPS system depicted in Figure 4a is employed either in vivo, using an appropriate recombinant host, or in vitro, using purified enzymes supplemented with the appropriate substrates. Stereochemically modified ramoplanin analogues are generated in vivo using of Actinoplanes sp. ATCC 33076, the ramoplanin producer, as the host strain. The C-domain is physically replaced by double recombination (Kieser et, al. (2000), Practical Streptomyces Genetics (The John Innes Foundation, Norwich, UK)). Stereochemically modified ramoplanin analogues are generated in vitro using over-expression of the recombinant 008CHP_10 and 008CHP_11 polypeptides in an appropriate host, for example E. coli, followed by the preparation of an extract or purified fraction thereof and use of said preparation together with appropriate substrates as outlined in Mootz et al. supra. In the absence of accessory proteins the product produced by this in vitro system is not expected to contain modifications such as N-acylation or glycosylation present in the natural ramoplanin molecule.

[0081] In this example, the use of dual condensation/epimerization NRPS domains for modifying the stereochemistry of peptide compounds presents major advantages over insertion of canonical epimerization domains in specific NRPS locations. Insertion of foreign domains in the NRPS complex may interfere with the activities of existing domains. For example, Schauwecker et al. (2000), Chem. Biol., 7, 287-297 demonstrated that replacement of an adenylation domain by an adenylation-methylation domain that is twice as long as the replaced domain, interferes with the activity of the cognate epimerization domain present in the specific module. However, use of dual condensation/epimerization NRPS domains instead of epimerization domains to specifically alter the stereochemistry of a peptide compound is expected to alleviate this problem as both regular condensation domains and dual condensation/epimerization NRPS domains are structurally similar.

EXAMPLE 6: Dual condensation/epimerization NRPS domain from the ramoplanin NRPS system is used to modify the stereochemistry of the complestatin molecule at a specific amino acid component

[0082] The NRPS biosynthetic machinery of peptide natural product complestatin can be modified so as to produce modified versions of complestatin having altered stereochemistries, as shown diagrammatically in Figure 5. Complestatin, a member of the vancomycin group of natural products produced by Streptomyces lavendulae, consists of an alpha-ketoacyl hexapeptide backbone modified by oxidative phenolic couplings and halogenations. The entire complestatin biosynthetic and regulatory gene cluster spanning ca. 50 kb was cloned and sequenced (Chiu et al. (2001 ), Proc. Natl. Acad. Sci. USA, 98, 8548-8553). It includes four NRPS genes, comA, comB, comC, and comD (Fig. 4b, panel a). The comA gene encodes an NRPS that is composed of a loading module (A-T) and a complete module (C-A-T-E) containing an epimerization domain. Module 2 catalyzes conversion of L- to D-Tryptophan (Trp) and condensation to the preceding amino acid, L-hydroxyphenylglycine (L-HPG). Complestatin is almost entirely composed of D-amino acids with the exception of the first L-HPG residue. The stereochemistry of complestatin is altered by domain swapping between the C-domain of ComA and a dual condensation/epimerization NRPS domain. The dual condensation/epimerization NRPS domain located in module 6 of ORF 008CHP_10 involved in epimerization of L-hydroxyphenylglycine (L- HPG) and condensation of this amino acid to L-threonine (L-Thr) (Fig 4a, panel a) is used to replace the C-domain of ComA, as the former domain is naturally specific for condensing L-HPG found at this position in the complestatin structure. This modification results in a complestatin derivative having an amino terminal amino acid in the D- configuration (D-HPG) (Fig. 4b, panel b). Additional changes in the stereochemistry of the modified compound may be accomplished by further exchanging additional C-domains and dual condensation/epimerization NRPS domains from various NRPS systems.

[0083] The recombinant NRPS system depicted in Figure 4b is employed either in vivo, using an appropriate recombinant host or in vitro using purified enzymes supplemented with the appropriate substrates. Stereochemically modified complestatin analogues are generated in vivo using Streptomyces lavendulae, the complestatin producer, as the host strain. This is accomplished by physically replacing C-domains by way of double recombination (Keiser et al, supra). Stereochemically modified complestatin analogues are generated in vitro by over-expression of the recombinant ComA, ComB, ComC and ComD polypeptides in an appropriate host, for example E. coli, followed by the preparation of an extract or purified fraction thereof and use of said preparation together with appropriate substrates as outlined in Mootz et al. supra. In the absence of accessory proteins the product produced by this in vitro system are not expected to contain modifications such as the cross-linking of residues that is catalyzed by specific complestatin cytochrome P450 enzymes.

[0084] In this example, the use of dual condensation/epimerization NRPS domains for modifying the stereochemistry of peptide compounds presents major advantages over insertion of canonical epimerization domains in specific NRPS locations. Insertion of foreign domains in the NRPS complex may interfere with the activities of existing domains. For example, Schauwecker et al. (2000), Chem. Biol, 7, 287-297 demonstrated that replacement of an adenylation domain by an adenylation-methylation domain that is twice as long as the replaced domain, interferes with the activity of the cognate epimerization domain present in the specific module. However, use of dual condensation/epimerization NRPS domains instead of epimerization domains to specifically alter the stereochemistry of a peptide compound is expected to alleviate this problem as both regular condensation domains and dual condensation/epimerization NRPS domains are structurally similar.

[0085] Although the present invention has been described hereinabove by way of preferred embodiments thereof, it can be modified without departing from the spirit, scope and nature of the subject invention, as defined in the appended claims. All of the published patent applications and issued patents mentioned in this application are hereby incorporated by reference.

Claims

WHAT IS CLAIMED IS:

1. An isolated polynucleotide encoding a dual condensation/epimerization NRPS domain, wherein said polynucleotide encodes a polypeptide having at least 45% sequence identity to SEQ ID NO: 139.

2. An isolated polynucleotide as defined in claim 1 comprising a sequence selected from the group consisting of:

(a) a sequence selected from the group consisting of SEQ ID NOS: 2, 4, 6, 8,10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36,

38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100, 102, 104, 106, 108, 110, 1 12, 1 14, 116, 118, 120, 122, 124, 126, 128, 130, 132, 134, 136 and 138; (b) a sequence that is complementary to (a);

(c) a sequence which hybridizes to said sequence of (a) or (b) under conditions of high stringency; and

(d) a sequence which has at least 60% or higher identity to said sequence of (a), (b), or (c) as measured with BLASTN version 2.0 using the default parameters.

3. An isolated polynucleotide as defined in claim 2 comprising a sequence selected from the group consisting of:

(a) a sequence selected from the group consisting of SEQ ID NOS: 2, 4, 6, 8,10, 12, 14, 16, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84,

86, 88, 90, 92, 94, 96, 98, 100, 102, 104, 106, 108, 110, 112, 114, 116, 1 18, 120, 122, 124, 126, 128, 130, 132, 134, 136 and 138;

(b) a sequence that is complementary to (a); (c) a sequence which hybridizes to said sequence of (a) or (b) under conditions of high stringency; and (d) a sequence which has at least 60% or higher identity to said sequence of (a), (b), or (c) as measured with BLASTN version 2.0 using the default parameters.

4. An isolated polynucleotide as defined in claim 1 which is derived from an organism of the actinomycetes taxon.

5. An isolated polynucleotide as defined in claim 1 , wherein said dual condensation/epimerization NRPS domain resides in a gene locus selected from the group consisting of:

RAMO, SYRI, SYRP, URSO, 023C, 034F, 040G, 084B, 107A, 143F, 153A and 263B.

6. An isolated polynucleotide as defined in claim 5, wherein said dual condensation/epimerization NRPS domain resides in a gene locus selected from the group consisting of: RAMO, SYRI, SYRP and URSO.

7. An isolated polynucleotide as defined in claim 5, wherein said dual condensation/epimerization NRPS domain resides in a gene locus selected from the group consisting of:

023C, 034F, 040G, 084B, 107A, 143F, 153A and 263B.

8. An isolated polynucleotide as defined in claim 7, wherein said dual condensation/epimerization NRPS domain resides in a gene locus selected from the group consisting of:

023C, 034F, 143F, 153A and 263B.

9. An isolated polynucleotide as defined in claim 1 encoding a polypeptide sequence selected from the group consisting of: (a) a sequence selected from the group consisting of SEQ ID NOS: 1,3, 5,7,9, 11, 13, 15, 17, 19,21,23,25,27,29,31, 33,35,37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95, 97, 99, 101, 103, 105, 107, 109, 111, 113, 115, 117, 119, 121, 123, 125, 127, 129, 131, 133, 135 and 137; and

(b) a sequence which has at least 60% or higher sequence identity or similarity to said sequence of (a) as determined using the BLASTP 2.0.10 algorithm.

10. An isolated polynucleotide as defined in claim 9 encoding a polypeptide sequence selected from the group consisting of:

(a) a sequence selected from the group consisting of SEQ ID NOS: 1, 3, 5, 7,9, 11, 13, 15, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95, 97, 99, 101, 103, 105, 107, 109, 111, 113,

115, 117, 119, 121, 123, 125, 127, 129, 131, 133, 135 and 137; and

11. An isolated polynucleotide encoding a dual condensation/epimerization NRPS domain that produces an alignment with at least 45% identity with the following consensus sequence using the BLASTP 2.0.10 algorithm, with the filter option -F set to false, the gap opening penalty -

G set to 11, the gap extension penalty -E set to 1 , and all remaining options set to default values:

ADIYPLAPLQEGILFHHLIadggedDaYVIpavleFDSReRLdaFlgALQ qViDRHDILRTavvWeGLrEPVQVVwRhAeLpVeevtLdpagiaadpvaq

LdaaaglrmDLgrAPLIrlhvAadpgggrWLaLLrfHHLVqDHTALevLI aEiqAfLaGrgdeLPePIPFRnFVAQARIGvsraEHErFFaeLLGDVtEP TAPFGLIDVrGDGsgveearlpldaeLaaRLReqARrLGVSpATIfHLAW ARVLgavSGRdDVVFG LfGRMqaGaGADRvpGIFINTLPVRVrlggqg

VldAVramRaqLAeLLeHEHAPLALAQRASGVpaptPLFTsLLNYRHsav aavsaealaawagaeleGirlLssrERTNYPLtVsVDDIGdgFsLtVqAv aplDaerVcallhTAIenLVdALEqaPdtpLsavdVLpaaERrrlLveWN dtaadyvpaatvpeLFeAQVartP.

12. An isolated polynucleotide as defined in any one of claims 1-11 , wherein said dual condensation/epimerization NRPS domain is involved in the incorporation of a D-amino acid or non-chiral amino acid into a peptide product.

13. An expression vector comprising an isolated polynucleotide as defined in any one of claims 1-11 operably linked to an expression control sequence.

14. A cultured cell comprising a vector as defined in claim 13.

15. A cultured cell comprising isolated polynucleotide as defined in any one of claims 1-11 operably linked to an expression control sequence.

16. A cultured cell transfected with a vector as defined in claim 13, or a progeny of said cell, wherein the cell expresses a dual condensation/epimerization NRPS domain.

17. A cultured cell as defined in claim 16 selected from the group consisting of Actinoplanes sp. ATCC 33076 and Streptomyces lavendulae.

18. A method of producing a dual condensation/epimerization NRPS domain, the method comprising culturing a cell as defined in claim 16 under conditions permitting expression of the dual condensation/epimerization NRPS domain.

19. A method of producing a dual condensation/epimerization NRPS domain, the method comprising culturing a cell as defined in claim 16 under conditions permitting expression under the control of the expression control sequence, and purifying the dual condensation/epimerization NRPS domain from the cell or the medium of the cell.

20. Use of a dual condensation/epimerization NRPS domain encoded by an isolated polynucleotide as defined in any one of claims 1-11 for epimerization and condensation of amino acids in E-less NRPS systems.

21. Use of a dual condensation/epimerization NRPS domain encoded by an isolated polynucleotide as defined in any one of claims 1-11 to modify the stereochemistry of a synthesized peptide compound in vivo, using an appropriate recombinant host.

22. A use as define in claim 21 , wherein the host is selected from the group consisting of Actinoplanes sp. ATCC 33076 and Streptomyces lavendulae.

23. Use of a dual condensation/epimerization NRPS domain encoded by an isolated polynucleotide as defined in any one of claims 1-1 1 to modify the stereochemistry of a synthesized peptide compound in vitro.

24. A use as defined in claim 21 , wherein a dual condensation/epimerization NRPS domain is genetically substituted for a regular condensation domain.

25. A use as defined in claim 21 to modify the strereochemistry of ramoplanin.

26. A use as defined in claim 21 to modify the stereochemistry of the complestatin molecule at a specific amino acid component.

27. Use of a dual condensation/epimerization NRPS domain encoded by an isolated polynucleotide as defined in any one of claims 1-11 to incorporate a D-amino acid into a peptide product.

28. An isolated dual condensation/epimerization NRPS domain having at least 45% sequence identity to SEQ ID NO: 139.

29. An isolated domain as defined in claim 29 comprising a sequence selected from the group consisting of:

(a) a sequence selected from the group consisting of SEQ ID NOS: 1 , 3, 5, 7,9, 11 , 13, 15, 17, 19, 21 , 23, 25, 27, 29, 31 , 33, 35, 37,

39, 41 , 43, 45, 47, 49, 51 , 53, 55, 57, 57, 59, 61 , 63, 65, 67, 69, 71 , 73, 75, 77, 79, 81 , 83, 85, 87, 89, 91 , 93, 95, 97, 99, 101 , 103, 105, 107, 109, 111 , 113, 115, 117, 119, 121 , 123, 125, 127, 129, 131 , 133, 135 and 137; (b) a sequence that is complementary to (a);

(d) a sequence which has at least 60% or higher identity or similarity to said sequence of (a), (b), or (c) as measured with BLASTP version 2.0.10 using the default parameters.

30. An isolated domain as defined in claim 29 comprising a sequence selected from the group consisting of:

(a) a sequence selected from the group consisting of SEQ ID NOS: 1 , 3, 5, 7,9, 11 , 13, 15, 65, 67, 69, 71 , 73, 75, 77, 79, 81 , 83, 85, 87, 89, 91, 93, 95, 97, 99, 101, 103, 105, 107, 109, 111, 113, 115,117,119,121,123, 125,127, 129, 131, 133, 135 and 137;

(b) a sequence that is complementary to (a);

31. An isolated domain as defined in claim 28 comprising a sequence selected from the group consisting of:

(a) a sequence selected from the group consisting of SEQ ID NOS: 1, 3, 5, 7,9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95, 97, 99, 101,

103, 105, 107, 109, 111, 113, 115, 117, 119, 121, 123, 125, 127, 129, 131, 133, 135 and 137;

(b) a sequence that is complementary to (a);

(d) a sequence which has at least 70% or higher identity or similarity to said sequence of (a), (b), or (c) as measured with BLASTP version 2.0.10 using the default parameters.

32. An isolated domain as defined in claim 31 comprising a sequence selected from the group consisting of:

(a) a sequence selected from the group consisting of SEQ ID NOS: 1, 3, 5, 7,9, 11, 13, 15, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95, 97, 99, 101, 103, 105, 107, 109, 111, 113, 115, 117, 119, 121, 123, 125, 127, 129, 131, 133, 135 and 137;

(b) a sequence that is complementary to (a); (c) a sequence which hybridizes to said sequence of (a) or (b) under conditions of high stringency; and

33. An isolated domain as defined in claim 28, wherein said domain is involved in the incorporation of a D-amino acid or non-chiral amino acid into a peptide product.

34. A dual condensation/epimerization NRPS domain that produces an alignment with at least 45% identity with the following consensus sequence using the BLASTP 2.0.10 algorithm, with the filter option -F set to false, the gap opening penalty -G set to 11 , the gap extension penalty -E set to 1 , and all remaining options set to default values:

ADIYPLAPLQEGILFHHLIadggedDaYVIpavleFDSReRLdaFlgALQ qViDRHDILRTavvWeGLrEPVQVVwRhAeLpVeevtLdpagiaadpvaq LdaaaglrmDLgrAPLIrlhvAadpgggrWLaLLrfHHLVqDHTALevLI aEiqAfLaGrgdeLPePIPFRnFVAQARIGvsraEHErFFaeLLGDVtEP

TAPFGLIDVrGDGsgveearlpldaeLaaRLReqARrLGVSpATIfHLAW ARVLgavSGRdDVVFGTVLfGRMqaGaGADRvpGIFINTLPVRVrlggqg VldAVramRaqLAeLLeHEHAPLALAQRASGVpaptPLFTsLLNYRHsav aavsaealaawagaeleGirlLssrERTNYPLtVsVDDIGdgFsLtVqAv aplDaerVcallhTAIenLVdALEqaPdtpLsavdVLpaaERrrlLveWN dtaadyvpaatvpeLFeAQVartP.

35. A dual condensation/epimerization domain contained in cosmid 008CH having accession number IDAC 190901-3 or cosmid 008CK having accession number IDAC 190901-1.