CA2375097A1 - Compositions and methods for identifying and distinguishing orthosomycin biosynthetic loci - Google Patents

Compositions and methods for identifying and distinguishing orthosomycin biosynthetic loci Download PDF

Info

Publication number
CA2375097A1
CA2375097A1 CA002375097A CA2375097A CA2375097A1 CA 2375097 A1 CA2375097 A1 CA 2375097A1 CA 002375097 A CA002375097 A CA 002375097A CA 2375097 A CA2375097 A CA 2375097A CA 2375097 A1 CA2375097 A1 CA 2375097A1
Authority
CA
Canada
Prior art keywords
seq
polypeptide
nos
polypeptides
genbank accession
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
CA002375097A
Other languages
French (fr)
Inventor
Chris M. Farnet
Emmanuel Zazopoulos
Alfredo Staffa
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Thallion Pharmaceuticals Inc
Original Assignee
Ecopia Biosciences Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ecopia Biosciences Inc filed Critical Ecopia Biosciences Inc
Publication of CA2375097A1 publication Critical patent/CA2375097A1/en
Abandoned legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • C12N15/52Genes encoding for enzymes or proenzymes
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12PFERMENTATION OR ENZYME-USING PROCESSES TO SYNTHESISE A DESIRED CHEMICAL COMPOUND OR COMPOSITION OR TO SEPARATE OPTICAL ISOMERS FROM A RACEMIC MIXTURE
    • C12P19/00Preparation of compounds containing saccharide radicals
    • C12P19/44Preparation of O-glycosides, e.g. glucosides
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12PFERMENTATION OR ENZYME-USING PROCESSES TO SYNTHESISE A DESIRED CHEMICAL COMPOUND OR COMPOSITION OR TO SEPARATE OPTICAL ISOMERS FROM A RACEMIC MIXTURE
    • C12P19/00Preparation of compounds containing saccharide radicals
    • C12P19/44Preparation of O-glycosides, e.g. glucosides
    • C12P19/445The saccharide radical is condensed with a heterocyclic radical, e.g. everninomycin, papulacandin

Abstract

The invention provides compositions and methods useful to identify orthsomycin biosynthetic gene clusters. The invention also provides compositions and methods useful to distinguish everninomicin-type orthsomycin gene clusters and avilamycin-type orthosomycin gene clusters. An orthosomycin gene cluster may be identified using compositions of the invention such as hybridization probes, PCR primers derived from specific protein families responsible for the unique structural features that distinguish orthosomycins, everninomycin-type orthsosomycins and avilamycin-type orthosomycins. An orthosomycin gene cluster may be identified using compositions of the invention such as the sequence code for the reference sequences stored on computer readable medium.

Description

DEMANDES OU BREVETS VOLUMINEUX
LA PRESENTE PARTIE DE CETTE DEMANDE OU CE BREVETS
COMPREND PLUS D'UN TOME.
CECI EST LE TOME DE
NOTE: Pour les tomes additionels, veillez contacter 1e Bureau Canadien des Brevets.
JUMBO APPLICATIONS / PATENTS
THIS SECTION OF THE APPLICATION / PATENT CONTAINS MORE
THAN ONE VOLUME.
THIS IS VOLUME ~ OF
NOTE: For additional volumes please contact the Canadian Patent Office.

TITLE OF THE INVENTION: Compositions and methods for identifying and distinguishing orthosomycin biosynthetic loci.
FIELD OF INVENTION
The present invention relates to the field of microbiology, and more specifically to genes and organisms involved in the production of orthosomycins.
BACKGROUND:
Orthosomycins are oligosaccharide molecules containing two orthoester saccharide linkages. The general structure of orthosomycins is illustrated below.
The saccharide residues in the above orthosomycin are labeled A-H and the key features of orthosomycins, the orthoester linkages are indicated below.
E
F
=~.. ~' H
A
Known orthosomycin compounds can broadly be classified into two classes: (1 ) the everninomicins that contain an amino- or nitrosugar residue in the terminal position of the oligosaccharide chain, i.e. wherein R is evernitrose in the above molecule; and (2) the avilamycins, curamycins and flambamycins that do not contain an amino- or nitrosugar residue in the terminal position, i.e. wherein R is hydrogen in the above molecule. Within the second class of orthosomycins, the avilamycins and the curamycins differ only in the nature of the acyl side chain found in ester linkage to the C45-hydroxyl group of sugar residue G. Neither the avilamycins nor the curamycins carry a simple methyl group on this hydroxyl.
In the everninomicin class, the hydroxyl is generally O-methylated. Flambamycins differ from the avilamycins only at position C23 of sugar residue D, which is a methylene carbon in the avi~lamycins but carries a hydroxyl group on the flambamycins. The everninomicins may or may not carry a hydroxyl at this position.
Many known orthosomycins have antibiotic activity. There is an urgent need for new anti-microbial agents given the emergence of bacteria resistant to conventional antibiotics. The oligosaccharide class of antibiotics has demonstrated a wide spectrum of antibacterial activity against gram-positive organisms, including methicillin-resistant Staphylococcus aureus, vancomycin-resistant enterococci, and penicillin-resistant pneumococci. It is therefore desirable to develop a means to identify new orthosomycin natural products. Orthosomycin-producing microbes represent an important source of new antibiotics. Accordingly, it is also desirable to develop a means to identify orthosomycin-producing organisms and to distinguish between the classes of orthomycins produced by such orgamisms.
Existing screening methods for identifying orthosomycin-producing microbes are laborious, time-consuming and have not provided sufficient discrimination to date to detect organisms producing orthosomycin natural products at low levels.
There is a need for improved tools to detect orthosomycin-producing organisms.
There is also a need for tools capable of detecting organisms that produce orthosomycins at levels that are not detected by traditional culture tests..
There is also a need for tools that discriminate between the classes of orthosomycin molecules such as avilamycin and everninomicin classes of orthosomycins.
SUMMARY OF THE INVENTION:
The invention provides compositions and methods useful to identify orthsomycin biosynthetic genes. The invention also provides compositions and methods useful to distinguish everninomicin-type orthsomycin gene clusters and avilamycin-type orthosomycin gene clusters. Once target orthosomycin genes are identified, a full length or partial biosynthetic locus for the orthosomycin compound may be isolated according standard methods.
In one aspect of the invention, an orthosomycin gene cluster is identified using compositions of the invention such as hybridization probes or PCR
primers.
Hybridization probes or PCFi primers according to the invention are derived from protein families responsible for the unique structural features that distinguish orthosomycins, everninomycin-type orthsosomycins and avilamycin-type orthosomycins. To identify orthosomycin gene clusters, the hybridization probes or PCR primers are derived from the nucleic acid sequences corresponding to the seventeen protein families caFTE, GFTG, GTFH, HOXG, MTFD, MTFE, MTFF, MTLA, MTIA, OXRV, OXRW, OXRW, PHOD, UNAJ, UEVA, UEVB and UNKU.
To identify everninomicin-type orthosomycin gene clusters, the hybridization probes or PCR primers are derived from the nucleic acid sequences corresponding to the nine protein families DACT, DEPF, EPIM, GTFA, MTFG, MTFV, OXBN, OXCO and UNBB. To identify avilamycin-type orthosomycin gene clusters, the hybridization probes or PCR primers are derived from the nucleic acid sequences corresponding to six protein families ABCD, DEPN, MEMD, REBU, UNAI and UNBR.
The invention provides compositions for use in identifying orthosomycin biosynthetic genes, orthosomycin gene fragments, orthosomycin gene clusters or orthosomycin-producing organisms. In one aspect, the invention provides an isolated, purified, or enriched nucleic acid comprising one of the sequences of SEQ
ID NOS: 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 100, 102, 104, 106, 108, 110, 112, 114, 116, 124, 126, 128, 130, 132, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 194, 196, 198, 200, 202, 204, 206, 208, the sequences complementary thereto, or a fragment comprising at least 1 U, 15, 20, 25, 30, 35, 40, 50, 75, 100, 150, 200, 300, 400 or 500 consecutive bases of one of the sequences of SEQ ID ~dOS: 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 100, 102, 104, 106, 108, 110, 11.2, 114, 116, 124, 126, 128, 130, 132, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 194, 196, 198, 200, 202, 204, 206, 208 or the sequences complementary thereto. In another aspect the invention provides the above nucleic acids for use in identifying orthosomycin biosynthetic genes, orthosomycin gene fragments, orthosomycin gene clusters or orthosomycin-producing organisms. The isolated, purified or enriched nucleic acids may comprise DNA, including cDNA, genomic DNA, and synthetic DNA. The DNA may be double stranded or single stranded, and if single stranded may be the coding or non-coding (anti-sense) strand. Alternatively, the isolated, purified or enriched nucleic acids may comprise RNA.
The isolated, purified or enriched nucleic acids of one of SEQ ID NOS: 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 100, 102, 104, 106, 108, 110, 112, 114, 116, 124, 126, 128, 130, 1 ~~2, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 194, 196, 198, 200, 202, 204, 206, 208 may be used to prepare one of the polypeptides of SEO I D NOS: 51, 53, 55, 57, 59, 61., 63, 65, 67, 69, 71, 73, 99, 101, 103, 105, 107, 109, 111, 113, 115, 123, 125, 127, 129, 131, 153, 155, 157, 159, 161, 163, 165, 167, 169, 171, 173, 175, 177, 179, 193, 195, 197, 199, 201, 203, 205, 207 or fragments comprising at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100 or 100 consecutive amino acids of one of the polypeptides of SEQ I D NO: 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 99, 101, 103, 105, 107, 109, 111, 113, 115, 123, 125, 127, 129, 131, 153, 155, 157, 159, 161, 163, 165, 167, 169, 171, 173, 175, 177, 179, 193, 195, 197, 199, 201, 203, 205, 207.
Accordingly, present invention also provides an isolated, purified or enriched nucleic acid which encodes one or the polypeptides of SEQ ID NOS: 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 99, 101, 103, 105, 107, 109, 111, 113, 115, 123, 125, 127, 129, 131, 153, 155, 157, 159, 161, 163, 165, 167, 169, 171, 173, 175, 177, 179, 193, 195, 197, 199, 201, 203, 205, 207 or fragments comprising at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100 or 150 consecutive amino acids of one of the polypeptides of SEQ ID NOS: 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 99, 101, 103, 105, 107, 109, 111, 113, 115, 123, 125, 127, 129, 131, 153, 155, 157, 159, 161, 163, 165, 167, 169, 171, 173, i 75, 177, 179, 193, 195, 197, 199, 201, 203, 205, 207. In another a:>pect, the invention provides the above nucleic acids for use in detecting orthosomycin biosynthetic genes, orthosomycin gene fragments, orthosomycin gene clusters, or orthosomycin producing organisms.
The coding sequences of these nucleic acids may be identical to one of the coding sequences of one of the nucleic acids of SEQ ID NOS: 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 100, 102, 104, 106, 108, 110, 112, 114, 116, 124, 126, 128, 130, 132, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 194, 196, 198, 200, 20.2, 204, 206, 208 or a fragment thereof or may be different coding sequences which encode one of the polypeptides of SEQ ID NOS:

51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 99, 101, 103, 105, 107, 109, 111, 113, 115, 123, 125, 127, 129, 131, 153, 155, 157, 159, 161, 163, 165, '167, 169, 171, 173, 175, 177, 179, 193, 195, 197, 199, 201, 203, 205, 207, or fragments comprising at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100 or 150 consecutive amino acids of one of the polypeptides of SEO ID NOS: 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 99, 101, 103, 105, 107, 109, 111, 113, 115, 123, 125, 127, 129, 131, 153, 155, 157, 159, 1 E~1, 163, 165, 167, 169, 171, 173, 175, 177, 'I 79, 193, 195, 197, 199, 201, 203, 205, 207 as a result of the redundancy or degeneracy of the genetic code, for use in detecting orthosomycin biosynthetic genes or orthosomycin producing organisms.
The invention provides compositions for use in identifying everninomicin-type orthosomycin biosynthetic genes, everninomicin-type orthosomycin gene fragments, everninomicin-type orthosomycin gene clusters, and everninomicin and orthosomycin-producing or<~anisms. In one aspect, the invention provicles an isolated, purified, or enriched nucleic acid comprising one of the sequences of SEQ
ID NOS: 210, 212, 214, 21 Ei, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, the sequences complementary thereto, or a fragment comprising at least 10, 15, :?0, 25, 30, 35, 40, 50, 75, 100, 150, 200, 300, 400 or 500 consecutive bases of one of the sequences of SEQ ID NOS: 210, 212, 214, 216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244 or the sequences complementary thereto. In another aspect, the invention provides the above nucleic acids for use in identifying everninomicin-type orthosomycin genes, everninomicin-type orthosomycin gene fragments, everninomicin-type orthosomycin gene clusters and everninomicin-like orthosomycin producing organisms. The isolated, purified or enriched nucleic acids may comprise DNA, including cDNA, genomic DNA, and synthetic DNA. The DNA may be double stranded or single stranded, and if single stranded may be the coding or non-coding (anti-sense) strand. Alternatively, the isolated, purified or enriched nucleic acids may comprise RNA.
The isolated, purified or enriched nucleic acids of one of SEQ ID NOS: 210, 212, 214, 216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244 may be used to preparE; one of the polypeptides of SEO ID NOS: 209, 211, 213, 215, 217, 219, 221, 22.3, 225, 227, 229, 231, 233, 235, 237, 239, 241, 243 or fragments comprising at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100 or consecutive amino acids of one of the polypeptides of SEQ ID NO: 209, 211, 213, 215, 217, 219, 221, 223, 225, 227, 229, 231, 233, 235, 237, 239, 241, 243.
Accordingly, the present invention also provides an isolated, purified or enriched nucleic acid which encodes one or the polypeptides of SEO ID NOS:
209, 211, 213, 215, 217, 219, 221, 223, 225, 227, 229, 231, 233, 235, 237, 239, 241, 243 or fragments comprising at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100 or 150 consecutive amino acids of one of the polypeptides of SEQ ID NOS: 209, 211, 213, 215, 217, 219, 221, 22 3, 225, 227, 229, 231, 233, 235, 237, 239, 241, 243. I n another aspect, the invention provides the above nucleic acids for use in identifying everninomicin-type orthosomycin genes, everninomicin-type orthosomycin gene fragments, everninomicin-type orthosomycin gene clusters, and everninomicin-type orthosomycin producing organisms. The coding sequences of these nucleic acids may be identical to one of the coding sequences of one of the nucleic acids of SEQ
ID NOS: 210, 212, 214, 2lEi, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244 or a fragment thereof or may be different coding sequences which encode one of the po~lypeptides of SEQ lD NOS: 209, 211, 213, 215, 217, 219, 221, 223, 225, 227, 229, 231, 233, 235, 237, 239, 241, 243 or fragments comprising at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100 or 150 consecutive amino acids of one of the polypeptides of SEQ !D NOS: 209, 211, 213, 215, 217, 219, 221, 223, 225, 227, 229, 231, 233, 235, 237, 239, 241, 243 as a result of the redundancy or degeneracy of the genetic code.
The invention provides compositions for use in identifying avilamycin-type biosynthetic genes avilamycin-type orthosomycin gene fragments, avilamycin-type orthosomycin gene clusters, and avilamycin-type orthosomycin producing organisms. In one aspect, the invention provides an isolated, purified, or enriched nucleic acid comprising one of the sequences of SEQ ID NOS: 246, 248, 250, 252, 254, 256 and the nucleic acid sequences corresponding to Genbank accession nos. AAG32068, AAK83183, AAG32069, AAK83i72, AAK83171 and AAK83175;
the sequences complementary thereto; or a fragment comprising at least 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, 150, 200, 300, 400 or 500 consecutive bases of one of the sequences of SE:Q ID NOS: 246, 248, 250, 252, 254, 256 and the nucleic acid sequences corresponding to Genbank accession nos. AAG32068, AAK83183, AAG32069, AAK83172, AAK83171 and AAK83175; or the sequences complementary thereto. In another aspect, the invention provides the above nucleic acids for use in identifying avilamycin-type orthosomycin genes and avilamycin-type orthosomycin producing organisms. The isolated, purified or enriched nucleic acids may comprise DNA, including cDNA, genomic DNA, and synthetic DNA. The DNA may be double stranded or single stranded, and if single stranded may be the coding or non-coding (anti-sense) strand. Alternatively, the isolated, purified or enriched nucleic acids may comprise RNA.
The isolated, purified or enriched nucleic acids of one of SEQ ID NOS: 246, 248, 250, 252, 254, 256 may be used to prepare one of the polypeptides of SEQ
ID
NOS: 245, 247, 249, 251, 253 and 255 or fragments comprising at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100 or 100 consecutive amino acids of one of the polypeptides of SEQ ID NO: 245, 247, 249, 251, 253.
Accordingly, the present invention also provides an isolated, purified or enriched nucleic acid which encodes one or the polypeptides of SEQ ID NOS:
245, 2 47, 249, 251, 253 or Genbank accession nos: AAG32068, AAK83183, AAG32069, AAK83172, AAK83171 and AAK83175 or fragments comprising at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100 or 150 consecutive amino acids of one of the polypeptides of SEQ ID NOS: 245, 247, 249, 251, 253 or Genbank accession nos: AAG32068, AAK83183, AAG32069, AAK83172, AAK83171 and AAK83175. In another aspect, the invention provides the above nucleic acids for use in identifying avilamycin-type orthosomycin genes, avilamycin-type orthosomycin gene fragments, avilamycin-type orthosamycin gene clusters, and avilamycin-type orthosomycin producing organisms. The coding sequences of these nucleic acids may be identical to one of the coding sequences of one of the nucleic acids of SEQ ID NOS: 246, 248, 250, 252, 254, 256 or a fragment thereof or may be different coding sequences which encode one of the polypeptides of SEQ ID NOS: 245, 247, 249, 251, 253 or Genbank accession nos. AAG32068, AAK83183, AAG32069, AAK83172, AAK83171 and AAK83175, or fragments comprising at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100 or 150 consecutive _8_ amino acids of one of the polypeptides of SEQ ID NOS: 245, 247, 249, 251, 253, or GenBank accession nos: AAG32068, AAK83183, AAG32069, AAK83172, AAK83171 and AAK83175 as a result of the redundancy or degeneracy of the genetic code.
The isolated, purified or enriched nucleic acid which encodes one of the polypeptides of SEQ ID NGS: 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 100, 102, 104, 106, 108, 110, 112, 114, 116, 124, 126, 128, 130, 132, 154, 156, 158, 160, 162, 164, 166, 168, 1 i'0, 172, 174, 176, 178, 180, 194, 196, 198, 200, 202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 24.6, 248, 250, 252, 254, 256 may include, but is not limited to: (1 ) only the coding sequences of one of SEQ ID NOS: 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 100, 102, 104, 106, 108, 110, 112, 114, 116, 124, 126, 128, 130, 132, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 194, 196, 198, 200, 202, 204, 206, 208, 210, 212, 214, 216, 218, x'.20, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, 252, 254, 256; (2) the coding sequences of SEQ ID NOS: 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 100, 102, 104, 106, 108, 110, 112, 114, 116, 124, 126, 128, 130, 132, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 194, 196, 198, 200, 202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, 252, 254, 256 and additional coding sequences, such as leader sequences or proprotein; or (3) the coding sequences of SEQ ID NOS: and non-coding sequences, such as introns or non-coding sequences 5' and/or 3' of the coding sequence. Thus, as used herein, the term "polynucleotide encoding a polypeptide" encompasses a polynucleotide which includes only coding sequence for the polypeptide as well as a polynucleotide which includes additional coding and/or non-coding sequence.
The invention relates to polynucleotides which have polynucleotide changes that are "silent", for examples changes which do not alter the amino acid sequence encoded by the polynucleotides of SEQ ID NOS: 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 100, 102, 104, 106, 108, 110, 112, 114, 116, 124, 126, 128, 130, 132, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 194, 196, 198, 200, 202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222, 224, 226, 228, _9-230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, 252, 254, 256, for use in detecting orthosomycin biosynthetic genes and orthosomycin-producing organisms. The invention also relates to polynucleotides which have nucleotide changes which result in amino acid substitutions, additions, deletions, fusions and truncations of the polypeptides of SEQ ID NOS: 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 99, 101, 103, 105, 107, 109, 111, 113, 115, 123, 125, 127, 129, 131, 153, 155, 157, 159, 161, 1 Ei3, 165, 167,169, 171, 173, 175, 177, 179, 193, 195, 197, 199, 201, 203, 205, 207, 209, 211, 213, 215, 217, 219, 221, 223, 225, 227, 229, 231, 233, 235, 237, 239, 241, 243, 245, 247, 249, 251, 253, 255 far use in identifying orthosomycin biosynthetic genes and orthosomycin producing organisms.
In one aspect the compositions of the invention are used as probes to identify samples harbouring orthosomycin biosynthetic genes and orthosomycin biosynthetic loci. Sampies may be in the form of environmental biomass, pure or mixed microbial culture, isolated genomic DNA from pure or mixed microbial culture, genomic DNA libraries from pure or mixed microbial culture. The compositions are used in polymerase chain reaction, and nucleic acid hybridization techniques well known to those skilled in the art.
In another embodimf:nt, environmental samples that harbour microorganisms with the potential to produce orthosomycins are identified by PCR
methods. Nucleic acids contained within the environmental sample are contacted with primers derived from tf ~e invention so as to amplify target orthoson-iycin biosynthetic gene sequences. Environmental samples deemed to be positive by PCR are then pursued to identify and isolate the orthosomycin gene cluster and the microorganism that contains the target gene sequences. The orthosomycin gene cluster may be identified by generating genomic DNA libraries (for example, cosmid, BAC, etc.) representative of genomic DNA from the population of various microorganisms contained within the environmental sample, locating genomic DNA
clones that contain the target sequences and possibly overlapping clones (for example, by hybridization techniques or PCR), determining the sequence of the desired genomic DNA clones and deducing the ORFs of the orthosomycin biosynthetic locus. The microorganism that contains the orthosomycin biosynthetic locus may be identified and isolated, for example, by colony hybridization using nucleic acid probes derived from either the invention or the newly identified orthosomycin biosynthetic locus. The isolated orthosomycin biosynthetic locus may be introduced into an appropriate surrogate host to achieve heterologous production of the orthosomycin compound(s); alternatively, if the microorganism containing the orthosomycin biosynthetic locus is identified and isolated it may be subjected to fermentation to produce the orthosomycin compound(s).
In another embodiment of the invention, a microorganism that harbours an orthosomycin gene cluster is first identified and isolated as a pure culture, for example, by colony hybridization using nucleic acid probes derived from the invention. Beginning with a pure culture, a genomic DNA library (for example, cosmid, BAC, etc.) representative of genomic DNA from this single species is prepared, genomic DNA clones that contain the target sequences and possibly overlapping clones are located using probes derived from the invention (for example, by hybridization techniques or PCR), the sequence of the desired genomic DNA clones is det~srmined and the ORFs of the orthosomycin biosynthetic locus are deduced. The microorganism containing the orthosomycin biosynthetic locus may be subjected to fermentation to produce the orthosomycin compounds) or the orthosomycin biosynthetic locus may be introduced into an appropriate surrogate host to achieve heterologous production of the orthosomycin compound(s).
In another aspect of the invention, an orthosomycin gene cluster is identified in silico using one or more sequences selected from orthosomycin-specific nucleic acid code, everninomicin-specific nucleic acid code, avilamycin-specific nucleic acid code, orthosomycin-specific polypeptide code, everninomicin-specific polypeptide code and avilarnycin-specific polypeptide code as taught by the invention. A query from a set of query sequences stored on computer readable medium is read and compared to a subject selected from the reference sequences of the invention. The level of similarity between said subject and query is determined and queries sequences representing orthosomycin genes are identified.

It is understood that the invention, having provided, compositions and methods to identify othosornycin biosynthetic gene cluster, everninomycin-type biosynthetic gene clusters and avilamycin-type biosynthetic gene clusters, further provides orthosomycins, everninomicin-type orthosomycins, and avilamycin-type orthsomycins produced by the biosynthetic gene clusters identified.
BRIEF DESCRIPTION OF THE DRAWINGS:
Figure 1 is a block diagram of a computer system which implements and executes software tools for the purpose of comparing a query to a subject, wherein the subject is selected from the reference sequences of the invention Figures 2A, 2B, 2C and 2D are flow diagrams of a sequence comparison software that can be employed for the purpose of comparing a query to a subject, wherein the subject is selected from the reference sequences of the invention, wherein Figure 2A is the query initialization subprocess of the sequence comparison software, Figure 2B is the subject datasource initilization subprocess of the sequence comparison software, Figure 2C illustrates the comparison subprocess and the analysis subprocess of the sequence comparison software, Figure 2D is the Display/Report subprocess of the sequence comparison software.
Figure 3 is a flow diagram of the comparator algorithm (238) of Figure 2C
which is one embodiment of a comparator algorithm that can be used for pairwise determination of similarity between a query/subject pair.
Figure 4 is a flow diagram of the analyzer algorithm (244) of Figure 2C
which is one embodiment o~f an analyzer algorithm that can be used to assign identity to a query sequence, based on similarity to a subject sequence, where the subject sequence is a referE:nce sequence of the invention.
Figure 5 is a schematic representation comparing the an avilamycin-type biosynthetic locus from Streptomyces mobaraensis (AVIA) to the avilamycin A
biosynthetic locus from Sireptomyces viridochromogenes Tu57 CAVIL), ORFs in the loci are identified by a four-letter protein family designation.
Figure 6 illustrates a biosynthetic scheme wherein members of the proteins families commonly found in orthosomycin biosynthetic loci, namely KASA (EVEA
ORF 17, SEQ ID NO: 84; EVER ORF 14, SEQ ID NO: 83; AVIA ORF 13, SEQ ID

NO: 81; and AVIL ORF 15, Genbank accession no: AAK83178), PKSO (EVER
ORF 16, SECT ID NO: 185; EVER ORF 32, SEQ ID NO: 183; AVIA ORf= 14, SEQ
ID NO: 181; and AVIL ORF 16, Genbank accession no: AAK83194), MTFA (EVEA
ORF 44, SECT ID NO: 97; EVER ORF 11, SEQ ID NO: 95; AVIA ORF 38, SECT ID
NO: 93), and HOMX (EVER ORF 20 , SEO ID NO: 79; EVER ORF 20, SECT ID
NO: 77; AVIA ORF 36, SECT ID NO: 75) provide for the formation of the dichloroisoeverninic moiety found in the ester linkage to the sugar residue B
of orthosomycin oligosaccharides.
Figure 7 illustrates two alternative biosynthetic routes wherein members of protein families diagnostic of orthosomycin biosynthetic loci, namely OXRW
(AVIA
ORFs 24 and 33 (SEQ ID NOS: 153 and 159); AVIL GenBank accession no.
AAK83187; EVER ORFs 18 and 26 (SECT ID NOs: 155 and 161); EVER ORFs 11 and 30 (SECT ID NO: 157 and 163)), and OXRV (AVIA ORF 19 (SECT ID NO: 167), EVEA ORF 6 (SEQ ID NO: 173), AVIL GenBank accession no. AAK83181 ), EVER
ORF 31 (SEO ID NO: 169)) provide for the formation of the orthoester linkages joining residues C and D of orthosomycin oligosaccharides.
Figure 8 illustrates a biosynthetic scheme wherein members of the proteins families diagnostic of everninomicin-type orthosomycin gene clusters and everninomicin-type orthosomycin producers, including DATC (EVER ORF 43 (SEO
ID NO: 209); EVEA ORF 37 (SECT ID NO: 211)); MTFV (EVER ORF 44 (SEO ID
NO: 229), EVEA ORF 38 (SECT ID NO: 231)); EPIM (EVER ORF 45 (SEQ ID NO:
217), EVEA ORF 39 (SEO ID NO: 219)), DEPF (EVER ORF 46 (SECT ID NO: 213), EVEA ORF 40 (SEO ID NO: 215)), and OXBN (EVER ORF 42 (SE(~ ID NO: 233), EVEA 36 (SECT ID NO: 235;1) provide for the formation of amino- and nitrosugar residues characterisitc of everninomicin-type orthosomycins.
Figure 9 is a represents a picture of a 1% agarose gel stained with ethidium bromide generated in the PCR amplification experiments described in Example 8.
Figure 10 is a schematic representation comparing the everninornicin biosynthetic locus from Micromonospora carbonacae var. aurantiaca (EVER) to the everninomicin biosynthetic locus from Micromonospora carbonacea var. africana (EVEA), ORFs in the loci are identified by a four-letter protein family designation.

DETAILED DESCRIPTION OF THE INVENTION:
The invention provides compositions and methods for identifying orthosomycin gene clusters and orthosomycin producing organisms. The invention also provides compositions and methods for distinguishing between everninomicin-type orthosomycin gene clusters and avilamycin-type orthosomycin gene cluster, and to distinguish between everninomicin-type orthosomycin producers and avilamycin-type orthosomycin producers. Ta provide the compositions and methods of the invention, the full-length biosynthetic locus for a member of each of the two classes of orthosomycin compounds was identified, sequenced and annotated. The biosynthetic locus for everninomicin in Micromonaspora carbonacea var. aurantiaca (EVER) spans approximately 60 kb and contains 49 ORFs encoding proteins involved in the biosynthesis of everninomicin. 'The biosynthetic locus for an avilamycin-like compound from Streptomyces mobaraensis (AVIA) spans approximately 50 kb and contains 42 ORFs encoding proteins involved in the biosynthesis of an avilamycin-type compound.
Analysis of EVER and AVIA has revealed seventeen (17) protein families responsible for structural features common to all orthosomycin molecules and indicative of an orthosomycin biosynthetic locus. A member of each of these 17 protein families has been found in EVER, namely EVER ORFs 5, 8, 9, 12, 13, 15, 17 to 19, 24 to 26, 31, 33, 35 and 40 (SEO I D NOS: 113, 65, 201, 71, 125, 101, 195, 155, 107, 53, 205, 161, 169, 177, 59 and 129 respectively), and also in AVIA, namely ORFs 1 to 3, 5, 9, 18, 19, 22 to 26, 31 to 34 and 37 (SEC) ID NOS: 123, 203, 127, 57, 199, 165, 167, 99, 105, 153, 111, 193, 51, 63, 159, 175 and 69 respectively). In EVER two of the protein families are fused together to form ORF
31 (SEQ ID NO: 169). A member of the 17 protein families has also been found in the biosynthetic locus for everninomicin from Micromonospora carbonacea var.
africana and the biosynthetic locus for an avilamycin compound from Strepfomyces viridochromogenes Tu57. :sequences from these 17 protein families form the basis for compositions and methods for identifying gene clusters involved in the biosynthesis of orthosomycins and for compositions and methods for identifying orthosomycin-producing organisms.

Analysis of EVER and AVIA has revealed nine (9) protein families that distinguish everninomicin-type orthosomycin biosynthetic loci from avilamycin-type orthosomicin biosynthetic loci. A member of each of these nine protein families has been found in EVER, namely EVER ORFs 3, 4, 21, 42, 43, 44, 45, 46 and 47 (SEQ ID NOS: 225, 237, 221, 233, 209, 229, 217, 213 and 241 respectively). A
member of each of the 9 protein families has also been found in the biosynthetic locus for everninomicin from Micromonospora carbonacea var. africana. No members of these nine protein families were found in biosynthetic loci fior avilamycin-type orthosomycins, including AVIA, the biosynthetic locus for an avilamycin compound from Streptomyces viridochromogenes Tu57. Sequences from these nine protein families form the basis for compositions and methods for identifying gene clusters involved in the biosynthesis of everninomicin-type orthosomycins and for compositions and methods for identifying everninomicin-type orthosomycin producing organisms.
Analysis of EVER and AVIA has revealed six (6) protein families that distinguish avilamycin-type orthosomycin biosynthetic loci from everninomicin-type orthosomycin biosynthetic loci. A member of each of these six protein families has been found in AVIA, namely AVIA ORFs 6, 7, 10, 21, 27 and 28 (SEO ID NOS:
253, 251, 255, 247, 245 and 249). A member of the 6 protein families has also been found in the biosynthetic locus for an avilamycin compound from Sfreptomyces viridochromogenes Tu57. No member of these six protein families were found in biosynthetic loci for everninomicin-type orthorsomycins, including EVER and the biosynthetic locus for everninomicin from Micromonospora carbonacea var. africana. Sequences from these six protein families form the basis for compositions and imethods for identifying gene clusters involved in the biosynthesis of avilamycin-type orthosomycins and for compositions and methods for identifying avilamycin-type orthosomycin producing organisms.
The compositions and methods of the invention can be used to detect the presence of virtually any orcfanism that contains DNA for the production of orthosomycins (both everninomicin-type orthosomycins and avilamycin-type orthosomycins) regardless of the level at which genes for orthosomycin production are expressed by the organism or the amount of orthosomycin produced by the organism. Detection of nucleic acid sequences or amino acid sequences involved in the production of orthosomycins allows for the detection of new orthosomycin natural products, which natural products may not be produced by the organism under standard laboratory conditions or under the typical environmental conditions in which the organism is found in nature. Detection of the nucleic acid sequences or amino acid sequences involved in the production of orthosomycins allows for the detection of new orthosomycins which are produced at levels too low for detection by culture tests. Detection of nucleic acid sequences or amino acid sequences involved in the production of orthosomycins allows for the detection of new orthosomycin producers (both everninomicin-type orthosomycin producers and avilamycin-type orthosomycin producers) representing a source of new orthosomycin natural products.
Detection of the presence or absence of open reading frames necessary for orthosomycin production can be accomplished by hybridization probes or PCR
primers based upon the compositions and teachings of the invention. Screening with a probe can be done either in silico or by traditional hybridization screening techniques.
Throughout the description and the figures, the biosynthetic locus for everninomicin from Micromonospora carbonacae var. aurantiaca NRRL 2997 is sometimes referred to as E'JER, the biosynthetic locus for everninomicin from Micromonospora carbonacea var. africana (ATCC 39149, SCC 1413) is sometimes referred to as EVEA, the biosynthetic locus for an avilamycin-like compound from Strepi"omyces mobarensis is sometimes referred to as AVIA, and the biosynthetic locus for an avilamycin compound from Streptomyces viridochromogenes Tu57 is sometimes referred to as AVIL.
The ORFs in EVER, EVEA, AVIA and AVIL are assigned a putative function and grouped together in families based on homology to known proteins, or lack of homology to any known proteins. To correlate structure and function, the protein families are given a four-letter designation used throughout the description and figures as indicated on Table 1.

3~~'I-11CA CA 02375097 2002-03-28 Family descriptions FamiliesProposed function BC transporter; ATP-binding cassette transmembrane transporter; includes ABCD proteins with similarity to Mdr proteins of mammalian tumor cells that confer resistance structurally unrelated chemotherapeutic agents; resistance determinant ehydratase/aminotransferase; SMAT family (secondary metabolism DATC minotransferase ; transaminase _ DEPA ehydratase/epimerase; dTDP-glucose 4,6-dehydratases, catalyze the second step i n 6-deox hexose bios nthesis; includes AveBll, StrE, OIeE, DesIV, UrdH, S cJ

DEPD eh dratase/e imerase; may be specific for dGDP-mannose;
similar to DEPE

ehydratase/epimerase, NAD-dependent; includes enzymes that may be specific DEPE or dGDP-mannose~; similar to DEPD ____ _ ehydratase/epimerase; includes NDP-hexose 4-ketoreductase TyICIV, AveBIV, DEPF Er BIV, Me DV _ ehydratase/epimerase; includes the dTDP-ketodeoxyhexose reductases Snot, DEPG LanZ3, DnmV, AknM _ ehydratase/epimerase/ketoreductase; similar to glucose 4-epimerases involved in DEPH rimar metabolism ehydratase/epimerase, NAD-dependent; 4-ketoreductase;
most similar to NDP-DEPI lucose 4-a imera;>es; includes S c1 DEPJ eh dratase/e imerase/ketoreductase; most similar to UDP- lucose-4-a imerases ehydratase/epimerase; similar to many plant putative dTDP-glucose-4,6-DEPN deh dratases _ _ _ _ DHYA ehydratase, deoxysugar; 2,3-dehydratases; 2 similar (repeated) substrate or ofactor bindin matifs in this class; includes EryBV_I, D_ nmT, LanS

EFFA fflux; transmembrane transporter ENGA ndoglucanase; hydrolysis of 1,4-beta-D-glucosidic linkage; likely resist;~nce eterminant; believed to be a secreted protein pimerase; NDP-hexose epimerase; TDP-4-ketohexose-3,5-epimerases, convert EPIM DP-4-keto-6-deoxy-D-glucose to TDP-4-keto-6-deoxy-L-mannose (TDP-4-keto-L-rhamnose ; includes Er BVII, DnmU, AknL, OIeL, LanZ1 lycosyl transferase: includes EryBV, EryClll, DesVll, TyIMII, Dau/DnrH, MtmGl-IV, GTFA LanGT1-4 _ GTFE I cos Itransferase; no homology to other GTFs ___ .~

GTFG I cos Itransferase; no homolo to other GTFs GTFH I cos Itransferase; no homology to other GTFs _ HOXG oxidase domain homology _ -hydroxylase/halogenase: strong similarity to non-heme HOXM halo enase/ox enase/h drox lases haloacid dehalogenase-like hydrolase; similarity to haloalkanoic acid HYDH deh dro enases, convert 2-halo acids to 2-hydroxy acids plus halide ketoacyl synthase; homology to 3-oxoacyl-[acyl-carrier-protein synthase III (KAS III, KASA cetyICoA ACP transacylase), the principle enzyme responsible for the initiation of branched-chain fatty acid biosynthesis KINB kinase; similar to Iucose kinase MEMD membrane protein; includes DrrB daunorubicin resistance transmembrane protein;

resistance determinant _____ _ MEMK membrane rotein; Na/proton antiporter-like MTBA -methyltransferase; includes TyIF macrocin-O-methyitransferase, spinosyn SpnH

rhamnose O-meth Itransferase methyltransferase, SAM-dependent; includes O-methyltransferases, N,N-MTFA imethyltransferases (e.g. spinosyn SpnS N-dimethyltransferase), C-eth Itransferases MTFD meth Itransferase _ _ _ MTFE meth Itransferase MTFF eth Itransferase domain homology MTFG meth Itransferase domain ho_momogy MTFH eth Itransferase; includes Snot, OIeY, T IE, S n1 rhamnose O-meth Itransferase MTFV -methyltransferase, SAM-dependent; includes tylClll NDP-hexose 3-G-meth Itransferase, NovU C-methyltransferase, EryBlll, DnrX _ MTIA resistance methyltransferase; similarity to SpoU
family of 23S rRNA

eth Itransferases _ MTLA rRNA meth ItransfE~rase; includes avilamycin AviRa; resistance determinant NUTA nucleotidyltransferase; dNDP-glucose synthase;
alpha-D-glucose-1-phosphate h mid I Itransferase; catal ze the first ste in 6-deox hexose bios nthesis OXBN lavin-dependent oxidoreductase; shows strong homology to eukaryotic acyl CoA

eh dro enases; includes DnmZ _ xidoreductase; blue copper oxidoreductase; similar to bilirubin oxidase and B.

OXCO ubtilis outer spore coat protein involved in brown pigmentation during porogenesis; weak similarity to phenoxazinone synthase and yeast laccases xidoreductase; NAD(H)-dependent dehydrogenases;
includes OIeW putative 3-OXRA ketoreductase, RdrnF __ __ OXRF xidoreductase; includes strT, er BII, t ICII
NDP-hexose 2,3-eno I reductase xidoreductase; dehydrogenase E1 component beta subunit; strongest homology OXRT o acetoin:DCPIP oxidoreductases from a variety of organisms _ xidoreductase; dehydrogenase E1 component alpha subunit; strongest homology o thiamine pyrophosphate-dependent acetoin dehydrogenases from a variety of OXRU r anisms _ _ _ xidoreductase; dioxygenase; has domain homology to Pseudomonas putative Ipha-ketoglutarate-dependent hypophosphite dioxygenase, a family of alpha-OXRV ketoglutarate-dependent dioxygenases that catalyze the oxidation of their respective substrates by using molecular oxygen as the immediate electron cce for _ xidoreductase; dioxygenase; includes SnoN, SnoK;
note that XRX=UNAJ+OXFIW; show homology to Pseudomonas putative alpha-ketoglutarate-dependent hypophosphite dioxygenase, a family of alpha-OXRW ketoglutarate-dependent dioxygenases that catalyze the oxidation of their respective substrates by using molecular oxygen as the immediate electron acceptor; also includes proteins with homology to proline 4-hydroxylase, mammalian h ano I-CoA al ha h drox lase xidoreductase; dioxygenase; fusion of UNAJ and OXRW (OXRX=UNAJ+OXRW);

he UNAJ portion may contain SAM-binding (methyltransferase) motifs; Ihas domain OXRX homology to Pseudomonas putative alpha-ketoglutarate-dependent hypophosphite ioxygenase, a family of alpha-ketoglutarate-dependent dioxygenases that catalyze he oxidation of their respective substrates by using molecular oxygen as the immediate electron acceptor _ hosphatase; domain homology to phosphoglycolate phosphatase and other PHOD h drolases __ _ _ PKSO iterative t a I of H;etide synthase; includes orsellinic acid synthase AviM

regulator, putative; small proteins that consist mostly of a HTH domain of the LuxR-REBU ype; strong homology to the C-terminal (LuxR-type) HTH domains of AbsA2-type res onse re ulators regulator, putative; small proteins that contain a C-terminal HTH domain having REBV ome homology to the C-terminal (LuxR-type) HTH
domains of AbsA2-type res onse re ulators REGL regulator, similar to Lacl family transcriptional regulators (generally repressors that lose hi h-affinit DIVA binding in the presence of small molecule effectors UEVA ontains domain found in MoaA/NifB/PqqA molybdenum cofactor biosynthesis roteins, coenz mE~ PQQ synthesis protein and NirJ
heme biosynthesis rotein ontains central double-stranded beta helix domain found in a variety of proteins nd proposed to be involved in carbohydrate binding or protein-protein interactions, epending on the context in which it is found;
this domain is found in e.ca. mannose-UEVB 1-phosphate guanylyltransferases(GDP), dTDP-4-keto-6-deoxyglucose-3,5-pimerase (dTDP-I_-rhamnose synthase), tetracenomycin TcmJ putative=. b-ring yclase, elloramycin EImJ, Pseudomonas WbjC putative nulcleotide-binding protein involved in O-antigen biosynthesis ___ UNAI unknown; homolog of hypothetical S. coelicolor protein unknown; contains domain found in many bacterial proteins, including hypothetical UNAJ Proteins and a variety of methyltransferases (a portion of this domain is also found in eukaryotic RNA helicases); also occurs as a domain fused to an everninomicin xidoreductase O;tRX=UNAJ+OXRW) __ -UNBB unknown unknown; N-terminal domain homology to some sugar UNBR eh dratase/e imerase/ketoreductases UNKU unknown "Isolated" means that the material is removed from its original environment, e.g. the natural environment if it is naturally occurring. For example, a naturally-occurring polynucleotide or polypeptide present in a living organism is not isolated, but the same polynucleotidE; or polypeptide, separated from some or all of the coexisting materials in the natural system, is isolated. Such polynucleotides could be part of a vector and/or such polynucleotides or polypeptides could be part of a composition, and still be isolated in that such vector or composition is not part of its natural environment.
The term "purified" does not require absolute purity; rather, it is intended as a relative definition. Individual nucleic acids obtained from a library have been conventionally purified to elE:ctrophoretic homogeneity. The sequences obtained from these clones could not be obtained directly from a large insert library, such as a cosmid library, or from total organism DNA. The purified nucleic acids of the present invention have been purified from the remainder of the genomic DNA in the organism by at least 104 to 106 fold. However, the term "purified" also includes nucleic acids which have been purified from the remainder of the genomic DNA
or from other sequences in a library or other environment by at least one order of magnitude, preferably two or three orders of magnitude, and more preferably four or five orders of magnitude.

"Recombinant" means that the nucleic acid is adjacent to "backbone" nucleic acid to which it is not adjacent in its natural environment. "Enriched"
nucleic acids represent 5% or more of the number of nucleic acid inserts in a population of nucleic acid backbone molE~cules. "Backbone" molecules include nucleic acids such as expression vectors, self-replicating nucleic acids, viruses, integrating nucleic acids, and other vectors or nucleic acids used to maintain or manipulate a nucleic acid of interest. Preferably, the enriched nucleic acids represent 15%
or more, more preferably 50% or more, and most preferably 90% or more, of the number of nucleic acid inserts in the population of recombinant backbone molecules.
"Recombinant" polyp~eptides or proteins refers to polypeptides or proteins produced by recombinant DNA techniques, i.e. produced from cells transformed by an exogenous DNA construct encoding the desired polypeptide or protein.
"Synthetic" polypeptides or proteins are those prepared by chemical synthesis.
The term "gene" means the segment of DNA involved in producing a polypeptide chain; it includes regions preceding and following the coding region (leader and trailer) as well as, where applicable, intervening regions (introns) between individual coding segments (exons).
A DNA "coding sequence" or "nucleotide sequence encoding" a particular polypeptide or protein, is a DNA sequence which is transcribed and translated into a polypeptide or protein when placed under the control of appropriate regulatory sequences.
"Oligonucleotide" refers to a nucleic acid, generally of at least 10, preferably 15 and more preferably at IE:ast 20 nucleotides, preferably no more than 100 nucleotides, that are hybridizable to a genomic DNA molecule, a cDNA molecule, or an mRNA molecule encoding a gene, mRNA, cDNA or other nucleic acid of interest.
"Orthosomycin producer" or "orthosomycin-producing organism" refers to a microorganism which carries the genetic information necessary to produce an orthosomycin compound, whether or not the organism is known to produce an orthosomycin product. The terms apply equally to organisms in which the genetic information to produce an orthosomycin compound is found in the organism as it exists in its natural environment, and to organisms in which the genetic information is introduced by recombinant techniques. Orthosomycin producers include organisms of the family Micromonosporaceae, of which preferred genera include Micromonospora, Actinoplanes and Dactylosporangium; the family Streptomycetaceae, of which preferred genera include Streptomyces and Kitasatospora; and the family Pseudonocardiaceae, of which preferred genera are Amycolatopsis and Saccharopolyspora.
Deposits: Three deposits of a E.coli DH10B strain each harboring a cosmid clone which together span the everninomicin biosynthetic locus from Micromonospora carbonacE~a aurantiaca were made on January 24, 2001 with the International Depositary Authority of Canada (IDAC), 1015 Arlington Street, Winnipeg, Manitoba, Canada R3E 3R2. The deposits were assigned accession nos. I DAC 240101-1, I DAC 240101-2 and I DAC 240101-3. Two deposits of a E.coli DH10B strain each h<~rboring a cosmid clone which together span the avitamycin-like biosynthetic locus from Strepfomyces mobarensis were made on February 27, 2001 with the International Depositary Authority of Canada (IDAC), 1015 Arlington Street, Winnipeg, Manitoba, Canada R3E 3R2. The deposits were assigned accession nos. IDAC 270201-1 and IDAC 270201-2. The E. coli strain deposits are referred to herein as "the deposited strains".
The deposited straina together comprise the complete biosynthetic locus for everninomicin from Microm~onospora carbonacae var. aurantiaca and the avilamycin-type compound from Streptomyces mobarensis. The sequence of the polynucleotides comprised lin the deposited strains, as well as the amino acid sequence of any polypeptide encoded thereby are controlling in the event of any conflict with any description of sequences herein.
The deposits of the cleposited strains have been made under the terms of the Budapest Treaty on the International Recognition of the Deposit of Micro-organisms for Purposes of Patent Procedure. The deposited strains will be irrevocably and without restriction or condition released to the public upon the issuance of a patent. The cleposited strains are provided merely as convenience to those skilled in the art and are not an admission that a deposit is required for enablement, such as that required under 35 U.S.C. ~112. A license may be required to make, use or sell the deposited strains, and compounds derived therefrom, and no such license is hereby granted.
Structural features common to all orthosomycins require one or more proteins selected from a group of 17 specific protein families, namely GTFE, GTFG, GTFH, HOXG, MTFD, MTFE, MTFF, MTLA, MTIA, OXRV, OXRW, OXRW, PHOD, UNAJ, UEVA, UEV13 and UNKU. These 17 protein families include two OXRW families, although in EVER the second OXRW family is designated OXRX
as it is a fusion of proteins from the UNAJ and OXRW families. A polypeptide representing a member of any one of these 17 protein families or a polynucleotide encoding a polypeptide representing a member of any one of these 17 protein families is considered diagnostic of an orthosomycin gene cluster and an orthosomycin-producing organism.
It is not expected that an orthosomycin biosynthetic locus will contain a member of each of the 17 protein families considered diagnostic of orthosomycin biosynthetic loci. For example, the UEVB and MTIA protein families arE~ not found in the EVEA locus. Nonetheless, the UEVB and MTIA protein families are considered to be indicative of an orthosomycin locus as they are found in the AVIA, AVIL and EVER loci and no other homologues have been found to date. The presence of at least one, preferably 2, more preferably 3, still more preferably 4, still more preferably 5, still rnore preferably 6, still more preferably 8, still more preferably 10 or more of the seventeen protein families GTFE, GTFG, CaTFH, HOXG, MTFD, MTFE, MTFF, MTLA, MTIA, OXRV, OXRW, OXRW, PHOD, UNAJ, UEVA, UEVB and UNKU indicates the presence of an orthosomycin biosynthetic locus and an orthosomycin producing organism.
Members of protein family GTFE include polypeptides selected from AVIA
ORF 31 (SEQ ID NO: 51), AVIL GenBank accession no. AAK83192, EVER ORF
24 (SEO ID NO: 53), EVEA ORF 33 (SEO ID NO: 55) and polypeptides having at least 99%, at least 95%, at least 90%, at least 85%, at least 80%, at least 70% or at least 60% homology to a polypeptide having the sequence of SEO ID NOS: 51, 53, 55 or AVIL GenBank accession no. AAK83192 as determined using the BLASTP algorithm with the default parameters.

Members of protein family GTFG include polypeptides selected from AVIA
ORF 5 (SEO ID NO: 57), AVIL GenBank accession no. AAK83170, EVER ORF 35 (SEQ ID NO: 59), EVEA ORF 27 (SEO ID NO: 61 ) and polypeptides having at least 99%, at least 95%, at least 90%, at least 85%, at least 80%, at least 70% or at least 60% homology to a polypeptide having the sequence of SEQ ID NOS: 57, 59, 61 or AVIL GenBank accession no. AAK83170 as determined using the BLASTP
algorithm with the default parameters.
Members of protein family GTFH include polypeptides selected from AVIA
ORF 32 (SEO ID NO: 63), AVIL GenBank accession no. AAK83193, EVER ORF 8 (SEQ ID NO: 65), EVEA ORF 31 (SEQ ID NO: 67), and polypeptides having at least 99%, at least 95%, at least 90%, at least 85%, at least 80%, at least 70"/° or at least 60% homology to a polypeptide having the sequence of SEQ ID NOS: 63, 65, 67 or AVIL GenBank accession no. AAK83193 as determined using the BLASTP
algorithm with the default parameters.
Members of protein family HOXG include polypeptides selected from AVIA
ORF 37 (SEO ID NO: 69), EVER ORF 12 (SEQ ID NO: 71 ), EVEA ORF 43 (SEQ
ID NO: 73), and polypeptides having at least 99%, at least 95%, at least 90%, at least 85%, at least 80%, at ileast 70% or at least 60% homology to a polypeptide having the sequence of SE(~ ID NOS: 69, 71 or 73 as determined using the BLASTP algorithm with the default parameters.
Members of protein family MTFD include polypeptides selected from AVIA
ORF 22 (SEO ID NO: 99), AVIL GenBank accession no. AAK83184, EVER ORF
15 (SEO ID NO: 101 ), EVER ORF 8 (SEO ID NO: 103), and polypeptidEa having at least 99%, at least 95%, at (least 90%, at least 85%, at least 80%, at least 70% or at least 60% homology to a polypeptide having the sequence of SEQ ID NOS:99, 101, 103 or AVIL GenBank accession no. AAK83184 as determined using the BLASTP algorithm with the default parameters.
Members of protein family MTFE include polypeptides selected from AVIA
ORF 23 (SEQ ID NO: 105), AVIL GenBank accession no. AAK83186, EVER ORF
19 (SEQ ID NO: 107), EVER ORF 10 (SEQ ID NO: 109), and polypeptides having at least 99%, at least 95%, at least 90%, at least 85%, at least 80%, at least 70%
or at least 60% homology to a polypeptide having the sequence of SEO ID NOS:

105, 107, 109 or AVIL Genl3ank accession no. AAK83186 as determined using the BLASTP algorithm with the default parameters.
Members of protein family MTFF include polypeptides selected from AVIA
ORF 25 (SEO ID NO: 111 ), AVIL GenBank accession no. AAK83188, EVER ORF
(SEQ ID NO: 113), EVEA ORF 12 (SEQ ID NO: 115) and polypeptides having at least 99%, at least 95%, at least 90%, at least 85%, at least 80%, at least 70% or at least 60% homology to a polypeptide having the sequence of SEO 1D NOS: 111, 113, 115 or AVIL GenBank accession no. AAK83188 as determined using the BLASTP algorithm with the default parameters Members of protein family MTLA include polypeptides selected from AVIA
ORF 3 (SEQ ID NO: 127), AVIL GenBank accession no. AAG32067, EWER ORF
40 (SEQ ID NO: 129), EVEA ORF 45 (SEQ ID NO: 131 ) and polypeptides having at least 99%, at least 95%, at least 90%, at least 85%, at least 80%, at least 70%
or at least 60% homology to a polypeptide having the sequence of SECT ID NOS:
127, 129, 131 or AVIL GenE3ank accession no. AAG32067 as determined using the BLASTP algorithm with the default parameters.
Members of protein family MTIA include polypeptides selected from AVIA
ORF 1 (SEQ ID NO: 123), AVIL GenBank accession no. AAG32066, EVER ORF
13 (SEQ ID NO: 125) and polypeptides having at least 99%, at least 95'%, at least 90%, at least 85%, at least 80%, at least 70% or at least 60% homology to a polypeptide having the sequence of SEO ID NOS: 123, 125 or AVIL GenBank accession no. AAG32066 as determined using the BLASTP algorithm with the default parameters.
Members of protein family OXRV include polypeptides selected from AVIA
ORF 24 (SEQ ID NO: 153), AV1L GenBank accession no. AAK83187, EVER ORF
18 (SEQ ID NO: 155), EVEA ORF 11 (SEQ ID NO: 157) and polypeptides having at least 99%, at least 95%, at least 90%, at least 85%, at least 80%, at least 70%
or at least 60% homology to a polypeptide having the sequence of SEQ ID NOS:
153, 155, 157 or AVIL GenEiank accession no. AAK83187 as determined using the BLASTP algorithm with the default parameters.
Members of protein family OXRW include polypeptides selected from AVIA
ORF 33 (SEQ ID NO: 159), EVER ORF 26 (SEQ ID NO: 161 ), EVEA ORF 30 (SEQ ID NO: 163) and polypeptides having at least 99%, at least 95%, at least 90%, at least 85%, at least 80% , at least 70% or at least 60% homology to a polypeptide having the sequence of SEQ ID NOS: 159, 161 or 163 as determined using the BLASTP algorithm with the default parameters.
Members of protein family OXRW include polypeptides selected from AVIA
ORF 19 (SEO ID NO: 167), EVEA ORF 6 (SEO ID NO: 173), AVIL GenBank accession no. AAK83181, EVER ORF 31 (SEQ ID NO: 169) and polypeptides having at least 99%, at least 95%, at least 90%, at least 85%, at least 80%, at least 70% or at least 60% homology to a polypeptide having the sequence of SEO ID
NOS: 167, 169, 173 or AVIL_ GenBank accession no. AAK83181 as determined using the BLASTP algorithm with the default parameters.
Members of protein family PHOD include polypeptides selected from AVIA
ORF 34 (SEQ ID NO: 175), EVER ORF 33 (SEO ID NO: 177), EVEA ORF 29 (SEQ ID NO: 179) and polypeptides having at least 99%, at least 95%, at least 90%, at least 85%, at least 80%, at feast 70% or at least 60% homology to a polypeptide having the sequence of SEQ ID NOS: 175, 177 or 179 as determined using the BLASTP algorithm with the default parameters.
Members of protein family UNAJ include polypeptides selected from AVIA
ORF 18 (SEQ ID NO: 165), EVEA ORF 5 (SEQ ID NO: 171 ), EVER ORF 31 (SEO
ID NO: 169) and polypeptides having at least 99%, at feast 95%, at least 90%, at least 85%, at feast 80%, at least 70% or at least 60% homology to a pofypeptide having the sequence of SEO 1D NOS: 165, 169 or 171 as determined using the BLASTP algorithm with the default parameters.
Members of protein family UEVA include polypeptides selected from AVIA
ORF 26 (SEO ID NO: 193), AVIL GenBank accession no. AAK83189, EVER ORF
17 (SEO ID NO: 195), EVER ORF 14 (SEQ ID NO: 197) and polypeptides having at least 99%, at least 95%, at least 90%, at least 85%, at least 80%, at least 70%
or at least 60% homology to a polypeptide having the sequence of SEQ (D NOS:
193, 195, 197 or AVIL GenBank accession no. AAK83189 as determined using the BI..ASTP algorithm with the default parameters.
Members of protein family UEVB include polypeptides selected from AVIA
ORF 9 (SEQ ID NO: 199), AViL GenBank accession no. AAK83174, EVER ORF 9 3~~1-11CA CA 02375097 2002-03-28 (SEQ ID NO: 201 ), and polypeptides having at least 99%, at least 95%, at least 90%, at least 85%, at least 80%, at feast 70% or at least 60% homology to a polypeptide having the sequence of SEQ ID NOS: 199, 201 or AVIL GenBank accession no. AAK83174 as determined using the BLASTP algorithm with the default parameters.
Members of protein family UNKU include polypeptides selected from AVIA
ORF 2 (SEQ ID NO: 203), EVER ORF 25 (SEQ ID NO: 205), EVER ORF 32 (SEQ
ID NO: 207) and polypeptides having at least 99%, at least 95%, at least 90%, at least 85%, at least 80%, at least 70% or at least 60% homology to a polypeptide having the sequence of SE(~ ID NOS: 203, 205 or 207 as determined using the BLASTP algorithm with the default parameters.
Structural features that distinguish everninomicin-type orthosomycins from other orthosomycins require one or more proteins selected from a group of nine protein families, namely DATC, DEPF, EPIM, GTFA, MTFG, MTFV, OXBN, OXCO
and UNBB. A polypeptide representing a member of any one of these nine protein families or a polynucleotide encoding a polypeptide representing a member of any one of these nine protein families is considered diagnostic of an everninomicin-type orthosomycin gene cluster and an everninomicin-type orthosomycin producing organism. In a preferred embodiment, a polypeptide representing a member of any one of these nine protein families, i.e. DATC, DEPF, EPIM, GTFA, MTFG, MTFV, OXEN, OXCO and UNBB, or a polynucleotide encoding a polypeptide representing a member of these nine protein families is detected together with one or more polypeptides representing a member of any one of the seventeen protein families diagnostic of an orthosomycin biosynthetic gene cluster, i.e. GTFE, GTFG, GTFH, HOXG, MTFD, MTFE, MTFF, MTLA, MTIA, OXRV, OXRW, OXRW, PHOD, UNAJ, UEVA, UEVB and UNKU or one or more polynucleotides encoding a polypeptide representing a member of these seventeen protein families.
It is not expected that an everninomicin-type orthosomycin biosynthetic locus will contain a member of each of the nine protein families considered diagnostic of everninomicin-type orthosomycin biosynthetic loci. Rather, the presence of at least one, preferably two, more preferably three, still more preferably four, and most preferably six or more of the nine protein families DATC, DEPF, EPIM, GTFA, MTFG, MTFV, OXBN, OXCO and UNBB indicates the presence of an everninomicin-type orthosomycin biosynthetic locus and an everninomicin-type orthosomycin producing organism. In a preferred embodiment, the presence of at least one, preferably two, more preferably three, still more preferably four, and most preferably six or more of the nine protein families DATC, DEPF, EPIM, GTFA, MTFG, MTFV, OXBN, OXCO and UNBB, detected together with the presence of at least one, preferably 2, more preferably more preferably 4, still more preferably 6, still more preferably 8 still more preferably 10 or more of the seventeen protein families diagnostic of an orthosomycin biosynthetic gene cluster, i.e. GTFE, GTFG, GTFH, HOXG, MTFD, MTFE, MTFF, MTLA, MTIA, OXRV, OXRW, OXRW, PHOD, UNAJ, UEVA, UEVB and UNKU indicates the presence of an everninomicin-type orthosomycin biosynthetic locus and an everninomicin-type orthosomycin producing organism.
Members of the protein family DATC include polypeptides selected from EVER ORF 43 (SEO ID NO: 209), EVEA ORF 37 (SEQ ID NO: 211 ) and polypeptides having at least 99%, at least 95%, at least 90%, at least 85%, at least 80%, at least 70% or at least 60% homology to a polypeptide of EVER ORF 43 (SEO ID NO: 209) or EVEA ORF 37 (SEQ ID NO: 211) as determined using the BLASTP algorithm with the default parameters.
Members of the protein family DEPF include polypeptides selected from EVER ORF 46 (SEQ ID NO: 213), EVEA ORF 40 (SEQ ID NO: 215) and polypeptides having at least 99%, at least 95%, at least 90%, at least 85%, at least 80%, at feast 70% or at least 60% homology to a polypeptide of EVER ORF 46 (SEQ ID NO: 213) or EVEA ORF 40 (SEQ ID NO: 215) as determined using the BLASTP algorithm with the default parameters.
Members of the protein family EPIM include polypeptides selected from EVER ORF 45 (SEO ID NO: 217), EVEA ORF 39 (SEQ ID NO: 219) and polypeptides having at least 99%, at least 95%, at least 90%, at least 85%, at least 80%, at least 70% or at least 60% homology to a polypeptide of EVER ORF 45 (SEQ ID NO: 217) or EVEA ORF 39 (SEQ ID NO: 219) as determined using the BLASTP algorithm with the default parameters.

Members of the protein family GTFA include polypeptides selected from EVER ORF 21 (SEQ ID NO: 221 ), EVEA ORF 35 (SEQ ID NO: 223) and polypeptides having at least 99%, at least 95%, at least 90%, at least 85%, at least 80%, at least 70% or at least 60% homology to a polypeptide of EVER ORF 21 (SEO ID NO: 221) or EVEA ORF 35 (SECT ID NO: 223) as determined using the BLASTP algorithm with the default parameters.
Members of protein family MTFG include polypeptides selected from EVER
ORF 3 (SEQ ID NO: 225), EVEA ORF 18 (SEO ID NO: 227), and polypeptides having at least 99%, at least 95%, at least 90%, at least 85%, at least 80%, at least 70% or at least 60% homology to a polypeptide of EVER ORF 3 (SEO ID NO: 225) or EVEA ORF 18 (SEGO ID PLO: 227) as determined using the BLASTP algorithm with the default parameters.
Members of protein family MTFV include polypeptides selected from EVER
ORF 44 (SEO ID NO: 229), EVER ORF 38 (SEQ ID NO: 231) and polypeptides having at least 99%, at least 95%, at least 90%, at least 85%, at least 80°!°, at least 70% or at least 60% homology to a polypeptide of EVER ORF 44 (SEO ID NO:
229) or EVEA ORF 38 (SEQ ID NO: 231) as determined using the BLASTP
algorithm with the default parameters.
Members of protein family OXBN include polypeptides selected from EVER
ORF 42 (SEQ ID NO: 233), EVEA 36 (SEQ ID NO: 235) and polypeptides having at least 99%, at least 95%, at least 90%, at least 85%, at least 80%, at least 70%
or at least 60% homology to a polypeptide of EVER ORF 42 (SEO ID NO: 233) or EVEA 36 (SEO ID NO: 235) as determined using the BLASTP algorithm with the default parameters.
Members of protein family OXCO include polypeptides selected from EVER
ORF 4 (SEQ ID NO: 237), EVEA ORF 19 (SEO ID NO: 239) and polypeptides having at least 99%, at least 95%, at least 90%, at least 85%, at least 80%, at least 70% or at least 60% homology to a polypeptide of EVER ORF 4 (SEQ ID NO: 237) or EVEA ORF 19 (SEO ID NO: 239) as determined using the BLASTP algorithm with the default parameters.
Members of protein family UNBB include polypeptides selected from EVER
ORF 47 (SEQ ID NO: 241 ), EVEA ORF 41 (SEQ ID NO: 243) and polypeptides having at least 99%, at least 95%, at least 90%, at least 85%, at least 80%, at least 70% or at least 60% homology to a polypeptide of EVER ORF 47 (SEGI ID NO:
241 ) or EVEA ORF 41 (Seta ID NO: 243) as determined using the BLASTP
algorithm with the default parameters.
Structural features that distinguish avilamycin-type orthosomycins from other orthosomycins involve one or more proteins selected from a group of six protein families, namely ABCD, DEPN, MEMD, REBU, UNAI and UNBR. A polypeptide representing a member of any one of these six protein families or a polynucleotide encoding a polypeptide representing a member of any one or these six protein families is considered diagnostic of an avilamycin-type orthosomycin gene cluster and an avilamycin-type orthosomycin producing organism. In a preferred embodiment, a polypeptide representing a member of any one of these six protein families, i.e. ABCD, DEPN, MEMD, REBU, UNAI and UNBR or a polynucleotide encoding a polypeptide representing a member of these six protein families is detected together with one or more polypeptides representing a member of any one of the seventeen protein families diagnostic of an orthosomycin biosynthetic gene cluster, i.e. GTFE, GTFG, GTFH, HOXG, MTFD, MTFE, MTFF, MTLA, MTIA, OXRV, OXRW, OXRW, PHOD, UNAJ, UEVA, UEVB and UNKU or one or more polynucleotides encoding a polypeptide representing a member of these seventeen protein families.
It is not expected that an avilamycin-type orthosomycin biosynthetic locus will contain a member of each of the six protein families considered diagnostic of avilamycin-type orthosomycin biosynthetic loci. Rather, the presence of at least one, preferably two, more preferably three, still more preferably four, and most preferably five or six of the protein families ABCD, DEPN, MEMD, REBU, UNAI
and UNBR indicates the presence of an avilamycin-type orthosomycin biosynthetic locus and an avilamycin-type orthosomycin producing organism. In a preferred embodiment, the presence of at least one, preferably two, more preferably three, still more preferably four, and most preferably five or six of the protein families ABCD, DEPN, MEMD, REBU, UNAI and UNBR, detected together with the presence of at least one, preferably 2, more preferably 4, still more preferably 6, still more preferably 8 still more preferably 10 or more of the seventeen protein families diagnostic of an ori:hosomycin biosynthetic gene cluster, i.e. G'TFE, GTFG, GTFH, HOXG, MTFD, MTFE, MTFF, MTLA, MTIA, OXRV, OXRW, OXRW, PHOD, UNAJ, UEVA, UEVB and UNKU indicates the presence of an avilamycin-type orthosomycin biosynthetic focus and an avilamycin-type orthosomcyin producing organism.
Members of protein family ABCD include polypeptides selected from AVIA
ORF 27 (SEO ID NO: 245), AVIL GenBank accession no. AAG32068 and polypeptides having at least 99%, at least 95%, at least 90%, at least 85%, at least 80%, at least 70% or at lea st 60% homology to a polypeptide of AVIA ORF 27 (SEQ ID NO: 245) or AVIL GenBank accession no. AAG32068 as determined using the BLASTP algorithm with the default parameters.
Members of protein family DEPN include polypeptides selected from AVIA
ORF 21 (SEQ ID NO: 247), AVIL GenBank accession no. AAK83183, and polypeptides having at leash 99%, at least 95%, at least 90%, at least 85%, at least 80%, at least 70% or at least 60% homology to a polypeptide of AVIA ORF 21 (SEQ ID NO: 247) or AVIL (aenBank accession no. AAK83183 as determined using the BLASTP algorithm with the default parameters.
Members of the protE~in family MEMD include polypeptides selected from AVIA ORF 28 (SEO ID NO: 249), AVIL GenBank accession no. AAG32069, and polypeptides having at least: 99%, at least 95%, at least 90%, at least 85%, at least 80%, at least 70% or at least 60% homology to a polypeptide of AVIA ORF 28 (SEQ ID NO: 249) or AVIL (aenBank accession no. AAG32069 as determined using the BLASTP algorithm with the default parameters.
Members of the protein family REBU include polypeptides selected from AVIA ORF 7 (SEQ ID NO: 251 ), AVIL GenBank accession no. AAK83172, and polypeptides having at least 99%, at least 95%, at least 90%, at least 85%, at least 80%, at least 70% or at least 60% homology to a polypeptide of AVIA ORF 7 (SEQ
ID NO: 251 ) or AVIL GenBank accession no. AAK83172 as determined using the BLASTP algorithm with the default parameters.
Members of the protein family UNAI include polypeptides selected from AVIA ORF 6 (SEQ ID NO: 253), AVIL GenBank accession no. AAK83i71 and polypeptides having at least 99%, at least 95%, at least 90%, at least 8:i%, at least 80%, at feast 70% or at least 60% homology to a polypeptide of AVIA ORF 6 (SEQ
ID NO: 253) or AVIL GenBank accession no. AAK83171 as determined using the BLASTP algorithm with the default parameters.
Members of the protein family UNBR include polypeptides selected from AVIA ORF 10 (SEQ ID NO: 255), AVIL GenBank accession no. AAK83175, and polypeptides having at least 99%, at least 95%, at least 90%, at least 85%, at least 80%, at least 70% or at least 60% homology to a polypeptide of AVIA ORF 10 (SEQ ID NO: 255) or AVIL GenBank accession no. AAK83175 as determined using the BLASTP algorithm with the default parameters.
Hybridization Probes and PCR Primers:
To identify an orthosomycin-producing organism or an orthosomycin biosynthetic locus, nucleic acids from cultivated microorganisms or from an environmental sample, e.g. soil, potentially harboring an organism having the genetic capacity to produce an orthosomycin compound may be contacted with a probe based on nucleotide sequences coding a member of the 17 protein families associated with biosynthesis of the structural features common to orthosomycins.
Useful probes may be designed based on a nucleic acid or a combination of nucleic acids selected from the group consisting of (1 ) a nucleic acid sequence encoding a polypeptide of the GTFE family, for example a nucleic acid of SEQ
ID
NOS: 52, 54, 56, (the nucleic acid sequences coding for the GTFE protein in AVIA
ORF 31, EVER ORF 24 and EVEA ORF 33 respectively) or the nucleic acid sequence coding for AVIL GenBank accession no. AAK83192; (2) a nucleic acid sequence encoding a polypeptide of the GTFG family, for example a nucleic acid of SEO ID NOS: 58, 60, 62 (the nucleic acid sequences coding for the GTFG protein in AVIA ORF 5, EVER ORF 35 and EVEA ORF 27 respectively) or the nucleic acid sequence coding for AVIL GenBank accession no. AAK83170; (3) a nucleic acid sequence encoding a polypE:ptide of the GTFH family, for example a nucleic acid of SEQ ID NOS: 64, 66, 68 (the nucleic acid sequences coding for the GTf=H
protein in AVIA ORF 32, EVER ORF 8 and EVEA ORF 31 respectively) or the nucleic acid sequence coding for AVIL GenBank accession no. AAK83193; (4) a nucleic acid sequence encoding a polypeptide of the HOXG family, for example a nucleic acid of SEQ ID NOS: 70, 72, 74 (the nucleic acid sequences coding for the HOXG

protein in AVIA ORF37, EVER ORF 12 and EVEA ORF 43 respectively); (5) a nucleic acid sequence encoding a polypeptide of the MTFD family, for example a nucleic acid of SEQ ID NO S: 100, 102, 104 {the nucleic acid sequences coding for the MTFD protein in AVIA ORF 22, EVER ORF 15 and EVEA ORF 8 respectively) or the nucleic acid sequence coding for AVIL GenBank accession no. AAK83184;
(6) a nucleic acid sequence encoding a polypeptide of the MTFE family, for example a nucleic acid of SEO ID NOS: 106, 108, 110 (the nucleic acid sequences coding for the MTFE protein in AVIA ORF 23, EVER ORF 19 and EVEA ORF 10 respectively) or the nucleic acid sequence coding for AVIL GenBank accession no.
AAK83186; (7) a nucleic acid sequence encoding a polypeptide of the MTFF
family, for example a nucleic acid of SEQ ID NOS: 112, 114, 116 (the nucleic acid sequences coding for the MTFF protein in AVIA ORF 25, EVER ORF 5 and EVEA
ORF 12 respectively) or the nucleic acid sequence coding for AVIL GenBank accession no. AAK83188; (8) a nucleic acid sequence encoding a polypeptide of the MTLA family, for example a nucleic acid of SEQ ID NOS: 128, 130, 132 (the nucleic acid sequences coding for the MTLA protein in AVIA ORF 3, EVER ORF 40 and EVEA ORF 45 respectively) or the nucleic acid sequence coding for AVIL
GenBank accession no. AAG32067; (9) a nucleic acid sequence encoding a polypeptide of the MTIA family, for example a nucleic acid of SEO ID NOS: 124, 126 (the nucleic acid sequences coding for the MTIA protein in AVIA ORF 1 and EVER ORF 13 respectively) or the nucleic acid sequence coding for AVIL
GenBank accession no. AAG32066; (10) a nucleic acid sequence encoding a polypeptide of the OXRV family, for example a nucleic acid of SEO ID NOS: 154, 156, 158 (the nucleic acid sequences coding for the OXRV protein in AVIA ORF
24, EVER ORF 18 and EVEA ORF 11 respectively) or the nucleic acid sequence coding for AVIL GenBank accession no. AAK83187; (11 ) a nucleic acid sequence encoding a polypeptide of the OXRW family, for example a nucleic acid of SEO
ID
NOS: 160, 162 and 164 (they nucleic acid sequences coding for the OXRW protein in AVIA ORF 33, EVER ORF 26 and EVEA ORF 30 respectively); (12) a nucleic acid sequence encoding a polypeptide of the OXRW/OXRX family, for example a nucleic acid of SEQ lD NOS: (the nucleic acid sequences coding for the second OXRW protein in AVIA ORF 19, SEQ ID NO: 167; EVEA ORF 6; SEQ ID NO: 173, respectively), SEQ ID NO: 170 (the nucleic acid coding the OXRX protein in EVER
ORF 31, and the nucleic acid sequence coding for AVIL GenBank accession no.
AAK83181; (13) a nucleic acid sequence encoding a polypeptide of the PHOD
family, for example a nucleic acid of SEQ ID NOS: 176, 178 and 180 (the nucleic acid sequences coding for the PHOD protein in AVIA ORF 34, EVER ORF 33 and EVEA ORF 29 respectively); (14) a nucleic acid sequence encoding a polypeptide of the UNAJ/OXRX family, vfor example a nucleic acid of SEQ ID NOS: (the nucleic acid sequences coding for the UNAJ protein in AVIA ORF 18, SEQ ID NO: 165, and EVEA ORF 5, SEQ ID NO: 171, respectively), SEQ ID NO: 170 (the nucleic acid coding the OXRX protein in EVER ORF 31 ); (15) a nucleic acid sequence encoding a polypeptide of the UEVA family, for example a nucleic acid of SEQ
ID
NOS: 194, 196 and 198 (the nucleic acid sequences coding for the UEVA protein in AVIA ORF 26, EVER ORF 17 and EVEA ORF 14 respectively) or the nucleic acid sequence coding for A'VIL GenBank accession no. AAK83189; (16) a nucleic acid sequence encoding a polypeptide of the UEVB family, for example a nucleic acid of SEQ ID NOS: 200 and 202 (the nucleic acid sequences coding for the UEVB protein in AVIA ORF 9, and EVER ORF 9 respectively) or the nucleic acid sequence coding for AVIL CaenBank accession no. AAK83174; (17) a nucleic acid sequence encoding a polypeptide of the UNKU family, for example a nucleic acid of SEQ ID NOS: 204, 206, 208 (the nucleic acid sequences coding for the UNKU
protein in AVIA ORF 2, EVER ORF 25 and EVEA ORF 32 respectively). Preferred probes are isolated, purified or enriched nucleic acids derived from SEQ ID
NOS:
52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 100, 102, 104, 106, 108, 110, 112, 114, 116, 124, 126, 128, 130, 132, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 194, 196, 198, 200, 202, 204, 206, 208 and the sequences complementary 'thereto, or a fragment comprising at least 10, 15, 20, 25, 30, 40, 50, 75, 100, 150, 200, 300, 400 or 500 consecutive bases of one of the sequences of SEQ ID NO: E.2, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 100, 102, 104, 106, 108, 110, 112, 114, 116, 124, 126, 128, 130, 132, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 194, 196, 198, 200, 202, 204, 206, 208 or the sequences complementary thereto.

3001-11~A CA 02375097 2002-03-28 In such procedures, nucleic acids are obtained from cultivated microorganisms or from an environmental sample potentially harboring an organism having the genetic capacity to produce an orthosomycin compound. The nucleic acids are contacted with probes designed based on the teachings and compositions of the invention under conditions which permit the probe to specifically hybridize to any complementary sequences indicative of the presence of an orthosomycin-specific protein family. The presence of at least two, preferably three, more preferably four, more preferably five, more preferably six, more preferably 8, still more preferably 10 or more of the seventeen orthosomycin specific protein families indicates the presence of an orthosomycin biosynthetic locus or an orthosomycin producing organism.
Diagnostic nucleic acid sequences for identifying arthosomycin genes, biosynthetic loci, and microorganisms that harbor such genes or loci may be employed on complex mixtures of microorganisms such as those from environmental samples (e.y., soil). A mixture of microorganisms refers to a heterogeneous population of microorganisms consisting of more than one species or strain. In the absence of amplification outside of its natural habitat, such a mixture of microorganisms is said to be uncultured. A cultured mixture of microorganisms may be obtained by amplification or propagation outside of its natural habitat by in vitro culture using various growth media that provide essential nutrients. However, depending on the growth medium used, the amplification may preferentially result in amplification of a sub-population of the mixture and hence may not be always desirable. If desired, a pure culture representing a single species or strain may obtained from either a cultured or uncultured mixture of microorganisms by established microbiological techniques such as serial dilution followed by growth on solid media so as to isolate individual colony forming units.
Orthosomycin genes and/or orthosomycin biosynthetic loci may be identified from either a pure culture or cultured or uncultured mixtures of microorganisms employing the diagnostic nucleic acid sequences disclosed in this invention by experimental techniques such as PCR, hybridization, or shotgun sequencing followed by bioinformatic analysis of the sequence data. The identification of orthosomycin genes and/or an orthosomycin biosynthetic locus in a pure culture of a single organism directly distinguishes such an organism with the genetic potential to produce a natural compound or multiple natural compounds belonging to the orthosomycin class. The idlentification of orthosomycin genes and/or orthosomycin biosynthetic loci in a cultured or uncultured mixture of microorganisms requires further steps to identify and isolate the microorganism (s) that harbors) them so as to obtain pure cultures of such microorganisms. One general method that might be employed to identify microorganisms that harbour orthosomycin genes and/or orthosomycin biosynthetic Foci from a cultured mixture of microorganisms is the colony lift technique (Ausubel et al., Current Protocols in Molecular Biology, John Wiley 503 Sons, Inc. 1997; and Sambrook et aL, Molecular Cloning: A Laboratory Manual 2d Ed., Cold Spring Harbor Laboratory Press, 1989) in which the mixture is grown on an appropriate solid medium, the resulting colony forming units are replicated on a solid matrix such as a nylon membrane, the membrane is contacted with detectable diagnostic nucleic acid sequences, the positive colony forming units are identified, and the corresponding colony forming units on the original medium are identified, purified, and amplified.
The orthosomycin diagnostic nucleic acids may be used to survey a number of environmental samples for the presence of organisms that have the potential to produce orthosomycin compounds, i.e., those organisms that contain orthosomycin genes and/or orthosomycin biosynthetic loci. One protocol for use of a survey to identify a polypeptide from DNA isolated from uncultured mixtures of microorganisms is outlined in Seow et al. (1997) J. Bacteriol. Vol. 179 pp.

7368.
To identify an everninomicin-type orthosomycin producer or an everninomicin-type orthosomycin biosynthetic gene cluster, nucleic acids from an environmental sample, e.g. soil, potentially harboring an organism having the genetic capacity to produce an everninomicin-type orthosomycin compound may further contacted be with a probe constructed based on a nucleotide sequence corresponding to the protein families associated with the structural features unique to everninomicin-type orthosomycins. Useful probes may be designed based on a nucleic acid selected from the group consisting of (1 ) a nucleic acid sequence encoding a polypeptide of tree DATC family, for example a nucleic acid of SEO
ID

3~fl1-~iC/~ CA 02375097 2002-03-28 NOS: 210, 212 (the nucleic acid sequences coding for the DATC protein in EVER
ORF 43 and EVEA ORF 37 respectively); (2) a nucleic acid sequence Encoding a polypeptide of the DEPF family, for example a nucleic acid of SEQ ID NOS: 214, 216 (the nucleic acid sequences coding for the DEPF protein in EVER ORF 46 and EVEA ORF 40 respectively; (3) a nucleic acid sequence encoding a polypeptide of the EPIM family, for examplle a nucleic acid of SEO ID NOS: 218 and 220 (the nucleic acid sequences coding for the EPIM protein in EVER ORF 45 and EVEA
ORF 39 respectively); (4) a nucleic acid sequence encoding a polypeptide of the GTFA family, for example a nucleic acid of SEQ ID NOS: 222 and 224 (the nucleic acid sequences coding for the GTFA protein in EVER ORF 21 and EVE:A ORF 35 respectively); (5) a nucleic acid sequence encoding a polypeptide of the MTFG
family, for example a nucleic acid of SEQ ID NOS: 226, 228 (the nucleic acid sequences coding for the MTFG protein in EVER ORF 3 and EVEA ORF 18 respectively); (6) a nucleic acid sequence encoding a polypeptide of the MTFV
family, for example a nucleic acid of SEQ ID NOS: 230, 232 (the nucleic acid sequences coding for the MTFV protein in EVER ORF 44 and EVEA ORF 38 respectively); (7) a nucleic acid sequence encoding a polypeptide of the OXBN
family, for example a nucleic acid of SEQ ID NOS: 234 and 236 (the nucleic acid sequences coding for the OXBN protein in EVER ORF 42 and EVEA ORF 36 respectively); (8) a nucleic acid sequence encoding a polypeptide of the OXCO
family, for example a nucleic acid of SEQ ID NOS: 238, 240 (the nucleic acid sequences coding for the OXCO protein in EVER ORF 4 and EVEA ORF 19 respectively); and (9) a nucleic acid sequence encoding a polypeptide of the UNBB
family, for example a nucleic acid of SEO ID NOS: 242, 244 (the nucleic; acid sequences coding for the UNBB protein in EVER ORF 47 and EVEA ORF 41 respectively). Preferred probes are isolated, purified or enriched nucleic acid derived from SEO ID NOS: 210, 212, 214, 216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, and the sequences complementary thereto, or a fragment comprising at leasvl 10, 15, 20, 25, 30, 40, 50, 75, 100, 150, 200, 300, 400 or 500 consecutive bases of one of the sequences of SECT ID NOS: 210, 212, 214, 216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244 and the sequences complementary i:hereto.

In such procedures, nucleic acids are obtained from cultivated microorganisms or from an environmental sample potentially harboring an organism having the genetic capacity to produce an everninomicin-type orthosomycin compound. The environmental sample may be a mixture of microorganisms or a pure culture of a single microorganism. The nucleic acids are contacted with probes designed based on the teachings and compositions of the invention under conditions which permit the probe to specifically hybridize to any complementary sequences indicative of the presence of an everninomicin-type orthosomycin-specific protein family. The presence of at least one, preferably 2, more preferably 4, still morel preferably 6 or more of the nine everninomicin-type orthosomycin specific protein families indicates the presence of an everninomicin-type orthosomycin biosynthetic locus and an everninomicin-type orthosomycin producing organism.
To identify an avilamycin-type orthosomycin producer or an avilamycin-type biosynthetic locus, nucleic acids from cultivated microorganisms or from an environmental sample, e.g. soil, potentially harboring an organism having the genetic capacity to produce an avilamycin-type orthosomycin compound is further contacted with a probe corresponding to a member of the six protein farnilies associated with biosynthesis of the structural features common to avilamycin-type orthosomycins. Useful probes may be constructed from a nucleic acid selected from the group consisting of (1 ) a nucleic acid sequence encoding a polypeptide of the ABCD family, for example SEQ ID NO: 246 (AVIA ORF 27) or AVIL GenBank accession no. AAG32068; (2) a nucleic acid sequence encoding a polypeptide of the DEPN family, for example SEQ ID NO: 248 (AVIA ORF 21 ) or AVIL GenBank accession no. AAK83183; (3) a nucleic acid sequence encoding a polypeptide of the MEMD family, for example SEO ID NO: 250 (AVIA ORF 28) or AVIL GenBank accession no. AAG32069; (4) a nucleic acid sequence encoding a polypeptide of the REBU family, for example SEQ fD NO: 252 (AVIA ORF 7) or AVIL GenBank accession no. AAK83172; (5) a nucleic acid sequence encoding a polypeptide of the UNAI family, for examplE: SEO ID NO: 254 (AVIA ORF 6) or AVIL GenBank accession no. AAK83171; and (6) a nucleic acid sequence encoding a polypeptide of the UNBR family, for example SEQ ID NO: 256 (AVIA ORF 10) or AVIL

GenBank accession no. AAK83175. Preferred probes are isolated, purified or enriched nucleic acid derived from SEQ ID NOS: 246, 248, 250, 252, 254, 256 and the nucleic acid sequences corresponding to Genbank accession nos. AAG32068, AAK83183, AAG32069, AAK83172, AAK83171 and AAK83175, the sequences complementary thereto, or ;~ fragment comprising at least 10, 15, 20, 25, 30, 40, 50, 75, 100, 150, 200, 300, 400 or 500 consecutive bases of one of the sequences of SEQ ID ~JOS: 246, 248, 250, 252, 254, 256 and the sequences complementary thereto.
In such procedures, nucleic acids are obtained from cultivated 1 o microorganisms or from an environmental sample potentially harboring an organism having the genetic capacity to produce an avilamycin-type orthosomycin compound. The environmental sample may be a mixture of microorganisms or a pure culture of a single microorganism. The nucleic acids are contacted with probes designed based on i:he teachings and compositions of the invention under conditions which permit the probe to specifically hybridize to any complementary sequences indicative of the presence of an avilamycin-type orthosomycin-specific protein family. The presence of at least one, preferably 2, more preferably 3, still more preferably 4 or more of the six avilamycin-type orthosomycin specific protein families indicates the presence of an avilamycin-type orthosomycin biosynthetic 20 locus and an avilamycin-type orthosomycin producing organism.
Where necessary, conditions which permit the probe to specifically hybridize to complementary sequencE;s from an orthosomycin-producer may be determined by placing the probe in contact with complementary sequences obtained from an orthosomycin-producer as well as control sequences which are not from an orthosomycin-producer. In some analyses, the control sequences may be from organisms related to orthosomycin-producers. Alternatively, the control sequences are not related to orthosomycin-producers. Hybridization conditions, such as the salt concentration of the hybridization buffer, the formamide concentration of the hybridization buffer, or the hybridization temperature, may be varied to identify 30 conditions which allow the probe to hybridize specifically to nucleic acids from orthosomycin-producers.

If the sample contains nucleic acids from orthosomycin-producers, specific hybridization of the probe to the nucleic acids from the orthosomycin-producer is then detected. Hybridization may be detected by labeling the probe with a detectable agent such as a radioactive isotope, a fluorescent dye or an enzyme capable of catalyzing the formation of a detectable product.
Many methods of using the labeled probes to detect the presence of nucleic acids from an orthosomycin-producer in a sample are familiar to those skilled in the art. These include Southern Blots, Northern Blots, colony hybridization procedures, and dot blots. Protocols for each of these procedures are provided in Ausubel et al., Current Protocols in Molecular Biology, John Wiley 503 Sons, Inc.
1997; and Sambrook et al., Molecular Cloning: A Laboratory Manual 2d Ed., Cold Spring Harbor Laboratory Press, 1989.
Alternatively, more than one probe designed based on the teachings and compositions of the invention may be used in an amplification reaction to determine whether the nucleic acid sample contains nucleic acids from an orthosomycin-producer. Preferably the probes comprise oligonucleotides. In one embodiment, the amplification reaction may comprise a Polymerase Chain Reaction (PCR) reaction. PCR protocols are described in Ausubel and Sambrook, supra. In such procedures, the nucleic acids in the sample are contacted with the probes, the 2o amplification reaction is performed, and any amplification product is detected. The amplification product may be detected by performing gel electrophoresis on the reaction products and staining the gel with an interculator such as ethidium bromide. Alternatively, one or more of the probes may be labeled with a radioactive isotope and the presence of a radioactive isotope and the presence of a radioactive amplification product may be detected by autoradiography after gel electrophoresis.
The isolated, purified or enriched nucleic acids of SEQ ID NOS: 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 100, 102, 104, 106, 108, 110, 112, 114, 116, 124, 126, 128, 130, 132, 154, 156, 158, 160, 162, 164, 166, 168,170, 1 i'2, 174, 30 176, 178, 180, 194, 196, 198, 200, 202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, 252, 254, 256, the sequences complementary thereto, or a fragment comprising at least 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, 150, 200, 300, 400 or 500 consecutive bases of one of the sequence of SEQ ID NOS: 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 100, 102, 104, 106, 108, 110, 112, 114, 116, 124, 126, 128, 130, 132, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 194, 196, 198, 200, 202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, 252, 254, 256, or the sequences complementary thereto may be used as probes to identify and isolate DNAs encoding the polype~>tides of SEQ ID NOS: 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 99, 101, 103, 105, 107, 109, 111, 113, 115, 123, 125, 127, 129, 131, 153, 155, 157, 159, 161, 163, 165, 167, 169, 171, 173, 175, 177, 179, 193, 195, 197, 199, 201, 203, 205, 207, 209, 211, 213, 215, 217, 219, 221, 223, 225, 227, 229, 231, 233, 235, 237, 239, 241, 243, 245, 247, 249, 251, 253, 2'.55 respectively. In such procedures, a genomic DNA library is constructed from a sample containing an orthosomycin producer. The genomic DNA library is then contacted with a probe comprising a coding sequence or a fragment of the coding sequence, encoding one of the polypeptides of SEQ ID NOS: 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 99, 101, 103, 105, 107, 109, 111, 113, 115, 123, 125, 127, 129, 131, 153, 155, 157, 159, 161, 163, 165, 167,169, 171, 173, 175, 177, 179, 193, 195, 197, 199, 201, 203, 205, 207, 209, 211, 213, 215, 217, 219, 221, 223, 225, 227, 229, 231, 233, 235, 237, 239, 241, 243, 245, 247, 249, 251, 253, 255, or a fragment thereof under conditions which permit the probe to specifically hybridize to sequences complementary thereto. In a preferred embodiment, the probe is an oligonucleotide of about 10 to about 30 nucleotides in length designed based on a nucleic acid of SEQ ID NOS: 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 100, 102, 104, 106, 108, 110, 112, 114, 116, 124, 126, 128, 130, 132, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 194, 196, 198, 200, 202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, 252, 254, 256,. Genomic DNA
clones which hybridize to thc: probe are then detected and isolated.
Procedures for preparing and identifying DNA clones of interest are disclosed in Ausubel et al., Current Protocols in Molecular Biology, John Wiley 503 Sons, Inc. 1997. and Sambrook et al., Molecular Cloning: A Laboratory Manual 2d Ed., Cold Spring Harbor Laboratory Press, 1989.
The isolated, purified or enriched nucleic acids of SEQ ID NOS: 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 100, 102, 104, 106, 108, 110, 112, 114, 116, 124, 126, 128, 130, 132, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 194, 196, 198, 200, 202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 2 48, 250, 252, 254, 256, the sequences complementary thereto, or a fragment comprising at least 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, 150, 200, 300, 400 or 500 consecutive bases of one of the sequences of SEQ ID NOS: 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 100, 102, 104, 106, 108, 110, 112, 114, 116, 124, 126, 128, 130, 132, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 194, 196, 198, 200, 202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, 252, 254, 256, or the sequences complementary thereto may be used as probes to identify and isolate related nucleic acids. In some embodiments, the related nucleic acids may be genomic DNAs (or cDNAs) from potential orthosomycin producers. In such procedures, a nucleic acid sample containing nucleic acids from a potential orthosomycin-producer is contacted with the probe under conditions which permit the probe to specifically hybridize to related sequences. The nucleic acid sample may be a genomic DNA (or cDNA) library from the potential orthosomycin-producer. Hybridization of the probe to nucleic acids is then detected using any of the methods described above.
Hybridization may be carried out under conditions of low stringency, moderate stringency or high stringency. As an example of nucleic acid hybridization, a polymer membrane containing immobilized denatured nucleic acids is first prehybridized for 30 minutes at 45 °C in a solution consisting of 0.9 M NaCI, 50 mM NaH2P04, pH 7.0, 5.~0 mM Na2EDTA, 0.5% SDS, 10X Denhardt's, and 0.5 mg/ml polyriboadenylic acid. Approximately 2 x 10' cpm (specific activity 4-9 x 108 cpm/ug) of 32P end-labeled oligonucleotide probe are then added to the solution.
After 12-16 hours of incubatuon, the membrane is washed for 30 minutes at room temperature in 1 X SET (150 mM NaCI, 20 mM Tris hydrochloride, pH 7.8, 1 mM

Na2EDTA) containing 0.5% SDS, followed by a 30 minute wash in fresh 1 X SET at Tm-10 C for the oligonucleatide probe where Tm is the melting temperature. The membrane is then exposed to auto-radiographic film for detection of hybridization signals.
By varying the stringency of the hybridization conditions used to identify nucleic acids, such as genomic DNAs or cDNAs, which hybridize to the detectable probe, nucleic acids having different levels of homology to the probe can be identified and isolated. Stringency may be varied by conducting the hybridization at varying temperatures below the melting temperatures of the probes. The melting temperature of the probe may be calculated using the following formulas:
For oligonucleotide probes between 14 and 70 nucleotides in length the melting temperature (Tm) in degrees Celcius may be calculated using the formula:
Tm=81.5+16.6(log [Na+]) + 0.41 (fraction G+C)-(600/N) where N is the length of the oligonucleotide.
If the hybridization is carried out in a solution containing formamide, the melting temperature may be calculated using the equation Tm=81.5+16.6(log [Na +]) + 0.41 (fraction G + C)-(0.63% formamide)-(600/N) where N is the length of the probe.
Prehybridization may be carried out in 6X SSC, 5X Denhardt's reagent, 0.5% SDS, 0.1 mg/ml denatured fragmented salmon sperm DNA or 6X SSC, 5X
Denhardt's reagent, 0.5% SDS, 0.1 mg/ml denatured fragmented salmon sperm DNA, 50% formamide. The composition of the SSC and Denhardt's solutions are listed in Sambrook .et al., supra.
Hybridization is conducted by adding the detectable probe to the hybridization solutions listed above. Where the probe comprises double stranded DNA, it is denatured by incubating at elevated temperatures and quickly cooling before addition to the hybridization solution. It may also be desirable to similarly denature single stranded probes to eliminate or diminish formation of secondary structures or oligomerization. The filter is contacted with the hybridization solution for a sufficient period of timed to allow the probe to hybridize to cDNAs or genomic DNAs containing sequences complementary thereto or homologous thereto. For probes over 200 nucleotide; in length, the hybridization may be carried out at 25 °C below the Tm. For shorter probes, such as oligonucleotide probes, the hybridization may be conducted at 5-10 °C below the Tm. Preferably, the hybridization is conducted in 6X SSC, for shorter probes. Preferably, the hybridization is conducted in 50% formamide containing solutions, for longer probes.
All the foregoing hybridizations would be considered to be examples of hybridization performed under conditions of high stringency.
Following hybridization, the filter is washed for at least 15 minutes in 2X
SSC, 0.1 % SDS at room temperature or higher, depending on the desired stringency. The filter is then washed with 0.1 X SSC, 0.5°i° SDS
at room temperature (again) for 30 minutes to 1 hour.
Nucleic acids which have hybridized to the probe are identified by autoradiography or other conventional techniques.
The above procedurE~ may be modified to identify nucleic acids having decreasing levels of homology to the probe sequence. For example, to obtain nucleic acids of decreasing homology to the detectable probe, less stringent conditions may be used. For example, the hybridization temperature may be decreased in increments of 5 °C from 68 °C to 42 °C in a hybridization buffer having a Na+ concentration of approximately 1 M. Following hybridization, the filter may be washed with 2X SSC, 0.5°!° SDS at the temperature of hybridization.
These conditions are considered to be "moderate stringency" conditions above °C and "low stringency" conditions below 50 °C. A specific example of ''moderate stringency" hybridization conditions is when the above hybridization is conducted at 55 °C. A specific example of "low stringency" hybridization conditions is when the above hybridization is conducted at 45 °C.
Alternatively, the hybridization may be carried out in buffers, such as 6X
SSC, containing formamide at a temperature of 42 °C. In this case, the concentration of formamide in the hybridization buffer may be reduced in 5%
increments from 50% to 0% to identify clones having decreasing levels of homology to the probe. Following hybridization, the filter may be washed with SSC, 0.5% SDS at 50 °C. These conditions are considered to be "moderate stringency" conditions above 25% formamide and "low stringency" conditions below 25% formamide. A specific example of "moderate stringency" hybridization conditions is when the above hybridization is conducted at 30% formamide. A
specific example of "low stringency" hybridization conditions is when the above hybridization is conducted at 10% formamide.
Nucleic acids which have hybridized to the probe are identified by autoradiography or other conventional techniques.
For example, the preceding methods may be used to isolate nucleic acids having a sequence with at least 97%, at least 95%, at least 90%, at least 85%, at least 80%, or at least 70% homology to a nucleic acid sequence selected from the group consisting of the sequences of SEQ ID NOS: 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 100, 102, 104, 106, 108, 110, 112, 114, 116, 124, 126, 128, 130, 132, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 194, 196, 198, 200, 202, 204, 206, 208, 210, 212, 214, 216. 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, 252, 254, x'.56, fragments comprising at least 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, 150, 200, 300, 400, or 500 consecutive bases thereof, and the sequences complementary thereto. Homology may be measured using BLASTN version 2.0 with the default parameters. For example, the homologous polynucleotides may have a coding sequence which is a naturally occurring allelic variant of one of the coding sequences described herein. Such allelic variant may have a substitution, deletion or addition of one or more nucleotides when compared to the nucleic acids of SEQ
ID NOS: 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 100, 102, 104, 106, 108, 110, 112, 114, 116, 124, 126, 128, 130, 132, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 194, 196, 198, 200, 202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, 252, 254, 256, or the sequences complementary thereto.
Additionally, the above procedures may be used to isolate nucleic acids which encode polypeptides having at least 99%, 95%, at least 90%, at least 85%, at least 80%, or at least 70°ro homology to a polypeptide having the sequence of one of SEO ID NOS: 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 99, 101, 103, 105, 107, 109, 111, 113, 115, 123, 125, 127, 129, 131, 153, 155, 157, 159, 161, 163, 165, 167, 169, 171, 173, 175, 177, 179, 193, 195, 197, 199, 201, 203, 205, 207, 209, 211, 213, 215, 217, 219, 221, 223, 225, 227, 229, 231, 233, 235, 237, 239, 241, 243, 245, 247, 249, 251, 253, 255, or fragments comprising at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 consecutive amino acids thereof as determined using the BLASTP version 2.2.2 algorithm with default parameters.
Bioinformatics:
As used herein, the term "orthosomycin-specific nucleic acid codes"
encompass the nucleotide sequences of SEO ID NOS: 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 100, 102, 104, 106, 108, 110, 112, 114, 116, 124, 126, 128, 130, 132, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 194, 196, 198, 200, 202, 204, 206, 208, fragments of SEO ID NOS: 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 100, 102, 104, 106, 108, 110, 112, 114, 116, 124, 126, 128, 130, 132, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 194, 196, 198, 200, 202, 204, 206, 208, nucleotide sequences homologous to SEQ ID NOS: 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 100, 102, 104, 106, 108, 110, 112, 114, 116, 124, 126, 128, 130, 132, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 194, 196, 198, 200, 202, 204, 206, 208, or homologous to fragments of SEO ID NOS: 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 100, 102, 104, 106, 108, 110, 112, 114, 116, 124, 126, 128, 130, 132, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 194, 196, 198, 200, 202, 204, 206, 208, and sequences complementary to all of the preceding sequences. The fragments include portions of SECZ ID NOS: 52, 54, 56, 58, 60, 62, 64, 66, 68, T0, 72, 74, 100, 102, 104, 106, 108, 110, 112, 114, 116, 124, 126, 128, 130, 132, 154, 156, 158, 160, 162, 164, 166, 168, 17U, 172, 174, 176, 178, 180, 194, 196, 198, 200, 202, 204, 206, 208 comprising at least 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, 150, 200, 300, 400 or 500 consecutive nucleotides of SEQ ID NOS: 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 100, 102, 104, 106, 108, 110, 112, 114, 116, 124, 126, 128, 130, 132, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 194, 196, 198, 200, 202, 204, 206, 208.
Preferably, the fragments are novel fragments. Homologous sequences and fragments of SEO ID NOS: 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 100, 102, 104, 106, 108, 110, 112, 1 '14, 116, 124, 126, 128, 130, 132, 154, 1 SEi, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 194, 196, 198, 200, 202, 204, 206, 208 refer to a sequence having at least 99%, 98%, 97%, 96°1°, 95%, 90%, 80%, 75% or 70% homology to these sequences. Homology may be determined using any of the computer programs and parameters described herein, including BLASTN and TBLASTX wil:h the default parameters. Homologous sequences also include RNA sequences in which uridines replace the thymines in the nucleic acid codes of SECT ID NOS: 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 100, 102, 104, 106, 108, 110, 112, 114, 116, 124, 126, 128, 130, 132, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 194, 196, 198, 200, 202, 204, 206, 208. The homologous sequences may be obtained using any of the procedures described herein or may result from the correction of a sequencing error. It will be appreciated that the nucleic acid codes of SEQ ID NOS: 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 100, 102, 104, 106, 108, 110, 112, 114, 116, 124, 126, 128, 130, 132, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 194, 196, 1 '98, 200, 202, 204, 206, 208 can be represented in the traditional single character format in which G, A, T and C denote the guanine, adenine, thymine and cytosine bases of the deoxyribonucleic acid (DNA) sequence respectively, or in which G, A, U and C denote the guanine adenine, uracil and cytosine bases of the ribonucleic acid (RNA) sequence (see the inside back cover of Stryer, Biochemistry, 3'd edition, W. H. Freeman & Co., New York) or in any other format which records i:he identity of the nucleotides in a sequence.
The term "everninomicin-specific nucleic acid codes" encompass the nucleotide sequences of SEQ 1D NOS: 210, 212, 214, 216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, fragments of SEQ ID NOS:
210, 212, 214, 216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, nucleotide sequences (homologous to SEQ ID NOS: 210, 212, 214, 216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, or homologous to fragments of SEQ ID NOS: 210, 212, 214, 216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, and sequences complementary to all of the preceding sequences. The fragments include portions of SEQ ID NOS: 210, 212, 214, 216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244 comprising at least 10, 15, 2.0, 25, 30, 35, 40, 50, 75, 100, 150, 200, 300, 400 or 500 consecutive nucleotides of SEQ ID NOS: 210, 212, 214, 216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244. Preferably, the fragments are novel fragments. Homologous sequences and fragments of SEQ ID NOS: 210, 212, 214, 216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, refer to a sequence having at least 99%, 98%, 97%, 96%, 95%, 90%, 80%, 75% or 70% homology to these sequences. Homology may be determined using any of the computer programs and parameters described herein, including BLASTN and TBLASTX with the default parameters. Homologous sequences also include RNA sequences in which uridines replace the thymines in the nucleic acid codes of SEO ID NOS: 210, 212, 214, 216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244. The homologous sequences may be obtained using any of the procedures described herein or may result from the correction of a sequencing error. It will be appreciated that the nucleic acid codes of SEQ ID
NOS:
210, 212, 214, 216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244 can be represented in the traditional single character format in which G, A, T and C denote the guanine, adenine, thymine and cytosine bases of the deoxyribonucleic acid (DNA) sequence respectively, or in which G, A, U and C
denote the guanine adenine, uracil and cytosine bases of the ribonucleic acid (RNA) sequence (see the inside back cover of Stryer, Biochemistry, 3~d edition, W.
H. Freeman & Co., New York) or in any other format which records the identity of the nucleotides in a sequence.
The term "avilamycin-specific nucleic acid codes" encompass the nucleotide sequences of SEQ ID NOS: 246, 248, 250, 252, 254, 256 and the nucleic acid sequences corresponding to Genbank accession nos. AAG32068, AAK83183, AAG32069, AAK83172, AAK83171 and AAK83175; fragments of SEO ID NOS:
246, 248, 250, 252, 254, 256 and the nucleic acid sequences corresponding to Genbank accession nos. AAG32068, AAK83183, AAG32069, AAK83172, AAK83171 and AAK83175; nucleotide sequences homologous to SEO ID NOS:
246, 248, 250, 252, 254, 256 and the nucleic acid sequences corresponding to Genbank accession nos. AAG32068, AAK83183, AAG32069, AAK83172, AAK83171 and AAK83175; or homologous to fragments of SEQ ID NOS: 246, 248, 250, 252, 254, 256 and the nucleic acid sequences corresponding to Genbank accession nos. AAG32068, AAK83183, AAG32069, AAK83172, AAK83171 and AAK83175; and sequences complementary to all of the preceding sequences. The fragments include portions of SEQ ID NOS; 246, 248, 250, 252, 254, 256 and the nucleic acid sequences corresponding to Genbank accession nos. AACa32068, AAK83183, AAG32069, AAK83172, AAK83171 and AAK83175 comprising at least 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, 150, 200, 300, 400 or 500 consecutive nucleotides of SEQ ID NOS: 246, 248, 250, 252, 254, 256 and the nucleic acid sequences corresponding to Genbank accession nos. AAG32068, AAK83183, AAG32069, AAK83172, AAK83171 and AAK83175. Preferably, the fragments are novel fragments. Homologous sequences and fragments of SEQ ID NOS: 246, 248, 250, 252, 254, 256 and the nucleic acid sequences corresponding to Genbank accession nos. AAG32068, AAK83183, AAG32069, AAK8317 2, AAK83171 and AAK83175 refer to a sequence having at least 99%, 98%, 97%, 96%, 95%, 90%, 80%, 75% or 70% homology to these sequences. Homology may be determined using any of the computer programs and parameters described herein, including BLASTN and TBLASTX with the default parameters. Homologous sequences also include RN,A sequences in which uridines replace the thymines in the nucleic acid codes of SEQ ID NOS: 246, 248, 250, 252, 254, 256 and the nucleic acid sequences corresponding to Genbank accession nos. AAG32068, AAK83183, AAG32069, AAK83172, AAK83171 and AAK83175. The homologous sequences may be obtained using any of the procedures described herein or may result from the correction of a sequencing error. It will be appreciated that the nucleic acid codes of SEQ ID NOS: 246, 248, 250, 252, 254, 256 and the nucleic acid sequences corresponding to Genbank accession nos. AAG32068, AAK83183, AAG32069, AAK83172, AAK83171 and AAK83175 can be represented in the traditional single character format in which G, A, T and C denote the guanine, adenine, thymine and cytosiine bases of the deoxyribonucleic acid (DNA) sequence respectively, or in which G, A, U and C denote the guanine adenine, uracil and cytosine bases of the ribonucleic acid (RNA) sequence (see the inside back cover of Stryer, Biochemistry, 3rd edition, W. H. Freeman & Co., New York) or in any other format which records the identity of the nucleotides in a sequence.
"Orthosomycin-specific polypeptide codes" encompass the polypeptide sequences of SEQ ID NOS: 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 99, 101, 103, 105, 107, 109, 111, 113, 115, 123, 125, 127, 129, 131, 153, 155, 157, 159, 161, 163, 165, 167, 169, 171, 173, 175, 177, 179, 193, 195, 197, 199, 201, 203, 205, 207 which are encoded by the cDNAs of SEQ ID NOS: 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 100, 102, 104, 106, 108, 110, 112, 114, 116, 124, 126, 128, 130, 132, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 194, 196, 198, 200, 202, 204, 206, 208; polypeptide sequences homologous to the polypeptides of SEQ ID NOS: 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 99, 101, 103, 105, 107, 109, 111, 113, 115, 123, 125, 127, 129, 131, 153, 155, 157, 159, 161, 163, 165, 167, 169, 171, 173, 175, 177, 179, 193, 195, 197, 199, 201, 203, 205, 207, or fragments of any of the preceding sequences. Homologous polypeptide sequences refer to a polypeptide sequence having at least 99%, 98%, 97%, 96%, 95%, 90%, 85%, 80%, 75% or 70% homology to one of the polypeptide sequences of SEQ ID NOS:: 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 99, 101, 103, 105, 107, 109, 111, 113, 115, 123, 125, 127, 129, 131, 153, 155, 157, 159, 161, 163, 165, 167, 169, 171, 173, 175, 177, 179, 193, 195, 197, 199, 201, 203, 205, 207. Polypeptide sequence homology may be determined using any of the computer programs and parameters described herein, including BLASTP version 2.2.2 with the default parameters or with any user-specified parameters. The homologous sequences may be obtained using any of the procedures described herein or may result from the correction of a sequencing error. The polypeptide fragments comprise at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100 or 150 consecutive polypeptides of the polypeptides of SEQ ID NOS: 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 99, 101, 103, 105, 107, 109, 111, 113, 115, 123, 125, 127, 129, 131, 153, 155, 157, 159, 161, 163, 165, 167, 169, 171, 173, 175, 177, 179, 193, 195, 197, 199, 201, 203, 205, 207. Preferably the fragments are novel fragments. It will be appreciated that the polypeptide codes of the SEQ ID
NOS:
51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 99, 101, 103, 105, 107, 109, 111, 113, 115, 123, 125, 127, 129, 131, 153, 155, 157, 159, 161, 163, 165, 167, 169, 171, 173, 175, 177, 179, 193, 195, 197, 199, 201, 203, 205, 207 can be represented in the traditional single character format or three letter format (see the inside back cover of Stryer, Biochemistry, 3rd edition, W.H. Freeman & Co., New York) or in any other format which relates the identity of the polypeptides in a sequence.
"Everninomicin-specific polypeptide codes" encompass the polypeptide sequences of SEQ ID NOS: 209, 211, 213, 215, 217, 219, 221, 223, 225, 227, 229, 231, 233, 235, 237, 239, 241 and 243 which are encoded by the cDNAs of SEQ ID
NOS: 210, 212, 214, 216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244 and 244; polypeptide sequences homologous to the polypeptides of SEQ I D NOS: 209, 211, 213, 215, 217, 219, 221, 223, 225, 227, 229, 231, 233, 235, 237, 239, 241 and 243 or fragments of any of the preceding sequences.
Homologous polypeptide sequences refer to a polypeptide sequence having at least 99%, 98%, 97%, 96%" 95%, 90%, 85%, 80°l°, 75% or 70%
homology to one of the polypeptide sequences of SEQ ID NOS: 209, 211, 213, 215, 217, 219, 221, 223, 225, 227, 229, 231, 233, 235, 237, 239, 241 and 243. Polypeptide sequence homology may be determined using any of the computer programs and parameters described herein, including BLASTP version 2.2.2 with the default parameters or with any user-specified parameters. The homologous sequences may be obtained using any of the procedures described herein or may result from the correction of a sequencing error. The polypeptide fragments comprise at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100 or 150 consecutive polypeptides of the polypeptides of SEQ
I D NOS: 209, 211, 213, 215~, 217, 219, 221, 223, 225, 227, 229, 231, 233, 235, 237, 239, 241 and 243. Preferably the fragments are novel fragments. It will be appreciated that the polypeptide codes of the SEQ ID NOS: 209, 211, 213, 215, 217, 219, 221, 223, 225, 227, 229, 231, 233, 235, 237, 239, 241 and 243 can be represented in the traditional single character format or three letter format (see the inside back cover of Stryer, Biochemistry, 3~d edition, W.H. Freeman & Co., New York) or in any other format which relates the identity of the polypeptides in a sequence.
"Avilamycin-specific polypeptide codes encompass the polypeptide sequences of SEQ ID NOS: 245, 247, 249, 251, 253, 255 (encoded by the cDNAs of SEQ ID NOS: 246, 248, 250, 252, 254, 256) and the polypeptide sequences of GenBank accession nos: AAG32068, AAK83183, AAG32069, AAK831 i'2, AAK83171 and AAK83175; polypeptide sequences homologous to the polypeptides of SEQ ID NOS: 245, 247, 249, 251, 253, 255 and to GenBank accession nos: AAG32068, AAK83183, AAG32069, AAK83172, AAK83171 and AAK83175 or fragments of .any of the preceding sequences. Homologous polypeptide sequences refer to a polypeptide sequence having at least 99%, 98%, 97%, 96%, 95%, 90%, 85%., 80%, 75% or 70% homology to one of the polypeptide sequences of SEO ID NOS: 245, 247, 249, 251, 253, 255 or to the polypeptides of GenBank accession nos: A'AG32068, AAK83183, AAG32069, AAK83172, AAK83171 and AAK83175. Polypeptide sequence homology may be determined using any of the computer programs and parameters described herein, including BLASTP version 2.2.2 with the default parameters or with any user-specified parameters. The homologous sequences may be obtained using any of the procedures described herein or may result from the correction of a sequencing error. The polypeptide fragnnents comprise at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100 or 150 consecutive polypeptides of the polypeptides of SEQ ID NOS:
245, 247, 249, 251, 253, 255 or to the polypeptides of GenBank accession nos:
AAG32068, AAK83183, AAG32069, AAK83172, AAK83171 and AAK83175.
Preferably the fragments are novel fragments. It will be appreciated that the polypeptide codes of SEO ID NOS: 245, 247, 249, 251, 253, 255 and GenBank accession nos: AAG32068, AAK83183, AAG32069, AAK83172, AAK83171 and AAK83175 can be represented in the traditional single character format or three letter format (see the inside back cover of Stryer, Biochemistry, 3~d edition, W.H.
Freeman & Co., New York) or in any other format which relates the identity of the polypeptides in a sequence.
For ease of comprehension the orthosomycin-specific nucleic acid codes, the everninomicin-specific nucleic acid codes, the avilamycin-specific nucleic acid codes, the orthosomycin-specific polypeptide codes, the everninomicin-specific polypeptide codes and the avilamycin-specific polypeptide codes, or a subset thereof, are sometime collectively referred to as "the reference sequences".
It will be readily apprE~ciated by those skilled in the art that the reference sequences can be stored, recorded and manipulated on any medium which can be read and accessed by a cornputer. As used herein, the words "recorded" and "stored" refer to a process for storing information on a computer medium. A
skilled artisan can readily adopt any of the presently known methods for recording information on a computer readable medium to generate manufactures comprising one or more of the the orthosomycin-specific nucleic acid codes, the everninomicin-specific nucleic acid codes, the avilamycin-specific nucleic acid codes, the orthosomycin-specific polypeptide codes, the everninomicin-specific polypeptide codes and the avilamycin-specific polypeptide codes.
Computer readable media include magnetically readable media, optically readable media, electronically readable media and magneticloptical media. For example, the computer readable media may be a hard disk, a floppy disk, a magnetic tape, CD-ROM, Digital Versatile Disk (DVD), Random Access Memory (RAM), or Read Only Memory (ROM) as well as other types of media known to those skilled in the art.
The orthosomycin-specific nucleic acid codes, the everninomicin-specific nucleic acid codes, the avilamycin-specific nucleic acid codes, the orthosomycin-specific polypeptide code, the everninomicin-specific polypeptide code, and the avilamycin-specific polypeptide codes may be stored and manipulated in a variety of data processor programs in a variety of formats. For example, the orthosomycin-specific nucleic acid codes, the everninomicin-specific nucleic acid codes, the avilamycin-specific nucleic acid codes, the orthosomycin-specific polypeptide code, the everninomicin-specific polypeptide code, and the avilamycin-specific polypeptide codes the nucleic acid codes may be stored as ASCII or text in a word processing file, such as MicrosoftWORD or WORDPERFECT in a variety of database programs familiar to those of skill in the art, such as DB2 or ORACLE. In addition, many computer programs and databases may be used as sequence comparers, identifiers or sources of query nucleotide sequences or query polypeptide sequences to be compared to the orthosomycin-specific nucleic acid codes, the everninomicin-specific nucleic acid codes, the avilamycin-specific nucleic acid codes, the orthosomycin-specific polypeptide code, the everninomicin-specific polypeptide code, and the avilamycin-specific polypeptide codes.
The following list is intended not to limit the invention but to provide guidance to programs and databases which are useful with the orthosomycin-specific nucleic acid codes, the everninomicin-specific nucleic acid codes, the avilamycin-specific nucleic acid codes, the orthosomycin-specific polypeptide code, the everninomicin-specific polypeptide code, and the avilamycin-specific polypeptide codes of the invention. The program and databases which may be used include, but are not limited to: MacPattern (EMBL), DiscoveryBase (Molecular Applications Group), GeneP~line (Molecular Applications Group) Look (Molecular Applications Group), MacLook (Molecular Applications Group), BLAST and BLAST2 (NCBI), BLASTN and BLASTX (Altschul et al., J. Mol. Biol. 215:403 (1990)), FASTA (Person and Lipman, Proc. Nalt. Acad. Sci. USA, 85:2444 (1988)), FASTDB (Brutlag et al. Cornp. App. Biosci. 6-237-245, 1990), Catalyst (Molecular Simulations Inc.), Catalyst/SHAPE (Molecular Simulations Inc.), Cerius''.DBAccess (Molecular Simulations Inc.;l, HypoGen (Molecular Simulations Inc.), Insight II
(Molecular Simulations Inc.;l, Discover (Molecular Simulations Inc.), CHARMm (Molecular Simulations Inc.), Felix (Molecular Simulations Inc.), DeIPhi (Molecular Simulations Inc.), QuanteMM (Molecular Simulations Inc.), Homology (Molecular Simulations Inc.), Modeler (Molecular Simulations Inc.), ISIS (Molecular Simulations Inc.), Quanta/Protein Design (Molecular Simulations Inc.), WetLab (Molecular Simulations Inc.), WetLab Diversity Explorer (Molecular Simulations Inc.), Gene Explorer (Molecular Simulations Inc.), SeqFold (Molecular Simulations Inc.), the MDL Available Chemicals Directory database, the MDL Drug Data Report data base, the Comprehen~;ive Medicinal Chemistry database, Derwents' World Drug Index database, the BioByteMasterFile database, the Genbank database, and the Gensyqn database. Many other programs and data bases would be apparent to one of skill in the art given the present disclosure.
Embodiments of the present invention include systems, particularly computer systems that stored and manipulate the sequence information described herein. As used herein, "a computer system", refers to the hardware components, software components, and data storage components used to analyze the reference sequences.
Preferably, the computer system is a general purpose system that comprises a processor and one or more internal data storage components for storing data, and one or more data retrieving devices for retrieving the data stored on the data storage components. A skilled artisan can readily appreciate that any one of the currently availabUe computer systems are suitable.
One example of a computer system is illustrated in Figure 4. The computer system of Figure 4 will includes a number of components connected to a central system bus 116, including a central processing unit 118 with internal 118 and external cache memory 12C1, system memory 122, display adapter 102 connected to a monitor 100, network adapter 126 which may also be referred to as a network interface, internal modem 124, sound adapter 128, 10 controller 132 to which may be connected a keyboard 140 and mouse 138, or other suitable input device such as a trackball or tablet, as well as external printer 134, and/or any number of external devices including but not limited to external modems, tape storage drives, or disk drives. One skilled in the art will readily appreciate that not all components illustrated in Figure 4 are required to practice the invention and, likewisE~, additional components not illustrated in in Figure 4 may be present in a computer system contemplates for use with the invention.
One or more host bus adapters 114 may be connected to the system bus 116. To host bus adapter 114 may optionally be connected one or more storage devices such as one or more disk drives 112 (removable or fixed), floppy drives 110, tape drives 108, digital versatile disk DVD drives 106, and compact disk CD
ROM drives 104. The storage devices may operate in read-only mode and / or in read-write mode. Optical storage such as DVD 106 or CD Rom 104, are more commonly used in read-only mode, and fixed disk drives 112 are more likely to operate in read-write mode. Some computer systems may store large datasets that are larger that an individual disk drive 112, in which case specialized software can be used to allow data to span multiple disks. Examples of such software include but are not limited to Sun Microsystems' Solstice Disk Suite, or Sun Microsystems' RAID (redundant array of inexpensive disks) Manager. The computer system may be enclosed in an enclosure or case. The computer system may optionally include multiple central processing units 118, or multiple banks of memory 122.
Arrows 142 in Figure 1 indicate the interconnection of internal components of the computer system. The arrows are illustrative only and do not specify exact connection architecture. Some vendors may connect one or more central processing units to CPU/memory boards which then connect to the system bus.
Software for accessing and processing the reference sequences (such as sequence comparison software, analysis software as well as search tools, annotation tools, and modeling tools etc.) may reside in main memory 122 during execution.
In a preferred embodiment, the computer system further comprises a sequence comparison software for comparing the nucleic acid codes of a query sequence stored on a computer readable medium to a subject sequence selected from an orthosomycin-specific nucleic acid code, an everninomicin-specific nucleic acid code, or an avilamycin~-specific nucleic acid code which is also stored on a computer readable medium; or for comparing the polypeptide code of a query sequence stored on a computer readable medium to a subject sequence selected from an orthosomycin-speciific polypeptide code, an everninomicin-specific polypeptide code, or an avilamycin-specific polypeptide code which is also stored on computer readable medium. A "sequence comparison software" refers to one or more programs that are implemented on the computer system to compare nucleotide sequences with other nucleotide sequences stored within the data storage means. The design of one example of a sequence comparison software is provided in Figure 2.
The sequence comparison software will typically employ one or more specialized comparator algorithms. Protein and/or nucleic acid sequence similarities may be evaluated using any of the variety of sequence comparator algorithms and programs known in the art. Such algorithms and programs include, but are no way limited to, TI3LASTN, BLASTN, BLASTP, FASTA, TFASTA, CLUSTAL, HMMER, MAST, or other suitable algorithm known to those skilled in the art. (Pearson and Lipman, 1988, Proc. Natl. Acad. Sci USA 85(8):2444-2448;
Altschul et al, 1990, J. Mol. Biol. 215(3):403-410; Thompson et aL, 1994, Nucleic Acids Res. 22(2):4673-4680; Higgins et al., 1996, Methods Enzymal. 266:383-402;
Altschul et al., 1990, J. Mol. Biol. 215(3):403-410; Altschul ef al., 1993, Nafure Genetics 3:266-272; Eddy S.R., Bioinformatics 14:755-763, 1998; Bailey TL et aI,J
Steroid Biochem Mol Biol 1997 May;62(1 ):29-44). One example of a comparator algorithm is illustrated in Figure 3. Sequence comparator algorithms identified in this specification are particularly contemplated for use in this aspect of the invention.
The sequence comparison software will typically employ one or more specialized analyzer algorithms. One example of an analyzer algorithm is illustrated in Figure 4. Any appropriate analyzer algorithm can be used to evaluate similarities, determined by the comparator algorithm, between query / subject pairs and based on context specific rules the annotation of a subject may be assigned to the query. A skilled artisan can readily determine the selection of an appropriate analyzer algorithm and appropriate context specific rules. Analyzer algorithms identified elsewhere in this specification are particularly contemplated for use in this aspect of the invention.
Figure 2 is a flowchart of one example of a sequence comparison software for comparing query sequences to a subject sequence. The subject sequence may be selected from the reference sequences, in which case the software determines if a gene or set of genes represented by their nucleotide sequence, polypeptide sequence or other representation is significantly similar to the orthosomycin-specific nucleic acid codes, the everninomicin-specific nucleic acid codf~s, the avilamycin-specific nucleic <acid codes, the orthosomycin-specific polypeptide codes, the everninomicin-specific polypeptide codes or the avilamycin-specific polypeptide codes of the invention. The software may be implemented in the C
or C++ programming language, Java, Perl or other suitable programming language known to a person skilled in the art Referring to Figure 2, the query sequences) may be accessed by the program by means of input from the user 210, accessing a database 208 or opening a text file 206. They "query initialization process" allows a query sequence to be accessed and loaded into computer memory 122, or under control of the program stored on a disk drive 112 or other storage device in the form of a query sequence array 216. The query array 216 is one or more query nucleotide or polypeptide sequences accompanied by some appropriate identifiers. A dataset is accessed by the program by means of input from the user 228, accessing a database 226, or opening a text file 224. The "subject data source initialization process" of Figure 2 refers to the method by which a reference dataset containing one or more sequences selected from the orthosomycin-specific nucleic acid code, the everninomicin-specific nucleic acid code, the avilamycin-specific nucleic acid code, the orthosomycin-specific polypeptide code, the everninomicin-specific polypeptide code, and the avilamycin-specific polypeptide code is loaded into computer memory 122, or under control of the program stored on a disk drive or other storage device in the form of a subject array 234. The subject array comprises one or more subject nucleotide or polypeptide sequE:nces accompanied by some appropriate identifiers.
The "comparison subprocess" of Figure 2 is the process by which the comparator algorithm 238 is invoked by the software for pairwise comparisons between query elements in the query sequence array 216, and subject elements in the subject array 234. The "comparator algorithm" of Figure 2 refers to the pairwise comparisons between a query and subject pair from their respective arrays 216, 234. Comparator algorithm 238 may be any algorithm that acts on a query /
subject pair, including but not limited to homology algorithms such as BLAST, Smith Waterman, Fasta, or statistical representation/probabilistic algorithms such as Markov models exemplified by HMMER, or other suitable algorithm known to one skilled in the art. Suitable algorithms would generally require a query /
subject pair as input and return a score (an indication of likeness between the query and subject), usually through the use of appropriate statistical methods such as Karlin Altschul statistics used in BLAST, Forward or Viterbi algorithms used in Markov models, or other suitable statistics known to those skilled in the art.
The sequence comparison software of Figure 2 also comprises a means of analysis of the results of the pairwise comparisons performed by the comparator algorithm 238. The "analysis subprocess" of Figure 2 is a process by which the analyzer algorithm 244 is invoked by the software. The "analyzE:r algorithm"
refers to a process by which annotation of a subject is assigned to the query based on query/subject similarity as determined by the comparator algorithm 238 according to context-specific rules coded into the program or dynamically loaded at runtime.
Context-specific rules are what the program uses to determine if the annotation of the subject can be assigned to the query given the context of the comparison.

These rules allow the software to qualify the overall meaning of the results of the comparator algorithm 238 In one embodiment, context-specific rules may state that for a set of query sequences to be considered representative of an orthosomycin locus the comparator algorithm 238 must determine that the set of query sequences contain at least one query sequence that shows a statistical similarity to reference sequences corresponding to a nucleic acid sequence code for a polypeptide from two of the groups consisting of: (1 ) SEQ ID NO: 51; Genbank accession no.
AAK83192; SEO ID NO: 53; SEQ ID NO: 55; and polypeptides having at least 70%
homology to a polypeptide having the sequence of SEQ iD NOS: 51, 53, 55 or Genbank accession no. AAK83192; (2) SEQ ID NO: 57; Genbank accession no.
AAK83170; SEQ ID NO: 59; SEQ ID NO: 61; and polypeptides having at least 70%
homology to a polypeptide having the sequence of SEQ ID NOS: 57, 59, 61 or Genbank accession no. AAK83170; (3) SEQ ID NO: 63, Genbank accession no.
AAK83193, SEQ ID NO: 65, SEQ ID NO: 67, and polypeptides having at least 70%
homology to a polypeptide having the sequence of SEQ ID NOS: 63, 65, 67 or Genbank accession no. AAK83193; (4) SEQ ID NO: 69, SEQ ID NO: 71, SEO ID
NO: 73, and polypeptides having at least 70% homology to a polypeptide having the sequence of SEQ ID NOS: 69, 71 or 73; (5) SEQ ID NO: 99, Genbank accession no. AAK83184, SEQ ID NO: 101, SEQ ID NO: 103, and polypeptides having at least 70% homology to a polypeptide having the sequence of SEQ ID
NOS: 99, 101, 103 or Genbank accession no. AAK83184; (6) SEQ ID NO: 105, Genbank accession no. AAK83186, SEO ID NO: 107, SEQ ID NO: 109, and polypeptides having at least 70% homology to a polypeptide having the sequence of SEQ ID NOS: 105, 107, 109 or Genbank accession no. AAK83186; (7) SEQ ID
NO: 111, Genbank accession no. AAK83188, SEQ ID NO: 113, SEQ ID NO: 115, and polypeptides having at least 70% homology to a polypeptide having the sequence of SEQ ID NOS: 111, 113, 115 or Genbank accession no. AAK83188;
(8) SEQ ID NO: 127, Genbank accession no. AAG32067, SEQ ID NO: 129, SEQ
ID NO: 131 and polypeptides having at least 70% homology to a polypeptide having the sequence of SEQ ID NOS: 127, 129, 131 or Genbank accession no.
AAG32067; (9) SEQ ID NO: 123, Genbank accession no. AAG32066, SEQ ID NO:

3001-iICA CA 02375097 2002-03-28 125 and polypeptides having at least 70% homology to a polypeptide having the sequence of SEQ ID NOS: 123, 125 or Genbank accession no. AAG32066; (10) SEO ID NO: 153, Genbank accession no. AAK83187, SEO ID NO: 155, SEQ ID
NO: 157, and polypeptides having at least 70% homology to a polypeptide having the sequence of SEQ ID NOS: 153, 155, 157 or Genbank accession no.
AAK83187; (11) SEQ ID NO: 159, SEQ ID NO: 161, SEQ ID NO: 163 and polypeptides having at least 70% homology to a polypeptide having the sequence of SEQ ID NOS: 159, 161 or 163; (12) SEQ ID NO: 167, SEO ID NO: 173, Genbank accession no. AAK83181, SEQ ID NO: 169 and polypeptides having at least 70% homology to a polypeptide having the sequence of SEQ ID NOS: 167, 169, 173 or Genbank accession no. AAK83181; (13) SEQ ID NO: 175, SEO ID NO:
177, SEO ID NO: 179 and polypeptides having at least 70% homology to a polypeptide having the sequence of SEQ ID NOS: 175, 177 or 179; (14) SEQ ID
NO: 165, SEQ ID NO: 171, SEQ ID NO: 169 and polypeptides having at least 70%
homology to a polypeptide having the sequence of SEQ ID NOS: 165, 169 or 171;
(15) SEQ 1D NO: 193, Genbank accession no. AAK83189, SECT ID NO: 195, SEQ
ID NO: 197 and polypeptides having at least 70% homology to a polypeptide having the sequence of SEQ ID NOS: 193, 195, 197 or Genbank accession no.
AAK83189; and (16) SEO ID NO: 199, Genbank accession no. AAK83174, SEQ ID
NO: 201 and polypeptides having at least 70% homology to a polypeptide having the sequence of SEQ ID NOS: 199, 201 or Genbank accession no. AAK83174;
and (17) SEQ ID NO: 203, SEQ ID NO: 205, SEQ ID NO: 207 and polypeptides having at least 70% homology to a polypeptide having the sequence of SEO ID
NOS: 203, 205 or 207. Of course preferred context specific rules may specify a wide variety of thresholds for identifying orthosomycin biosynthetic gene or orthosomycin-producing organism without departing from the scope of the invention. Some preferred thresholds contemplates are that at least one query sequence in the set of query sequences show a statistical similarity to the nucleic acid code corresponding to 3 or 4 or 5 or 6 or 7 or 8 or 10 or more of the above 17 groups polypeptides diagnostic of othosomycin biosynthetic genes. Other preferred context specific rules set the level of homology required in each of the group may be set at 70%, 75%, 80%, 85%, 90%, 95% or 98% in regards to any one or more of the reference sequences.
In another embodiment context-specific rules may state that for a set of query sequences to be considered representative of an everninomicin-type orthosomycin, the comparator algorithm 238 must determine that at least one of the query sequences in the set of query sequences shows a statistical similarity to reference sequences corresponding to a nucleic acid sequence code for a polypeptide from one of the groups consisting of: (1) SEO ID NO: 209, SEQ ID
NO:
211 and polypeptides having at least 70% homology to a polypeptide of SEQ ID
NO: 209 or SEQ ID NO: 211; (2) SEQ ID NO: 213, SEQ ID NO: 215 and polypeptides having at least 70% homology to a polypeptide of SEQ ID NO: 213 or SEQ ID NO: 215; (3) SEQ ID NO: 217, SEQ ID NO: 219 and polypeptides having at least 70% homology to a polypeptide of SEQ ID NO: 217 or SEQ ID NO: 219;
(4) SEQ ID NO: 221, SEQ ID NO: 223 and polypeptides having at least 70% homology to a polypeptide of SEQ ID NO: 221 or SEQ ID NO: 223; (5) SEQ ID NO: 225, SEQ
ID NO: 227 and polypeptides having at least 70% homology to a polypeptide of SEQ ID NO: 225 or SEQ ID NO: 227; (6) SEQ ID NO: 229, SE(~ ID NO: 231 and polypeptides having at least 70% homology to a polypeptide of SEQ ID NO: 229 or SEQ ID NO: 231; (7) SEQ ID NO: 233, SEQ ID NO: 235 and polypeptides having at least 70% homology to a polypeptide of SEQ ID NO: 233 or SEQ ID NO: 235;
(8) SEQ ID NO: 237, SEQ ID NO: 239 and polypeptides having at least 70% homology to a polypeptide of SEQ ID NO: 237 or SEQ ID NO: 239; and (9) SEQ ID NO: 241, SEQ ID NO: 243 and polypeptides having at least 70% homology to a polypeptide of SEQ ID NO: 241 or SEQ ID NO: 243. Of course preferred context specific rules may specify a wide variety of thresholds for identifying everninomicin-type orthosomycin biosynthetic genes or everninomicin-type orthosomycin-producing organism without departing from the scope of the invention. Some preferred thresholds contemplates are that at least one query sequence in the set of query sequences show a statistical similarity to the nucleic acid code corresponding to 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9 the above 9 groups polypeptides diagnostic of everninomicin-type othosomycin biosynthetic genes. In a highly preferred embodiment, the set of query sequences would contain at least one query sequence showing a statistical similarity to the nucleic acid code corresponding to 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9 of the 9 groups polypeptides diagnostic of everninomicin biosynthetic gene cluster, together with at least one query sequence in the set of query sequences showing a statistical similarity to the nucleic acid code corresponding to 3 or 4 or 5 or 6 or 7 or 8 or 10 or more of the above 17 groups of polypeptides diagnostic of othosomycin biosynthetic genes. Other preferred context specific rules set level of homology required in each of the group may be at 70%, 75%, 80%, 85%, 90%, 95% or 98% in regards to any one or more of the reference sequences.
In another embodiment context-specific rules may state that for a set of query sequences to be considered representative of an avilamycin-type orthosomycin locus the comparator algorithm 238 must determine that the set of query sequences contain at least one query sequence that shows a statistical similarity to reference sequences corresponding to a nucleic acid sequence code for a polypeptide from one of the groups consisting of (1) SEQ ID NO: 245, Genbank accession no. AAG32068 and polypeptides having at least 70%
homology to a polypeptide of SEQ ID NO: 245 or Genbank accession no.
AAG32068; (2) SEQ ID NO: 247, Genbank accession no. AAK83183, and polypeptides having at least 70% homology to a polypeptide of SEQ ID NO: 247 or Genbank accession no. AAK83183; (3) SEQ ID NO: 249, acce ssion no.
AAG32069, and polypeptides having at least 70% homology to a polypeptide of SEQ ID NO: 249 or Genbank accession no. AAG32069; (4) SEQ ID NO: 251, Genbank accession no. AAK83172, and polypeptides having ai. least 70%
homology to a polypeptide of SEQ ID NO: 251 or Genbank accession no.
AAK83172; (5) SEQ ID NO: 253, Genbank accession no. AAK83171 and polypeptides having at least 70% homology to a polypeptide of SEQ ID NO: 253 or Genbank accession no. AAK83171; (6) SEO ID NO: 255, Genbank accession no.
AAK83175, and polypeptides having at least 70% homology to a polypeptide of SEQ ID NO: 255 or Genbank accession no. AAK83175. Of course preferred context specific rules may specify a wide variety of thresholds for identifying an avilamycin-type orthosomycin biosynthetic gene or an avilamycin-type orthosomycin-producing organism without departing from the scope of the invention. Some preferred thresholds contemplates are that at least one query sequence in the set of query sequences show a statistical similarity to the nucleic acid code corresponding to 2, 3 or 4 or 5 or 6 of the above groups polypeptides diagnostic of avilamycin-type othosomycin biosynthetic genes. In a highly preferred embodiment, the set of query sequences would contain at least one query sequence showing a statistical similarity to the nucleic acid code corresponding to 2 or 3 or 4 or 5 or 6 groups polypeptides diagnostic of avilamycin-type biosynthetic gene cluster, together with at least one query sequence in the set of query sequences showing a statistical similarity to the nucleic acid code corresponding to 3 or 4 or 5 or 6 or 7 or 8 or 10 or more of the above 17 groups of polypeptides diagnostic of othosomycin biosynthetic genes. Other preferred context specific rules set the level of homology required in each of the group may at 70%, 75%, 80%, 85%, 90%, 95% or 98% in regards to any one or more of the reference sequences.
Thus, the analysis subprocess may be employed in conjunction with any other context specific rules and may be adapted to suit different embodiments.
The principal function of the analyzer algorithm 244 is to assign meaning or a diagnosis to a query or set of queries based on context specific rules that are application specific and may be changed without altering the overall role o1 the analyzer algorithm 244 Finally the sequence comparison software of Figure 2 comprises a means of returning of the results of the comparisons by the comparator algorithm 238 and analyzed by the analyzer algorithm 244 to the user or process l:hat requested the comparison or comparisons. The "display / report subprocess" of Figure 2 is the process by which the results of the comparisons by the comparator algorithm and analyses by the analyzer algorithm 244 are returned to the user or process that requested the comparison or comparisons. The results 240, 246 may be written to a file 252, displayed in some user interface such as a console, custom graphical interface, web interface, or other suitable implementation specific interface, or uploaded to some database such as a relational database, or other suitable implementation specific database.

Once the results have been returned to the user or process that requested the comparison or comparisons the program exits.
The principle of the sequence comparison software of Figure 2 is to receive or load a query or queries, receive or load a reference dataset, then run a pairwise comparison by means of the comparator algorithm 238, then evaluate the results using an analyzer algorithm 244 to arrive at a determination if the query or queries bear significant similarity to the reference sequences, and finally return the results to the user or calling program or process.
Figure 3 is a flow diagram illustrating one embodiment of a comparator algorithm 238 process in a computer for determining whether two sequences are homologous. The comparator algorithm receives a query / subject pair for comparison, performs an appropriate comparison, and returns the pair along with a calculated degree of similarity.
Referring to Figure 3, the comparison is initiated at the beginning of sequences 304. A match of (x) characters is attempted 306 where (x) is a user specified number. If a match is not found the query sequence is advanced 316 by one polypeptide with respect to the subject, and if the end of the query has not been reached 318 another match of (x) characters is attempted 306. Thus if no match has been found the query is incrementally advanced in entirety past the initial position of the subject, once the end of the query is reached 318, the subject pointer is advanced by 1 polypeptide and the query pointer is set to the beginning of the query 318. If the end of the subject has been reached and still no matches have been found a null homology result score is assigned 324 and the algorithm returns the pair of sequences along with a null score to the calling process or program. The algorithm then exits 326. If instead a match is found 308, an extension of the matched region is attempted 310 and the match is analyzed statistically 312. The extension may be unidirectional or bidireciional. The algorithm continues in a loop extending the matched region and computing the homology score, giving penalties for mismatches taking into consideration that given the chemical properties of the polypeptide side chains not all mismatches are equal.
For example a mismatch of a lysine with an arginine both of which have basic side chains receive a lesser penalty than a mismatch between lysine and glutamate which has an acidic side chain. The extension loop stops once the accumulated penalty exceeds some user specified value, or of the end of either sequence is reached 312. The maximal score is stored 314, and the query sequence is advanced 316 by one polypeptide with respect to the subject, and if the end of the query has not been reached 318 another match of (x) characters is attempted 306.
The process continues until the entire length of the subject has been evaluated for matches to the entire length of the query. All individual scores and alignments are stored 314 by the algorithm and an overall score is computed 324 and stored.
The algorithm returns the pair of sequences along with local and global scores to the calling process or program. The algorithm then exits 326.
Comparator algorithm 238 algorithm may be represented in pseudocode as follows:
INPUT: Q[m]: query, m is the length S[n]: subject, n is the length x: x is the size of a segment START:
for each i in [1,n] dc>
for each j in [7.,m] do if ( j + x. - 1 ) <= m and ( i + x -1 ) <= n then if Q(j, j+x-1) = S(i, i+x-1) then k=l;
while Q(j, j+x-1+k ) = S(i., i+x-1+ k) do k++;
Store highest local homology Compute overall homology score Return local and overall homology scores END.
The comparator algorithm 238 may be written for use on nucleotide sequences, in which case the scoring scheme would be implemented so as to calculate scores and apply penalties based on the chemical nature of nucleotides.
The comparator algorithm 238 may also provide for the presence of gaps in the scoring method for nucleotide or polypeptide sequences.
BLAST is one implementation of the comparator algorithm 238. HMMER is another implementation of the comparator algorithm 238 based on Markov model analysis. In a HMMER implementation a query sequence would be compared to a mathematical model representative of a subject sequence or sequences rather than using sequence homology.

Figure 4 is a flow diagram illustrating an analyzer algorithm 244 process for detecting the presence of an orthosomycin biosynthetic locus, an everninomicin-type orthosomycin biosynthetic locus or an avilamycin-type orthosomycin biosynthetic locus. The analyzer algorithm of Figure 4 may be used in the process by which the annotation of a subject is assigned to the query based on their similarity as determined by the comparator algorithm 238 and according to context-specific rules coded into the program or dynamically loaded at runtime.
Context sensitive rules are what determines if the annotation of the subject can be assigned to the query given the context of the comparison. Context specific rules set the thresholds for determining the level and quality of similarity that would be accepted in the process of evaluating matched pairs.
The analyzer algorithm 244 receives as its input an array of pairs that had been matched by the comparator algorithm 238. The array consists of at least a query identifier, a subject identifier and the associated value of the measure of their similarity. To determine if a group of query sequences includes an sequences diagnostic of an avilamycin-type orthosomycin biosynthetic gene cluster, a reference or diagnostic array 406 is generated by accessing a data source and retrieving avilamycin specific information 404 relating to avilamycin-specific nucleic acid codes and avilamycin-specific polypeptide codes. Diagnostic array 406 consists at least of subject identifiers and their associated annotation.
Annotation may include reference to the nine protein families diagnostic of avilamycin-type biosynthetic genes clusters, i.e. ABCD, DEPN, MEMD, REBU, UNAI and UNBR.
Annotation may also include information regarding exclusive presence in loci of a specific structural class or may include previously computed matches to other databases, for example databases of motifs. Once the algorithm has successfully generated or received the two necessary arrays 402, 406, and holds in memory any context specific rules, each matched pair as determined by the comparator algorithm 238 can be evaluated. The algorithm will perform an evaluation 408 of each matched pair and based on the context specific rules confirm or fail to confirm the match as valid 410. In cases of successful confirmation of the match 410 the annotation of the subject is assigned to the query. Results of each comparison are stored 412. The loop ends when the end of the query / subject array is reached.

Once all query / subject pairs have been evaluated against avilamycin-specific nucleic acid codes and avilamycin-specific polypeptide codes, a final determination can be made if the query set of ORFs represents an avliamycin locus 416.
The algorithm then returns the overall diagnosis and an array of characterized query / subject pairs along with supporting evidence to the calling program or process and then terminates 418.
The analyzer algorithm 244 may be configured to dynamically load different diagnostic arrays and context specific rules. It may be used for example in the comparison of query / subject pairs with diagnostic subjects for other biosynthetic pathways, such as everninomicin-specific nucleic acid codes or everninomicin-specific polypeptide codes, or other sets of annotated subjects.
The present invention will be further described with reference to the following examples; however, it is to be understood that the prE;sent invention is not limited to such examples.
Example 1: Identification of the everninomicin biosynthetic locus in Micromonospora carbonacea var. aurantiaca:
The microorganism Micromonospora carbonacea var. a~rantiaca NRRL
2997 was obtained from the Agriculture Research Service Culture Collection of the United States Department of Agriculture, 1815 N. University Street, Peoria, IL
61604. The everninomicin compound produced by strain NRRL 2997 is described in US Patent 3,499,078. The biosynthetic locus for everninomicin was identified from strain NRRL 2997 (EVER) according to the method described in Canadian patent application CA 2,352,451. The sequences obtained frorn cosmids containing overlapping genomic inserts spanning the biosynthetic locus for everninomicin were identified. Within the sequences of the cosmid inserts, numerous ORFs encoding polypeptides having homology to known proteins were identified. Homology was determined using the program BLASTP version 2.2.2 with the default parameters. Contiguous nucleotide sequences and deduced amino acid sequences of EVER are provided. EVER is formed of three contiguous DNA sequences (SEQ ID NOS: 280, 281 and 282) which are arranged such that, as found within the EVER, the 3' end of DNA contig 1 (SEQ ID NO: 280) is adjacent to the 5' end of DNA contig 2 (SEQ ID NO: 281 ), which in turn is adjacent to the 5' end of DNA contig 3 SEQ ID NO: 282). The ORFs present in EVER
encode 50 polypeptides, the sequences of which are provided as follows: The amino acid sequence of ORF 1 (SEQ ID NO 263) is deduced from the nucleic acid sequence of SEQ ID NO 264 drawn from contig 1 (SEQ ID NO 280). The amino acid sequence of ORF 2 (SEQ ID NO 89) is deduced from the nucleic acid sequence of SEO ID NO 90 drawn from contig 1 (SEQ ID NO 280). The amino acid sequence of ORF 3 (SEQ ID NO 225) is deduced from the nucleic acid sequence of SEQ ID NO 226 drawn from contig 1 (SEO ID NO 280). The amino acid sequence of ORF 4 (SEQ ID NO 237) is deduced from the nucleic acid sequence of SEQ ID NO 238 drawn from contig 1 (SEQ ID NO 280). The amino acid sequence of ORF 5 (SEO ID NO 113) is deduced from the nucleic acid sequence of SEQ ID NO 114 drawn from contig 1 (SEO ID NO 280). The amino acid sequence of ORF 6 (SEQ ID NO 119) is deduced from the nucleic acid sequence of SEQ ID NO 120 drawn from contig 1 (SEQ ID NO 280). The amino acid sequence of ORF 7 (SEQ ID NO 49) is deduced from the nucleic acid sequence of SEQ ID NO 50 drawn from contig 1 (SEQ ID NO 280). The amino acid sequence of ORF 8 (SEO ID NO 65) is deduced from the nucleic acid sequence of SEQ ID NO
66 drawn from contig 1 (SEQ ID NO 280). The amino acid sequence of ORF 9 (SEQ ID NO 201 ) is deduced from the nucleic acid sequence of SEQ ID NO 202 drawn from contig 1 (SEQ ID NO 280). The amino acid sequence of ORF 10 (SEQ
ID NO 15) is deduced from the nucleic acid sequence of SEQ ID NO 16 drawn from contig 1 (SEQ ID NO 280). The amino acid sequence of ORF 11 (SEQ ID NO
95) is deduced from the nucleic acid sequence of SEQ ID NO 96 drawn from contig 1 (SEO ID NO 280). The amino acid sequence of ORF 12 (SEO ID NO 71 ) is deduced from the nucleic acid sequence of SEQ ID NO 72 drawn from contig 1 (SEQ ID NO 280). The amino acid sequence of ORF 13 (SEQ YD NO 125) is deduced from the nucleic acid sequence of SEQ ID NO 126 drawn from contig 1 (SEO ID NO 280). The amino acid sequence of ORF 14 (SEQ ID NO 83) is deduced from the nucleic acid sequence of SEQ ID NO 84 drawn from contig 1 (SEO ID NO 280). The amino acid sequence of ORF 15 (SEQ ID NO 101 ) is deduced from the nucleic acid sequence of SEQ ID NO 102 drawn from contig 1 (SEQ ID NO 280). The amino acid sequence of ORF 16 (SEQ ID NO 47) is deduced from the nucleic acid sequence of SEQ ID NO 48 drawn from contig 1 (SEQ ID NO 280). The amino acid sequence of ORF 17 (SEQ ID NO 195) is deduced from the nucleic acid sequence of SEQ ID NO 196 drawn from contig 1 (SEQ ID NO 280). The amino acid sequence of ORF 18 (SEQ ID NO 155) is deduced from the nucleic acid sequence of SEQ ID NO 156 drawn from contig 1 (SEQ ID NO 280). The amino acid sequence of ORF 19 (SEQ ID NO 107) is deduced from the nucleic acid sequence of SEQ ID NO 108 drawn from contig 1 (SEQ ID NO 280). The amino acid sequence of ORF 20 (SEQ ID NO 77) is deduced from the nucleic acid sequence of SEQ ID NO 78 drawn from contig 1 (SEQ ID NO 280). The amino acid sequence of ORF 21 (SEQ ID NO 221 ) is deduced from the nucleic acid sequence of SEQ ID NO 222 drawn from contig 1 (SEQ ID NO 280). The amino acid sequence of ORF 22 (SEQ ID NO 151 ) is deduced from the nucleic acid sequence of SEQ ID NO 152 drawn from contig 1 (SEQ ID NO 280). The amino acid sequence of ORF 23 (SEQ ID NO 143) is deduced from the nucleic acid sequence of SEQ ID NO 144 drawn from contig 1 (SEQ ID NO 280). The amino acid sequence of ORF 24 (SEQ ID NO 53) is deduced from the nucleic acid sequence of SEQ ID NO 54 drawn from contig 1 (SEQ ID NO 280). The amino acid sequence of ORF 25 (SEQ ID NO 205) is deduced from the nucleic acid sequence of SEQ ID NO 206 drawn from contig 1 (SEQ ID NO 280). The amino acid sequence of ORF 26 (SEQ ID NO 161) is deduced from the nucleic acid sequence of SEQ ID NO 162 drawn from contig 1 (SEQ ID NO 280). The amino acid sequence of ORF 27 (SEQ ID NO 257) is deduced from the nucleic acid sequence of SEQ ID NO 258 drawn from contig 1 (SEQ ID NO 280). The amino acid sequence of ORF 28 (SEQ ID NO 135) is deduced from the nucleic acid sequence of SEQ ID NO 136 drawn from contig 1 (SEQ ID NO 280). The amino acid sequence of ORF 29 (SEQ ID NO 3) is deduced from the nucleic acid sequence of SEQ ID NO 4 drawn from contig 1 (SEQ ID NO
280). The amino acid sequence of ORF 30 (SEQ ID NO 35) is deduced from the nucleic acid sequence of SEQ ID NO 36 drawn from contig 1 (SEQ ID NO 280).
The amino acid sequence of ORF 31 (SEGO ID NO 169) is deduced from the nucleic acid sequence of SEQ ID NO 170 drawn from contig 1 (SEO ID NO 280). The amino acid sequence of ORF 32 (SEQ ID NO 183) is deduced from the nucleic acid sequence of SEQ ID NO 184 drawn from contig 1 (SEQ ID NO 280). The amino acid sequence of ORF 33 (SEQ ID NO 177) is deduced from the nucleic acid sequence of SEQ ID NO 178 drawn from contig 1 (SEQ ID NO 280). The amino acid sequence of ORF 34 (SEQ ID NO 29) is deduced from the nucleic acid sequence of SEQ ID NO 30 drawn from contig 1 (SEQ ID NO 280). The amino acid sequence of ORF 35 (SEO ID NO 59) is deduced from the nucleic acid sequence of SEQ ID NO 60 drawn from contig 1 (SEQ ID NO 280). The amino acid sequence of ORF 36 (SEQ ID NO 189) is deduced from the nucleic acid sequence of SEQ ID
NO 190 drawn from contig 1 (SEQ ID NO 280). The amino acid sequence of ORF
37 (SEQ ID NO 141) is deduced from the nucleic acid sequence of SEQ ID NO 142 drawn from contig 1 (SEO ID NO 280). The amino acid sequence of ORF 38 (SEQ
ID NO 41) is deduced from the nucleic acid sequence of SEQ IID NO 42 drawn from contig 1 (SEQ ID NO 280). The amino acid sequence of ORF 39 (SEQ ID NO
9) is deduced from the nucleic acid sequence of SEQ ID NO 10 drawn from contig 1 (SEQ ID NO 280). The amino acid sequence of ORF 40 (SEO ID NO 129) is deduced from the nucleic acid sequence of SEQ ID NO 130 drawn from contig 1 (SEO ID NO 280). As indicated in Table II-B, the sequence of (JRF 41 provided herein contains a gap. The amino acid sequence of ORF 41, C-terminus (SEO ID
2o NO 23) is deduced from the nucleic acid sequence of SEO ID NO 24 drawn from contig 1 (SEQ ID NO 280). The amino acid sequence of ORF 41, N-terminus (SEQ
ID NO 21 ) is deduced from the nucleic acid sequence of SEQ ID NO 22 drawn from contig 2 (SEQ ID NO 281 ). The amino acid sequence of URF 42, C-terminus only (SEQ ID NO 233) is deduced from the nucleic acid sequence of SEQ ID NO
234 drawn from contig 3 (SECT ID NO 282). The amino acid sequence of ORF 43 (SEQ ID NO 209) is deduced from the nucleic acid sequence of SEQ ID NO 210 drawn from contig 3 (SEQ ID NO 282). The amino acid sequence of ORF 44 (SEO
ID NO 229) is deduced from the nucleic acid sequence of SEQ ID NO 230 drawn from contig 3 (SEQ ID NO 282). The amino acid sequence of GRF 45 (SEO ID NO
30 217) is deduced from the nucleic acid sequence of SEQ ID NO 218 drawn from contig 3 (SEQ ID NO 282). The amino acid sequence of ORF 46 (SEQ ID NO 213) is deduced from the nucleic acid sequence of SEQ ID NO 214 drawn from contig 3 (SEQ ID NO 282). The amino acid sequence of ORF 47 (SEQ ID NO 241) is deduced from the nucleic acid sequence of SEQ ID NO 242 drawn from contig 3 (SEQ ID NO 282). The amino acid sequence of ORF 48 (SEQ ID NO 259) is deduced from the nucleic acid sequence of SEQ ID NO 260 drawn from contig 3 (SEQ ID NO 282). The amino acid sequence of ORF 49 (SEQ ID NO 267) is deduced from the nucleic acid sequence of SEQ ID NO 268 drawn from contig 3 (SEQ ID NO 282). The amino acid sequence of ORF 50 (SEQ ID NO 261 ) is deduced from the nucleic acid sequence of SEQ ID NO 262 drawn from contig 3 (SEQ ID NO 282).
The ORFs in EVER have been assigned a putative function and protein family designation based on homology to known proteins as indicated in Table II-A.
The position, length and orientation of each EVER ORF within SEQ ID NOS: 280, 281 and 282 is provided in Table II-B.

-a~

c o '~ 0 0 c ~ ~ c E
~

L
a~ ~ ~ a o ~ U

r c0 _ U c0 N N
.. cB .G C

p p U
~' ~ ~ o > ~

~w E , a __ Y >, '~~ c ~ ~ E ~ v~ o ~

C _ N U ~ ~ ~ j O Q O N W E
O

d U
~

O ~ E ~_ V E ~ V ~ O O j ~

L C
d c0 _ O 'D = T 7 C O (~ ~ c0 O
~

U
U O O' p U E ~U U ~ -Ot O E N O

m ~ E- U O ~ Q . ~ ~ ~ tn ~ O

_ N ~ _ ~
~ O U

0 (n ~ (n '~ ~ ~ N O E p ~ p ~ N ~
L

O
C E ~ O N N (V >.

~n p N o~ ~' _ a ~ ~ >,~ , E cn ~
cn ' ~ ~ U
0 T c ~ E E ~ ~' O

~ o W C O ~ O ~o ~ ~ ~ ' ' C p w ON ~ ~ ~ ~ O q~ ~ 0 ~ -5 "

'n a,t s ~' c L a , n.~

~ _ o ' Q~ c ' ~ i c o ~n o ~ a~v~~ ~ in a) a c E

p >' ' s a'i n _ 'Y~ n. ~ ~, o ' m ~ w j, ~n - ~, ', c ~ a''~ r~ c ~ > , ~ U m ~ , ~ " E
a~'~

. , .a a ~ ' N E c '' ' , u ~

o v~E a~ ~ .. ~nE m o ~ u~ ~ o E c r ~ ~'> E

_ ?, ~ao a _ a~ ~ c o~ a cn ~o o ~ ~ ' ~ U a x - v~v~ v~ c~ >, ~ a~
O ~ a ~ c '~

~~ -' a ~ o o c ~- o, ~

~ OC C C ~ ~ N ~ O ~ N (n ~ U >GOSp E N (n ~ T .n p ~ O
>

~_ = a ~ ~ ~ d s ~ n x Y

o E ~ z ' ~ _L (~ a~G1 a~.n EI i~ . ' - > o x m ~ o >
O E ~ m E O

Q o_z Z o -~ Q z E a a a Q

o W .o o a o .00 0 'a o 0 0 0 v v v o~ o o~ o o v o d o o W 0 0 0 0 c~v d'~ ' r.co~, o~ v N v ~~ N N oo m a o -.L r . o r r o o,.- a, ca o, h R n n rN ~~ ~ ao r~of y ri r~co ai~r ao ~ m o - m ~ o ~

v _~
_~ r _c _mn ~ co v Vin, m_~~ V _ _ O O N O v ~ O c0h N 0~ ~t N M M
M r O C ' N N N M ' COO ~ h O O V ~?~ V M
v . c V'' i o ~ Vv N N 4 \ N V M ~?~M \ c M ~ ~ ' ~

a O a0 0a 0 ( ~ ~ cD C 00 ~ h ~ N N
D 0 N0 M ~ N N h N N O it 00 ~ N N
'r N

. N

d W W o vo o' .o v W 0 W \ v \ v v \ o W
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ~ o V (D h tTN C'~ C~00 et'00 OD m r N O ~ m h O
~

c N O h ~ V M O~ > W c0 a0 ~ c0 . O O v N c0 N C'~(DV h ~tC'~ SOW O O ~ CO
~ ' ~

y 0 ~ _V N V V c tn ~ ~ _V~V~~ v _h N ._.~
..~

;C O~ p~~ O O~ p c0h. N M V N 07 h M

V N Ch h O CO0pO h O O W '~ V 0~
o N O V M ~ N ~ h ( C
' ' N ~ d V C . ~ C,\
o v O M N N ~ ~ ~ ~ v ~ ~ v M

O h 5,.~ (p N OD00C'~ ~ O O h X70CD 'V ~ N
m O ' o ~rr 0 h c a~h h m aoh m m n h 0 0 o ~

~ ' m C~7N CO ~ c0 O c0tf7 00CO h ~D N N OY N
V
' CD 00OD OD N O O ODc0 00 h h h L. h O O (D
.a a~ a~a~ ( a~ a~ d a~a~ a~ v w a~ a~ a~ a~ a~
~
a~
a~

SO ~ OJ Nr ~Y m ~N C''7N N N h r <~ ~ cO N fD

fl.

c0N
N ~ ~ i 1 ~ i ~'~f ~
~

~ ~ ~ ~ N c c ~ ~U ~ ~ ~ cU
h O , O O' ~ E

O (p N Sp cC4 h h a0 ~ O7 a0 N m c0 m ca m n~''~ '~ ~ ~ ~
E

p h c h o ~ cn r. v ~ c~
N N N Vh ~ O N ' ' r : V ct N V d_ C~C C Ch h ~ ~ ! r L O m 07 M.C'~r M S,.j~ ~ 00 cflC~ ~ O (D C'7M

N V ~ a0 h ~ O ( ~ O

C N 00 mM O c0 h ~ M V
O

!0 N C V' .O V h c0aNpN a ~ V N <V cN~ aM0O M
'7 a. 0 a C7 U D U'-~~~ cu Y ~ a Y Y D D (J U Y '-' 0 Y

U Q UZ z Q ~ Q ~ ~ ~ ~t U I

Q

'N v m ~ N v m m m N ? Q Y
~

a ' _ m w ~ ~ ~ ~
~~

~ ~ w ~

- ' ~i ~ ~ ~ O ~ ~ ~ C~

- _...
.
.

.. _.. -.

N M ~ ~ 0 h O

c o o v~

o N ~ E o u, N ~ O V, n ~ . ~ L N

O fl _ N ~ ~ C U O _ ' a 7 0 ~ O (If. ~ N O " p f a L
o s ~ ' S

L ai o' c 'o ~n~ ' > ~ a~
~

y N T ' _ ~, , ~ N

j, O S ~ cnV ~ O U ?.~ c . 3 a E d E a m >.
~

w ~ O O O C V fl C 7,V E N ~ O ~ O~
' . _. E O

N (O ~ ~ O C E ~ ~ O ~~ ~ O pf a ~ N p N 7 ' ' , _ _ . c' O O. cn d d ~ ~ ' E ~

>' M L ~ m O N O ~~ ' Vf /7 ~ >' ~ N

~ N N 0 . O ~ ~ E
~

O ~ N p ~ L nO ~ O O O
Dl E 7 o f ' c ' ~ ~, ~ ~ ~ CJ in N ~ aSa u o ~ E a c a , _ CO -Y 7 O N N _C ~ '~ ~ ~Y ~ ~ d N ~
N N

O d ~ C ~ L O _ U U O

c O_ c~ C_ (/) C n !

Y L O T W ~ cN0 O O O N >.C O ~ (L ~ N U
E E ~ ~

o a a~ E a~~, ~n O d v~a~ m a~ ' o ~ o x o > ~ ~ ~ ' ' ~ ~~ m N ~'y o n a . t~ ~ ~ . ar . c~c~ E Ua ~ o v ~d ~

E ~ ~ a D a ~ a o E m >, '>.>, a Q~ ~, ~n ~' ~ c E
.-E- ~ u~ >, o ~ ~ ~ a v c _ o~co cn ~ ~ , ~
~

OO X Y. N O _ N N N d V' tp O.. fn ' O .

~ c~ a~ ~'o~' ' E E E E ~t > " > s~N N

~ ~ L ~ ~ ~ . ~ ~ O
U U ~G ' ~ Q Q Q Z ~O lU ~ f0 X
E a v ' ~

~ ~ ? z z z Y~ a~ o ~
~ o , a~ L~ Z O O a z ~ ~ ~ Q c~s a E a a o~ Q
m v v v v o v v v v v vv v o 0 o v v o o ~ o o o ~ ~ ~
~

f7 tn ~ ~. oD N _ M O~ N ~ N M cD
~t7 N I d N
~

po d 00 N CD 07(D GOd0M V O
~

V N N ~ (C1 O cDOi ~ nj v ;n~ 00 d ~ v ~ .~v Ir v ~ v ~~ ~ d' O O O m O N O ~ V ~.y~~M C~~~ N p O W

C M t ~ C~ d O p f,\~M C'7 C\ M N N N ~ \O N ~ r d CO
a0 f~ a0 N im f7 fD ~ 07 a0 CON p Q N ~ M
h M

d M M r ~ r N NN ~

3 0 0 0 0 0 0 oo ~ 0 0 0 o o a a ~ o o ' o o ~ o F 0 O o NN ~ N r tt7 M

p ~ tn Q ~ ~ Ntn d ~. d ~ (V Oi p VC h I~I~

c0 ~~jM O) O ~j dy ~rj ~ d ~,j f0 M t0 M M v d N ~ M M .~~ ~ v.

M M Q ~ d M,~~V ~

O p M O N O O MM N d M
C' h ~ n N c~M ~ ~ N tDMO~ ~ ~ O

c O M N ~ d pp~ N t()M M d N ' r N

p r r N 00 Inr M !~CO N f~O N V' ~ fnM
I~ O ~ d M M M ID O M I~.N N r f~fD r r , N N ~ ~ N ~ N ~ ~ O M~ d0 ~ O ~ ~ N

I

cbctf ~1 ~ ~ c0Nc0 ca t0 c0 tC N

cd cb N ~C ~0 ~ ~ cCNN N cd c4 N c0 N f~ tD N u7N f~ ~ ~ CO~tD O M O N M! ~D

0 cD ~ ~ N N N ~ M~ N N ~ ~ M~.M

a r T f'~.-r r. r r r rr r r r ~ r N

N p ' 0 ~ ~r d p N N' ~

Qj V f. 0 O O .,.0 V d ,_ M m n 0 0p 0 O 0 M

O ~ N n N t0 N d d ~ ~ M O M p ~ M

~ d pnp C~ N ~ Y C7m Y n Y ~ U Y
l Q ~ ~ U ~ z z ~ ~U

m ~ i ~ ~ o d N

M C' N N d ~ .

U X a ~ y ~ ~
~

~ 7 Y ~ (3 p r N M V ~ O

_72_ V O N

d N
O .

v ~ 7 ~ ~ ~ ~ r E > , O

N ~ 07 W ~ ~ O L O L
N O N t SS
~

C C O O U j N C O j .

0 0 ~ > p v ~~
o 0 ~ ca . ~ v U
E ' U j o ~> C
>

E E O ~ $ ~ N , O ~ ~ ~ ( , U
~ ~

c c ~ d E ~ o ~ a ~c ~

~ a , v ~~ ~ > >'~ ~" a ~ c ~ ~~ , a .
a n 0 o ~ O d_ E ~ E ~ C O . .~ OE ~ G
' co . , _ N

' _ ~ OC >. N N W ~ ~ ~ ~ O ~ O ~
f 0 (/7 7 ~ Y
p ~ ~U a N Q M = p .

m C
O 0 ~> ~ Q7~ O E N j ~ O p 0 . _ _ N ( ~ .
z C O

Tft1 v O O N ~ T ~ O C N N~

7v. U O O p1E d C t p Q1 ~ O a7 U
y d EO f l) ~ T ~~ O O ~ p7 ~ W m ] ct n c0~
t 0 0N ~; p E o a o ~ o ~ ~ y , d_ ~ ~ Q ~~ o ~ o ~ ~ ~ ~ ~ in~~
'~ a N ~ o a a CL co - ov ~ _c ~ c i cai ' U o is ~ cu w o cu ' m c~

n ~N c ~ o ~ ~ ain ~ o ~ o '"

~ia~c ~c u~~ ~ o p ~ p ~ - ~, . o a (y r 0 C > N

E T T ,~ . . . !n T
, C

C CQ ~ O L EO O O O > ~ ~''(77 >' ~ O U ~
m O d~- O- QO N N V 7 L L 7 L U T .. p7 O O

~a ~ o~ cc~ -'' > ' ~ >, ~ ~ _'~o~? o a E m E

x xL p cav~o ~ ' , ~ o . o ' , o0 a N

E o a~ ~ ~

> ~ ~ ~~ n ~ ? y o ' > o '- ~
o a o ' . L p!t ~ ~ c YC a , ~
o _ _L _ ~ C ~ O 7 ~ ~ :D ~ a ~
cDcC f0 ;~

p_ ~p _>._ ~" , N a a o. Z
O
d f0 O

d dd d L ft1LC O Ofd d ->

0 0 o w v \ 0 0 o o 00 0 0 0 0 0 0 ~ o M N M

o yy y O M M h O ~O O ~

O 0 ~ O N ~h h ~ LO 07 ~ O Q~ O
M h O N v ~ v ~ V ' d N pQ'r cM ~ ~~ ~ ~ c ..~
h vp O ~M

h ~ ~0 ~ O M N c0 C~0N M

O c0CO N M c M c C c N N N
N M 0 D D \
M f d VV 0 , _ M N ~ ~ ~ M 00 ~ O

O N ~ d M N Or0 N

d ~a O O N NN ~ N
N D

0 00 0 0 ~ o ~ ~ N ~

v o0 0 ~ 0 O M cp~
oo N ' ~t N ~ 0D00M h M ~ M h s O CO ~ ~
N NM ~
O

O _~ NM M ~ ~~ O I O r v M~ ~ d. v ..r C V V ~ M ch a0c0~ d' ~6 N N M O cp OOCD N ~ O Oc0 h COm ~ c ~ ~ C C' F~ N N
h hcO ~N N ~7 M NN N ~ VV _MM a ~~ ~- h ' V V ~ v ~ M

~ h ~V O cD O M N ~ CC

N ~ r, a0 .- N ,m i d' p i N h N 00 G LOM tn~ h m ~ ~ ~ ~ N OD
I ~f7 ~ N

~.OOp h O tn~~ h h CD U r. p p N
,~

rp N n COp1 N ~ M ~ M p , -d 7 I

a3 c9 N ~ ~N
~

~ td ~ Nc ~ N cC N ~ O ''f ~ ~ u C c6 d M m ~ 0 0 I~M~ ~ ~ N M M M N
cr!

NC ~ c V~t c'~ ~ ~ - - ,-.,..

N

M N N _ ~ _ , r O T O ~ OO M M O ,n M Y n7 V ~ M cD
O ~ ~

O O ~ O
N

E M O ~ O~ M N f~O ~ N ~ pMp N pMp~ N
~

M M 4Y D ~ ~ N ~ ~1 Y Y'n~ Y d~ E.M- ~ Y ~ f ~ Z Z

U~ ~ ~ ~ z Z Z Z

M ~ ICJ N G
O j M d. op m M M M M

N ~ Y

-Q ~ ~ ~ Y
~ X

u ~ ~ X H~ Z
X .

~ O ~ ~ O C7 O

v W _ N N N N

C

~

E o E ' ' E a ' , v >

0 o c o 0 c v ~ E o >
l E ~ E

N . ~ U C m U sU ~C
N y ~

~' O ~ O

o s u~ s 'Q > i. o ~ Eo ~ c n ~ ' ' v >, a m ~ L ~ , o o o~
> ' .

o a o ' ' D ' E cL > ~
v ' U ~ a7 N _C_ ~ , 7 >

U L O ~ > V E ~ ' OU 3 U
~

O ~ N >' O O L>. t~ O

' > o d 'a ~ ~ , ~ > ~' No >. ~ >.
._ ~ U ' _ m a o y ~ ~ a o ~~ Q ~
c '~'' ~ E o ' c c o v d ~n c N - o Eu~ ~ ~ oy a ~ ' o _ iu~ ~ ~ y ~ u) c ~ o i ~ N ~ o-~ ~ tn n ~

_ _ . ~ .
~ ~~

7. 7, U N _~ (' C ~ d ~ N O N UJO
o ~

7 ~ ~ c ~ ~ fl'o ~ ~ ~ E ~ io u,~ ofa E o . ' O T ~ O n' c _ c6O
~ ~ C

cD ~ C d 7. V V O O ~ U

c6y,d V d O N m >~ ~ >. p~ d N CS, E N C 5.
d N ' o N ~~d N ran~ fn~ r~np N a a ~~ a '~ N.U

t 0 , ~ >
n , aa~ c c ~ ~ ~ O'O U O OE ~ S U N

_ 7 N U ZO N _ O N ~ ~
C

~ "~OO7 Lr O N ~ jj (nC ~ L
~

~f ' d tiQ a a p o ~ a m a o ~aca ~ o E
~

' o . U Q p E a p ~ i> c ~ i ~j p p Z m Z Q I- . p a aa cno a c - a >
'>

n.cna C~ C~(J inw a ~

0 0 0 0 o v 0 0 0 0 0 0 00 o p ~ ~ N
~

V ow d d O O~ ~ O N O Ch rNA tO MN ~ CO
p cDM V O r ~ ~ c D ~
D ...

M N ' M ~ ~ 0 ~ 0 M c v c r N h p d d f7 f p _ d_ r d ~ O~ N C7 M O'~a0 N O O O
i c0Op O O O ~ ~ ~ O~ . N~

N NN M M cN~ (1C~ M C'~~ m M ~ M 1 r O O

O ~ N N N ~ O

~ T ~ ~ N N N N r 00 ~D

f ~ f ~ O r 0 00 o Q O)~ ~ (V h (V p t D~ p~ r ~- r O

(Oc0CO ~ M ~!Y r d d ~ h 01 M O OD

c0Nc0 h h O O N cp O. ~ m M

CDC N r t t V M r v ~ d O D~v0 O O v M Q7 ~ O N ~ O O~ N f0 a~01 ' (OO00 p M M V ~ ' ~ N' M M N N N r r N ~ ~ ' NN M 1 ( d h O M~
~ J

N m O~O c d' O ~ h i 00 c ~ InN ~ ~ N ~
p O

C7 d r N M O W t f7 ' h ~C7 N tVN N N N N N h V d N N N ~~ N ~ ~ d~ ~ O 0 0 d Mr (D COd C ~ O
J O

rr d d d r r r -'r O O O
N N N N d 1 N .~ 117~ O

m ~N ~ N ~

td c0 c0N c0 c~CN
~ N N ~

~ N~ ~ ~ ~ c m ~ c0of tE ~ M a0 0 h c N N ~ N M 7 N O O N~ c M a00n C D

N N c c t M M N- N O~
O c N NN ~ ~ M ~ c0 M M ~ ~ _ _ r r rr '_r O~'7M '_ T !~ r 07 h r N ~ M
. O t0 I~ M Qj ~ N-r M a000 r r - d ~ h ~ ~ ~ M h ~ h d' M 0 h ,,n t 00O
h O ~ D ''>O N MM O O ~
a0 M _~ d O OD ~ a 00 ~ ~ t oa O
0 f7 00 a Oh N ~ d N M ~ M _ U Y ~ N~ YY t1 ~ U Y
0 C~

~ ~ a ~ ~ ~~ ~ d ~ z z ~~ ~d ~

d ~a z ~

o '' N

N M M

Q ~ X O

x = E W O O n.
p p Z

N
O h N N ~ m M

N N

a ~ E

- o 0 ' E - E E . ~ o o, ~ ca o o _ ~ ~, a o . .E

a~ E r ~ -_ ~ ' E
U
~

E v ~ o o > p, ~ v~

m o~ ~ t O ~ ~ t/) Q O U

U 9 U _ U O
~

U ~ E > N T ~ N
U

T U N >, E~ ~, >.

fl ~ N U U E p U T7 M

N~ ~ F- E ~ ~ Q N ~ N u) N
j _ c0 4j U N . 47 W >
U N N

= O ~ _ 7, .
N ~O ~ ~ C
E U E

00 - ~a ~ ~ in ~ co ''? o c a~
a~ ca -o ~ ~a ~ c a ~a N =
>

m s ~ ~ U ~ o .u~ a cn o ' a ~ .~y ~e~~ ~ ~ c w aiai a'o aS v~ o m o a ~~ o E ~ rn 0 cn ' ~'~ a~ ~n ca ~ ~ a ~;-~ ~ m _ v =c v -a o c o _ a~
a o c ' a~ ~ c ~ m a-- ~ z'~E c~
c u is E v cao~ E C N r11~ U ~/l X c0 d ~

aa E Q .N O >, O
.

o st vc ' ~ ~ ~,~ w >..~ ~ o o ~ ~ c ~

as d' " 0 o ~ 0 0 ~ ~ co o is i~
i~ ..

C L~

NN 0~ '' ~ ~ O " N~ ~ ~ ~ 'N~In s ~

ss -~ ~ ' . M .o Y a~ , p= a~
~ Q ~ E

i> > > N ' > ~~' ~ o a o =D E
W - ~ ~' a ~a 0o c a> d ~ .
, vi0 ~ in ca li~ itsE~ p '~ ao ~ Da a ~ ~' cc a~ ~ > > ~ a -' o " E aE z ~

aa ~ ca x F- O ~
c -cna a a O ... a OCo ca -o d za 0o ao 0 o v vv .o ..o~ ~ v v 0 0 o aa o o~ a 0 0 . _ 0 __ caM o0 0~ o~ o o a m M v co0 0>

OO ~M O N ~ a M N M~ r (O ~Y COrO M

tnM V' cD ~ O 'J. ~ c01~. M N Oi00~ t0 d'~ v'~ ~_t _I~ ~ ~ ~ v ~f7v c0 CO v r, O
a0V M V " O "' V MaD O N ~Y 07OM O
a0N O OW O C O a0 tn d' N Mm V
N0~7 M ~ C M J1 c'~M~ V <t VI M P~~.~ N

ON (D a0 M O I~ (OMM Ch (vj~p a O(p O

~ COV' M V ~D ~ a0Wc0 O a0 a0 c0~N

M N N N N

o ~ o '0 0 0 0 ~0 0 0 0 0~oo 0000 ~N o0 ~ ~ o o M ~c0 ~ CO 00 iI7MM

VV' ~ M '~ V CO <Y a0 N CC7(O O MM M

N ~ ~ MM

C ~ ~ tn COSC
O y0v0 ~

N m v ~ ~ 0 OO

~ cD O O ~ a ~ ~ N C'c f~~ 0 ~J~

m c~ ~ ~ ~ ~M V ~t V (IMC,h N
~ O I~V

00CO N (D M ~Q) c0 CA 4n ~ ~O
O N ~ V V ~t ct V N N OO

MM r M r M
N N N N NN

MV M M N O 7 N ~ O N N
OO 0 N~N

_ _ _N
OO tnN N c0 N N r fv.1v _ a00 ~ M ~ ~ N ~ ~~O O N ~ N NN O

I
t0 N~ ~ cb aJ d1c0 N N c0 ~6 N ~0NN t0 ~U

N cd~ ~ N ~tJN tU N N td c0 N c0NN N
N N~ N

_ __ ~ O N O N p)c st tD M .~.~ O
D

i NN Mm m N O N N , n M' ~~ N
~ ' ~~

N C V F ~
~ ' 1.
.~

r T- .- r r r N- r r r r rr r OV vtN O O M ~ M M u7 nj OD M ifjp ON ~n f~ I~ O

00_ ~ N M ~ 00 N CDO
NM ~N M - I~M tn ~ 07 07 ~

d'O M ~ M M ~ wtM 01 t9 N O C9 N
M fJ7 DM N Y x ~ m ~ _ ~ m ~I' M
' yC~ ~ i Q a Da e~ U Ca7 U .QY
ua;
~

Z Z Z o Q Q 4 U ~~ Q

v V O ~ N ~ ~ ~ N

M

N M N M V M

D
c~ > a a W a O ~ m o~ ~- a ~

I M M ~ ~ ~ ~

C c' c ~ M
h ~

a E c0 'a E j ~ v - N O , O >, N
O

N - - ~ ~ ~ cU0 E ~ p m U~ ~C ~
' m ~ ca a~ _ - cu ~' E E ~ ~

T O n ~ ~N ~ d y ~ ~ fn 'fl U = a > ~o , E > ~n U N O ~ , r~ ? ~ dS cc ~ a~
N v7 .

~ c6 N . U ~ j V O

N L Q1j CO y O O d O ?
N ~ N

M N N ~- U ~ O 7 H
~ d O j > m d N
~

' ~n , , o a n- ' E ' N

> ~n ~nc ~n E y o o o ~c ~

_ fn N ~ 47 ~ Q a U ~ ~ y ~ d ~ ~>
f0 O U j,O a U j, >' V .1G y O O
U

E o ~ E E E io m ~ O Q '~ O "~ d Q
E

- T u, d N O O a O O ~ L ~ O E ~ N U U' >.
N ~ t0 7 C O x L

N ~ N >' f~ ~ d . Q X C E
d N ~U ~ O - O J
-' _ ~ ~ O O ' ~ d O N
- ' ..'-. C

' E cnU?a Q ~ -- ~ in ~ T ~ X ' a E m ' cn a i ? c O ~c c .. n x ~ N U ~ L x p ~ ~ ~
?

C ~ ~ ~ ~ . ~ ~ ~ p (n ~ ~

~ d ~ 0.~''>, 2 ~ E ~ t ~ ~ 0 p tin ~ E d 0 O _ t t cd c0 O c0 M M U cE a 4f1 O N O C
~ cG O V
d ~ U >, .~~ o ~ U U ~ .~ a~ M Y a~
O v~ ~ ~ ' ~

N O ~ N O U N N i N N N j Y ~ E O
L .'~ U ~ ~

> ~ N L > ~ L ,> > O L > ,> ~ a O m U ~ ~ ~ L_ V ~ '~
N

N 4. m O ~ . O N i0 ~ i0 iC ~ p r -o ~ nj ~ ~ . W
~-c ~ ~ ~ ~ ~ a~ ~ ~ Z
~ Z ~

a'> ~ oa~ Q a oa v .~ a > E - ~
o. o ~

O O \ ~ ~ ~ \ O ~ \ U \ ~
O \ \ O O O \ O \ O O ''~ O 0 O O O
0 \
O O
~

d N M N N OD M ~ ~ T m tl)h h O ~ ~ Ln(O(D f0 N h N CO h d h N r-d c0 p ~j d 07 OD ~ 00 M ~ O d M M O
h O

c0 ~ ~ ~ ~ ~ ~ h ~ L .h.. ..M~~ _d ~ ~. ~

O 01COOD M M tn Q1 d OD O M
M h ~ O O d N N N h h O 07 O

(p ~ h v -h M N M M M v M N . M c' d T ~ M d ~ O t ~

ifJ f ~ m . N d n O O C~ O
~. ~ M
I' O O O N r d N N M M h M ~ h h h O~
N

0 o v o o \ W v W W W v v \ \ W
o \ \ o y y W 0 0 0 0 v 0 0 0 0 0 o 0 0 0 o d o o d O pp d p~ 1p ~p d M N
M ~ M

(D N ~ N ~ ~ M O GO C7 h ~ N d O) O N N

cD OD O a0 a0 c0 cD a0 ~ d M M d ~ O 90 V 'd ~ N ~ CO (D (D CO N v d .d..d N_ h v v W OSO v M M ~ O d OD 0 M
n N h f0 O M h O O O d N N N h h O 1 h 07 01 i t O M N ~ M M ~ M N r- '- M M d\
\ M M \ \ \ CM \
\ \ 1 O ~_ 00O ~ 1 ~ M d O d O OD 01 f0 O ~ M
N

h h h O N N N N d '- '- M d M

N

i M O c0 N d O O O O N N O O d N O O
~ O d ~

~ C~JN O~V N ~ . ~ r GO c0 ~ cp Sp cD M
r - ~

0 N O ~ 0 ~ N N N N n O O ~ ~ M
N N

0 I c __i ( N ~ ~ ~ ~~ N N ~ N N N N f0 ~0 N
c6 N ~ ~ c0~ ct1N ~ c0 N td c0 ~0 ~ N
N

N O d c0N _ c0 O a0 M ~ a0 ill a0 M c0 O M O~ c0 ~

N td cC c0 O O O O N N c0 M ~ m CO~ M cflC N c0 N ~ O
O N
M N

~ ~ ~ T _ M M
T

07 O ~ t(OM M c0 d d ~ tD r ~p C'~tO
~ n N
CV ~

O ~ d M M h CO d Q~ d h ~ O

m 00 a0 07O 00 O M M d N O M N 7 ~ O
~.,~ ~ a0 a0 a0 In M 00 M d N O

ap N C7O CO O d d M d c0 d T ap h h ~ d a a ~" Q a ~ ~ ~ ~ ~ ~ Q
a Z Z Z m U U d CO

0 o m co h o~ co h cD O r. o cp O

d M d N M M ~

X ~ ~ ~ Z Z

p O Q

N M d ~ c0 h a0 d' d d d d d d d ' o ~ o ~ ~ o . o . . ~ o E

a~ m E ~, 0 0 ~ o E a~

a p a ~ p o U NL _ U N
U C U ~ U7 N U(U
>

. O
i o U U
E E

O O , O (C E

d E E ~ O C jO
O = ~~ ~

O

N ~ ~ ~_ (n (n C Ly'U

/7 O C ~

c c0 _ O _ ~ a a ~ ~~

o . v c ~ u m ~ a' r~a a~~ c cn m o c mo O U - >, O U-I .NT . .

( - ~ d O.N
f ~

~ E

C N ~ U ib r n c0U N (0'.

v O ,.
' ' ~

a~ a~~ ~ ~ o-a~
o o o p ~ > _ >
. O > V
O

i0 7 . . ..(0C ' 0 O .
. .
+

O . N U p ,.

O ~ 7 7 ~ ~ I
O O p d U d d d d ..Q
U U U

o \ 0 0 o A~ o0 M N ~ ~

~ tn ~ O N
I

~ M ~

_~_ V ~_ _u~ ' '7 c YU

CO ~ h 07 V M

~ N N N O

d~ ~ N ~ N
' m ~ ~ ~ ~ MM U
i O

I~

i o ~ o 0 0 o~ ~~I N
G

V ~ V ~ M ~ M~ m ~

07 c T c (hN
0 D ;

N M _ M m C'')tO.

V V' .~ COM
i n r r r O V i ~

~ p M'N C

c0 N 00V~
:

p UI

N

3' o n T-o~ o ~ vr~ c N N ~ V V' c'~C'7C'7 N

a~ a~ a~a~ a~ a~ a~a~ E

V 0~ ~ N N (''>N<Y

c~

G

N N

a~0t~6 N M

V
V M MO

~ V ~

M T ~V

n M~

~ ~ ~ V O

M NM

a Q ~ a a c U U U U

O

I~
N ' i N

pp V C
N

E

J

z _ ti I
..j v ~n O

Z

3001-1~(iA CA 02375097 2002-03-28 _77_ Table II-B
EVER ORF START (bp)...ENDLENGTH ORIENTATIONINTEGRITY
(bp) (aa) -Contig 1 75 250 NEGATIVE COMPLETE
1 _3 ... 1 2 1533...2861 442 POSITIVE COMPLETE

3 2967...4055 362 POSITIVE COMPLETE

4 5529...4060 489 NEGATIVE COMPLETE

6174...6998 274 POSITIVE COMPLETE

6 6995...8284 429 POSITIVE COMPLETE

7 8281 .,.9465 394 POSITIVE COMPLETE

8 9472...10491 339 POSITIVE COMPLETE

9 10607...11020 137 POSITIVE COMPLETE

f 12020 ... 11076314 NEGATIVE COMPLETE

11 13089...12061 342 NEGATIVE COMPLETE

12 13912...13082 276 NEGATIVE COMPLETE

13 14812...14015 265 NEGATIVE COMPLETE

14 15177...16211 344 POSITIVE COMPLETE

17099...16377 240 NEGATIVE COMPLETE

16 18415...17099 438 NEGATIVE COMPLETE

17 19900...18683 405 NEGATIVE COMPLETE

18 20858...19917 313 NEGATIVE COMPLETE

19 21589 ,.. 20858243 NEGATIVE COMPLETE

23031 ...21586 481 NEGATIVE COMPLETE

21 23345...24487 380 POSITIVE COMPLETE

22 24565...25542 325 POSITIVE COMPLETE

23 25547...26509 320 POSITIVE COMPLETE

24 26557...27570 337 POSITIVE COMPLETE

27567...28619 350 POSITIVE COMPLETE

26 29397...28639 252 NEGATIVE COMPLETE

27 29752...30681 309 POSITIVE COMPLETE

28 30879 .., 31946355 POSITIVE COMPLETE

29 31946...32935 329 POSITIVE COMPLETE

32990...34018 342 POSITIVE COMPLETE

31 34073...35425 450 POSITIVE COMPLETE

32 39383...35580 1267 NEGATIVE COMPLETE

33 39863...40612 249 POSITIVE COMPLETE

34 40609...41532 307 POSITIVE COMPLETE

42398...41511 295 NEGATIVE COMPLETE

36 42708...42460 82 NEGATIVE COMPLETE

37 44557...43532 341 NEGATIVE COMPLETE

38 45966...44554 470 NEGATIVE COMPLETE

39 47003...45963 346 NEGATIVE COMPLETE

47971 ...47207 254 NEGATIVE COMPLETE

41 48221 ... 4807049 NEGATIVE C-TERMINUS
ONLY

Contig 41 480 ... 1 160 NEGATIVE N-TERMINUS

Contig 42 1 ... 1203 400 POSITIVE C-TERMINUS

43 1200...2327 375 _ COMPLETE
POSITIVE

44 2357...3607 416 POSITIVE COMPLETE

3616...4239 207 POSITIVE COMPLETE

46 5169...4060 369 NEGATIVE COMPLETE

47 6086...5166 306 NEGATIVE COMPLETE

48 7811 ...6267 517 NEGATIVE COMPLETE

49 8746...7889 286 NEGATIVE COMPLETE

10035...8764 423 NEGATIVE COMPLETE

_78_ Example 2: Identification of a biosynthetic locus for an avilamycin-type compound from Streptomyces mobaraensis:
The microorganism Streptomyces mobarensis strain NRRL B-3729 was obtained from the Agriculture Research Service Culture Collection of the United States Department of Agriculture. Streptomyces mobarensis was not previously reported to produce an avilamycin-type compound or orthosomycins in general. A
biosynthetic locus for an avilamycin-type compound in Streptomyces mobarensis (AVIA) was identified using the method described in Canadian patent application CA 2,352,451. The sequences obtained from cosmids containing overlapping genomic inserts spanning the biosynthetic locus for everninomicin were identified.
Within the sequences of the cosmid inserts, numerous ORFs encoding polypeptides having homology to known proteins were identified. Homology was determined using the program BLASTP version 2.2.2 with the default parameters.
A contiguous nucleotide sequence spanning AVIA and deduced amino acid sequences of AVIA are provided as follows: The amino acid sequence of ORF 1 (SEQ ID NO 123) is deduced from the nucleic acid sequence of SEQ ID NO 124 drawn from contig 1 (SEQ ID NO 277). The amino acid sequence of ORF 2 (SEQ
ID NO 203) is deduced from the nucleic acid sequence of SEO ID NO 204 drawn from contig 1 (SEQ ID NO 277). The amino acid sequence of ORF 3 (SEQ ID NO
127) is deduced from the nucleic acid sequence of SEO ID NO 128 drawn from contig 1 (SEQ ID NO 277). The amino acid sequence of ORF 4 (SEQ ID NO 19) is deduced from the nucleic acid sequence of SEO ID NO 20 drawn from contig 1 (SEQ ID NO 277). The amino acid sequence of ORF 5 (SEQ ID NO 57) is deduced from the nucleic acid sequence of SEQ ID NO 58 drawn from c:ontig 1 (SEQ ID NO
277). The amino acid sequence of ORF 6 (SEQ ID NO 253) is deduced from the nucleic acid sequence of SEO iD NO 254 drawn from contig 1 (SEO ID NO 277).
The amino acid sequence of ORF 7 (SEQ ID NO 251 ) is deduced from the nucleic acid sequence of SEQ ID NO 252 drawn from contig 1 (SEQ ID NO 277). The amino acid sequence of ORF 8 (SEQ ID NO 187) is deduced from the nucleic acid sequence of SEQ ID NO 188 drawn from contig 1 (SEQ ID NO 277). The amino acid sequence of ORF 9 (SEQ ID NO 199) is deduced from the nucleic acid sequence of SEQ ID NO 200 drawn from contig 1 (SEQ ID NO 277). The amino acid sequence of ORF 10 (SEQ ID NO 255) is deduced from the nucleic acid _79_ sequence of SECI ID NO 256 drawn from contig 1 (SEQ ID NO 277). The amino acid sequence of ORF 11 (SEQ ID NO 117) is deduced from the nucleic acid sequence of SEO ID NO 118 drawn from contig 1 (SEQ ID NC> 277). The amino acid sequence of ORF 12 (SEQ ID NO 87) is deduced from the nucleic acid sequence of SECT ID NO 88 drawn from contig 1 (SEQ ID NO 277). The amino acid sequence of ORF 13 (SEGt ID NO 81) is deduced from the nucleic acid sequence of SECT ID NO 82 drawn from contig 1 (SECt ID NO 277). The amino acid sequence of ORF 14 (SEQ ID NO 181 ) is deduced from the nucleic acid sequence of SEQ ID
NO 182 drawn from contig 1 (SEQ ID NO 277). The amino acid sequence of ORF
15 (SEQ ID NO 133) is deduced from the nucleic acid sequence of SECt ID NO 134 drawn from contig 1 (SEQ ID NO 277). The amino acid sequence of ORF 16 (SEQ
ID NO 1 ) is deduced from the nucleic acid sequence of SEO ID NO 2 drawn from contig 1 (SEO ID NO 277). The amino acid sequence of ORF 17 (SEO ID NO 33) is deduced from the nucleic acid sequence of SEQ ID NO 34 drawn from contig 1 (SEQ ID NO 277). The amino acid sequence of ORF 18 (SEQ ID NO 165) is deduced from the nucleic acid sequence of SEQ ID NO 166 drawn from contig 1 (SEQ ID NO 277). The amino acid sequence of ORF 19 (SEO ID NO 167) is deduced from the nucleic acid sequence of SEC1 ID NO 168 drawn from contig 1 (SEQ ID NO 277). The amino acid sequence of ORF 20 (SEO ID NO 45) is deduced from the nucleic acid sequence of SEQ ID NO 46 drawn from contig 1 (SEQ ID NO 277). The amino acid sequence of ORF 21 (SEQ ID NO 247) is deduced from the nucleic acid sequence of SEQ ID NO 248 drawn from contig 1 (SEQ ID NO 277). The amino acid sequence of ORF 22 (SEQ ID NO 99) is deduced from the nucleic acid sequence of SEQ ID NO 100 drawn from contig 1 (SEGO ID NO 277). The amino acid sequence of ORF 23 (SEO ID NO 105) is deduced from the nucleic acid sequence of SEQ ID NO 106 drawn from contig 1 (SEQ ID NO 277). The amino acid sequence of ORF 24 (SECT ID NO 153) is deduced from the nucleic acid sequence of SEQ ID NO 154 drawn from contig 1 (SEQ ID NO 277). The amino acid sequence of ORF 25 (SEC1 ID NO 111 ) is deduced from the nucleic acid sequence of SEQ ID NO 112 drawn from contig 1 (SEO ID NO 277). The amino acid sequence of ORF 26 (SEO ID NO 193) is deduced from the nucleic acid sequence of SECT ID NO 194 drawn from contig 1 (SEQ ID NO 277). The amino acid sequence of ORF 27 (SEQ ID NO 245) is deduced from the nucleic acid sequence of SEQ ID NO 246 drawn from contig 1 (SEQ ID NO 277). The amino acid sequence of ORF 28 (SEQ ID NO 249) is deduced from the nucleic acid sequence of SEO ID NO 250 drawn from contig 1 (SEO ID NO 277). The amino acid sequence of ORF 29 (SEO ID NO 149) is deduced from the nucleic acid sequence of SEQ ID NO 150 drawn from contig 1 (SEQ ID NO 277). The amino acid sequence of ORF 30 (SEQ ID NO 145) is deduced from the nucleic acid sequence of SEQ ID NO 146 drawn from contig 1 (SEQ ID NO 277). The amino acid sequence of ORF 31 (SEQ ID NO 51 ) is deduced from the nucleic acid sequence of SEQ ID NO 52 drawn from contig 1 (SEQ ID NO 277). The amino acid sequence of ORF 32 (SEQ ID NO 63) is deduced from the nucleic acid sequence of SEQ ID NO 64 drawn from contig 1 (SEO ID NO 277). The amino acid sequence of ORF 33 (SEC! ID NO 159) is deduced from the nucleic acid sequence of SEQ ID NO 160 drawn from contig 1 (SEO ID NO 277). The amino acid sequence of ORF 34 (SEC! ID NO 175) is deduced from the nucleic acid sequence of SEQ ID NO 176 drawn from contig 1 (SEQ ID NO 277). The amino acid sequence of ORF 35 (SEQ ID NO 27) is deduced from the nucleic acid sequence of SEO ID NO 28 drawn from contig 1 (SEQ ID NO 277). The amino acid sequence of ORF 36 (SEQ ID NO 75) is deduced from the nucleic acid sequence of SEO ID NO 76 drawn from contig 1 (SEO ID NO 277). The amino acid sequence of ORF 37 (SEQ ID NO 69) is deduced from the nucleic acid sequence of SEO ID NO 70 drawn from contig 1 (SEQ ID NO 277). The amino acid sequence of ORF 38 (SEO ID NO 93) is deduced from the nucleic acid sequence of SEO ID NO 94 drawn from contig 1 (SEQ ID NO 277). The amino acid sequence of ORF 39 (SEO ID NO 7) is deduced from the nucleic acid sequence of SEO ID NO 8 drawn from contig 1 (SEO ID NO
277). The amino acid sequence of ORF 40 (SEQ ID NO 39) is deduced from the nucleic acid sequence of SEQ ID NO 40 drawn from contig 1 (SEO ID NO 277).
The amino acid sequence of ORF 41 (SEQ ID NO 139) is deduced from the nucleic acid sequence of SEQ ID NO 140 drawn from contig 1 (SEQ ID NO 277). The amino acid sequence of ORF 42 (SEO ID NO 13) is deduced from the nucleic acid sequence of SEO ID NO 14 drawn from contig 1 (SEQ ID NO 277). The ORFs in AVIA have been assigned a putative function and protein familiar designation based - $1 -on homology to known proteins as indicated in Table III-A. The position, length and orientation of each AVIA ORF within SEQ ID NO: 277 is provided in Table III-B

.

~n a r c U

N t0 ~ 7 E

C ~ O N O E

~ . . f0 O . NO O O c0 O ttS

O -. VJ
E L E

O ~ UC 0 N N7 ~ C V
'0 V N ~~ U U O .E U
T

O ~ c . E O > ~~
~

~ p_ O > O

7 ~ ~ ~ ~ ~ ~ O O L
.

U
. ~ ~ a L T d E d .
.N

N ~d U7 ~ L N (17N

_ fn 'U ~ UC ~' Qj O LT' O O O 0)O fn v r. o ~ ~ m o~ _ ia ~ ~ E ~ ~
ui Um a N 7 N _ Z ~ O >. ~

>,c d E a m v? ~ ~ ~ E ~ o~

~ ~7 i .a i N m o i!o , ~C ~ a~ ~
r a c ~ E ~ ~ ~~ i c ~' > ~' c a w n rr~
~a '~

~ Q ~m ~ 'o .E ccC~ ~ ~ c o m u~ E m , Z a~ ~ ~ y ~ > "' ~ ~ - ~
v~ ' d ~ , o~

'~ N~ '~ vv o ~n E

m c a d a~ o U O O - o a~
C O i D >, >, ~ c o a ~

~ ~ ~ _0 3 a o E E E

n E > 07 > > ~ L
O) . O U ~ _ >~ O

Z Z ~ m X Q is o -Z cn o :~ D D ~ o > '>> p ~ > ~
~

w 0C m OC o~ ._ , 3 N ~Z a> > > a a a 4 .c a> ..

O
w ' s s v ~s s v s v o o v s '.- v v o~ ~ co co . ~ o 0 -ca co m ~ m n o ~ c 'n ~

~ ~ r a ~c ~ M M ~ ~ ~ r W j O M N O O ~ O ~ ~ _ f, c tO
p N O

N N N N m M M M N c0 tC
~ ~ ~ N N ~' ' C W l h 0~ _ ~ O
O C

N N N 00 N 00 00 ~ m N c0 V
1 a0 I
o o ~ M Q1 00 N 0 0 s pp ~

I ~ ~! O OD OD O N ~ 00.
N V

m N

m vj ~ ~ ~. ~ N N ~ C N I~
~ D

i cD O M u7 M
~

y ~ M M~ ~ ~ ~

N N N; N N N N c C\O
~O

vi v v v N 0 O MM ~ 0 N

a 0i N f~~ N o (O

i I
N N O N ' 0 0 N N

G ~ Oc~ 0 ~
O~ f 7 d N ~, ~ a~ w a~~ d M o'oa I

i m ~ m ~ ~ o ~ c ~ c ~ ~ t ~ N c N N
a I a 0 o O N N N N ~ MM N ~'' M N

r r- .- ~ M ~
r r _ _ _ _ L t0 ~ ~ fs fn O O O O V 00 N O

y ~ M I (D M O n N O n OW E

C O Sp C m 00 M f1.~ m N ~ N
f~.

N N
I

' Y N Y ~ m Y m Y M
N

U ~ ~ I
~

d a a a a V Q

(g 'Q Z Z Z Z Z Z

Wn N ~ ,~ m N m N m N

+r>.
g a a _ N M V' 417 c0 I~

a o '' ~ ~n a, o E
0 o c ~

y ~ - N N o o d W E > a ~ >

N V 01 C O ~ C
N

fO N O E E '. U ,N.?. p f O
i f n ~ O OO O O ~ ~ N

~ ~ ~ N

N ~ >'~ ~ ~ 7 O ~V ~ L > c C

U ' .
O N

~ ~ O ~ ~ U~ O ~ O

o ~ o o ~ ~ ~ m EN > ~ >. ~ cn _ ~o a ~ o~, , ~ m .

C ~ ~ O O ~ p7 ~ E ~T ~

T ~ N N
.

. U U J,C ~ - ~ ~ N OI~
t .

(n ' N ~ (f1 V7 L ~

O N ~ "~ 'O p ~ a ~~p dQ (n C C N 7.

C ~ 7 d '_- C

O >,> > a~_ t u~ c0 ~ -0 ~ ~ C U O

O N N d U (n ' N CC_ t U ~ ~C
~ ~

V ? T 7. N ~ d U NO ~ p O t O . U U r 1 _ V c E N ~ M a N C ~ (~O ' T ~

- O C c O O d d .~N C ~ ~ O Q~ V 7 7 O C d . .

3 N ~ C ~ ~ c o '~a a >~ ' o ~ . '~ .._ ' n.t~ ~? ~ d a~~ a~ c"o~ 'c r ~ ' ~ N
~ ~ ~ c p v .a > O r y,.p > > n . ._~ CL
o ~ , ca ~ ~ 7C X ~ l1 iU~ ioZ N~ ~ ~ ~ Z ~ ~
v o>

a~p '>
' o c ~ o~

n p O ~ Q Q ~ Z a Z a 4 c%>~ o c c ~ n , a c of o :0 0 0 0 o oo ~ o o ~ o ~
'' 0 0 0 0 0o M ~ 0 0 0 0 0 0 0 0 0 0 ~ M ~ ~ V~ r; tnM
(O ~ N ~
N

h N O> cD O ~ 47 ~

f0~ 01 ~ M N N GDCfl ~OD ~

~ InM ~ ~ ~ ~ ~ ~ ~ ~v M

C W (7 cp h Q7 h O O)~ h~ O) N 0 ~ M ~ ~ C~~C~r!' M CMr7M N r r M p M
Oh ' C ~ N m h C'~PC1D~ tn aa ) N it ~ h M h M NN O M N N

t0~ M N c'N N ~ ~
~

v v o 0 0 0 0 0 00 ~ ~ o v o 0 h i ~ ~ M O~ ~Y O'o~

i0O)~ N ~ t)0 7 G ~ M y N

~ M h ~ c ~ ~M ~
~7 c0 a0 c'~V V ~ c C 'r~ ~Y
_0 VD O

00 tv~ (p O 0 ~ ~ MV N ~

cN h N M ~ ~ W 0 C~ ~ - N ~ M
O 0 ~ r ' C N ~ d' M M c\N MC~ . ~ O ~ M c0 _ ~ ~ N

~ ~ N ~ ~ N tn i.l~M ~ M N M N
N

V N , N
~

~ ~ ~ b.~ tn ~~'~ ~ N N

OD 00cDh ~ O . r V'Wit,~ hcD O O O r r r - r . ' O y O , O , ~ O C O ~ N N

~ U N G7 ~ y p ~ y r r d. . N M

~ ~ N ~ N N ~ N ~ ~ N ~N

( c N t h c0 COO O M O ~ ~

N O GO

c0 ~ tI
p O~ M M N ~ V ~ ~ MM M ODh c Q a ~ a N N ,~ ~

r r ,_ rr N c M M
~

M M ~ ~ N r ~ ~ ~ n O m' n a~0 N ~. W r I~ ~ ~ h r h~ h r el 00 Q1 . . O fP
r O~M M M C~ O~M O1 a0 p CD O c0 ~ Y m M ~ ~

M Y 1im Y Y Y D Y y (J Y 1 -Q ~ ~ ~ Q z ~ ~ U

~ V O ~ ~N

c M
~ N d r Y Q ~O
i ~

j p L . ~ ~ 1 ~ j p 1 Y

F - W Y k Z
g a t o N M : m n 0o a~

v o ce o v~ ~ v o~ v .' c c E - O
.

~ U ~L, 07 01 U O
O O

~0 c O N W ~ tn O m E c v - E E

o U ~ ~ v ' > > a ~ c v ~ ' m - , , o _ . _ U , T7 E O O O ~ U O
~> E ~ O O ~ O
~

O c C .a p ' v) ~ ~ '~ ~ 'j >

V ' aj~ ~ (LJN~U ~ Q O

a p ~ V O f0 Cn ~ 07 ~ U j N

~ ~ ~ N O
~ ~ E O V
~

' E ~ ~ qJ U U
' O

N Q1 E ~ ~ f0 N >' >. p fO p O

a E
v ~ ~ i~ o E E

v ~ ~ E E ~ c s ' ~' ' o ~ ~' a m m - in - m ~ v v v E

a ~s in inE
n o v ~ ~ rv ~ 'a ~ in ~
~

. ~ n o ~

' s a ~ ' ' v m ~ ~ ~ a a ~ ~

- ~ v ~ v v~ ~ ~ ~ m m d a o ai ~, v ~ c N ~n v v v v E c u~

' v co ~, _ ~ ~ '~ ~v ~ v ~ _ ' ~' ~a '~

D > , C ~ ,.. U ~ C C
~ ' c E v c s, ~ ? c~o cac n ~' c 0 o a a E s ~ o ~ ~ ~ ~ ~ S
~ o v a~ vrn . , 5, .
a ~ v > . ~, N. v ~n ~ v v ~

j z ~ E o '' a~ccZ ~ v Y ~ ~' o vj - a , ~ o E a~ v x a ~ a> > o > v v v v v E E
v o, ~
-.

~ . , s ~ ' ?
~ ~

z 'a o '~ o ~ isis a ~ a. > >
c a .

v Q F- a'> a a -' a _ _ ~

a a a > j ~

a a ->

-..

~ ~

I~ M D a0 0 ~ oa \ 0 0 0 o c M \ \

O D 00 M ~O ~ ~ l f ~

~ ~ V 0 ~ ~ CO

pp a0~ Nc0 V O'!~t7 N ~0 d'.O~ c0 (p ~ M p O

M N N M p ~ m ~O _ p ~ m f~ .- n p \ M ~t V

"J N O ~ ~ N ~ V~ M Q N ~
(' ~

t c d. ~ m N ~
~ 0 0 O ~ ~ fi V N t0 l~ U7 ODa0 O ( N
c'~N N ~ , 0 p W

M ~, Q~ N _ d N f7 ' ) N U! N ~ N
~' N

o o 0 0 0 0 0 ' o ~

\\ o o t!7d' N p ~ N ~ o oo o o \ ~ a N :

~ O op N ~.. _N N Iw ~ ~ 0 0 'd' Mi st CM is ~jp M N O O~
Is ~ cv a p c0O~ MM ~ i~ O
M ~

~ a I ' O~

C~ N N N ~ a p c0O -' O v oD
N ~

M O p N M (~ ~ ,- r" V' M ~, O 1 O \ ~ ~ O p '~'d' ~ O N
M ~

f~ N c N CA M 7 O p N M r,p Qy f7.' ~ O
(h : t ~ N M
.V

~O CD N ~ N

p O ~ p0p O I~I~ ~ ~ N N O
(~M7 Q7 N a ~ N a ~ ~ ~- ~ ~ ~ p f 0 N i 7 - 7 O r ' O N M G1 V N N
' N

N N ~ ~ ~ QS N c0 N ~

~ m ~ ~ M O N O' ~~ ~ 0 t0 O Q 0 O
c c M f' ~' 7 c . - ,... ,-.T N ~ M O ~ N N N h ,-n Oi Oi ~ . ~ V~ ~~ - C
~ ~
"

~ ~ r _ _ _ ~

O OV N M OD ~N pp N ~ c 0 pi fp M

' Y Y ~ p~ ~~ N ~ ''~M
p C ~ Q c a O
) Q ~q ~ ~

Z Z 1Q Q q Q 4 Q

O N

N N ~ M

N N M

Q
~ ~

w w z ~ a 4 ~ >

0 o I x ~ J ,~, ~ ~ ac ~

o o ~ ~ o N N

I N N N

N ~ N N
U

C U ~ C
O 'C U

.f0 O f~U 7 E 0 7 '~

N O O O ~ ~ ~ V

o ~ E E O

-o ~ co ' O
N O E N ~

O a C ~ ~ ~ U O ~ U O N
> ' ~ :o ~, E E ' a ' ' N ~ ~ ~ a. E ~ c o ~ v~ o ~ ~ ~ E

N ~ >. c E.. o m a .: o a o c ...: >
$ a~ E ~ a ~ ~ .., ~ ~ ~ ~ .~ , =o E

as o V

c o E ~ ~ ~' v? ~ ai ~ ai ~ '~ ~ o N

E c o a ~', c >, N c~ c, ~ ~ U
~ ~

L ~ E N _ E N C ~ ~ c~ U (O
O O U ~ O

L D C ~ ~ ~

N O V O ~ V O Ol d Ol ~ O
- ' V V (If _? p i. a. , i, o. O y ~ Q
' y a~ ~ y . fn N

_ O N ~ C N U ~ N ~ N U
r U O

7 N d ~ ~ r ~ C C ~ , t ~
~ O
C

~ O N O N - ~ O
N " ~ N ~-, a~ ~ a~ a~ c c a a~ a y w d ~

N N N N N 'D , 'p p7 y N
U T C U U U U N ~j N .~
' E (,n O a C Q a~ c a a~ a~ o ...5.
>. >~ >. >, o ~

E ~ o ~ ~o ~o Ed ~o ',o u~ ~ andv ~ N o v c~

' o a U ~ U U ~ o ~ ~ r > a a o o a L o - d U ~ a ~ , c '~ ~ ~

c Q Q~ a' ~o Q' Q' ao ~' ao ~ ~ ~
~

-o (O O In O N ~ U T~
(n fn O O U O O L O O ~N
L C C L C

> > ~ > _ ~ > U ~~ > ; > U
~ C .C ~ .> , j U
.

U's X iu its~ i0 m ~ io ~ i0 ' ~ is ~ ~ ~p ~ :: -p ~
~

, > > ~ 0 ~ ~ 0 ~ :~ U ~ a v 0 0 ~ 0 0 >. :c a E a a n. a n. a a a > m a d m a n.
a a '> a a a. '>

0 0 0 :o o a o .0 0 0 0 v o o a v o o~ 0 0 0 ~ d- 0 0 0 o a o N r cc ~.c>r' N yn cn ct m n o0 ~ O ~ I ~ - O M CO ~ V' ~ N I' I I~ p n N , p p of ~ a ~ c a ~

p ~ _ ~
~ M ~

M ~ N O O C CO mN N M N ~ c M
M t O 9 M M M N N N p M c'~ w M M N
' ~ 'd' o co co ~_ r o~ m Wn c o o~ c~
n ~

N M M N N N ~ ~ N~ ~t) N C~ N O

N ~

0 0 0 0 0 0 0 0 0 0 0 0 0 0 ~ o ~ o ~f7 1~ ~ N (O o CO M ~ i~
M ~ d a _ O) W tn OD O ~ c 0 tn N O
(D~ ~ N ~ ~ c0 tO r. r ~ Qj O~ oD u7 ~ av V_ M _I~ _c~ ~ V_ N_ N
M

CO ~ _ _ _ M tO O O u7 ~ ~ O
M OW - 07 j I~C'~ ~ N O (O CO N N N N N M M M

N ~ CO M M M N N N M M M M M M
~ v v v v v v -... ~.,,v ~ v N

~ O~ aD ~! N cD a0 ~ aD _ M ~ N M
c0 N N f~ N a0 f~ ~ O W N N N N M N c0 (' ~

N ~ r' ~ ~ N N CO p (9 M M a0 O ~ O) p O O ~ O~ 07 y 0 st Wfi ~ ~ ~ ~

O N ~ O ~ ~ O (O N ~ N O ~ V O f~

f N ~ ~ ~ 0 ~ ~ ~

c f 0 N M N
'- 0 D D ~ M p ~

N C9 M c~ ~ N N O ~ M

r ,- N r T' N
~ M ~ O <D ~ ~ ~ ~ r M

0 O C~ C,~ i C
0 p O ~ ~ ~ ~ O) ~ O
(D

M ~ .- O M M N M M M N f'~ .~ N t~ M M
M N a0 a0 i?'7 M M Y ~ Y ~ Y

Y N Y C7 U ~ U' U ~ ~ ~

a~a ~ a a ~ a as ~ a a a U U a a Z Z Z Z

a M ~

N M ~ M

~ ~

u > U ~ r .

Q g O i N N N M

N N M

E w ' E ~ c N a~ ~ E o Y ~ o c ~ _ o o ~ d ~~ Y a ~ ~ > ' . c o E . o v~ . .o ~
- E ~ ~ ~

o , ~ a c "

o ~ E s = O g ~ cc y a~

N j N ~ O .V ~ O .~ 'C> O c U
C

t0 ~ O 'C O ~ O O~f1a L T M
E

U N .~ L NU7 ~ .Nm N o tn N
C

L U v. t~O C
, O ~ N W ~ ~ ~ O~ ~ ~ N C7 O
V

c O f0E (f3 'S 7 C

~' jJ O N ~ pO m , ~ Y ti E ~ 7 ~ ca p ~0 w Z ~ C

U ~ ~ ~ a t U ''~'~ E ~ ~~'E
' ~, ~

. o. ~n ~ v~ v~
~

N U ~ ~ ~ ~ ~~ ~ Q(n~ NO N U CO

c _ ~ i ~ ~ ~ 'c c c in~ t c' ~ a~a a 'cc n ~ ~
a >, ~ a o a ~ -m ~n ~T m o >~ m p ~~ o ' , ~s ~ ~ _ ~, E~ ~ Y a E>,a c6 p O ~ y ~ >_ ~ O ?V~ . ' p O
' G ~
p1 N

, ~ od ~ . tO T , _ c~ i0. ~o o ~ ~ ' E ' ~n ~ , m x T ~ ~ t oL O) On. E C ya 0 O ~ ~ N O N

tn Cn a~a~ o O.~ Q1 - ocuo ~~ O t:
o ~ ~o N ~
a _ E ' ~ >v -m c ' a in. o' m rr o c!~E~ ' a O
a '~ >

N Y6 -L7~ ( ~ ~N D ~ j .C ~N ~
U O U C

a a Q a n$ Z ' ~~ ta ~ Y a o. ca a ~oc Z O, a d p O
~

QZ -a C\O ~ ~ ~ o~ 00 0~ o ~o~ o0 o v vv v o cD ~ N 07n ~ M~ ~ V M

M N I~CO ~ c hO OO M r O
L

v~ oo v ~ m ~~ m ~ n~

~ ~.v ~' v on o N ~ N m 000 O NN O 0 V

_ C (_'m Oc~ 0 N Nb,n ~S ") 0 m ~ M01 c~ ~ ~ N m c1c'"' M ~ V

a0 N M a N N c ~ OO " '~
N 0 ~~
O

N N N N NN c0~7 o 0 0 0 ~ ~0 0 0 o ~ o\ 0 0 00 0 ~ \

N N m ON N O O ~ ~
l 00 ~ n e M
O

7 ef01 O lO (p O O ~ ~ p Oa if c p V ~ , . .~M M ~ V.M N

N
tnN V ~00 1~ O O M
O

M m ~ .M- NM M ~ ~ M 000 N NV
~

N ~.. M o?M
V M M '~ O ~O N ~ v ~

c0 oDO~~ ~ ~V V ~ ,n N~ p~ M N~ I~
p~ ~

N ~ O N _ _ N NN N

~ O p NCO O ~ M N N
~

c S ~~ N N M
N p( O
O

N N ON ~V N W N
N v0 N N

~ - V c0 N N d I

i O OO

p N ~ M M ~ IN N ~ ~ ~ ~ N'~~

c t c O M ~ ~t7i N f0 f. ~ N N 6 C
M N O~
cC
c0 N M M M O N V' ~ c~0 M c' c _ _ _ _ ~ ,~ ~
~Y 7 M '~ '- M ~- '- -r r O ~ , d~

O O m V O ~ 'y17 p ~ ~ C'~~ O N OM
' c ~ ~

_ f0 ' M 0 O M ~ i c0 O o0 O M Ou 7~
~ u7:? f V Y a of ~ I 0 M O ap NN~ O O M tD

cr 0 l a l l ~ ~ ~ a Q
~

z z z ~ z ~ z ~ ~ aa a a a d U. U 4 U
i 4 f a d.d' y c;~~ ~ ao ao M N N iM d0' N

.

x O

C7 O ~ ~ O I- m Z

Z S ;g D D

~M M . m M M O
M

_g7_ a~

>.

. E

E o o E v, a o a~

> > >, M

~ UO 07 cu o> c c a ca ~
u ~

io c p L U OO N O a '0 E >.~.> O .~.

0 oE . Q X co E ~

0 O O L ~ ~rj N ~ ~ d N
d X U
N ~ ~

_ O
N

U O N
d Y

O~ p ~

N ~ a CO V
p X N VV O ~ O p 7, o -o >> E o a~ L
E

aa o ~ ~e ao a ~ ~~ r ~c v o a co a o ~ v ~ av ''' yY ' ~ ~ i ~' W

, n 7 o Y N VM 7 C a O
a <U

V N OO O O O
L O ca a O m tn> ~ L L E
a. -~ a d ~
~ ~ ~
~

o _~ a c _ >, . o ca ' p ~

c a Da o z a">
o v v v vv v v v 0 0 00 o a o 0 N I~ r00c0 N ~ a0 M O Nc0N !~ tn ~

00O ~f7 N V' N

_(Dt_0 _(D_c0_t0 _~ _d'_V

~Y CO OM ~ CO N O

V I~ a0N 00 V V NM N ('~~M M

O ~ rt0(D ~ 07 N

m O O1m OD M M M

N N

O ~ N m O
C

~ M ~ O

_~_ _ M_ d' ~ OM tn (O N O
V

V ap NM N M c~

M
'~ N ~ ~ 0 0 ( ~ 0 ~

N

NO 00 m N

~ a0ODf~ M M N

N N ~N V ~ ~

I

N ~c0N to c0 <0 caN N c0 c0 V ~ c0O r t0 N r ~ M N N O

V a ~ c C V
O ~

V M

~ M M M

l D

11 ~ CDLLp Q U~ ~ Q Q

co v M N

M M

a n.

x w _8$_ Table III-B
AVIA ORF START (bp)...ENDLENGTH ~ ORIENTATIONINTEGRITY
(bp) (aa) Contig 1 1 ... 858 285 POSITIVE COMPLETE

2 1847...816 343 NEGATIVE: COMPLETE

3 2721 ...1969 250 NEGATIVE: COMPLETE

4 2972...3934 320 POSITIVE COMPLETE

4870...3986 294 NEGATIVE: COMPLETE

6 5044...6114 356 POSITIVE COMPLETE

7 6410...6111 99 NEGATIVE: COMPLETE

8 7412...6936 158 NEGATIVE: COMPLETE

9 7639...8049 136 POSITIVE COMPLETE

7992...8741 249 POSITIVE COMPLETE

11 8738...9961 407 POSITIVE COMPLETE

12 10020...11390 456 POSITIVE COMPLETE

13 11546...12580 344 POSITIVE COMPLETE

14 12577...16431 1284 POSITIVE COMPLETE

16623... 17690 355 POSITIVE COMPLETE

16 17723... 18763 346 POSITIVE COMPLETE

17 18842...19870 342 POSITIVE COMPLETE

18 19960...20676 222 POSITIVE COMPLETE

19 20723...21394 223 POSITIVE COMPLETE

21442...22734 430 POSITIVE COMPLETE

21 22731 ...23744 337 POSITIVE COMPLETE

22 23741 ...24469 242 POSITIVE COMPLETE

23 24512...25228 238 POSITIVE COMPLETE

24 25239...26183 314 POSITIVE COMPLETE

26177...27013 278 POSITIVE COMPLETE

26 27010...28200 396 POSITIVE COMPLETE

27 28197...29168 323 POSITIVE COMPLETE

28 29168...29962 264 POSITIVE COMPLETE

29 30003...30980 325 POSITIVE COMPLETE

30980...31942 320 POSITIVE COMPLETE

31 31981 ...32988 335 POSITIVE COMPLETE

32 32985...34013 342 POSITIVE COMPLETE

33 34813...34061 250 NEGATIVE COMPLETE

34 35012...357.37 241 POSITIVE COMPLETE

35734...36678 314 POSITIVE COMPLETE

36 38312...368.55 485 NEGATIVE COMPLETE

37 38516...39367 283 I POSITIVE COMPLETE

38 39369...40415 348 ~ POSITIVE COMPLETE

39 40636...41682 348 POSITIVE COMPLETE

41676...43076 466 POSITIVE COMPLETE

41 43081 ...44091 336 POSITIVE COMPLETE

42 44081 ...45055 324 POSITIVE COMPLETE

_89_ AVIA was compared to the avilamycin A locus of Streptomyces viridochromogenes Tu57 (herein referred to as AVIL), GenBank nucleotide accession AF333038, Weitnauer et al. 2001 Chemistry and Biology Vol. 8, pp.

581. Figure 5 illustrates that the presence and orientation of homologous ORFs in AVIA and AVIL. The scale at the top of the Figure 1 is in kilobasepairs. Solid black arrows depict the relative positions of the individual ORFs in AVIA and AVIL
with the arrowhead indicating the orientation of each ORF; the corresponding four letter family designation is indicated to the right of each ORF. The empty arrows between the two loci highlight segments that contain a number' of ORFs whose relative order and orientation is identical between the two loci. The order and orientation of ORFs in AVIA is identical to that in AVIL with the exception of one ORF in the middle of the AVIL locus designated as a member of the OXRF family of oxidoreductases. The ORF designated OXRF in AVIL does not have a counterpart in the AVIA locus (as indicated by the 'X'). The ORFs in AVIL
whose four-letter protein family designation is underlined are not disclosed in the Streptomyces viridochromogenes Tu57 avilamycin A biosynthetic gene cluster in the GenBank nucleotide accession AF333038. Using the compositions and methods of the present invention, we have now identified additional ORFs at the 3' end of the AVIL locus. The sequence of the ORFs in AVIL corresponding to proteins considered designated HOXG and UNKU appear to be disrupted by frameshifts. It is unclear whether these frameshifts reflect real perturbations of the ORFs (rendering them inactive) or if they are due to sequencing errors. We have detected portions of the AVIL UNKU ORF in the region in which three small ORFs (designated UNIO) had earlier been reported. We believe the presence of multiple frameshifts in the region corresponding to the UNKU ORF of AVIL may have resulted in the three earlier UNIO ORFs report based on the wrong strand.
Example 3: Genes indicative of orthosomycin biosynthetic loci:
Certain genes in orthosomycin loci are associated with structural features that are common to all classes of orthosomycin oligosacharides and indicative of orthosomycin biosynthetic loci. Table IV lists the protein families and their respective ORF numbers in four orthosomycin loci, namely EVER (described in Example 1 ); AVIA (described in Example 2); EVEA {described in Example 10);
and AVIL (described in Weitnauer et al. 2001 Chemistry and Biology Vol. 8, pp. 569-581 ). Each row in Table IV relates to a single protein family and identifies ORFs considered to be members of that protein family in the respective loci. The protein family is identified by its four-letter designation (see Table I). Thus, if a member of a particular protein family is found in one or more of EVEA, EVER, AVIA and AVIL
those members will be listed in the same row. The symbols ## and ### and lower-case family designations for locus AVIL specify those ORFs that are not disclosed in the Streptomyces viridochromogens Tu57 amilamycin A locus in GenBank nucleotide accession AF333038 but that are now identified using the compositions and methods of the present invention. EVER and EVEA are examples of everninomicin-type orthosomycins; while AVIA and AV1L loci are examples of avilamycin-type orthosomycins.
The protein families in these four orthosomycin biosynthetic loci can be categorized into 5 groups based on their distribution: i) seventf~en (17) families that are common among orthosomycin loci but also found in non-orthosomycin loci and therefore are not considered specific to orthosomycin; ii) seventeen (17) families that are common to most orthosomycin loci and are considered diagnostic of orthosomycin loci, as described in more detail below; iii) six (6} families that are diagnostic of avilamycin-type orthosomycin loci, particularly when found together with members of the protein families of group (ii) as described in more detail in Example 5; iv) nine (9) families that are considered diagnostic of everninomicin-type orthosomycin loci, particularly when found together with members of the protein families of group (ii), as described in more detail in Example 4; and v) a group of 12 miscellaneous families (not including those designated as 'UNIO' in the AVIL locus) that are not present in all four orthosomycin loci and/or not unique to orthosomycin loci. Using the compositions and methods of the invention, the region of the strand opposite AVIL ORFs 2, 3, and 4 as disclosed in Weitnauer et al. 2001 Chemistry and Biology Vol. 8, pp. 569-581 exhibits homology to the AVIA
member of protein family UNKU. Accordingly, it is believed that AVIL ORFs 2, 3, and 4 as disclosed in Weitnauer et a1_ may be incorrect conceptual translations and are designated as UNIQ in Table IV.

Table IV
EVEA EVER AVIA AVIL

ORF FAMILY ORF FAMILY ORF FAMILY ORF FAMILY
# # # #

i ORFs not necessaril uni ue to, but common to orthosom cin loci 22 DEPE 39 DEPE 39 DEPE ### de a 25 DEPG 10 DEPG 42 DEPG ### de 34 DEPH 41 _DEPH 4 DEPH 6 _DEPH
28 DEPI 34 DEPI 35 DEPI ### de i 23 DHYA 38 DHYA 40 DHYA ### dh a 7 GTFA 16 _GTFA 20 GTFA 22 GTFA
20 HOXM 20 HOXM 36 HOXM ### hoxm 44 MTFA 11 _MTFA 38 MTFA__ ### mtfa 24 OXRA 37 OXRA 41 OXRA ### oxra 16 PKSO 32 PKSO 14 PKSO '16 PKSO

ii ORFS
which are dia nostic of orthosom cin loci 31 GTFH 8 _ G_TF_H 32 _GTFH GTFH
43 HOXG 12 HOXG 37 _35 hox HOXG
##

12 MTFF 5 _MTFF 25 MTFF MTFF

MTLA

30 OXRW 26 O_XRW 33 OXR_W oxrw 6 OXRW 31 OXRX 19 _ ## OXRW
O_XRW

~'.0 29 PHOD 33 PHOD 34 PHOD hod ##

32 UNKU 25 UNKU 2 UNKU unku 13 MTIA 1 ## MTlA
MTIA

Table IV (cont.'d) EVEA EVER AVIA AVIL

ORF FAMILY ORF FAMILY ORF FAMILY ORF FAMILY
# # # #

iii ORFs not necessaril uni ue to, but common to avilam cin-t a loci 10 UNBR ~12 UNBR

iv ORFs not necessaril uni ue to, but common to everninomicin-t a loci v Miscellaneous ORFs resent in various orthosom cin loci 23 OXRT 30 OXRT ,33 OXRT

_ 49 REGL

Group (ii) of Table IV, represent seventeen (17) protein families considered diagnostic of orthosomycin loci, namely GTFE, GTFG, GTFH, HOXG, MTFD, MTFE, MTFF, MTLA, OXRV, OXRW, OXRW, UNAJ, PHOD, UEVA, UNKU, UEVB, and MTIA. The 17 protein families includes two families designated OXRW, although in EVER one of the OXRW proteins is fused with a member of the UNAJ
protein family and is therefore designated OXRX. Hence, EVER contains a single freestanding member of OXRW and contains no freestanding member of UNAJ.
The UEVB, and MTIA families are not present in the EVEA locus, but are nonetheless considered to be diagnostic of orthosomycin loci as they are found in the other three orthosomycin loci and no known homologues have been described elsewhere to date. The seventeen protein families that are considered diagnostic of orthosomycin loci are those families for which no homologues exist that are naturally involved in the biosynthesis of compounds other than orthosomycins and/or no homologues exist that are in a context other than an orthosomycin biosynthetic locus. However, an orthosomycin biosynthetic locus is not necessarily expected to include a member of each of the seventeen protein families considered diagnostic of orthosomycin loci.
The following members of the seventeen protein families considered diagnostic of orthosomycin biosynthetic loci are identified in EVEA, EVER, AVIA
and AVIL: GTFE (AVIA ORF 31, SEQ ID NO: 51; AVIL accession no. AAK83192;
EVER ORF 24, SEQ ID NO: 53; EVEA ORF 33, SEQ ID NO: 55); GTFG (AVIA
ORF 5, SEQ ID NO: 57; AVIL accession no. AAK83170; EVER ORF 35, SEQ ID
NO: 59; EVEA ORF 27, SEQ ID NO: 61); GTFH (AVIA ORF 3c!, SEO ID NO: 63;
AVIL accession no. AAK83193; EVER ORF 8, SEQ ID NO: 65; EVEA ORF 31, SEQ ID NO: 67); HOXG (AVIA ORF 37, SEQ ID NO: 69; EVER ORF 12, SEQ ID
NO: 71; EVEA ORF 43; SEQ ID NO: 73); MTFD (AVIA ORF 22, SEQ ID NO: 99;
AVIL accession no. AAK83184; EVER ORF 15, SECT ID NO: 101; EVEA ORF 8 , SEQ ID NO: 103), MTFE (AVIA ORF 23, SEQ ID NO: 105; AVIL accession no.
AAK83186; EVER ORF 19, SEQ ID NO: 107; EVEA ORF 10, SEQ ID NO: 109), MTFF (AVIA ORF 25, SEQ ID NO: 111; AVIL accession no. AAK83188; EVER
ORF 5, SEQ ID NO: 113; EVER ORF 12, SEQ fD NO: 115); MTLA (AVIA ORF 3, SEO ID NO: 127; AVIL accession no. AAG32067; EVER ORF ~40, SEQ ID NO:
129; EVEA ORF 45, SEQ ID NO: 131 ); MTIA (AVIA ORF 1, SEQ ID NO: 123;

AVIL accession no. AAG32066; EVER ORF 13, SEQ ID NO: 125); OXRV (AVIA
ORF 24, SEO ID NO: 153; AVIL accession no. AAK83187; EVER ORF 18, SEO ID
NO: 155; EVEA ORF 11, SEO ID NO: 157); OXRW (AVIA ORF 33, SEQ ID NO:
159; EVER ORF 26, SEQ ID NO: 161; EVEA ORF 30, SEO ID NO: 163); OXRW
(AVIA ORF 19, SEQ ID NO: 167; EVEA ORF 6, SEQ ID NO: 173; AVIL accession no. AAK83181 ), in EVER the second member of the OXRW family is fused with a protein from the UNAJ family and the combined polypeptide is designated as OXRX (EVER ORF 31, SEO ID NO: 169); PHOD (AVIA ORF 34, SEQ ID NO: 175;
EVER ORF 33, SEO ID NO: 177; EVEA ORF 29, SEO ID NO: 179); UNAJ (AVIA
ORF 18, SEQ ID NO: 165; EVEA ORF 5, SEQ ID NO: 171), in EVER the UNAJ
protein is fused with the second member of the OXRW family and the combined polypeptide is designated as OXRX (EVER ORF 31, SEO ID NO: 169); UEVA
(AVIA ORF 26, SEQ ID NO: 193; AVIL accession no. AAK83189; EVER ORF 17, SEO ID NO: 195; EVEA ORF 14, SEO ID NO: 197); UEVB (A\/IA ORF 9, SEQ ID
NO: 199; AVIL accession no. AAK83174; EVER ORF 9, SEO ID NO: 201; and UNKU (AVIA ORF 2, SEQ ID NO: 203; EVER ORF 25, SEO ID NO: 205; EVEA
ORF 32, SEO ID NO: 207).
The homologues from the four orthosomycin loci belonging to each of the seventeen families diagnostic of orthosomycin loci were compared by BLAST. The percent identity and percent similarity of the amino acid sequences are reported in the sixteen tables identified as Tables V to XX. Values in Tables V to XX are expressed as % identity (%similarity) following a pairwise blast 2 sequences;
n/a, comparison is not applicable since UNAJ and OXRW are non homologous ORFs;
XXX, denotes that a family homolog is not present in the locus.. AVIL ORFs with an asterisk are present in the publicly available nucleotide sequence of the avilamycin locus (as shown in Figure 10) but were not submitted to the Ge~nBank protein database; homology values listed for such ORFs were obtained with tblastn using the default settings and the corresponding AVIA homologues as queries. "Refer to figure" denotes those avilamycin ORFs which are segmented, presumably because of frameshifts in the publicly available sequence, see the corresponding TBLASTN
alignments below.

0 0 ~ja o 0 0 n v E

E , .

.

I

~~

O

x d o o ~ _ o o a a ~ ~o~o ..
I a x a M ~ ~t ~ O r ~t > r~ n r. w r~ r~ do UJo00~ l>11000 d' ~ M O O n ca cp ca cp cp r.
a o o»r a iv a> v W ao v t~ W co r~ 00 l>11 0 0 o IL>IJ o 0 N M M N .- I~
I'~ f~ CD f~ 1~. I~

m ao 0 o jao ~ o a o o ~
o N O M 1~ I O M ~ N f~. _ ~ UJ
p O ~ ~ h D
C

j (fl L

o u, I ~ a _ ~

v \ W ' v W v! o 0 0 0 0 0 o ~ ~ o i a ~ ~ a ~ ' a ' w y v ~ m ~ ~ ~ v s ~

a Y ' ' j o o a a o o ~ ~ a o o o C O O f~ a1 O N ~ , t~N ) C
C I
O

O O c0~ G O I~c0 O o0f~.O O
I
C

G W
1l t0 i N
, T

~ a1 p ~ O
I

0 ~ _ O ~ O . O

O O O O
l a J a ~ x QJ a o~ = aJ ao c - - ILIIlJ -_ ItJLLJ t1JJ
X - > >< -> ! >> I
> > LI

> > . > > ( >
a a ~,w aa aa >
~

~ ~,~, ~ ~, ,~

_97_ oc ~ ~ o w o 4 in ~ o w r~ r. 00 w o 0 0 co o~
s s o~ 0 0 0 '. a o 0 J I~ ~t~~ J M N i * .- (Ot~-0 c0 CO i m 0 0 J

> 0 > c a 0 0~ a o 0 0 ~ o 0 0 a a E E E E

E E~ E E

>, >. >. >

i Q > ~ O

H ,,, Z

X X'~

v s sO W O'', o 0 o a o o o o Q g.~ ~~ a N . ~ 0J .-t c l - I Q n o 00c~c0 ~ 00CO t Q>
> .. > , ~

>

0 0 0 0 0 0 0 0 o Q.
Q y o \ = Q y o 0 01a o ~ o C
oNO ~ ~ ~ oro ~ ~ 0 o~o c~D C~O
IM

, , ~

l9 O ( >. >. >. >

~ p' O O O

i p ~ I
i O ~ O
' ' _ . I
' QJ ~ ~~i ~ QJ 4 ~', aJ Q ~

' ,. ~ ~ >

aa w ~~,; aa ~ ~ ~ aa ~ w .a .n a ' ~ a _ i H
i 3~~1-11C/~ CA 02375097 2002-03-28 _98_ O
O
a E
v E
ca Y
Z
d t +.
a O
O
E
Z
X
X
v ca f-_99_ 0 0 0 0 o a x o0 0o r ~ o c~
> X '~ = °° C °° o~
u~ o 0 0 0 0 N ~ ~t N c~ I~
r~ ca r~ co r r.
y n oo co m v caca caco Z c o c c O o o O

a ' ~ r a ~

~ ~

rN,. Z C C C

o oo - d0 N

d J

a ~

.R ~ a7~ ~. N
_f~a0 x o C G~ C
~ o W w ~

N CO~

x o~ o t ~

a c~ N oN ~

_ _ __ _ ~ ~

o 0 O = 0 0 0~

O

O a ~ o o~

_ ~

~ N ~n ~

O ~ o Co C
o ~

~a ~ a~!aoxc o~ o ~o~~o a~a~J~Ja~a ac '>> > >'~>'~
a a~
w a a ur Without intending to be limited to any particular mechanism of action or biosynthetic scheme, the protein families which are found in all orthosomycin biosynthetic loci can explain formation of structural elements that define orthosomycin compounds. Figure 2 shows one scheme for them biosynthesis of dichloroisoeverninic acid from acetyl CoA. In the scheme of Fiigure 2, the KASA
enzyme (a putative ketoacyl synthase) is a priming enzyme which loads acetyl CoA
onto the PKSO (a putative orsellinic acid synthase). MFTA (similar to aromatic O-methyl transferases) methylates orsellinic acid, and HOXM (similar to non-heme hydroxylase/halogenases) chlorinates isoeverninic acid. Member of other protein families present in all orthosomycin loci may also be involved in the biosynthesis of dichloroisoeverninic acid moiety (or moieties) of orthosomycins.
Figure 7 shows two schemes (A and B) for orthoester formation by the two OXRW's and OXRV, all of which have sequence similarity to iron alpha-ketoglutaric acid dependent enzymes. Scheme A is distinguished from scheme B in that the former does not implicate the action of a glycosyltransferase enzyme prior to the oxidative C-O coupling reaction. Similar oxidative C-O coupling has been observed in other iron alpha-ketoglutaric acid dependent enzymes such as clavaminic acid synthase (Salowe SP, Marsh EN, Townsend, CA, Biochemistry 29(27): 6499-6508). Members of other protein families present in all orthosomycin loci may also be involved in the formation of the orthoester linkages) of orthosomycins.
Example 4: Genes specific to everninomicin-type orthosomycin biosynthetic loci:
Protein families DATC, DEPF, EPIM, GTFA, MTFG, MTFV, OXBN, OXCO, and UNBB (group (iv) of Table IV) are considered diagnostic of everninomicin-type orthosomycin biosynthetic loci and everninomicin-type orthosomycin producers, particularly when a member of at least one, preferably 2, more preferably 3, still more preferably 4, still more preferably 5 and most preferably 6 or more of the nine protein families is found together with a member of one, preferably 2, more preferably 3, still more preferably 4, still more preferably 6, and most preferably 8 or more members of the seventeen orthosomycin specific protein families listed in group (ii) of Table IV. DATC, DEPF, EPIM, GTFA, MTFV, OXBN, and OXCO are not unique to everninomicin-type orthosomycin loci as close relatives are associated with secondary metabalism unrelated to orthosomycin biosynthesis.
MTFG and UNBB represent two families that are considered to be unique to everninomicin-type orthosomycin loci as no homologues exist that are naturally involved in the biosynthesis of compounds other than everninc~micin-type orthosomycins and/or no homologues exist that are in a context other than an everninomicin-type orthosomycin biosynthetic locus. An everninomicin-type orthosomycin biosynthetic locus is not expected to necessarily contain a member of the nine protein families considered diagnostic of everninomicin-type orthosomycin loci.
Homologues of the nine protein families diagnostic of everninomicin-type orthosomycin loci and present in EVER and EVEA were compared by Blast analysis with the default parameters. The percent identity and percent similarity of the amino acid sequences are reported in Table XXI.
Table XXI
identi~ % similarit DATC 78% 84%

DEPF 64% 74%

EPIM 74% 84%

GTFA 60% 74%

MTFG 70% 75%

MTFV 81 % 85%

OXBN 80% 86%

OXCO 77% 85%

UNBB 71 % 82%

Without intending to be limited to any particular mechanism or biosynthetic scheme, the protein families diagnostic of everninomicin-type orthosomycin biosynthetic loci can explain formation of structural elements that characterize everninomicin compounds. Figure 8 shows one route for the formation of the nitrosugar residue of everninomicin. In Figure 8 the amine oxidation reactions are catalyzed sequentially by OXBN, with sequence similarity to flavin-dependent monooxygenases.

Example 5: Genes specific to avilamycin-type orthosomycin biosynthetic loci:
Protein families ABCD, DEPN, MEMD, REBU, UNAI and UNBR (group (iii) of Table IV) are considered to be diagnostic of avilamycin-type orthosomycin, particularly when a member of one, preferably 2, more preferably 3, still more preferably 4 or more of the six protein families diagnostic of an avilamycin-type orthosomycin biosynthetic locus is found together with a member of one, preferably two, more preferably 4, still more preferably 6, still more preferably 8, and most preferably 10 or more members of the seventeen orthosomycin specific protein families listed in group (ii) of Table IV.
The six protein families considered diagnostic of avilamycin-type orthosomycin biosynthetic are ABCD (AVIA ORF 27, SEQ ID NO: 245; AVIL
accession no. AAG32068); DEPN (AVIA ORF 21, SEQ ID NO: 247; AVIL
accession no. AAK83183); MEMD (AVIA ORF 28, SEQ ID NO: 249; AVIL
accession no. AAG32069); REBU (AVIA ORF 7, SEO ID NO: 251; AVIL accession no. AAK83172); UNAI (AVIA ORF 6, SEQ ID NO: 253; AVIL accession no.
AAK83171 ) and UNBR (AVIA ORF 10, SEO ID NO: 255; AVIL accession no.
AAK83175). ABCD, DEPN, MEMD, and UNAI are not unique to avilamycin-type orthosomycin loci as close relatives of their protein families exist in secondary metabolism unrelated to orthosomycin biosynthesis. REBU and UNBR members represent two families that are considered to be unique to avilamycin-type orthosomycin loci as no homologues exist that are naturally involved in the biosynthesis of compounds other than avilamycin-type orthosomycins and/or no homologues exist that are in a context other than an avilamycin-type orthosomycin biosynthetic locus. An avilamycin-type orthosomycin is not expected to necessarily include a member of each of the six protein families considered diagnostic of orthosomycin loci.
Homologues of the six families diagnostic of avilamycin-type orthosomycin loci and present in AVIA and AVIL were compared by Blast analysis. The percent identity and percent similarity of the amino acid sequences are reported in Table XXII.

3001-'11 CA

Table XXII
identit % similarit ABCD 89% 93%
DEPN 89% 94%
MEMD 94% 95%

REBU 92% 93%

UNAI 86a 92%

UNBR 906 94%

AVIA and AVIL both contain a two-component transport system that is not found in everninomicin-type loci. The ABCD and MEMD proteins in AVIA have been described as an ATP-binding transporter (AviABCI) and a transmembrane transporter (AviABClI), respectively, and are involved in conferring resistance of S.
viridochromogenes to avilamycin A (Weitnauer et al., 2001, Antimicrob. Agents Chemother., Vol. 45, pp. 690-695). Based on the high sequence homology, corresponding ORF 27 (SEO ID NO: 245) and ORF 28 (SEQ ID NO: 249) in the AVIA are believed to carry out analogous functions. These proteins are also similar to the DrrA and DrrB proteins of S. peucetius involved in conferring resistance of that organism to daunorubicin and doxorubicin. l-he ABCD
protein, the AviABCI protein and the DrrA proteins are similar to proteins encoded by the mdr genes of mammalian tumor cells, which confer resistance on these cells to many structurally unrelated chemotherapeutic agents. ABCD and MEMD act jointly to confer resistance to avilamycin-type orthosomycin oligosaccharides by a mechanism analogous to the antiport mechanism established for mammalian tumor cells that contain amplified or overexpressed mdr genes (Guilfoile et al., 1991, Proc. Natl. Acad. Sci. USA, Vol. 88, pp. 8553-8557). AVIA and AVIL both contain a dehydratase/epimerase that is designated as 'DEPN' and which is distinct from the dehydratase/epimerase enzymes in the everninomicin-type orthosomycin loci.
AVIA and AVIL both contain an ORF of unknown function designated as 'UNAI' for which no homologue is present in the everninomicin-type orthosomycin loci, but for which at least one homologue exists, hypothetical protein SCF55.28c of Streptomyces coelicolor A3(2) Example 6: Design of diagnostic nucleic acid sequences for identifying orthosomycin genes by hybridization or by PCR amplification:
Three of the seventeen families of proteins common to orthosomycin oligosaccharide biosynthetic loci were used to design oligonucleotides that may be used either as hybridization probes or as PCR primers for the purpose of identifying orthosomycin biosynthetic loci in other organisms. The three families of proteins that were used in this example include UEVA, UEVB, and HOXG.
The nucleotide sequences of the UEVA, UEVB, and HOXG protein families from EVER, namely EVER ORFs 17, 9, and 12 (SEO ID NOS: 195, 201 and 71 respectively), and from AVIA, namely AVIA ORFs 26, 9, and 3'7 (SEQ ID NOS:
193, 199 and 69 respectively) were aligned by pairwise comparison using 'BLAST
2 Sequences', a BLAST-based tool for aligning two protein or nucleotide sequences (Tatiana et al. 1999 FEMS Microbiol Lett. 174:247-250). Parameters were all default settings except that filtering (masking of segments of the query sequence that have low compositional complexity) was not applied.
The alignments of the EVER and AVIL sequences for their UEVA, UEVB
and HOXG proteins are shown below in Tables XXIII, XXIV and XXV respectively.
Table XXIII is a nucleic acid alignment of the UEVA protein family, comparing AVIA
ORF 26 (SEQ ID NO: 193) and EVER ORF 17 (SEQ ID NO: 1!~5). Table XXIV is a nucleic acid alignment of the UEVB protein family, comparing AVIA ORF 9 (SEO
ID
NO: 199) and EVER ORF 9 (SEQ ID NO: 201 ). Table XXV is a nucleic acid alignment of the HOXG protein family, comparing AVIA ORF 37 (SEQ ID NO: 69) and EVER ORF 12 (SEQ ID NO: 71 ). Several well-conserved regions of the alignment that served as a basis for designing diagnostic oligonucleotides are highlighted ('>' is used to indicate oligonucleotides oriented in i:he 'sense' direction;
'<' is used to indicate oligonucleotides oriented in the 'antisense' direction; and '~' is used to indicate a control oligonucleotide that has the same sequence as one strand but with inverted polarity and hence is unable to hybridize to either strand, thus serving as a negative control).

TABLE XXIII:

»» » » » » » » »» » » »
AVI:A_ORF26: 30 gtgcgtgct.gccgtggatccacatgtgcgcctccatcgacggcgtctacggccggtgctg II III II III'. II II I II'',II'!',''IIIIII !III'IIIIIIII''IIIIII
EVER ORF17: 57 gtgtgtgctgccgtggatccacctctgcgcctccatcgacggcgtc:tacggccggtgctg >
AVIA_ORF26: 90 cgtggacqactr_catgtaccacaacgagctgtacgagtccgtgga<vgagccggtcttc:aa Il II IIII 111111 1;' lilll V III 'I IIIIIII II!
EVER ORF17: 117 cgtcgacgactcgatgtaccacacggagctgt.acgacgagcaggaggagccggcgttcgc »» » » »»» » » »» » »
AVIA_ORF26: 150 gctcaacgccgacgccgtcggctgcgcgcccaactcccgctacgcc:aaggacaacccgga II! II ~ IIII~I IIII III I II I I II IIIII'III?IIIIIIIII
EVER_ORF17: 177 gctgaacgacgacgcgatcggttgctccccgggctcgcggtacgc.~aaggacaacccgga AVIA ORF26: 210 cgaggtacgcgggctgacggaggcgttcaacagccccaacatgcggcgcacccggctgaa I I! II ! III!II VIII III!IIIII,!I !!Ii IIIIIIIII
EVER ORF17: 237 ccgcgtgat:gggcatccgggaggccttcaacagccccaacatgaagcggacccggctggc AVIA_ORF26: 270 gatgctggccggcgagcgggtgtccgcgtgcgactactgctaccaccgcgaggaccgggg !1111 I 11'!111'1 II' !11111 Illlli'II 'II 1111111 II
EVER_ORF17: 297 gatgctcggtggcgagcgcgtggaggcgtgcaagtactgctacttu:cgggaggaccacgg AVIA_ORF26: 330 cgcgacctcgtaccggcagagcatcaacgagcggttcgccgacacggtggacttcgccga llli IIII'I ' I I I!I III
EVER ORF17: 357 cgcccagtcctaccggcagaacgtcaaccgccggttccaccaggagtacgacctcgat:gc AVIA_ORF26: 390 cctggccgaacggaccgcccccgacggctcgttcgacgagttcccgttcttcctggacat II II!I ! I Il ! '11!'I I IIII I'~IIIIII i III i1 V III
EVER_ORF17: 417 gctcgccgcccgtaccgccgcggacggcacggtcgaggagttcccgttctttctcgacat AVIA_ORF26: 450 ccggttcggcaacacctgcaacctgcggtgcgtgatgtgcgcctacccggtcagctccgg I II '111'1 III II'IIII'IIIIII 'illl' I'lil lll'I ',I I'' EVER_ORF17: 477 caggttcggcaa~~ctctgc:aacctgcggtgcgtcatgtgcacct:acccggtgagttcctc 536 «««««««««««««<
AVIA ORF26: 510 ctggggcgccaagaagcggccgtcgtggtcgtccgcggtgatcgacccgtaccgcgagga IIIIIIIII Ill I II II IIIIIIIIIII''IIIII IIIII'IIIIIIIIIII II
EVER ORF17: 537 ctggggcgccaagcaacgcccgtcgtggtcgtccgcggtcatcgaccc:gtaccgcgac:ga AVIA_ORF26: 570 cgaggagctgtgggcgacgct:ccgcgagaacgcccacctcat:ccgccggctgtacttcgc III II! III ~II!,IIII II III I I 11111 II''' lilllllll''.II
EVER_ORF17: 597 cgacgagttgtgggcgacgct:gcgggagaatgcgcacctgatccgc:aagctgtacttcgc AVIA_ORF26: 630 cggcggtgaaccgttcatgcagccgggccacttcgcgatgctcgacctgctg:atcgagac VIII IIII' III II I V III Ii V III 111111 I I III I II II
EVER ORF17: 657 gggcggcgaacc~ttcctgcaaccgggtcatttcgccatgctcgagctgctcgtggaaac AVIA_ORF26: 690 cggcaacgcgggcaacgtcgacatcgtctacaactccaacctcacggtgctcccggagaa I'I II''III IIIIII''III'~I' II''i,lll I II !I I II!III II
EVER_ORF17: 717 cgggaacgcgc<~caacgtcgacatccagtacaactcgaacctgaccgtctccccggacaa <.:««:«««<:<
AVIA_ORF26: 750 ggtcttcgaccgcttcccgcacttcaagagcgtcgggat:cggcgcctcctgcgacggcgt I I I I I I I !II!IIIilllllll II I'II' II IIIIIIIIIIIIII
EVER ORF17: 777 cgcgataaagctcctacggcacttcaagagcgtgggcatcggggct_tcctgc:gacggcqt ««««««<
AVIA ORF26: 810 cggcgaggtcttcgagcgcatccggcagcccgcgaaatgggacgtgttcgtcgccaacgt IIIII,II~I '1111 II'',,III II I I III'' I V III V III I
EVER ORF17: 837 cggcgaggtgttcgaatacatccgggccggcgggaagtgggcggacttc:gtg<Iccaatct AVIA_ORF26: 870 ccgccgggccaagaccgaggtgaacctctggctccaggtcgcgccccagcggctcaacct 9:?9 I !I I I IIII I II 'IIIIIII'~III I II '11111 V III
EVER ORF17: 897 gcgcctgctccggtccgacttcgacgtctggctccaggtgtccccgcagcggcacaacct AVIA ORF26: 930 gtgggggctgcgggacct:gctc3cacttcgcccgcgaggagggcctcgacgcggacctcgc IIII III I I II''I Ill II V III II I i I''''Ill I
EVER_ORF17: 957 gtgggacctgcgcaacgt:gctc:gagttcgcgcgtaccgaggggctggaggtgqacctggc AVIA_ORF26: 990 caacgtcgtgcagtggcccgacgactactccgtcgccaacctcccggacgagqagaagcg Iil ~l II' ;111;1 ' I ~I Ili 1l I '1l II'IIIIII
EVER ORF17: 1017 caacgtggtgcagtggccgcaggatctctcggtcqccagcctgtr_ggccgagqagaaggc AVIA_ORF26: 1050 gcgggcgaccgtcgagct:ggcc:gacctggccgagt:ggtgcgacagcctggact:gggccaa 1109 III II II''' 111111 '' 111111 II 111111! I II I illll III
EVER_ORF17: 1077 gcgcgccacccaggagctgacc~gacctgatcgcctggtgcgccgagctcggct:gggacaa 1136 AVIA_ORF26: 1110 gcccgc 1115 EVER_ORF17: 1137 gcclgc 1142 Identities = 840/1086 (77$1 TABLE XXIV:
AVIA_ORF 9: 2 tgaaaatcgaggtgctccaaccc~acctgcaacctggacacggtgcgggacggtc:gcggcg 4~ II I '11'111' I'' II III ~IIIII 111'111 I I i!,III' I'' IIII
F.VER_ORF 9: 2 tgaagatcgaggtcctgcagccgagctgcaacctggacaccgtccgggacggcc:ggggcg AVIA_ORF 9: 62 gaattttcacctgggttcccccggagcccatcctggaattcaatatgctgcaccagtacc '1l !II III I 'I V III 11'1111' II''','Il ' I I Il I'I
EVER ORF 9: 62 gcatcttcacctgggtgccaccagagccgatcctggagttcaacctcatcaccatgcacc 121.
UEVfl-Sl » » » » » » » » » » » >
~JO AVIA_ORF 9: 122 cgggaaaggtgcgcggtctgcactaccacccgcacttcgtcgaatacctgctcttcgtcg 181 I 'I IIII II II IIIII!I''~I'III 11'111';11 ''Ill 111 1l 1111111 EVER ORF 9: 122 ccggcaaggtccgtgggctgcactaccacccgcacttcgtggaatacctgctgttcgtcg AVIA ORF 9: 182 agggctcgggcgtgctggtcaccaaggacgacgccgacgacccgaact_gcgagc~aagagt II II II IIII I Iill ~I '! Il~'III 'L1 III lil EVER ORF 9: 182 acggggagggggtgctggtgaccaaggacgatccggacgaccccgactgcccggaggagt 60 AVIA ORF 9: 242 tcatccacgtctcgcgcggcatctgcaccaggacgcccgcggggatcatgcacgccgtcc 301 III'IIIIII I '~I I' I 'I 'I ' I III I II I Illil III
EVER_ORF 9: 242 tcatccacgtcgcccgggggacgtgtacgcgcacgccctccggagtgatgcacgcggtct AVIA-ORF 9: 302 acgccatcacgccgctgacgttcatcgccatgctcaccaagccIctggga<:gagtgcgacc I il~ II ~I~Il~f~lll ~~ i EVER_ORF 9: 302 actcgatcacgtcgctgtccttcgtggccatgttgacccgaccgtgggacgagtgtgatc AVIA,-ORF 9: 362 cgccgctggtccaggtcgagccc7ctgccgcacaccct 398 I'II
EVER_ORF 9: 362 cgcccatcgtccaggtgcagccgctgccgcacaccct 399 .,."",.".
UE'JB CTLl Identities _: 314/397 (79~) TABLE XXV:
AVIA_ORF37: 16 ctgaccgag--gagcaggtcgacaggcttcgtctccgacggcttcgtccacctgc:cgggtg IIIII II I'Il~l ~~II i''I~III il~lll~ ll~ll ~I~~I I
EVER_ORF12: 4 ctgac--agccgagcagatcgagagcttcgtcgccgacggcttcgtccgggtgccgaacg 61 AVIA-ORF37: 74 cgttcccgggggagctcgccgac~gaggcgcgcgcc -ctgctgtggcggcagct:ggacat 137.
I
EVER_ORF12: 62 ctttccccgccgcgctcgccgccgagt-gc-cgcaatctgctctggaagcaactcgacgt AVIA ORF3'7: 132 gga-cccggacgac--c-c'gggc::acctggacgc-gggaggtggtccggctcggcrgtgcgc lBEi I III',I!II I I,IIII Illl i~l'I'II IIII
EVER ORF12: 120 ggatcccg-acgacagctc:g- acctggac-cagggaggt:cgCCCggcr_cggtctgcgg AVIA_ORF37: 187 gacgacgacgtctttcgtcc-gtgccgccaacaccccg-c--tgct-gcac:gccc~cctacg 241.
I III,IIII III I I I I'' IIIIII''~IIII I II I I I II I IIII
EVER_ORF12: 175 ggcgacgacgcgttcgtgcagaqc--gccaacaccccggcgttggtcg-aggc-c_f--tacg 225~
AVIA_ORF37: 242 accagctcgccggggagggccgctggc-agccgctg-accca-ggtcggcacgttcccgg I III Il'' III I V II I I'II ' IIIIII ~ I II ll~ II!IIIIII
EVER-ORF12: 230 accagctcgtcggtgcgggccgc3tggcga-cc:gctggac--atggtcgggacgttcccga » »» »»»» »»
AVIA_ORF37: 299 tgcggttccccgtg-acgaagcgg--ccggaggagaccgaggactacggctggcacatcg I ! VIII Ii' II I II Ills IIIII-III !II IIIIIIIIII
E:VER_ORF12: 287 tccgtttcc cggtc3gacc---g-c~gatccggaacaggccgaggactacggctggc:acatcg 343 »»»»>
AVIA_ORF37: 356 acgccagcttcctc-gcc -gagggcgccgacg<:c--ga-c cg -.---gact.ggtccg II''III'IIIIII III III: III I Ili~ I' I II IIIIII
EVER_ORF12: 344 acgccagcttcctcagccccgagggcgtggccgccatgagcagcggccaggactgggagg AVIA_ORF37: 404 gcgagctcgacg-t-gatcccgc;cggactacgacaagatcttccggta-caacgtgtg-g III .II! ''I I I IIIIIII+IIIIIII IIII!III I I !III II I
EVER_ORF12: 404 gcgagctcc--cgctcg-tgccgccggactacgaccggatcttcc--gcagc:aaccag-gtt 459 AVIA_ORF37: 460 tcccgcggccgggcgctgctgc~cctgctgctgttctccgacaccggcgag-gaggacgc i ~~I~I
EVER_ORF12: 460 tcgcgtggccgggccctgcagg;.gctgctcctctactccgacaccggcgagcg--tgacgc AVIA_ORF37: 519 gcccacgctgatccgcgtcggctcccacctggacgtaccgccgctgctggcaccgtacgg 57~3 II~~~,I~ III ,III ~~III i ~'Il~',~ ~' ~I~III~'I,~II ~ ~IIII
EVER ORF12: 519 gcccacgctgatccgggtcggttcgcacctggacgrgccgcccctgctggcgccctacgg AVIA_ORF37: 579 cgccgagggcacctacctcfgaggcc g--gggaggtgggacg-ggaccggccgct-----ga 63;
~illll~ll ~I~Ill~ill ~~I~ ~~~ I I
EVER ORF12: 579 cgccgaggggacctacct --cc7cctgccgcgacgtggg-cgcggaccgccccctcgcga 63~!
HOXG- AS"1 «««<:««««««««««
AVIA_ORF37: 632 -ggtccgcga-cgggcaacfgccc7gg--gacgcctacctctgccaccccttcctgc_ftgcaca 68F3 II ~~~' I' I~~ I~~ ~~I~III
fiVER_ORF12: 635 tgg-cc --accgggc--cfggccrggcgacgcctacctctgccatccgttcctggtgcaca 688 AVIA_ORF37: 689 cgccggtcgccaacaccggcgtcc--gcccgcgcttcatggcccagccgaacct-gctgc-74'_:
Ii ~II~I~I~III I II~~ I~ ~I~~~~~Il'!~~
EVER ORF12: 689 r_gccgatcaccaacaccgcfc-accagcccccggttcatggcc,~agc----cctcgctgca 743 AVIA_ORF37: 746 -ccgtggggc~-agctcgaactcc~accggc-ccgacggccggtacacccccgtcgagcggg 80<:
II'' III I I'I~ II III II I ~ '~ I ~I' II 'I~I,IIII~
E:VER_ORF12: 744 accgaccggcgagttcgacctggacc--gcgccgacgggcagtacgtcccggtcgagcggg AVIA ORF37: 803 ccg-tgcgccggg 814 I
EVER ORF12: 803 -cgat--ccg<fg 811 Identities == 653/853 (76~), Gaps = 99/853 (11~) The oligonucleotide sequences listed below on Table XXVI were supplied by InvitrogenT"~. Where necessary, degenerate oligonucletides were designed in which "S" denotes a base in the oligonucleotide that consists of an approximately equimolar mixture of G and C, and in which "R" denotes a base' in the oligonucleotide that consists of an approximately equimolar mixture of G and A.
The oligonucleotides may be used as hybridization probes to identify orthosomycin genes as further described in Example 7. The oligonucleotides may also be used as PCR primers, as described in Example 8, to amplify portions of orthosomycin genes either from isolated DNA (from pure cultures, mixed cultures, or environmental samples) or directly from crude cell mass or environmental sample.
As further members of each gene family disclosed in this application are identified, those skilled in the art will be able to improve and refine diagnostic oligonucleotides for identifying and isolating orthosomycin genes, for example by using appropriate tools capable of carrying out multiple sequence alignments, for example Clustal (Higgins.and Sharp (1988) Gene Vol. 73 pp.237-244).
Table XXVI.
Oligonuc:leotideSequence (5'->3') length (nt) UEVA-AS1 GTCGATSACCGCGGACGACCACGACGG~:7 UEVB-CTLl* TCCCACACGCCGTCGCCGASSTGG 24 HOXG-Sl ACTACGGCTGGCACATCGACGCCAGCTc.7 * This oligonucleotide serves as a negative control in the hybridization experiments.
Example 7: Use of diagnostic nucleic acid sequences for identifying orthosomycin genes by hybridization:
The microorganism Micromonospora carbonacea var. africana NRRL 15099 was obtained from the Agriculture Research Service Culture Collection of the United States Department of Agriculture. This organism was propagated on N-Z
amine agar medium (per liter of water: 10.0 g glucose, 20 g soluble starch, 5.0 g yeast extract, 5.0 g N-Z Amine Type A (Sigma C0626), 1.0 g reagent grade CaC03, 15.0 g agar) at 28 degrees Celsius for several days. For isolation of high molecular weight genomic DNA, cell mass from three freshly grown, near confluent 100 mm petri dishes was used. The cell mass was collected by gentle scraping with a plastic spatula. Residual agar medium was removed by repeated washes with STE buffer (75 mM NaCI; 20 mM Tris-HCI, pH 8.0; 25 mM EDTA). High molecular weight DNA was isolated by established protocols (Kieser et al., Practical Streptomyces Genetics, The John Innes Foundation, 2000) and its integrity was verified by field inversion gel electrophoresis (FIGE) using the preset program number 6 of the FIGS MAPPERT"" power supply (BIORAD).
A Micromonospora carbonacea var. africana genomic DNA cosmid library was prepared using the SuperCos-1 cosmid vector (StratageneT~"). The cosmid arms were prepared as specified by the manufacturer. The high molecular weight 3001-'11 CA

DNA was subjected to partial digestion at 37 degrees Celsius with approximately one unit of Sau3Al restriction enzyme (New England Biolabs) per 100 micrograms of DNA in the buffer supplied by the manufacturer. At various timepoints, aliquots of the digestion were transferred to new microfuge tubes and the enzyme was inactivated by adding a final concentration of 10 mM EDTA and 0.1 % SDS.
Aliquots judged by FIGE analysis to contain a significant fraction of DNA in the desired size range (30-50kb) were pooled, extracted with phenol/chloroform (1:1 vol:vol), and pelletted by ethanol precipitation. The 5' ends of Sa~l3AI DNA
fragments were dephosphorylated using alkaline phosphatase (Roche) according to the manufacturer's specifications at 37 degrees Celcius for 30 min. The phosphatase was heat inactivated at 70 degrees Celcius for 10 min and the DNA
was extracted with phenol/chloroform (1:1 vol:vol), pelletted by ethanol precipitation, and resuspended in sterile water. The dephosphorylated Saif3Al DNA fragments were then ligated overnight at room temperature to the SuperCos-cosmid arms in a reaction containing approximately four-fold molar excess SuperCos-1 cosmid arms. The ligation products were packaged using Gigapack~
III XL packaging extracts (StratageneT"") according to the manufacturer's specifications. A library of 864 isolated cosmid clones was picked and inoculated into nine 96-well microtiter plates containing LB broth (per liter of water:
10.0 g NaCI; 10.0 g tryptone; 5.0 g yeast extract) which were grown overnight and then adjusted to contain a final concentration of 25% glycerol. These microtiter plates were stored at -80 degrees Celcius and served as glycerol stocks. Duplicate microtiter plates were arrayed onto nylon membranes as follows. Cultures grown on microtiter plates were concentrated by pelleting and resuspending in a small volume of LB broth. A 3 X 3 grid of 96-pins per grid was spotted onto nylon membranes. These membranes representing the complete cosmid library were then layered onto LB agar and incubated ovenight at 37 degrees Celcius to allow colonies to grow. The membranes were layered onto filter paper pre-soaked with 0.5 N NaOH/1.5 M NaCI for 10 min to denature the DNA and then neutralized by transferring onto filter paper pre-soaked with 0.5 M Tris (pH 8)/'I .5 M NaCI
for 10 min. Cell debris was gently scraped off with a plastic spatula and the DNA was crosslinked onto the membranes by UV irradiation using a GS GENE LINKERT""
UV Chamber (BIORAD).

Orthosomycin-specific hybridization oligonucleotide probes were radiolabeled with P32 using T4 polynucleotide kinase (New England Biolabs) in microliter reactions containing 5 picomoles of oligonucleotide and 6.6 picomoles of (y-P32]ATP in the kinase reaction buffer supplied by the manufacturer. After 1 hour at 37 degrees Celcius, the kinase reaction was terminated by the addition of EDTA
to a final concentration of 5 mM. The specific activity of the radiolabeled oligonucleotide probes was estimated using a Model 3 Geiger counter (Ludlum Measurements Inc., Sweetwater, Texas) with a built-in integrator feature. The radiolabeled oligonucleotide probes were heat-denatured by incubation at 85 degrees Celcius for 10 minutes and quick-cooled in an ice bath immediately prior to use.
Cosmid library membranes were prepared by incubation for at least 2 hours at 42 degrees Celcius in Prehyb Solution (6X SSC; 20mM NaH2P04; 5X
Denhardt's; 0.4% SDS; 0.1 mg/ml sonicated, denatured salmon sperm DNA) using a hybridization oven with gentle rotation. The membranes were then placed in Hyb Solution (6X SSC; 20mM NaH2POa; 0.4% SDS; 0.1 mg/ml sonicated, denatured salmon sperm DNA) containing 1 X1 O6 cpm/ml of radiolabeled oligonucleotide probe and incubated overnight at 42 degrees Celcius using a hybridization oven with gentle rotation. The next day, the membranes were washf;d with Wash Buffer (6X SSC, 0.1 % SDS) for 45 minutes each at 46, 48, and 50 degrees Ceicius using a hybridization oven with gentle rotation. The membranes werE; then exposed to X-ray film to visualize and identify the positive cosmid clones. The results obtained with four representative orthosomycin-specific oligonucleotide probes are shown in Table XXVII. Cosmid clones that were positive in the hybridization experiment are indicated by a '+'. The ends of the inserts in these cosmids were sequenced using T7 and T3 universal primers and, as expected, were shown to contain sequences homologous to those in the EVER locus (data not shown). Since cosmid clone IH01 was detected by most of the orthosomycin-specific oligonucleotide probes, including one derived from the OXCO gene family from the EVER locus (data not shown), it was selected for further sequencing analysis. This cosmid clone was completely sequenced using a shotgun method. Cosmid clones FB03 and DH01 were found to overlap and extend the IH01 sequence towards the 5' and 3' direction, respectively, so they too were sequenced. Together, overlapping cosmid clones IH01, FB03, and DH01 (hereto referred to as 050CB, 050CA, and 050CG, respectively) constitute over 85 kilobasepairs that includes the everninomicin biosynthetic locus of Micromonospora carbonacea var. africana (EVEA). EVEA is further described in Example 10.
Table XXVII
oli on_ucleotide robe _ Cosmid UEVA-S2UEVA-AS2HOXG-S1HOXG-AS1 clone AF02 + +

CD01 +

DA01 +

DD06 +

DH01 +

FB03 +

FH08 +

FH09 +

GF08 +

HA08 +

HD01 +

HF10 +

HH12 +

IB04 +

ID08 + +

IF12 + I

IH01 + + +

To verify the specificity of the diagnostic probes according to the invention, 50 ng aliquots of cosmid DNA from three microorganisms known to contain orthosomycin biosynthetic loci were spotted onto nylon membranes and denatured, crosslinked and probed as described above. Cosmid DNA was isolated according to the alkaline lysis method (Sambrook et al. 1989 Molecular cloning: a laboratory manual, 2"d edition. Cold Spring Harbour Laboratory, Cold Spring Harbour, NY) from 15 mililiter cultures. Cosmids used in this experiment included 050CA, 050CB, and 050CG of the everninomicin locus from Micromonaspora carbonacea var. africana (EVER); 010CA, 010CB, and 010CG of the everninomicin locus from Micromonospora carbonacea var. auranfiaca (EVER); and 017CH, 017CP, and 017CP of the avilamycin-type locus from Streptomyces mobarensis (AVIA). In addition, a Micromonospora carbonacea var. aurantiaca genomic DNA cosmid clone, 050CC, which is unrelated to orothosomycin loci served as a negative control. The results obtained with eight orthosomycin-specific oligonucleotide probes are shown in Table XXVIII. Cosmid clones that were positive in the hybridization experiment are indicated by a '+'. Cosmid clones that were negative in the hybridization experiment are indicated by a '-'.
The results of the experiment summarized in Table XXVIII are consistent with the sequence information available for EVER, EVEA, and AVIA. The members of the UEVA, HOXG, and UEVB protein families in EVER are all contained within the 010CA cosmid; the same is not true for them other two loci, i.e.
the members of the UEVA, HOXG and UEVB protein families in EVEA and AVIA
are more distant to one another. All four UEVA probes consistently detected the same cosmid(s) in EVER, EVEA and AVIA, although the UEVA-S2 probe gave a weak signal for EVEA (indicated by the parentheses in Table XXVIII). The UEVB-S1 probe did not hybridize to EVEA cosmids as EVEA does not contain a UEVB
homologue (see Example 10). None of the oligonucleotide probes hybridized to the negative control cosmid DNA, 050CC. The negative control oligonucleotide probe UEVB-CTL1 did not hybridize with any of the cosmid DNAs.

r J

F-V ~ ~ ~ ~ ~ ~ ~ ~ i m W

r N

~ + + , W

r N

Q

+ ~ ~ ~ ~ + y . i Z

O r' w N

+ y .

W Z

N
o a + . . , + , + + , , r a + , . . + , + + . .

W

N

N

a .~.~ , . + , + + , , > ....

W

r N

~ + , + + , , W

O

c O
v C_ p p p 0 ~ n ~ 0 0 ~~.
N
V

Q Q
V ~' > Q O
IJJ W V

Example 8: Use of diagnostic nucleic acid sequences for 'identifying orthosomycin genes by PCR amplification:
The oligonucleotides described in Example 6 may be used as PCR primers to identify orthosomycin genes and biosynthetic loci and/or orthosomycin-producing organisms. Genomic DNA was prepared from Micromonospor~~ carbonacea var.
africana and Micromonospora carbonacea var. aurantiaca as described in Example 7. 010CA cosmid DNA was prepared by the alkaline lysis method (Sambrook et al.
1989 Molecular cloning: a laboratory manual, 2nd edition. Cold Spring Harbour Laboratory, Cold Spring Harbour, NY). Aliquots of the genomic DNA and the cosmid DNA were used as template DNA in PCR reactions with the following four PCR primer pairs: 1) UEVA-S2 and UEVA-AS1; 2) UEVA-S1 and UEVA-AS1; 3) UEVA-S2 and UEVA-AS2; and 4) UEVA-S1 and UEVA-AS2.
Each PCR amplification was carried out in 50 microliter reactions containing 50-100 nanograms of template DNA; 37.5 picomoles of each primer; a final concentration of 0.2 mM each of dATP, dGTP, dCTP, and dTTP; a final concentration of 10% dimethyl sulfoxide, and 2 units of Pfu DNA polymerase (StratageneT"') in the reaction buffer supplied with the enzyme by the manufacturer. The PCR conditions included an initial two minute denaturation step at 96 degrees Celcius followed by thirty amplification cycles in which denaturation was performed at 96 degrees Celcius for 30 seconds, annealing was performed at 45 degrees Celcius for 30 seconds, and extension was performed at 72 degrees Celcius for 2.5 minutes.
The four primer pairs used were expected to amplify portions of the orthosomycin-specific UEVA gene and are listed in the order of increasing expected size for the amplified product. The relative position of these oligonucleotides is depicted on the UEVA aligned nucleotide sE~quences as shown below and in Figure 9.
Figure 9 is a picture of a 1 % agarose gel stained with ethidium bromide in which 5 microliter aliquots of the PCR reactions were resolved by electrophoresis.
Primer pairs are indicated at the top of the Figure. The numbers indicate which template DNA was used in the PCR reaction, i.e. "1" represents Micromonospora carbonacea var. africana genomic DNA; "2" represents Micromonospora carbonacea var. aurantiaca genomic DNA; and "3" represents cosmid 010CA from the EVER locus. The leftmost lane contains the 1 Kb Plus DNA ladder (InvitrogenT"') molecular weight standards, some of which are labeled to the left in basepairs (bp). The schematic drawing below the picture in Figure 9 depicts the relative positions of the primer pairs and the expected sizes (in basepairs) of the PCR products based on the known nucleotide sequence of the UEVA gene from the EVER locus (described in Example 1 ).
Referring to Figure 9, the PCR reactions in which genomic DNA was used as template produced a smear with all four primer pairs tested. In contrast, the PCR reactions in which purified 010CA cosmid DNA was used as template gave rise to distinct bands that are consistent with the expected sizes. This result suggests that the PCR conditions used are suboptimal for amplification from genomic DNA but may be adequate for less complex, subcloned DNA fragments.
The smears that arise with genomic DNA templates are likely due to mispriming (i.e., inaccurate annealing of the PCR primers followed by extension) caused by a combination of a suboptimal annealing temperature in the thermal cycle, a high G/C content and complexity of the genomic DNA, relatively low abundance of the target sequence, and the presence of some degenerate positions in the oligonucleotide PCR primers.
Based on the assumption that a certain proportion of the amplified products arise from accurate priming events (as can be seen in several lanes in Figure 9), an aliquot of the products obtained with the UEVA-S1 and UEVA-AS2 primer pair was used as template DNA in a second PCR reaction using the UEVA-S2 and UEVA-AS1 primer pair so as to specifically amplify the UEVA sequences. In essence, this amounts to a two-step nested PCR in which the first round of amplification serves to enrich for UEVA sequences with a pair of "outer" UEVA-derived primers and the second round of amplification, carried out with primers that are contained within the region defined by the "outer" primers. Using this two-step nested PCR approach on both Micromonospora carbonacea var. africana genomic DNA and Micromonospora carbonacea var. aurantiaca genomic DNA, a distinct band was obtained whose size was similar to that obtained with cosmid 01 OCA
using the UEVA-S2 and UEVA-AS1 primer pair (data not shown). The band was resolved on an agarose gel and purified by spinning through a glass wool plug, extraction with phenol/chloroform (1:1 vol:vol), and pelletting by ethanol precipitation. The purified DNA was then sequenced using the UEVA-S2 and UEVA-AS1 primers.
The sequencing of the M. carbonacea var. africana PCR product yielded 302 nucleotides of high quality sequence information which is in perfect agreement with the region coding for amino acids 69-168 of the UEVA protein in EVEA
(described in Example 10):
AACCCCGGCCGGGTGATGGGCCTGGCGGACGCCTTCAAC:AGCCCC 45 N P G R V M G L A D A F N S P

N M R R T R L A M L A G E R V
GACGCCTGCTCCTACTGCTACCACCGCGAGGACCACGGC:GCGC'rG 135 D A C S Y C Y H R E D H G A L
TCGTACCGGCAGGAGATCAACCAGCGGTTCCGGGACATC:GCCGAC 180 S Y R Q E I N Q R F R D I A D
CCCGACCGGCTGGCCGCCCGCACCGCGCCCGACGGCACC:GTCGAG 225 P D R L A A R T A P D G T V E
GACTTCCCGTTCTTCCTCGACATCCGGTTCGGCAACACC:TGCAAC 270 D F P F F L D I R F G N T C N

L R C V M C A Y P V
The sequencing of the M, carbonacea var. aurantiaca PCR product yielded 343 nucleotides of high quality sequence information which is in perfect agreement with the region coding for amino acids72-185 of the UEVA prol:ein in the EVER
locus (described in Example 1 ):

R Y A K D N P D R V M G I R E
GCCTTCAACAGCCCCAACATGAAGCGGACCCGGCTGGCC~ATGCTC 90 A F N S P N M K R T R L A M L

G G E R V E A C K Y C Y F R E

GACCACGGCGCCCAGTCCTACCGGCAGAACGTCAACCGC:CGGTTC 180 D H G A Q S Y R Q N V N R R F
CACCAGGAGTACGACCTCGATGCGCTCGCCGCCCGTACC:GCCGCG 225 H Q E Y D L D A L A A R T A A
GACGGCACGGTCGAGGAGTTCCCGTTCTTTCTCGACATC:AGGTTC 270 D G T V E E F P F F L D I R F
GGCAACCTCTGCAACCTGCGGTGCGTCATGTGCACCTAC:CCGGTG 315 G N L C N L R C V M C T Y P V

S S S W G A K Q R
Example 9: In silico identification of orthosomycin biosynthetic genes:
Sequence information from the polypeptides and polynucleotides taught in the invention allows for in silico identification of orthosomycin biosynthetic loci in any biological sample. The biological sample may be an environmental sample (i.e. soil), genetic material and purified genetic material (DNA, RNA, cDNA) from environmental samples or from cultivated microorganisms.
Genomic DNA from cultured Micromonospora carbonacea var. africana NRRL
15009 was extracted and analyzed as described in Canadian patent application 2,352,451. Briefly, extracted genomic DNA was randomly fragmented, size-fractionated to generate small size DNA fragments and cloned into an appropriate plasmid vector to generate a Genomic Sampling Library (GSL). The GSL is a library of small size random genomic DNA fragments that covers the entire genome of Micromonospora carbonacea var. africana NRRL 15009.
The GSL library was analyzed by sequence determination of the cloned genomic DNA inserts. The universal primers KS and/or SK, referred to as forward (F) and reverse (R) primers respectively, were used to initiate polymerization of labeled DNA. Sequence analysis of the Genomic Sequence Tags (GSTs) generated was performed using a 3700 ABI capillary electrophoresis DNA
sequencer (Applied Biosystems). Further analysis of the GSTs was performed by sequence homology comparison to various protein sequence databases. The DNA
sequences of the obtained GSTs were translated into amino acid sequences and compared to the National Center for Biotechnology Information (NCBI) nonredundant protein database and the DECIPHERT"~ database (Ecopia BioSciences, St.-Laurent, QC, Canada) using previously described algorithms (Altschul et al. J. Mol. Biol., October 5; 215(3) 403-10). Sequence similarity with known proteins of defined function in the databases facilitates recognition of protein families of the invention from the polypeptides encoded by the translated GSTs.
Four hundred GSTs were analyzed from the Micrornonospora carbonacea var. africana GSL library and compared to the above protein databases. Among the 400 analyzed GSTs, three GSTs (RAA12, RAC92, FAE38) were found to have substantial sequence similarity to proteins taught by the invention to be diagnostic of orthosomycin biosynthetic loci (HOXG, OXRW, MTFD, respectively). These three GSTs had a much greater degree of similarity to homologous proteins from orthosomycin-specifying loci than to related proteins from non-orthosomycin-encoding loci. The degree of homology between the translated GST products and their homologs in EVER, AVIA, and AVIL othosomycin loci is shown in Table XXIX.
All three GSTs encode members of protein families that are unique to the biosynthesis of orthosomycin compounds. HOXG, OXRW, and MTFD are only found in orthosomycin-encoding loci and their detection through the genomic sampling of Micromonospora carbonacea var. africana clearly indicates the presence of an orthosomycin-specific locus within the genome of the microorganism. The GSTs used for the in silico determination of the orthosomycin locus were subsequently shown to belong to EVEA as confirmed by complete sequence determination of the EVEA locus (see example 10).
Further determination of the class of the predicted orthosomycin compound would have been possible if GSTs harboring members of the protein families diagnostic for everninomicins or avilamycins had been detected. The presence of the orthosomycin-specifying locus was confirmed by detection and complete sequence determination of the locus (see example 10 A similar approach was used to evaluate the potential of Streptomyces sp.
(collection ATCC 39365) to encode orthosomycin compounds. Seven hundred GSTs were analyzed and compared to protein databases. Among these GSTs, two (FAF63, FAA47) were shown to have substantial sequence homology to HOXG and PKSO protein families that are found in orthosomycin loci (see Table XXIX). HOXG is an orthosomycin diagnostic protein family as it is only found in orthosomycin biosynthetic loci, whereas PKSO is a protein family found in orthosomycin loci, but may also be associated with secondary metabolism other than orthosomycin biosynthesis. Use of the compositions and methods of the invention in regard to Streptomyces sp. (collection ATCC 39365) demonstrates the predictive ability of the invention for discovery of orthosomycin loci in microorganisms or biomass for which no metabolite expression determination was previously performed.
Table XXIX presents comparison of translated GSTs from Micromonospora carbonacea var. africana and Streptomyces sp. with their homologs from orthosomycin loci. Blast analysis was performed using the Blastx algorithm (Altschul et al. J. Mol. Biol., October 5; 215(3) 403-10). In each comparison, the first line indicates the number of identical amino acids and the degree of identity whereas the second line indicates the number of similar amino acids and the degree of similarity between the two protein segments.
Table XXIX
Micromonospora carbonacea var.
africana EVER EVEA AVIA AVIL

HOXG 75/93 (80%)iden.86/86 (100%) 78/94 (82%)78/93 (83%) (RAA12)77/93 (82%) iden. 85/94 (89%)86/93 (91 sim. NA %) OXRW 83/132 134/134 (100%) 86/132 87/132 (RAC92)(62%) NA (6 (65%) 102/132 5%) 102/132 (76%) 101/132 (76%) (76%) MTFD 56/87 (64%) 105/105 (100%) 50/91 (54%)48/91 (52%) (FAE38)70/87 (80 /) NA 63/91 (68 / 62/91 (67%) ) Streptomyces sp.

EVER EVEA AVIA AVIL

HOXG 47/99 (47%) 47/99 (47%) 52/109(47%) 49/109(44%) (FAF63) 56/99 (56%) 56/99 (56%) 63/109(57%) 61/109(55%) PKSO 81/146(55%) 76/146(52%) 83/146(56%) 82/146(56%) (FAA47) 97/146(65%) 92/146(62%) 97/146(65%) 95/146(64%) Example 10: The everninomicin biosynthetic locus in Micromonospora carbonacea var. africana:
The microorganism Micromonospora carbonacea var. africana NRRL 15099 was obtained from the Agriculture Research Service Culture Collection of the United States Department of Agriculture, 1815 N. University Street, Peoria, IL
61604. The everninomicin compounds produced by strain NRRL 15099 are described in US Patent 4,597,968. The biosynthetic locus for everninomicin from strain NRRL 15099 (EVEA) was identified according to the method described in Canadian patent application CA 2,352,451. The sequences obtained from cosmids containing overlapping genomic inserts spanning EVEA were identified. Within the sequences of the cosmid inserts, numerous ORFs encoding polypeptides having homology to known proteins were identified. Contiguous nucleotide sequences and deduced amino acid sequences of EVEA are provided as follows: the amino acid sequence of ORF 1 (SEQ ID NO 271 ) is deduced from the nucleic acid sequence of SEQ ID NO 272 drawn from contig 1 (SEQ ID NO 278). The amino acid sequence of ORF 2 (SEO ID NO 137) is deduced from the nucleic acid sequence of SEQ ID NO 138 drawn from contig 1 (SEQ ID NO 278). The amino acid sequence of ORF 3 (SECt ID NO 5) is deduced from the nucleic acid sequence of SEQ ID NO 6 drawn from contig 1 (SEQ ID NO 2'78). The amino acid sequence of ORF 4 (SEO ID NO 37) is deduced from the nucleic acid sequence of SEQ ID NO 38 drawn from contig 1 (SEQ ID NO 278). The amino acid sequence of ORF 5 (SEQ ID NO 171) is deduced from the nucleic acid sequence of SEQ ID NO
172 drawn from contig 1 (SEQ ID NO 278). The amino acid sequence of ORF 6 (SEQ ID NO 173) is deduced from the nucleic acid sequence of SEQ ID NO 174 drawn from contig 1 (SEO ID NO 278). The amino acid sequence of ORF 7 (SEO
ID NO 49) is deduced from the nucleic acid sequence of SEO ID NO 50 drawn from contig 1 (SEO ID NO 278). The amino acid sequence of ORF 8 (SEO ID NO
103) is deduced from the nucleic acid sequence of SEQ ID NO 104 drawn from contig 1 (SEQ ID NO 278). The amino acid sequence of ORF 9 (SEQ ID NO 269) is deduced from the nucleic acid sequence of SEO ID NO 270 drawn from contig 1 (SEQ ID NO 278). The amino acid sequence of ORF 10 (SEQ ID NO 109) is deduced from the nucleic acid sequence of SEQ ID NO 110 drawn from contig 1 (SEO ID NO 278). The amino acid sequence of ORF 11 (SEQ ID NO 157) is deduced from the nucleic acid sequence of SEO ID NO 158 drawn from contig 1 (SEQ ID NO 278). The amino acid sequence of ORF 12 (SEQ ID NO 115) is deduced from the nucleic acid sequence of SEQ ID NO 116 drawn from contig 1 (SEQ ID NO 278). The amino acid sequence of ORF 13 (SEQ ID NO 121) is deduced from the nucleic acid sequence of SEO ID NO 122 drawn from contig 1 (SEQ ID NO 278). The amino acid sequence of ORF 14 (SEO ID NO 197) is deduced from the nucleic acid sequence of SEO ID NO 198 drawn from contig 1 (SEQ ID NO 278). The amino acid sequence of ORF 15 (SEQ ID NO 91 ) is deduced from the nucleic acid sequence of SEQ ID NO 92 drawn from contig 1 (SEQ ID NO 278). The amino acid sequence of ORF 16 (SEQ ID NO 185) is deduced from the nucleic acid sequence of SEQ ID NO 186 drawn from contig 2 (SEO ID NO 279). The amino acid sequence of ORF 17 (SEQ ID NO 85) is deduced from the nucleic acid sequence of SEO ID NO 86 drawn from contig 2 (SEQ ID NO 279). The amino acid sequence of ORF 18 (SEQ ID NO 227) is deduced from the nucleic acid sequence of SEQ ID NO 228 drawn from contig 2 (SEO ID NO 279). The amino acid sequence of ORF 19 (SEQ ID NO 239) is deduced from the nucleic acid sequence of SEO ID NO 240 drawn from contig 2 (SEQ ID NO 279). The amino acid sequence of ORF 20 (SEO ID NO 79) is deduced from the nucleic acid sequence of SEO ID NO 80 drawn from contig 2 (SEO ID NO 279). The amino acid sequence of ORF 21 (SEQ ID NO 275) is deduced from the nucleic acid sequence of SEO ID NO 276 drawn from contig 2 (SEQ ID NO 279). The amino acid sequence of ORF 22 (SEQ ID NO 11 ) is deduced from the nucleic acid sequence of SEQ ID NO 12 drawn from contig 2 (SEO ID NO 279). The amino acid sequence of ORF 23 (SEQ ID NO 43) is deduced from the nucleic acid sequence of SEQ ID NO 44 drawn from contig 2 (SEO ID NO 279). The amino acid sequence of ORF 24 (SEO ID NO 143) is deduced from the nucleic acid sequence of SEQ ID NO 144 drawn from contig 2 (SEO ID NO 279). The amino acid sequence of ORF 25 (SEQ ID NO 17) is deduced from the nucleic acid sequence of SEO ID NO 18 drawn from contig 2 (SEQ ID NO 279). The amino acid sequence of ORF 26 (SEO ID NO 191 ) is deduced from the nucleic acid sequence of SEQ ID NO 192 drawn from contig 2 (SEO ID NO 279). The amino acid sequence of ORF 27 (SEQ ID NO 61 ) is deduced from the nucleic acid sequence of SEQ ID NO 62 drawn from contig 2 (SEQ ID NO 279). The amino acid sequence of ORF 28 (SEQ ID NO 31 ) is deduced from the nucleic acid sequence of SEQ ID NO 32 drawn from contig 2 (SEQ ID NO 279). The amino acid sequence of ORF 29 (SEO ID NO 179) is deduced from the nucleic acid sequence of SEO ID NO 180 drawn from contig 2 (SEQ ID NO 279). The amino acid sequence of ORF 30 (SEQ ID NO 163) is deduced from the nucleic acid sequence of SEQ ID NO 164 drawn from contig 2 (SEO ID NO 279). The amino acid sequence of ORF 31 (SEQ ID NO 67) is deduced from the nucleic acid sequence of SEO ID NO 68 drawn from contig 2 (SEO ID NO 279). The amino acid sequence of ORF 32 (SEQ ID NO 207) is deduced from the nucleic acid sequence of SEQ ID NO 208 drawn from contig 2 (SEQ ID NO 279). The amino acid sequence of ORF 33 (SEQ ID NO 55) is deduced from the nucleic acid sequence of SEO ID NO 56 drawn from contig 2 (SEO ID NO 279). The amino acid sequence of ORF 34 (SEQ ID NO 25) is deduced from the nucleic acid sequence of SEQ ID NO 26 drawn from contig 2 (SEO ID NO 279). The amino acid sequence of ORF 35 (SEQ ID NO 223) is deduced from the nucleic acid sequence of SEQ ID NO 224 drawn from contig 2 (SEQ ID NO 279). The amino acid sequence of ORF 36 (SEO ID NO 235) is deduced from the nucleic acid sequence of SEO ID NO 236 drawn from contig 2 (SEO ID NO 279). The amino acid sequence of ORF 37 (SEO ID NO 211 ) is deduced from the nucleic acid sequence of SEQ ID NO 212 drawn from contig 2 (SEO ID NO 279). The amino acid sequence of ORF 38 (SEQ ID NO 231) is deduced from the nucleic acid sequence of SEO ID NO 232 drawn from contig 2 (SEO ID NO 279). The amino acid sequence of ORF 39 (SEO ID NO 219) is deduced from the nucleic acid sequence of SEQ ID NO 220 drawn from contig 2 (SEO ID NO 279). The amino acid sequence of ORF 40 (SEQ ID NO 215) is deduced from the nucleic acid sequence of SEO ID NO 216 drawn from contig 2 (SEQ ID NO 279). The amino acid sequence of ORF 41 (SEQ ID NO 243) is deduced from the nucleic acid sequence of SEO ID NO 244 drawn from contig 2 (SEQ ID NO 279). The amino acid sequence of ORF 42 (SEQ ID NO 273) is deduced from the nucleic acid sequence of SEQ ID NO 274 drawn from contig 2 (SEO ID NO 279). The amino acid sequence of ORF 43 (SEQ ID NO 73) is deduced from the nucleic acid sequence of SEO ID NO 74 drawn from contig 2 (SEQ ID NO 279). The amino acid sequence of ORF 44 (SEQ ID NO 97) is deduced from the nucleic acid sequence of SEQ ID NO 98 drawn from contig 2 (SEQ ID NO 279). The amino acid sequence of ORF 45 (SEQ ID NO 131 ) is deduced from the nucleic acid sequence of SEO ID NO 132 drawn from contig 2 (SEO ID NO 279). Homology was determined using the BLAS-fP version 2.2.2 algorithm with the default parameters. Table XXX-A presents 'the results of the homology analysis. Table XXX-B presents the position, length and orientation of each EVEA ORF within SEO ID NOS: 278 and 279.

a m E o~ o i.

o E s ~ ~ c c = a o ~ ~ is ~ n ~ o E

U O U O
a > c0 O 'O !/) . >, p ~ ~ o :>

,. ~~ > ~ o o g _ O ~ ~ U E _ > t t ~n r .'' p Oj N 7, d L O ~ U U N O

y ~~ ~ E '' ~ L ~ i ~ ~. ~ m p~ E Q u~ ca ~ L E ..

o >,>, o v ~; ai ~, E '> ~> o nj co o cn ~ EE n ~ ~n '- to 0 o wn o, a~

~ ~ ~ ~ o ~ '~

, . -- in ~ ~ a~ >, >, ~ >.
,-E ~ ~'~' ai m m E '1 o in E.~E ~ o E

o ~ ~, o- ~ -o o coo ~ v~
m a~ ~ ~ ~ a ina ~ a E > m~ ~ c -a v ~ ~ m a~a~E o a~ a~

, ~ ~; ~ ~~ c ~ in E
~

p 3 ~ ~
~

- T c0 Q1 ~ N ~N S. c0 p a~ ~, E '~ ' v ' 'S, a E~ ~ o ~, ~ ~~ ~ o a~ a a~ E ~ cnoor o v ' o .

Q
O O O t/1 Q N p ~ ~ U >. ~
s ~ >' ~ ~ $ z O E o ~o 'v " ~
o ~

a~ o ~o~ a. ~ ~ ~ o ~ y o '' ' a ~ a ~ o ~ ~ > a >, ~

~n. ~ > > ~ >
o D DO D ~ a. in d a m i,3Yin i~ a o -a ~

L E-Z Z a Q ~ D o ~ a c> > E
;.- p a ~ ~'a '0 4 F- a > a a tit~n. a a tn > c O
L

0 0 00 ~ o~\ \ 0 0 o p~oo O O~ ~ M I~ M r (D O'r _ o0 0p 00 0 N . p r r tp~n n ~t m . p n C C ~ ~. m v co ~ vv L v ~
O a ~~ N N 00 N O O ~ ~ vN
N 7 ~ V

N M C\(' C ~ ~ ~ M M N M V
~ ~

~ ~ a ~V ~ n ~ NN V M O
~' N NN N N N ~ O M M ~~ O
t O~ ~ N N

0 00 0' ~ 0 0 0 0 0 0 ~~ o \ \

r ~ d ~ ~ N ~ p M O> ~ ~~ y o ; CO ~ 0)~.~ '- N M

v CrCDCO ' ~ ~ O ~ v ~
O O

~ v O _ u N
~ vD

M MV N N OD N t0 M CD a 0 N O I~ ft O ~ M~

N N V O O O~
~ M M C~ M

0 I~M V N ~ n M ~ tp ~ j NN M

_ N_ N N _ N r ~ O~ N M M
N N N

N NN VO ~ c ~ M 0 O M
D

~ _ ~ Gr 0 r C V
r r O

p ~ NN N N N N O d ~ ~ N p ~ N
~

O

T
DI ~ ~ ~ ~ ~ ~ c0 f 9 N N N
N

N c0tI3 t0 N 1~ N O) M ~ C C c0 c0 c0 c0 Nt 1p7tO~ ~ M M ~ O M ~

M m N N M d r M
' M MM ~ r r - - ,= r T ' ' n .Y ~!1~0f~M.n r 'p~ .~ W N Or0o ND a MD
!1 I ~. M . O ~
I , ~

0M Q ~ ~ N N ~ O O M M 0 ' Q~ ~ av-- c 0 O O a 0 ~ O

ad~ Q Z z 4 ~ 4 d 4 d U ' a 4 d d N~ M '~ J N

(Q M m N C J V

+r o~~ ' a i a S Q

r ~ w w ' zz o n ~ d ~

N M v u~ cn n H JJ~
L

3001-11(~,A CA 02375097 2002-03-28 m m o E ~n,w o ~ v m ~ a~ E >

O = ~ O ~ p~U O 'C
O O

U p <n a E E O O ~ _ ~ ~
N E N t O - N O O E O c0 Q >'d > a N ~ '> > ~ E
~

s ~ . o r > O N ~ d o o N U r U ~ t C ~ U ~ '~ N ~ > ~ ~
~ fn . t a -p N ~ N N
p , .

C - E > > p~ C G1 _ v T

y o ~n~n o o ~ c~E ~ o ~ ~ m o a a~a~ E ~ ~,~ ' E E m 4 0 ~

~ , o O ~ o = a E ~ o aiaSE
i.

s z >,N O ~ ~na v~v~o m ~ ~' ~

gy ~ c ~ ~ o o ~ a~ E m o m :n ~ Y Y s ,- u ,-m ~a v ~n ~ a a -p~ o ..~ o N ~ ~ ~ p d U 7 ~ ~ d E ;;U d 7,~> >,' U y ~' c ~ ~ U inin ~ w E '~ "
c ~ N n ~ ~ E c ;o_o>
~ ~ i ~ a o S

c ~ p c a a a~~ E U y ,7~_ U U
p N O ~ ~ C C ~ p T r ~ p p p U E C ~
O

N _ O E ~, U U U
M ~ _ _ - -~

O O ~ O. p ~ ~ N Q ~ >',p >">'E
Of p x ~ O O U ~ ~ C N ~ p d U I ' O

E O c0 E O O d E X C O. f0 C c0 O _ p ~ . p p M C U

d d N O U d N fn ..N C ~.>, > > > G > N f C

~ > . > > ~ ~ T t 4 p L t ~ ~ ' ~r~u ~ o x z ' ivD Y E E
~ Z

a c.a m o. s a a Q a E Z 4 a ~ o cocoQ

0 0 0 0 0 o 0 0 .oo' v v c 0 0 ~ \ o o o ~

. ~ n ~ _ ~ _ o ~ a~
r ~ ~ O

'V V In ~ r N 1IJ~ O LON ~ ~ O O V ( y N ~

V O ,b ~ ~ _a V C ~ ~ C
O D

O p ~ m O ~ r O C~N V h.tn~ f~D fsh~C~7 ~ 0 N ~ N N M N N ~ ~ ~ ~ ~ C M N r r O M M P O ~ O a ~ r N N
~

1 O ~ M M N C' C~
f O O ~

N r N N Ch N r ~ C~0N

o. v \ \ Q
\ \ o o \ o \ o \ o \ o \ \ 0 0 0 0 000 0 0 0 o o o o o o N M 0 N N o N p ~ ~ \

c~ ~ ( ~ O ~ ~ ~ a0~ ~ c0 0 !

~ 00O
~ ~ n h t M ~ v m N N ~ ~.~ ~ N,v ~ M M

cD a0N ~ m ~ C'~n O M N V f~~ O (-O ~

M N ~ ~ N ~ M ~

M ~ ~. ~ ('~ ~ ~V r N cp~ ~ u7 ~ 1 ~ h ~ a0r' 0 N

~ ~ ~ N j c N
N r '~ 0 P a0 dwt N

f~ I~N O 1~ I~ m O O I~ ~ ~t V N
' (D I~C7 C 00 O _ r r O f~f~O f~C~N O O O
~7 N ~ V N ~ U ~ p V U N O j N ~ O O O
N

c w O m cb N

O O p ~ m ~ 'p ~ O r ~ ~ N
~ p 0 c I O C M M c OnO C
~ O O 0 O

N ~ M m N ~ M N N N ~ ttM ~ n l r .- ' r-r _ _ N V _ O _ r O c0c'~ai M _ r r r a0 a0~ ~ CO ~ Isa0 ~ ~ M ~ = M
00 a0 ~

,~ f~N a0 I ' 0 ~ r M ~

M c9~ O M ~ M CO M O M c~ a ~ ~'C 0 _ ~ 7 D p 00 aD~ M c0 _ 0~00 O ~ o~~C'a0 .Y~ N ~ ~ O

Y
Y YQ~ I Y y Ye~Y YQ~ Y paYQ U ~ tQi Y
Q ~ Q ~ Q ~ Q ~ ~

Z Z Z Z U Z

f~ N f~ V ~ (D I~. in '...
i N ~ N N ~ N O ~
N
I

M , N
M V

~ ~ ~ o ~~ ~ > i~

~

o ~ p ~I
a Y

_ i c~

_ ...~......_ i _.. _ r N ~ M ~
..
j r N

N

.N N U 7 N E _ N w .O N
~

L 7 7 ~ :
. U

o .

'~ ~ ~ E ~ ~ o ~ c >, _U
N c0U

7 p ~ ~ >, = v C G7 tn N ~ fCJ C N N N
. 0 ?

. N . f -n E > o ~' ~ ~ o o c ' ~
~ o . , _ N

E ~o ~ o ~ d ~d ~ ~ ~ _ _ ~ ~ E
' ' ~c . E ~ ~ c . d u a~ i ino c >, c~ o a ~ mE o~ .o~' . c~' n ~ E
~

N _N .~ ~ o , .
. ~

y E o~ ~ v~Q "a' ~a ' ~ ~ w M
v c0 N N U m ~ N~ "D N C~7 m O . ?. O' N Q Q> Q ~ ~ 7,~ T ~ UN C a p E

c N C ~ ~ o Q E~ ~ O T~ ~ - ~ 0 dN p cG

N Y

a O C ~ >'~ ~ Ep~' N~ j ~ OE O N ~ ~p Q t -Op ~

f N , , _ _ L
O ~ .

7 a fO O O N a (Od p ~N. O ~ ~ a N ~ W
C V~ fn N

N . N Va ~ tnN v.x a C ~~ ~ ~ ~ ~ r0N p N d ~

N p~ ~ N fd C f ~ Z z > > x ' ~ ~U

E .~ ~ ~ ~ d .d ~ ~ -n O ~ ~ ~a ~ ~ v r o ~= ~ ~

_ ~ ~ ~ i~ a a r ' ' > o an ~ n~ ~ ~ ~ ~ o o n.
~. ~S. ~
~

, . . .c s o a a p 'X0 O O Y 'a ~ ~ E7 ~ M O
~ ~

M W s Oc f1 C lCf N N . y O O ~
~ G -' p Y

. . ,..Q J a QL Q N O~ p. 0 0 ' ~ TJ O
C

0 0 00 0 0 0 00 o D o0 0 0 0 0 0 0 \ o \ 00 0 0 \ \ \\ \ \ \\ \ \ \ \ o 0 0 0 V a ~c N ~ 0 N (D ~ ' ' h V M i~N c0 Crl~ Q7 O c N V I~f~ N~ W A ~

~ . ! a0 ~f7MN chd: M G a DO c 0 a0O 00 ~ C7M s 0 W O. D

v ~~ ~ ~ ( NGnf a O00 N N CD 00 ~ ~~ ~ v ~ c0 c0 M W O 00N h O V V G~d0 t0t0~, d c0I~ 00 W I~ N
~ 0 ~0000 M f~f~ O)00O) f0 c N NN V V'tn O) ~ O) w M ~. ~M V M M V~ V M c\M V <tV N ~
M

~ N
N 07 Nc~ 0~0O C~D T 1 w ~O V c~ON~ c~0 0~~ V' O 1~ O

r ~DM M N O O r0700 ~l7COIn h. f~00 t'.W I'~ In T

NN N N N MN N N NN N N N

a o 0 00 0 0 0 ~ 00 0 0 00 0 0 ~ o ~ 0 0 u7n 07 ~ V M c0 O~M cD N NM ~ Iw N M M d' O ~ 00O c0a W

N 0 ~ Wc0aW t VO tn u~ M O W cD
M~ c)~ W nN ~ tn i W

cV y st O is dW~ a0 ~ V~ v M M V~ V c0c0 V d' ~' ' v _ y ~ V ~ _M
M QON ~ O V et W~ t0cDtp ~ aD 00t~t 0 Wa0~ W h I~ Oi O~ N NN ~ O) 07 M ~M dc\~ McI M VV C M MM . m c\

"'1 ,'~ N N
r1M CO ~~ t0 M m ~Q O O ON O O M O OD d' N

Nr NN N _ NN N N
N N
..-N m rm NW 0~07p~,;N..~ W~N O W cph n o ' W

ca ro ~ooc m.cmn, ~, IT ~~ _ ~ n c v i _ al~

tO N pN N ~ n NI p~ Ni u)N N N _ ~ p , u) N ' t ~ N N N
' G) N
!
-L

p N N Nt0~ ~ p ~, f0at ~C N i N N
~ , I c0 t0 cO pN N c0~C N c~:, NN c0 tC ~0 cb - ' c0 ca ~

N ~ 0 WcCI~. ~d' (p c0D N O cU
n c c O I p N~
N

c 7 ~V C V ~ t ~~c~ ~ d ~ M ~M C ~0 M D_~ CI ' _ rr r ~ ~.- r-~ riT- r~ .- ~ r , r r~

O M ~~ ~~ W n ~

~ m N t_~.a V a Nn ~ O~ fi M m W
D ~ O~~ ~ ~ M
N

N I~ ~ c0GOt~ a0P. ~ MO M~ t O . N ~ 0 f O) - ~ ,~ ~.,~ I ~ N N O d , ~ ~

U ~ ~Q U~ Q ~ ~ ~U Q~ U ~ ~ Q Q
Z

i M ~ V i !

i d Q ~ ~ l 1l ~ D i O ( O O

N O O
N N

N N ~ ~ N N

C

N c4 o o E o E
.

c ~n a~ d m E

_ ~ a o o a~ ~ E ~ ~ s s >
t E

o o o N o o o ~ N ~-~ m . ' - o :'. -~ E

o C ~ ~~ > E o O > U U o _ o UE . ~ !~ 7 O

O N p ~ f~ E > d L f/7UO p O ~ 7, V V N

~ p ~ a_ E E T 4- , _ p- p >~ a p, >' p . 0 1 - M . E o E

E a o o ui = ~ > - o ~ ~ o .o a '~ - '~ . - ~ c ~ - ~ oo' c ~ >
>
.

, ~ ~' Z a ' ~Li~ ' ~ m > z~ m ~ c a , d ~ ~ ~ ai ~ C E ~ ~ m - a~ a~ v~

~ ' _o ~o ~ a N ~ C3 a ~N ~ ~~ ~ m S.

m a N o ~ c ~ o~E

o ~ ~ ino ' ~o N c E

~ ~ c Q ~ ~ ~ t -'o 0 0 m ~ .. N N ~ E .~ _c~ ~ ..

~ N O T7 ~ y, V O O ~ ~ j,C 'S N.'-~ ~ y, j.L

~ EQ7 , ~ ~ ~ 0 V V p ~~ ~ V ~ t0 ~ (U ~ V O .
~

N N ~ QC T >. O N O m a~ ~, ~'~ ~ j, E

t~N O y ~ T ~ O ~a _ ~

a ~ ~ ? 47N O L d i ~ i E

a~ n. a~ ~ ~ E ~ n3 ~ na > >
i ~ L _> ~~ >_ ~ n wi o Yo ':.cum o - ' s ~~ ~a L ~~ m - C

vi pY ca , O ~'p ~ z o ~ a aa Z n ~ a Z z a ~ a a c . c .
n o a o o .o o v oo v vv v v v o ~ a o ~~ 0 00 0 0 0 0 a 0 c ~ ~ ~~ ~ M
~ M

O~a a0 c0 7 00 c r c ~O C
0 0 ~ 0 ~

r 0 ~ ~~ pj M ~rjc0 , M r a0 Oi O
O

o ('~Mc~ c0 c0 v V ~Y ~'v~ C ~ v V V

a0 COv ~.,~~ M O ~ N

(NDO V MM m O) CpO ~ M~ O I~
M N N

N M f~ C~ c0 N,C'~OM W 'o i coO a .-cfl ~ a r-o~ cc c~,N~ n~ O
' MN

M ~ (p cps M M R7M N O ~n N N
r e-r_ 0 0 ~ II_ 0 0 00 0 \ o 0 0 0 v1 0 0 0 4 . ~ 0 00 0o 0 O

~ ~ O M M O N ~- ~r ~ O
' ~ O I t ~ M ~f'7M00 - M c0,~
~ D

n mM c0 CO ' M _ ~ N
' ~

v N M NN ~V

c 0 ~ M~ 01 O ~ M Mt M ~ N f 0 ' - t 0 t O c~ o ~.
D N

N C C C ' N ,.~M N Nr ~ M ~ p ~ N aG

O N ~ ~~ ~ N ~ ~ ~ ~O ~ GO
r ~fOV NoD

O N N

O 00 N aO ~ N c0 N M '~.~O ~ pp N 00GO
~

O7 N N.- c0 O ~ N N '-I T

~ p O N N 1. O

N N N ~N N IM ~

f0 Nc0 cC p c0~ ~ ~ ~~ p c~0ca ~ p c0N cC Nc0 N ~ Q~ C' ~ ~M N I~-M
t0N

N O ~ OM d' _ N c ~ MM

~ N N M M M M N NN

M N r T
T r T r _ ~,r r T T ~00 N tI)N
M _ N

c0~ n M ~ N ~ c~ ~~ ~ M h r M~ ~ ~ ~

_ _ M N rW M ~ ~ N ~ M ~O M ~O M p O

_ ~ M N N O'- Y y'- p M
p 'l DQ1 Y 11m Y ~~ D ~ ~ Y Li ~ Q I
1 L a a Q ~ Qm 'Q Z ~ z Z Z ZZ Z

C
' N ,:. N ~ ~ ~ ~ O ''V~' O ~

M -M ~ N N ~ M

f, I I ~ V.4 m ~ U o~

~ ~ ~ ~ Z E-~

o a ~p c~ ~ c~
' _ _ _ _..._..O . ..
~ N ~ M N
N

N M

N

.

(C C j, O

E N ' E >, = C v1 (l1O E

O (~ - _ O
' 0 ~

' o E ~ ~ a~
~ ~

7 >' O U E rn cU (n , O a NN > o N ~v~E v N
>

E o v, a~ ~_ a~~a a >, N:D m ~ ~

8 >E ~ N ''v a o, m o rv N .?~ ro~,a N C

N U (C(n ~ U ~ O C ~ N O ~E C U
>, N

E =oD ~ ~m. ~ E ~'o N ~ ~~ ~
~ ~

i ~ ~ >.' ' > r . ' n n = o ~C a o , a~N o ' m E o ' a a N ~, N E s ~ Q cno N _ ~ a N yC o a ~NQ N ~n U ' ~ a N a - ~ ~ ~_m a Y
M ~ ~

T >' O ~ C ..O~ tn >'>'O ~ V
E E ~''~

E 'a o~ ~ E ~ o a3co p E ET _~a O ~~, ~ ~ ~ O cv .n O O O
E

O L~ ~ ~~ ~ y~ ~ ~ j, C ~ Q T

- - ~

O ~n o -oa~ E ~ ~ a~ E a~
a~ ( ~

E L (~C~ cnOQ ~ 2 Q .. ~ ~ VJ~ a >, s O ~E N N ~ .CO N X C j, C Qj C X
N

N .a cNacNa ~ 'U N >,o 'a~~ O 'a~N~' ~ o cu ~ .~

C O p ~ y ~ O A~ O O ~~ O
C

' Uv U ON a -Y , E a E z' a '~ a co v~ ~ ~ ~ T E c o ~ n n> ~_ ~ n ~ a o D '~ ~ ~' ~ o ~ M m '~ m a~
~ ~

~ i~ .....' . ~o ~ ~ ~ ~ U .ra~ ~~a~co .~ca ? ~ C N -O . O _ N ~
~ O U

> Q7 U~7~ (OL C'~ - ? 'O L >> L V
U p C ~ N

m m~ ~~ o .~.~ ~j~ d _ ~ ~ _ mca _ 4--Op m .C O ~ O O c >,>,o a ~E v ~ o ~ ~ cn ~ 5~ ~ ~
a E a~ a~

a > o~o~a t a. Q a I- s a ~ r aa s Z
> ~> ~n m a >

0 00 0 0~. 0 :00~ o 0 0 0 0 0a o 0 0 00 0 00 0 00 0 0 o v v o a oo a o N NV N NN OW17c0 N (D04 r ~ V _ C'7V tn ~

O) nO a0nc0 c0~a0 00ODof t0 M C(D V N

(O07 n VN N ~N 00aDGO ~ n t1>C~tf7 ~y m_n ca_v _n_n_n v cococ_D y _n _ _M _ VN c0~- ~CJO)V ~ ~ a0 CDc0~ c0O
U7 ~ n ~ ~

0 00~ n roo co r n con ~ ~ rn o>a~a> n v ' C,\~7CAN M ('~~C~ C~NC~! C\M i.~ ~ ~ c' i r ~r\ N N
~ M ~ ODO00 N M

00 Mc'~ 00O7 ( COcr V ~ N 07 ' ' N
' O~ nn N OO N C~ ~N N N M m M N CN V
I ~ ~

I
0 00 0 0o o :o 0 0 0 0 0 , 0 0\ 0 0 O 00 0 0I 0 o~ 0 0 0 0 0 0 0 0 0 0 0 0~0~ O~~
( co cho~ cmn ao c~ca v cDoo co apm~p N n r i ' O ~ N

V N(D n d0c0 V On N V O CnV ( N r ; ~ ~

COc~ N N(D O Vf~ O O 00 n n CO V N0 O V
~ .

tf'v C VM V ~_N ~ Iv-fD c0c0 c0 v v ~ <i' ~a I
v ~M n N (O r~ c~Q7V N N 00 (Dc0~ COO
~

0 oa~ n rao ao ~ n con o 0 0~ a~o~o n v ' M c'N M M('7M CVc\ M P C17 d ~ M r ~r N N

Cp 00 O~O VC'~ N O CO M ~ COM~ r (p , c0 VN c0i0V O~ O (Dc0tn n n O OO r O

N N N N N N

I
OD cDOO n CWI7 ~ ~V COO~ t~ ~ '~N m I
~

a0 cDt0 IscD~ O> ~N ~ yfi V V
j N ~N N NN N NQ~ ~ N ~ N N Iv N ON ~ Q~
_ ~M ~ r. N C'tn 07NO rn V i i i c0to ~ ~ ~c0 ~ N N N c0c0 N
~

cff ~~ N N ~6~ c~ N N c0 4CN t0 ' ~

N ~Q Ch~7c0 c0 N~ O~fdV c0CO C~ c000~ c0 O

N _ n cON c0 c0t0n N O O N O ~ N
, ' M~ m C~O C OpV V N NN tnm M (lC~ ~

m ~ ri ~ r O r r N rr ~ r n' ~ oi ui ~m m ca ~ ~ ~ o vn ~'~
o y _m ~ N_ ~ OM O ' O ~ m ~ Q OM ~ N

COO ~ O C m O M O ~~ n V
D D ~

m C3 H U m J

~ Q (n Q H' ~ Q CO ~ U ~ ~U

ZZ I , z a0 07 ~ N CO c0 O O~

n O

V c0 V N C9 i I I

~ a ~ a w x a o i ~ ~; p o g ~ w o ~

m n ca n ao , o~ o !

ch co ch ch m ~ m v 'o m c Q

~ a ~ ~ --0 0 d O O L O U
"

O O
cu o L m a .
U

U_ (f7 N O L
~

O U O ~ ~ U O

C_ O O = C ~ ~ 'O
. . .

O O U ~ O O N

N U
o E O ~u >, ~ > o E a a~ E
w ' , E ~ 'c c a E

c O y (~ O

y N N r N >. V7 m v N m ~ N

C (n O C O N

_ O (~ O ~ ( (~ E N(6 fn (0 ~ ~ C O OO

C O N N
.c . ~ ~

r a ~ ~ c fl. c ci a ~

U N N

_' ~ ' E a~~

~ , C
Q7 .fl O ~ O L O UL c4 >

_ c0 ~ y c0 O 11 7 EO
c0 O Q ~

O ~ ' j ~ Q Q L
V D

Y o ~a.o o Y ~O a~
.E a~

o E a~' > E a~ ' E
m d' o > o E u~.>
.c ~ L m ~ ~ - a vm m o Q O j 'X 7 O L ~U
N

F- Z a o ~ Z O Oa E m 0 0 0 0 0~ 00 0 ' m 'n a ~ !~ mm o0 o ~ y _ v ~ ~ ~ O~ p a m C c' 0 ~ ~

N m m m M MC' N

~ a I

V 0000 !~~ u7 VV N

0 0 0 0 ~II o o w :
m O ~

c O a0 ~, a0 ~!c p a0 I ~ I~ 07 v v m ~ mN v_ o mv a~
~

m I

N ~ M c' m T v ~ M N ~

I ~~ O

W v v W i ~ n m N c! N 0O O

M I d I ' c4 t0O ~ ~ c~ c6 ca ' c0 cU~ ~ c4 c c0 m m c~i r m r m ~ ~ V N ti ~,-~

Gp 0 N N

O 0 N r ,'!

D

Q. m c~

~ U Q

U Z Z

~

N ~

m m c ~

m u. c~ a a p v ~ v v Table XXX-B
EVEA ORF START (bp)...ENDLENGTH ORIENTATIONINTEGRITY
(bp) (aa) Contig1 1 ... 747 248 POSITIVE COMPLETE

2 895...1962 355 POSITIVE COMPLETE
3 1962...2951 329 POSITIVE COMPLETE

4 3054...4082 342 POSITIVE COMPLETE

4137...4874 245 POSITIVE COMPLETE

6 4871...5545 224 POSITIVE COMPLETE

7 5598...6869 423 POSITIVE COMPLETE

8 6946...7689 247 POSITIVE COMPLETE

9 7735...8763 342 POSITIVE COMPLETE

8753...9526 257 POSITIVE COMPLETE
11 9523...10467 314 POSITIVE COMPLETE
12 10464...11312 282 POSITIVE COMPLETE
13 11314...12594 426 POSITIVE COMPLETE
14 12627...13820 397 POSITIVE COMPLETE

13867...15204 445 POSITIVE COMPLETE

Contig16 3806 ... 6 1266 NEGATIVE COMPLETE

17 4893... 3859 344 _ COMPLETE
NEGATIVE

18 5247...6305 _ POSITIVE COMPLETE

19 7794...6313 493 NEGATIVE COMPLETE

9421 ...7943 492 NEGATIVE COMPLETE

21 11171...9579 530 NEGATIVE COMPLETE

22 11575...12615 346 POSITIVE COMPLETE

23 12612...14066 484 POSITIVE COMPLETE

24 14071 ...15105 344 POSITIVE COMPLETE

15122...16075 317 POSITIVE COMPLETE

26 17064...17312 82 POSITIVE COMPLETE

27 17463...18377 304 POSITIVE COMPLETE

28 19301...18030 423 NEGATIVE COMPLETE

29 20061...19309 250 NEGATIVE COMPLETE

20262...21023 253 POSITIVE COMPLETE

31 22144...21122 340 NEGATIVE COMPLETE

32 23214...22159 351 NEGATIVE COMPLETE

33 24254...23211 347 NEGATIVE COMPLETE

34 25177...24251 308 NEGATIVE COMPLETE

26343...25174 389 NEGATIVE COMPLETE

36 26626...27864 412 POSITIVE COMPLETE

37 27875...28996 373 POSITIVE COMPLETE

38 29105...30355 416 POSITIVE COMPLETE

39 30363...30965 200 POSITIVE COMPLETE

32002...30923 359 NEGATIVE COMPLETE

41 32933...32004 309 NEGATIVE COMPLETE
42 33190...34254 354 POSITIVE COMPLETE

43 34375...35229 284 POSITIVE COMPLETE

44 35226...36314 362 POSITIVE COMPLETE

36361...37116 251 POSITIVE COMPLETE

Figure 10 is a schematic representation comparing the everninomicin biosynthetic locus from Micromonospora carbonacae var. aurantiaca (EVER) to the everninomicin biosynthetic locus from Micromonospora carbonacea var.
africana (EVEA). The scale at the top of the figure is in kilobasepairs. Solid black arrows depict the relative positions of the individual ORFs in EVER and EVEA with the arrowhead indicating the orientation of each ORF; the corresponding four letter protein family designation is indicated to the right of each ORF. The empty arrows between the two loci highlight segments that contain a number of ORFs whose relative order and orientation is identical between the two loci. The orientation of the empty arrows indicates the relative order of the ORFs in each segment; the segments in the EVER locus have all arbitrarily been assigned the "left-to-right" orientation. A segment is defined as two or more adjacent ORFs whose relative order and orientation is identical in the loci being compared. The solid lines between the two loci link each segment from one locus to the corresponding segment in the other locus. The dashed lines between the two loci link individuaN pairs of homologous ORFs that do not form segments.
ORFs in each locus that do not have a counterpart in the other locus are indicated by an 'X'. EVER contains ten (10) ORFs for which no counterpart is found in EVEA; these include ORFs designated as members of the protein families MTBA, MTFH, UEVB, MTIA, OXRU, OXRT, DEPD, ENGA, REGL, and KINB. EVEA contains four (4) ORFs for which no counterpart is found in EVER; these include ORFs designated as members of the protein families HYDH, OXRF, EFFA and OXRF. ORFs of the protein families MTBA, MTFH, UEVB, MTIA, OXRU, OXRT, DEPD, ENGA, REGL, KINB, HYDH, OXRF, EFFA and OXRF are not likely to be involved in the assembly of the core structure of the everninomicin-type orthosomycins.
Rather, they are believed to be involved in various modifications of the core structure including methylation (MTBA and MTFH); oxidation/reduction (OXRU, OXRT, OXRF); or in resistance mechanisms (MTIA, EFFA).
A search of NCBI's Conserved Domain Database with Reverse Position Specific BLAST (Altschul et al., (1997) Nucleic Acids Res. 25:3389-3402) revealed that the UEVB family displays structural homology to the double stranded beta helix domain involved in carbohydrate binding and in protein-protein interactions in different contexts. Thus the UEVB family may represent small, carbohydrate-binding proteins that may specifically recognize certain substructures of orthosomycins. One interesting possibility is that the UEVB proteins recognize and bind to the sugar residue H so as to block further modifications. This hypothesis is based on the fact that the everninomicin locus from Micromonospora carbonacae var. africana does not contain a UEVB homologue and that this organism has been described to produce everninomicins with various substitutions on sugar residue H, including an ester linkage to an orsellinic acid moiety. Thus, based on this hypothesis, one would predict that disruption of the UEVB ORF in the AVIA, AVIL, or EVER loci or other orthosomycin loci that may contain such an ORF
may result in the production of new orthosomycins with additional substitutions in sugar residue H.
The finding that the ORFs of the EVER and EVEA loci are shuffled to such an extent and the presence of ORFs that have no counterparts in each locus is unexpected as both loci produce related compounds and the respective organisms containing these loci are both classified as Micromonospora carbonacae It is to be understood that the embodiments described herein are for illustrative purposes only and that various modifications or changes in light thereof will be suggested to persons skilled in the art and are to be included within the spirit and purview of this application and scope of the appended claims. All publications, patents and patent applications cited herein are hereby incorporated by reference in their entirety for all purposes.

SEQUENCE LISTING
Applicant name: Farnet, Chris Staffa, Alfredo Zazopoulos, Emmanuel Title of invention: COMPOSITIONS AND METHODS FOR IDENTIFYING AND
DISTINGUISHING ORTHOSOMYCIN BIOSYNTHETIC LOCI
Correspondence address: 7290 Frederick-Banting Saint-Laurent, Quebec, H4S 2A1 Current Application Data Application Number: 2,375,097 Filing Date: March 28, 2002 Classification: C12Q-1/68 Patent Agent Information Name: Ywe J. Looper Reference Number: 10961 File reference: 3001-11CA
Number of SEQ ID Nos: 282 Software: PatentIn version 3.0 Information for SEQ ID NO: 1 Length: 346 Type: PRT
Organism: Streptomyces mobaraensis Strandedness: positive Sequence: 1 Met Ala Gly Ser Glu Asp Glu Arg Asp Gly Gly Thr Gly Ala Glu Arg Gly Arg Ser Val Leu Val Ile Gly Gly Ala Gly Phe Ile Gly Ser His Tyr Val Arg Glu Leu Ile Arg Asp Gly Gly Pro Ala Arg Val Thr Val Leu Asp Lys Leu Thr Tyr Ala Gly Asn Pro Ala Asn Leu Asp Pro Val Ala Gly Arg Tyr Thr Phe Val His Gly Asp Ile Cys Asp Thr Gly Leu Leu Ala Asp Val Val Pro Gly His Asp Leu Val Val Asn Phe Ala Ala Glu Ser His Val Asp Arg Ser Ile Ala Asp Ala Ala Pro Phe Ile Arg Thr Asn Val Leu G1y Val Gln Ala Leu Met Gln Val Cys Met Glu Ala Gly Thr Arg Lys Ile Val Gln Val Ser Thr Asp Glu Val Tyr Gly Ser Ile Glu Thr Gly Ser Trp Asp Glu Asp Ala Leu Ile Ala Pro Asn Ser Pro Tyr Ala Ala Ser Lys Ala Gly Gly Asp Met Val Ala Leu Ala Tyr Ala Arg Thr His Gly Leu Pro Val Ser Val Thr Arg Cys Gly Asn Asn Tyr Gly Pro Tyr Gln Phe Pro Glu Lys Val Val Pro Leu Phe Thr Thr Arg Leu Leu Asp Gly His Gly Ile Pro Leu Tyr Gly Asp Gly Gly Asn Val Arg Asp Trp Val His Val Ser Asp His Val Arg Gly Ile Arg Leu Val Ala Glu Arg Gly Leu Pro Gly Gln Val Tyr His Ile Ala Gly Thr Ala Glu Leu Thr Asn Leu Glu Leu Thr Arg His Leu Leu Asp Ala Leu Gly Ala Asp Trp Asp Arg Val Glu Arg Val Ala Asp Arg Lys Gly His Asp Arg Arg Tyr Ser Leu Ser Asp Ala Arg Leu Arg Ala Leu Gly Tyr Thr Pro Gln Val Pro Phe Glu Gln Gly Leu Ala Asp Thr Val Arg Trp Tyr Ala Glu Asn Arg Asp Trp Trp Glu Pro Leu Asn Glu Arg Ala Arg Ser Gly Ala Ala Ala Pro Ala Val Val Gly Information for SEQ ID NO: 2 Length: 1041 Type: DNA
Organism: Streptomyces mobaraensis Strandedness: positive Sequence: 2 atggcgggcagcgaggacgagcgggacggcggcacgggcgcggagcggggcaggagcgtc60 ctcgtcatcggcggggcgggcttcatcggctcccactacgtgcgcgaactgatccgggac120 ggcggcccggcgcgggtgaccgtgctggacaagctgacctacgccggcaacccggccaac180 ctggacccggtcgccggccggtacaccttcgtccacggcgacatctgcgacaccgggctg240 ctcgccgacgtcgtccccggccacgacctggtggtcaacttcgcggcggagtcgcacgtc300 gaccggtcgatcgccgacgcggcccccttcatccgcaccaacgtgctgggcgtccaggcc360 ctgatgcaggtgtgcatggaggccggcacgcgcaagatcgtgcaggtctccaccgacgag420 gtgtacggcagcatcgagaccggctcctgggacgaggacgcgctgatcgcgcccaactcg480 ccctacgcggcctccaaggcgggcggcgacatggtcgccctggcctacgcccggacccac540 gggctgccggtgagcgtgacgcggtgcggcaacaactacgggccctaccagttcccggag600 aaggtcgtcccgctgttcaccacccggctgctggacgggcacggcatcccgctgtacggc660 gacggcggcaacgtccgcgactgggtgcacgtctccgaccacgtccgcggcatccggctg720 gtcgccgagcgcggcctgccggggcaggtctaccacatcgcgggcaccgccgagttgacc780 aacctggagctcacccggcacctgctggacgcgctgggcgccgactgggaccgggtcgag840 cgggtggccgaccgcaagggccacgaccggcgctactcgctctccgacgcccggctccgc900 gcactcggctacacgccccaggtgcccttcgagcagggcctggccgacaccgtgcgctgg960 tacgccgagaaccgggactggtgggagccgctcaacgagcgcgcccgttccggggccgcg1020 gcaccggccgtcgtcggctag 1041 Information for SEQ ID NO: 3 Length: 329 Type: PRT
Organism: Micromonospora carbonacea aurantiacaStrandedness: positive Sequence: 3 Val Pro Arg Val Phe Val Ala Gly Gly Ala Gly Phe Ile Gly Ser His Tyr Val Arg Glu Leu Val Ala Gly Ala Tyr Ala Gly Trp Gln Gly Cys Glu Val Thr Val Leu Asp Ser Leu Thr Tyr Ala Gly Asn Leu Ala Asn Leu Ala Gly Val Arg Asp Ala Val Thr Phe VaI Arg GIy Asp IIe Cys Asp Gly Arg Leu Leu Ala Glu Val Leu Pro Gly His Asp Val Val Leu Asn Phe Ala Ala Glu Thr His Val Asp Arg Ser Ile Ala Asp Ser Ala Glu Phe Leu Arg Thr Aan Val Gln Gly Val Gln Ser Leu Met Gln Ala Cys Leu Thr Ala GIy Val Pro Thr Ile Val Gln Val Ser Thr Asp Glu Val Tyr Gly Ser Ile Glu Ala Gly Ser Trp Ser Glu Asp Ala Pro Leu Ala Pro Asn Ser Pro Tyr Ala Ala Ala Lys Ala Gly Gly Asp Leu Ile Ala Leu Ala Tyr Ala Arg Thr Tyr Gly Leu Pro Val Arg Ile Thr Arg Cys Gly Asn Asn Tyr Gly Pro Tyr Gln Phe Pro Glu Lys Val Ile Pro Leu Phe Leu Thr Arg Leu Met Asp Gly Arg Ser Val Pro Leu Tyr Gly Asp Gly Arg Asn Val Arg Asp Trp Ile His Val Ala Asp His Cys Arg Gly Ile Gln Thr Val Val Glu Arg Gly Ala Ser Gly Glu Val Tyr His Ile Ala Gly Thr Ala Glu Leu Thr Asn Leu Glu Leu Thr Gln His Leu Leu Asp Ala Val Gly Gly Ser Trp Asp Ala Val Glu Arg Val Pro Asp Arg Lys Gly His Asp Arg Arg Tyr Ser Leu Ser Asp Ala Lys Leu Arg Ala Leu Gly Tyr Ala Pro Arg Val Pro Phe Ala Asp Gly Leu Ala Glu Thr Val Ala Trp Tyr Arg Ala Asn Arg His Trp Trp Glu Pro Leu Arg Lys Gln Leu Asp Ala Val Pro His Asp Information for SEQ ID NO: 4 Length: 990 Type: DNA
Organism: Micromonospora carbonacea aurantiaca Strandedness: positive Sequence: 4 gtgccgagggtcttcgtggccggtggcgccggcttcatcggctcgcactacgtgcgggaa60 ctcgtcgccggggcgtacgccgggtggcagggctgcgaggtcacggtgctcgacagcctc120 acctatgcgggaaacctcgcgaatctcgccggggtgcgggacgccgtcaccttcgtccgc180 ggtgacatctgcgacggccgactgctcgccgaggtcctgcccggccacgacgtggtgctg240 aacttcgcggccgagacccacgtcgaccggtccatcgccgactcggcggagttcctgcgg300 accaacgttcagggcgtccagtcgctcatgcaggcgtgcctgaccgccggagtgccgacc360 atcgtccaggtctccaccgacgaggtgtacggcagcatcgaggccggatcctggagcgag420 gacgcgccgctggcgccgaactcgccgtacgccgcggccaaggcgggcggtgacctgatc480 gccctggcgtacgcgcggacgtacggactgccggtccgcatcaccaggtgcggcaacaac540 tacggtccataccagttcccggagaaggtgatccccctcttcctcacccgtctgatggac600 ggtcggtcggtcccgctcta cggcgacgggcgcaacgtccgcgactggat ccacgtggcc660 gaccactgccgtggcatcca gacggtggtcgaacgcggtgcgtccggcga ggtctaccac720 atcgccgggacggccgagct gaccaacctggaactcacccagcacctgct ggacgcggtc780 ggcggaagctgggacgccgt cgagagggtgcccgaccgtaagggccacga ccgccgctac840 tcgctttccgacgcgaagct ccgggccctgggctacgccccgcgcgtccc cttcgccgac900 ggcctggccgagacggtcgc gtggtaccgcgcgaaccggcactggtggga gccgctgcgg960 aaacaactcgacgccgtccc gcacgactga 990 Information for SEQ
ID N0:

Length:

Type:
PRT

Organism:Micromonospora carbonacea africana Strandedness:
positive Sequence: 5 Met Arg Arg Val Leu Val Thr Gly Gly Ala Gly Phe Ile Gly Ser Gln Tyr Val Arg Asp Leu Ala Thr Gly Ala Tyr Pro Asp Thr Ala Gln Ala Arg Val Thr Val Leu Asp Lys Leu Thr Tyr Ala Gly Asn Leu Ala Asn Leu Glu Pro Val Gln Asp Arg Ile Thr Phe Val Gln Gly Asp Val Cys Asp Thr Ala Leu Leu Ala Glu Val Leu Pro Gly His Asp Val Val Val Asn Phe Ala Ala Glu Ser His Val Asp Arg Ser Ile Ala Asp Ser Ala Glu Phe Val Arg Thr Asn Val Gln Gly Val Gln Thr Leu Met Gln Ala Cys Leu Asp Ala Gly Val Ala Arg Val Val Gln Val Ser Thr Asp Glu Val Tyr Gly Ser Ile Asp Glu Gly Ser Trp Ala Glu Asp Thr Pro Leu Ala Pro Asn Ser Pro Tyr Ala Ala Ala Lys Ala Gly Gly Asp Leu Ile Ala Leu Ala Tyr Ala Arg Thr His Gly Leu Pro Val Cys Leu Thr Arg Cys Gly Asn Asn Tyr Gly Pro Tyr Gln Phe Pro Glu Lys Leu Ile Pro Leu Phe Val Thr Glu Leu Leu Asn Gly Arg Arg Val Pro Leu Tyr Gly Asp Gly Gly Asn Val Arg Asp Trp Ile His Val Thr Asp His Cys Arg Gly Ile Gln Thr Val Val Asp Arg Gly Val Pro Gly Glu Val Tyr His Ile Ala Gly Thr Ala Glu Leu Ser Asn Met Glu Leu Thr Gly Arg Leu Leu Asp Ala Leu Gly Ala Gly Trp Asp Arg Val Glu Arg Val Pro Asp Arg Lys Gly His Asp Arg Arg Tyr Ser Leu Thr Asp Ala Lys Leu Arg Ala Leu Gly Tyr Arg Pro Glu Val Ala Phe Ala Asp Gly Leu Ala Glu Thr Ile Asp Trp Tyr Arg Thr His Arg Asp Trp Trp Glu Pro Leu Lys Lys Gln Ala Asp Arg Ser Pro Val Pro Information for SEQ ID N0: 6 Length: 990 Type: DNA
Organism: Micromonospora carbonacea africana Strandedness: positive Sequence: 6 atgcgtcgcgtcctggtcaccggcggtgccggtttcatcggctcgcagtacgtccgcgac60 ctggccaccggtgcctaccccgacacggcgcaggcccgggtgacggtgctggacaagctg120 acgtacgcgggcaacctcgccaacctcgaaccggtccaggaccggatcaccttcgtccag180 ggcgacgtctgcgacacggcgctgctggccgaggtgctgcccgggcacgacgtggtggtc240 aacttcgccgccgagtcgcacgtcgaccggtccatcgccgactcggcggagttcgtccgc300 accaacgtgcagggcgtccagacgctcatgcaggcgtgtctcgacgccggggtcgcccgg360 gtggtccaggtctccaccgacgaggtctacggcagcatcgacgagggttcctgggccgag420 gacacccccctggcgccgaactccccgtacgcggcggcgaaggccggcggggacctgatc480 gccctggcctacgcccgcacccacgggctgccggtctgcctcacccggtgcggcaacaac540 tacgggccgtaccagtttccggagaagctgatcccgctgttcgtcaccgagctgttgaac600 gggcgacgggtgccgctgtacggcgacggcgggaacgtccgcgactggatccacgtgacg660 gaccactgccggggcatccagaccgtcgtcgaccgcggtgtccccggcgaggtctaccac720 atcgccggcacggctgagctgtccaacatggagctgaccgggcggctgctggacgccttg780 ggggccgggt gggaccgggt cgagcgggtg ccggaccgca agggccacga ccgccgctac 840 tcgctgacgg acgcgaaact gcgggcgctc ggctaccggc ccgaggtcgc cttcgccgac 900 ggcctggccg agacgatcga ctggtaccgg acgcaccggg actggtggga gccgctgaag 960 aagcaggccg accggtcgcc ggtgccctga 990 Information for SEQ ID NO: 7 Length: 348 Type: PRT
Organism: Streptomyces mobaraensis Strandedness: positive Sequence: 7 Met Ile Leu Ser Arg Arg Ala Leu Ile Thr Gly Ile Thr Gly Gln Asp Gly Thr Tyr Leu Ala Arg Gln Leu Leu Glu Ser Gly Tyr Glu Val Phe Gly Met Val Arg Gly Gln Gly Arg Pro Tyr Ala Arg Asp Gly Ser Pro Leu His Pro Asp Ile Arg Val Val Ser Gly Asp Leu Leu Asp Gln Thr Ser Leu Ile Ala Ala Val Glu Gln Ala Ser Pro Thr Glu Val Tyr Asn Leu Gly Ala Leu Ser Tyr Val Pro Val Ser Trp Lys Gln Pro Ala Val Ala Ala Glu Val Thr Gly Lys Gly Val Leu Arg Met Leu Glu Ala Ile Arg Ser Val Ala Gly Leu Asn Ala Ser Arg Thr Thr Gly Ser Gly Val Pro Arg Phe Tyr Gln Ala Ser Thr Ser Glu Met Phe Gly Lys Val Arg Glu Thr Pro Gln Ser Glu Ser Thr Pro Phe His Pro Arg Ser Pro Tyr Gly Val Ala Lys Ala Phe Gly His Tyr Met Val Gln Asn Tyr Arg Glu Ser Tyr Gly Met Phe Ala Val Ser Gly Ile Leu Phe Asn His Glu Ser Pro Ile Arg Gly Pro Glu Phe Val Thr Arg Lys Val Ser Leu Gly Val Ala Ala Val Lys Leu Gly Leu Val Asp Lys Leu Arg Leu Gly Asn Leu Asp Ala Glu Arg Asp Trp Gly Phe Ala Gly Asp Tyr Val Arg Gly Met Arg Met Met Leu Ala Gln Asp Glu Pro Glu Asp Ile Val Leu Gly Thr Gly Val Thr His Ser Val Arg Asp Leu Val Glu Phe Ala Phe Ala His Ala Gly Leu Asp Trp Arg Asp His Val Glu Val Asp Pro Arg Leu Leu Arg Pro Ala Glu Val Glu Leu Leu Cys Ala Asp Leu Ser Arg Ala Arg Glu Lys Leu Gly Trp Lys Pro Glu Val Ser Phe Glu Glu Leu Ile Ala Met Met Val Asp Asn Asp Leu Arg Leu Leu Thr Glu Ser Glu Asn Ala Ala Gly Glu Ala Val Leu Glu Gln Ala Gly Leu Trp Information for SEQ ID NO: 8 Length: 1047 Type: DNA
Organism: Streptomyces mobaraensis Strandedness: positive Sequence:a atgattctgtctaggcgagcgctcattacgggaatcacaggccaggatggcacttatctc60 gcgcgtcagctcctcgagtccggatacgaggtgttcggaatggtgcgcgggcaggggcgg120 ccgtacgcgcgcgacggcagtcccctgcacccggacatccgggtggtgagcggcgacctg180 ctggaccagacgagcctgatagccgcggtcgagcaggcgagccccaccgaggtctacaac240 ctcggcgccctgtcgtacgtcccggtctcctggaagcagcccgccgtcgccgcggaggtg300 accgggaagggcgtgctgcggatgctggaggccatccgcagcgtggcggggctcaacgcg360 tcccggaccaccggcagcggcgtcccgcgcttctaccaggcgtccacctccgagatgttc420 ggaaaggtgcgcgagaccccgcagagcgagtccacgcccttccacccccgcagcccctac480 ggcgtggccaaggcgttcgggcactacatggtgcagaactaccgcgagtcctacgggatg540 ttcgcggtcagcggcatcctgttcaaccacgagtcccccatccgcggccccgagttcgtc600 acccggaaggtctcgctcggggtggccgccgtgaagctcggcctggtggacaagctccgg660 ctgggcaacctcgacgccgagcgcgactggggcttcgccggggactacgtgcgcggcatg720 cggatgatgctcgcccaggacgagccggaggacatcgtcctgggcaccggcgtcacccac780 agcgtccgcgacctggtggagttcgccttcgcgcacgccggcctcgactggcgcgaccac840 gtggaggtcgacccccggctgctccgcccggcggaggtggaactcctctgcgccgacctc900 _ g _ agccgcgccc gggagaagct cggctggaag cccgaggtgt cgttcgagga actgatcgcc 960 atgatggtcg acaacgacct gcgcctgctc accgagagcg agaacgccgc cggtgaagcc 1020 gtcctcgaac aggccggtct gtggtaa 1047 Information for SEQ ID NO: 9 Length: 346 Type: PRT
Organism: Micromonospora carbonacea aurantiaca Strandedness: negative Sequence: 9 Leu Thr Arg Arg Ala Leu Ile Thr Gly Ile Thr Gly Gln Asp Gly Thr Tyr Leu Ala Arg His Leu Leu Ala Ala Gly Tyr Asp Val Tyr Gly Met Val Arg Gly Gln Asn Ser Pro Ser Ala Arg Cys Gly Arg Gln Val His Pro Aep Val Arg Leu Val Asn Gly Asp Leu Met Asp Gln Ser Ser Leu Ile Ser Ala Val Asp Arg Val Arg Pro Asp Glu Ile Tyr Asn Leu GIy Ala Leu Ser Tyr Val Pro Thr Ser Trp Arg Gln Pro Asn Thr Thr Ala Glu Ile Thr Gly Thr Gly Val Val Arg Met Leu Glu Ala Val Arg Ile Val Ala Gly Ile Thr Ser Ser Arg Thr Pro Gly Pro Ser Arg Pro Arg Phe Tyr Gln Ala Ser Ser Ser Glu Met Phe Gly Lys Val Arg Glu Thr Pro Gln Asn Glu Leu Thr Pro Phe His Pro Arg Ser Pro Tyr Gly Val Ala Lys Val Phe Gly His Tyr Thr Val Gln Asn Tyr Arg Glu Ser Tyr Gly Met Tyr Ala Val Ser Gly Met Leu Phe Asn His Glu Ser Pro Ile Arg Gly Pro Glu Phe Val Thr Arg Lys Val Ser Leu Gly Ala Ala Ala Val Lys Leu Gly Leu Arg Asp Ser Leu Arg Leu Gly Asn Leu Met Ala Glu Arg Asp Trp Gly Phe Ala Gly Asp Tyr Val Arg Gly Met Ser Met _ g _ Met Leu Ala Gln Asp Glu Pro Asp Asp Tyr Val Leu Gly Thr Gly Ile Ala His Ser Val Arg Glu Leu Val Glu Leu Ala Phe Ala His Val Asp Leu Asp Trp Arg Asp His Val Val Leu Asp Glu Ala Leu Gln Arg Pro Ala Glu Val Asp Leu Leu Cys Ala Asp Ala Thr Lys Ala Gln Gln Arg Leu Gly Trp Lys Pro Thr Val Tyr Phe Glu Glu Leu Val Gly Met Met Val Asp Ser Asp Leu Arg Leu Leu Ser Gly Pro Gln Cys Ser Glu Gly Gly Val Ile Arg Glu Leu Ala Asp Leu Trp Information for SEQ ID N0: 10 Length: 1041 Type: DNA
Organism: Micromonospora carbonacea aurantiaca Strandedness: negative Sequence:10 ctgactaggcgagcgctgatcacagggattaccggtcaggacggtacttatcttgcgcgg60 cacctcctggccgctggatatgacgtctatggaatggtgcggggccagaattcgccgagc120 gcgcgctgcggccggcaggtgcaccccgacgttcggcttgtcaacggcgacctgatggac180 cagtcgagcctgatctcggcggtggacagggttcggccggacgagatctacaacctcggc240 gcactgtcctacgtccccacctcgtggcggcagcccaacaccaccgcggagatcaccggg300 acgggcgtggtgcgcatgctcgaagccgtccggatcgtcgcaggcatcaccagctcgcgt360 acccccggcccgagtcggccacgcttctatcaggcctcgtcgtccgagatgttcggcaag420 gtgcgggagaccccccagaatgagctgacgcccttccacccacgcagtccgtacggcgtc480 gccaaagtgttcggccactacacggtgcagaactaccgcgagtcgtacggcatgtacgcg540 gtctccggcatgctcttcaaccatgaatcacccatccgcggaccggagttcgtgacgcga600 aaggtgtcgctgggcgcggcggccgtgaaactggggctccgggactcgttgcgactgggc660 aatctgatggccgagcgggactggggtttcgccggcgactacgtccgcggcatgtcgatg720 atgctcgcccaggacgagcctgacgactacgtcctgggtaccgggatcgcgcacagcgtg780 cgggagctggtcgagctggccttcgcccacgtcgacctggactggcgtgaccacgtcgtc840 ctcgacgaggcgctccagcgtccggccgaggtcgacctgctctgcgccgacgcgacgaag900 gcccagcagcgcctcggctggaagccgacggtctacttcgaggagctggtcgggatgatg960 gtcgacagcg atctccgtct gctgtcgggt ccgcagtgca gtgagggcgg cgtcattcgc 1020 gagctggccg atctgtggtg a 1041 Information for SEQ ID NO: 11 Length: 346 Type: PRT
Organism: Micromonospora carbonacea africana Strandedness: positive Sequence: 11 Leu Thr Arg Arg Ala Leu Ile Thr Gly Ile Thr Gly Gln Asp Gly Thr Tyr Leu Ala Glu His Leu Leu Gln Ser Gly Tyr Glu Val Phe Gly Leu Val Arg Gly Gln Thr Ala Pro Ser Val Arg Ser Leu Arg Gln Pro Asp Pro Ala Val Lys Leu Ile Ser Gly Asp Leu Leu Asp Gln Thr Ser Leu Val Ala Ala Ile Glu Arg Ala Ala Pro Asp Glu Val Tyr Asn Leu Gly Ala Leu Ser Tyr Val Pro Val Ser Trp Arg Gln Ser Thr Thr Thr Ala Glu Val Thr Gly Met Gly Val Leu Arg Met Leu Glu Ala Leu Arg Ile Val Gly Gly Leu Ser Asp Ser Arg Ser Pro Ala Ala Gly Gln Pro Arg Phe Tyr Gln Ala Ser Ser Ser Glu Met Phe Gly Lys Val Arg Glu Pro Val Gln Asn Glu Leu Thr Pro Phe His Pro Arg Ser Pro Tyr Gly Ala Ala Lys Ala Phe Gly His Tyr Met Val Gln Asn Tyr Arg Glu Ser Tyr Gly Met Tyr Ala Val Ser Gly Ile Leu Phe Asn His Glu Ser Pro Val Arg Gly Pro Glu Phe Val Thr Arg Lys Val Ser Leu Gly Val Ala Ala Val Lys Leu Gly Ile Arg Ser Ser Leu Arg Leu Gly Asn Leu Ser Ala Glu Arg Asp Trp Gly Phe Ala Gly Asp Tyr Val Arg Gly Met Val Leu Met Leu Ala Gln Asp Glu Pro Glu Asp Tyr Val Leu Gly Thr Gly Val Thr His Ser Val Arg Glu Leu Val Glu Ala Ala Phe Ala His Val Gly Leu Asn Trp Arg Asp His Val Val Val Asp Glu Ser Leu Ile Arg Pro Ala Glu Val Glu Leu Leu Cys Ala Asp Pro Thr Lys Ala Arg Gln Arg Leu Gly Trp Lys Pro Ser Val Ser Phe Glu Glu Met Val Ala Met Met Val Asp Ser Asp Leu Arg Leu Leu Ala Asp Thr Asp Gly Ser Ser His Ala Phe Ser Ala Glu Leu Ala Glu Leu Trp Information for SEQ ID NO: 12 Length. 1041 Type: DNA
Organism: Micromonospora carbonacea africana Strandedness: positive Sequence:12 ctgacacggcgggcgctgatcactggaattaccggccaggacggcacgtatctcgcggag60 cacctgcttcagtccggatacgaggtatttggattggtgcgcgggcagaccgcgccctcg120 gtccgcagccttcggcaacctgatccagcggtcaagctgatcagcggcgaccttctggat180 cagacgagcctggtggcggcgatcgaacgcgcggcgccggacgaggtctacaacctcggc240 gcgctgtcgtacgtgccggtgtcgtggcggcagtccaccacgacggcggaggtcaccggc300 atgggtgtgctccgcatgctcgaagccttgcggatcgtggggggcctgtcggattcccgc360 agtcccgcagccggtcagccgcgcttttatcaggcgtcttcgtcggagatgttcggcaag420 gtgcgggagcccgtccagaatgagctgaccccgttccatccgcgtagcccgtacggcgcg480 gccaaggcgttcgggcattacatggtgcagaactaccgtgagtcgtacggcatgtatgcc540 gtctccggcattctgttcaaccacgaatcaccggtgcgtggtcccgagttcgtcacccgc600 aaggtgtcgctgggcgtggcggcggtgaagctgggcattcgcagctcgcttcgcctgggc660 aacctctcggccgagcgggactggggcttcgcgggcgactacgtgcggggcatggtcctg720 atgctggcccaggacgagccggaggactacgtcctcggcacgggggtcacgcacagcgtc780 cgcgagctggtcgaggcggccttcgcccacgtgggcctcaactggcgggaccacgtggtg840 gtggacgagtcgctcatccggcccgccgaggtcgagctgctctgcgcggatccgacgaag900 gcccgccagcggctcggctggaaaccctccgtctccttcgaggagatggtcgccatgatg960 gtcgacagcgatctgcgcctgttggcggacacggacgggtcttcccacgccttttccgcc1020 gaactggccg agctgtggtg a 1041 Information for SEQ ID NO: 13 Length: 324 Type: PRT
Organism: Streptomyces mobaraensis Strandedness: positive Sequence: 13 Met Ala Asp Glu Arg Glu Arg Val Leu Val Ala Gly Gly Thr Gly Phe Val Gly Arg Arg Leu Cys Ala Asp Leu Val Ala Ala Gly Ala Glu Val Ala Ala Val Ala Arg Arg Ala Pro Asp Leu Pro Pro Pro Cys Arg Ile Leu Thr Leu Asp Val Thr Thr Ala Ser Pro Gly Glu Leu Ala Gly Leu Ile Asp Ser Phe Arg Pro His Thr Leu Val Asn Ala Ile Gly Ser Asn Trp Gly Ile Ala Glu Arg Asp Leu Glu Ala Asn Cys Ala Val Pro Ala Arg Arg Leu Met Ala Ala Leu Arg Arg Thr Thr Cys Arg Pro Tyr Val Val His Leu Gly Ser Val Leu Glu Tyr Gly Pro Thr Ala Pro Gly Glu Thr Thr Arg Thr Ala Thr Pro Ala Arg Pro Thr Thr Thr Tyr Gly Lys Ala Lys Leu Ala Ala Ser Arg Ala Val Leu Glu Ala Ala Ala Glu Gly Val Val Glu Ala Gly Val Leu Arg Ile Gly Asn Val Ala Gly Pro Gly Thr Pro Ala Val Ser Leu Leu Gly Arg Val Ala Gly Arg Leu Ala Glu Ala Val Ala Arg Asp Thr Leu Pro Ala Val Val Glu Leu Ser Gln Leu Arg Ala His Arg Asp Tyr Val Asp Val Arg Asp Val Ser Glu Ala Val Leu Ala Ala Thr Arg Ala Arg Ile Pro Gly Leu Val Val Pro Ile Gly Arg Gly Glu Ala Val Ala Val Arg Trp Leu Val Asp Leu Leu Val Glu Val Ser Gly Val Pro Ala Glu Val Arg Glu Leu Pro Ala Ala Thr Thr Gly Thr Ala Gly Asp Asp Trp Val Gln Val Asp Pro Glu Pro Ala Arg Arg Leu Leu Gly Trp Thr Ala Val Arg Ser Leu Arg Glu Ser Val Ser Gly Leu Trp Asp Glu Thr Leu Arg Ala His Gly Ile His Asp Pro Ala Gly Ala Arg Arg Information for SEQ ID NO: 14 Length: 975 Type: DNA
Organism: Streptomyces mobaraensis Strandedness: positive Sequence: 14 atggccgatg agcgcgagcg ggttctcgtg gccgggggaa cggggttcgt gggccgccgg 60 ctctgcgcgg acctcgtcgc cgccggggcc gaggtggccg cggtcgcccg gcgggccccg 120 gacctccccc cgccgtgccg catcctcacc ctggacgtca ccacggcctc gcccggcgaa 180 ctggccggcctgatcgactcgttccgcccgcacacgctggtcaacgccatcggcagcaac240 tgggggatcgccgaacgggacctggaggccaactgcgccgtgcccgcgcggcgcctgatg300 gccgccctgcggcggaccacctgccggccctacgtcgtccacctcggctcggtgctggag360 tacggcccgaccgcgcccggcgagacgacccggaccgcgacgccggcccggccgacgacc420 acctacggcaaggccaagctcgcggcgagccgcgccgtgctggaggccgccgcggagggc480 gtcgtcgaggcgggcgtgctgcggatcggcaatgtcgcgggcccgggcaccccggcggtc540 agcctgctgggccgggtggccgggcggctcgccgaggccgtcgcccgggacaccctcccg600 gcggtcgtggagctgtcccagctgcgcgcccaccgggactacgtcgacgtgcgcgacgtg660 tcggaagcggtgctggccgccacccgcgcccggataccggggctcgtcgtcccgatcgga720 cgcggcgaggccgtggcggtgcgctggctggtggacctcctcgtggaggtcagcggcgtg780 cccgccgaggtgcgggaactccccgccgccaccacgggaaccgccggcgacgactgggtc840 caggtcgaccccgagcccgcccgccggctcctcggctggaccgccgtccgctcgctgcgc900 gagtcggtgagcggactgtgggacgagacgctgcgggcccacgggatccacgaccccgcg960 ggggcgcggcggtga 975 Information for SEQ ID NO: 15 Length: 314 Type: PRT

Organism: Micromonospora carbonacea aurantiaca Strandedness: negative Sequence: 15 Val Ser Asp Asn Arg Val Ile Val Phe Gly Gly Thr Gly Phe Leu Gly Arg Gln Val Ala Lys Asn Leu Val Ala Ala Gly His Asp Val Leu Val Val Ala Arg Asn Ala Pro Arg Ala Thr Thr Gly Tyr Arg Phe Arg Ala Ile Asp Val Ser Gly Val Arg Pro Gly Glu Leu Ala Ala Met Leu Ala Ala Glu Arg Pro Ala Ala Ile Val Asn Ala Thr Gly Gly Lys Trp Gly Leu Thr Gly Arg Gly Leu Glu Ala Ser Cys Val Gly Ala Thr Glu Ala Ile Leu Thr Ala Leu Ala Met Thr Ser Leu Val Pro Arg Phe Val His Leu Gly Ser Val Leu Glu Cys Gly Leu Ala Ala Pro Asp Ala Pro Gly Ala Ala Gln Arg Ser Ser Arg Pro Ala Ser Glu Tyr Asp Arg Phe Lys Leu Ala Ala Thr Glu Ala Val Leu Glu Ala Ala Ala Gln Gly Thr Val Asp Pro Val Val Leu Arg Leu Ala Asn Val Thr Gly Pro Gly Val Pro Pro Ala Ser Leu Leu Gly Leu Val Ala Gly Ser Leu Val Glu Ala Ala Arg Arg Gly Gly His Ala Asn Ile Glu Leu Thr Ala Leu Asp Ala Arg Arg Asp Tyr Val Asp Val Arg Asp Val Ala Glu Ala Ile Arg Ala Ala Ile Arg Val Pro Gly Thr Thr Val Pro Ile Ala Ile Gly Arg Gly Glu Ser Val Ser Val Arg Thr Leu Val Ala Met Leu Val Asp Ile Ser Gln Val Pro Ala Thr Val Val Glu Leu Pro Ala Pro Ala Ala Gly Ala Glu Asp Trp Thr Arg Val Asp Leu Arg Pro Ala Arg Glu Leu Leu Gly Trp Thr Pro Arg Arg Thr Leu Ser Glu Ala Ile Gly Ala Leu Trp Arg His Ala Leu Glu Gly Asp Pro Val Glu Ser Arg Information for SEQ ID NO: 16 Length: 945 Type: DNA
Organism: Micromonospora carbonacea aurantiaca Strandedness: negative Sequence:16 gtgagcgacaaccgcgtcatcgtgttcggaggcaccggcttcctggggcgccaggtcgcg60 aagaacctcgtggccgcggggcacgacgtgctcgtcgtggcgagaaacgcgcccagggcc120 acgaccgggtaccggttccgggcgatcgacgtctcgggggtacggcccggggagctggcc180 gcgatgctggccgccgagcgcccggcggcgatcgtcaacgccacgggcggcaagtggggt240 ctgaccggacggggccttgaggcgagttgcgtcggggcgaccgaggccatcctgacggcg300 ctggcgatgacctcgttggtgccacggttcgtgcacctcggatcggtgctcgagtgtggg360 ctcgccgcaccggacgcgcccggggccgcccagcggtcgtcccgacccgccagcgagtac420 gaccggttcaagctcgccgcgaccgaggccgtgctggaggccgcggcgcaggggaccgtg480 gacccggtggttttgcgcctggccaacgtcaccggccccggtgtgccgccggccagcctg540 ctcggcctggtggccggcagtctggtcgaggcggcacgtcgtggcgggcacgcgaacatc600 gagctgaccgcgctggacgcccgccgcgactacgtggacgtgcgcgacgtcgccgaggcg660 atccgggccgcgatccgggtgcccggcacgaccgttcccatcgccatcggccggggcgag720 tcggtgtccgtgcgcacgctggtcgccatgctcgtcgacatcagccaggtgccggccacc780 gtggtcgaac tgccggcgcc ggccgcaggc gccgaggact ggacccgggt cgacctgcgg 840 ccggcgcgtg agctgctcgg ctggacgccg cggcggacgc tgtccgaggc gatcggggcg 900 ctctggcgcc acgcgctgga aggcgacccg gtggaaagcc ggtga 945 Information for SEQ ID NO: 17 Length: 317 Type: PRT
Organism: Micromonospora carbonacea africana Strandedness: positive Sequence: 17 Met Gly Ala Arg Arg Val Val Val Val Gly Gly Thr Gly Phe Val Gly Arg His Val Ser Ala Ala Leu Ala Ala Arg Gly Asp Asp Val Leu Val Leu Ala Arg Arg Val Pro Ser Ala Gly Leu Pro Tyr Arg Ala Arg Ala Leu Asp Val Ala Thr Leu Glu Pro Ala Ala Leu Ala Ala Val Phe Asp Ala Glu Gln Pro Asp Ala Val Val Asn Ala Thr Gly Gly Lys Trp Asn Leu Thr Asp Ala Glu Leu Pro Ser Ser Cys Thr Ile Pro Thr Trp Ser Val Thr Ala Ala Leu Glu Arg Thr Arg Cys Arg Pro Arg Leu Val His Leu Gly Ser Val Leu Glu Arg Val Gln Glu Pro Pro Gly Ala Pro Ala Gly Ala Thr Val Pro Thr Gln Pro Glu Ser Met Tyr Gly Arg Ala Lys Leu Ala Ala Thr Gln Ala Val Leu Ala Ala Thr Arg Ala Gly Ser Val Asp Ala Thr Val Leu Arg Leu Ala Asn Val Val Gly Pro G1y Va1 Pro Pro Asp Ser Leu Leu Gly Arg Val Val Val Arg Leu Val Asp Ala Ala Gly Arg Asp Arg Ser Ala Arg Val Glu Leu Ser Pro Leu Arg Ala His Arg Asp Tyr Val Asp Val Arg Asp Val Ala Glu Ala Val Val Ser Ala Thr Arg Glu Ser Val Thr Gly Arg Val Ile Gly Val Gly Arg Gly Glu Ala Val Pro Val Arg Ser Leu Val Glu Met Leu Ile Glu Val Ser Gly Val Pro Thr Glu Val Val Glu Leu Pro Asp Arg Pro Gly Ser Val Glu Val Val Asp Trp Ala Arg Val Asp Pro Gly Pro Ala Arg Asp Leu Leu Gly Trp Arg Pro Arg Arg Ser Leu Arg Asp Ala Val Gly Gly Leu Trp Asp Glu Ala Ala Ser Arg Leu Pro Asp Arg Ser Arg Arg Information for SEQ ID NO: 18 Length: 954 Type: DNA
Organism: Micromonospora carbonacea africana Strandedness: positive Sequence: 18 atgggcgcgc ggcgtgtcgt ggtcgtcggt ggtacgggct tcgtcgggcg tcacgtgagc 60 gccgcgcttg ccgcccgggg cgacgacgtc ctcgtgttgg cccgccgcgt cccgtcggcg 120 gggctgccgt accgggcccg ggcgctggac gtcgccaccc tggagcccgc cgcgctggcc 180 gccgtgttcgacgccgagcagcccgacgcggtggtcaacgccaccggcggcaagtggaac240 ctgacggacgccgagctgccgtcgagctgcacgatcccgacgtggagtgtcaccgcggcg300 ctggagcgcacccgttgccggcccaggctggtgcacctcggctcggtcctggaacgcgtc360 caggagccacccggggctcccgccggtgccacggtgcccacccagccggagagcatgtac420 ggcagggcgaagctggccgccacccaggccgtgctcgcggcgacgcgggccggctcggtg480 gacgcgacggtgctgcggctcgcgaacgtggtgggtccgggcgtgccacccgacagcctg540 ttggggcgggtcgtcgttcgcctggtcgacgcggcgggccgcgaccggtcggccagggtg600 gagttgtctccgttgcgcgcccaccgggactacgtcgacgtgcgggacgtcgccgaggcg660 gtggtgtcggccacgcgggagtccgtcaccgggcgggtgatcggcgtggggcggggggag720 gccgttccggtccgctccctcgtggagatgttgatcgaggtgagcggggtgccgaccgag780 gtggtggagttgccggatcggcccggctcggtggaggtcgtcgactgggcccgcgtcgac840 ccggggcccgcgcgtgacctgctcgggtggcggccccgacgatcgctgcgggatgcggtg900 ggcgggctctgggacgaggccgcgagtcggcttcccgatcggtcccggcgctga 954 Information for SEQ ID NO: 19 Length: 320 Type: PRT
Organism: Streptomyces mobaraensis Strandedness: positive Sequence: 19 Met Phe Pro Gly Phe Pro Asp Gly Thr Arg Leu Leu Val Thr Gly Gly Ala Gly Phe Ile Gly Ser His Leu Val Asp Ala Leu Leu Glu Ala Gly Ala Glu Val Thr Val Leu Asp Asp Leu Thr Thr Gly Asp Pro Glu Arg Leu Asp Pro Arg Ala Glu Leu Arg Arg Val Asp Val Ala Asp Ala Ala Ala Leu Gly Glu Ala Val Arg Ser Val Arg Pro Asp Ala Val Cys His Leu Ala Ala Gln Ile Asp Val Arg Val Ser Val Ala Asp Pro Ala Ala Asp Ala Arg Val Asn Val Glu Gly Thr Ile Asn Val Leu Glu Ala Ala Arg Ala Val Gly Ala Arg Val Val Phe Ala Ser Thr Gly Gly Ala Leu Tyr Gly Glu Gly Val Pro Val Pro Thr Gly Glu Asp Thr Leu Pro Gly Pro Gly Ala Pro Tyr Gly Thr Ala Lys Tyr Cys Ala Glu Gln Tyr Val Gly Leu Phe Asn Arg Leu His Gly Thr Arg His Ser Val Leu Arg Leu Gly Asn Val Tyr Gly Pro Arg Gln Ser Pro Gly Gly Glu Ala Gly Val 180 185 ~ 190 Ile Ala Ile His Cys Gly Leu Ala Arg Asp Gly Glu Val Pro Thr Val Phe Gly Asp Gly Thr Gln Thr Arg Asp Tyr Val Tyr Val Gly Asp Val Ala Glu Ala Phe Leu Ala Ala Leu Arg His Pro Ala Pro Gly Val Trp Asn Ile Gly Thr Gly Lys Gly Ser Thr Val Leu Glu Val Leu Asp His Ile Ala Ala Ala Ser Gly Arg Glu Leu Arg Pro Arg Phe Ala Pro Arg Arg Pro Gly Glu Ile Gln His Ser Thr Leu Asp Val Ser Arg Ala Ala Ala Asp Leu Gly Trp Thr Ala Ser Val Pro Leu Glu Lys Gly Ile Ala Ala Thr Tyr Asp Trp Val Arg Ser Gly Ser Pro Val Arg Gln Arg Ala Information for SEQ ID NO: 20 Length: 963 Type: DNA
Organism: Streptomyces mobaraensis Strandedness: positive Sequence: 20 atgttcccggggttcccggacggaacgcgtctgctggtgaccggaggggcgggcttcatc60 ggctcgcacctcgtcgacgcccttctggaggccggggccgaggtcaccgtcctggacgac120 ctgaccaccggcgatccggagcggctggacccgcgggcggagctgcgccgcgtcgacgtg180 gccgacgccgccgccctcggcgaggcggtgcggtccgtacgccccgacgccgtctgccac240 ctcgcggcgcagatcgacgtccgggtctcggtggccgacccggccgccgacgcgcgggtc300 aacgtggaggggacgatcaacgtgctggaggcggcccgggccgtcggggcgcgggtggtg360 ttcgcctccaccggcggggccctctacggggagggcgtcccggtcccgaccggcgaggac420 acgctgcccggtccgggcgccccctacggcaccgccaagtactgcgcggagcagtacgtc480 ggcctcttcaaccggctgcacggaacgcggcacagcgtgctgcggctcggcaatgtgtac540 gggccccggcagagcccgggcggagaggcgggcgtcatcgccatccactgcgggctggcc600 cgcgacggcgaggtgcccaccgtgttcggcgacggtacgcagacccgcgactacgtgtac660 gtcggcgacgtcgccgaggcgttcctcgccgccctccggcatcccgcgcccggcgtctgg720 aacatcggcacgggcaaggggagcacggtgctggaggtcctcgaccacatcgccgccgcc780 tcgggccgcgagctgcgtccccgcttcgcgccccgccggccgggcgagatccagcacagc840 acgctggacgtctcccgcgccgccgccgacctgggctggaccgcgtccgtcccgctggag900 aagggcatcgccgccacctacgactgggtccgctccggctcgcccgtccggcagcgggcc960 tga 963 Information for SEQ ID NO: 21 Length: 159 Type: PRT
Organism: Micromonospora carbonacea aurantiaca Strandednesa: negative Feature Name/Key: misc_feature Other information: N-terminus only Sequence: 21 Met Arg Val Leu Val Thr Gly Gly Ala Gly Phe Ile Gly Ser His Leu Val Asp Ala Leu Leu Glu Arg Gly Asp Thr Val Thr Val Val Asp Asp Leu Ser Thr Gly Arg Cys Gly Arg Leu Ala Val Arg Val Ala Phe His Gln Glu Ser Ile Thr Asp Gly Lys Ala Leu Ala Ala Ile Val Ala Asp Ala Arg Pro Asp Leu Ile Tyr His Leu Ala Ala Gln Ala Asp Val Arg Thr Ser Val Ala Asp Ala Ser Gly Asp Thr Gly Val Asn Val Leu Gly Thr Val Asn Val Leu Glu Ala Ala Arg Ala Val Gly Ala Arg Val Val Phe Ala Ser Thr Gly Gly Ala Leu Tyr Gly Ala Ile Ser Ala Ile Pro Ser Pro Glu Asp Ala Arg Pro Glu Pro Ala Ala Pro Tyr Gly Ala Ala Lys Tyr Cys Ala Glu Gln Tyr Val Ala Leu Tyr Asn Arg Leu Tyr Information for SEQ ID NO: 22 Length: 480 Type: DNA
Organism: Micromonospora carbonacea aurantiaca Strandedness: negative Feature Nem/Key: misc feature Other information: N-terminus only Sequence:22 atgcgcgtcctcgtgacaggtggcgccggcttcatcggctcacacctggtcgacgccctg60 ctggagcgcggcgacaccgtcaccgtggtcgacgacctctccaccggccggtgcggccgg120 ttggccgtccgtgtcgccttccatcaggagtccatcaccgacgggaaggctctcgccgcg180 atcgtggcggacgcccgtccggacctgatctaccacctcgccgcgcaggccgacgtccgc240 acctcggtcgcggatgccagcggcgacaccggggtcaacgtgctcggcaccgtcaacgtc300 ctggaggcggcccgagccgtcggggcccgggtggtgttcgcctccaccggcggagccctg360 tacggggcgatcagcgcgatcccgtcccccgaggacgcccgcccggagcctgcggcgccg420 tacggcgccgccaagtactgcgcggagcagtacgtggccctctacaaccggctgtacggc480 Information for SEQ ID NO: 23 Length: 49 Type: PRT
Organism: Micromonospora carbonacea aurantiaca Strandedness: negative Feature Neme/Key: misc_feature Other information: C-terminus only Sequence: 23 Ala Arg Leu Gly Glu Leu Gln His Ser Ala Leu Asp Val Thr Arg Ala Gly Arg Glu Leu Asn Trp Thr Ala Arg Thr Ala Leu Ala Glu Gly Ile Ala Arg Ala Tyr Gln Trp Ile Arg Asp Gln Asp Gly Cys Asp Val Glu Ala Information for SEQ ID NO: 24 Length: 150 Type: DNA
Organism: Micromonospora carbonacea aurantiacaStrandedness: negative Feature Name/key: misc_feature Other information: C-terminus only Sequence: 24 gcccgtctcg gcgagctcca gcattccgcc ctcgatgtga cccgggccgg ccgggagttg 60 aactggaccg cacggacagc cctcgccgag ggaatcgcca gggcctacca gtggatcagg 120 gaccaggacg gttgcgacgt cgaggcatga 150 Information for SEQ ID NO: 25 Length: 308 Type: PRT
Organism: Micromonospora carbonacea africana Strandedness: negative Sequence: 25 Val Thr Gly Gly Ala Gly Phe Ile Gly Ser His Leu Thr Asp Ala Leu Leu Glu Arg Gly Asp Ser Val Thr Val Leu Asp Asp Leu Ser Thr Gly Arg Pro Glu Arg Leu Pro Ala Gly Val Pro Leu His His Gly Ser Ile Thr Asp Arg Ala Gly Leu Thr Arg Leu Ala Glu Gln Cys Arg Pro Glu Val Ile Cys His Leu Ala Ala Gln Ala Asp Val Arg Asn Ser Val Ala Asp Ala Thr Ser Asp Thr Gly Val Asn Val Val Gly Thr Val Asn Val Leu Glu Ala Ala Arg Ala Ile Asp Ala Arg Val Val Phe Ala Ser Ser Gly Gly Ala Leu Tyr Gly Glu Val Asp Glu Leu Pro Ser Pro Glu Asp Val Arg Pro Ala Pro Trp Ala Pro Tyr Gly Ala Ala Lys Tyr Cys Ala Glu Gln Tyr Leu Ala Leu Tyr Asn Arg Leu Tyr Gly Ser Thr His Ala Ala Leu Arg Leu Gly Asn Val Tyr Gly Pro Arg Gln Asp Pro Thr Gly Glu Ala Gly Val Val Ser Ile Phe Cys Gly Cys Leu Val Ala Gly Arg Arg Pro Thr Val Phe Gly Asp Gly Glu Gln Thr Arg Asp Tyr Ile Tyr Val Ala Asp Val Val Glu Ala Phe Leu Leu Ala Val Gly His Gly Gly Pro Gly Leu Trp Asn Ile Gly Thr Gly Thr Ser Thr Ser Ile Arg Lys Leu Leu Asp Leu Val Gly Arg Thr Ala Gly Arg Val Pro Asp Pro Arg Phe Glu Pro Pro Arg Leu Gly Glu Leu Lys His Ser Ala Leu Glu Val Thr Arg Ala Ala Arg Glu Leu Arg Trp Ala Ala Arg Thr Arg Leu Ala Asp Gly Ile Ala Lys Val Tyr Lys Trp Val Glu Ala Asp Glu Pro Val Arg Gly Glu Arg Information for SEQ ID NO: 26 Length: 927 Type: DNA
Organism: Micromonospora carbonacea africana Strandedness: negative Sequence: 26 gtgaccggcggagccgggttcatcggctcccacctcaccgacgcgctgctcgaacgcggc60 gacagcgtcaccgtgctcgacgacctgtccaccgggcggcccgagcggctgcccgccggg120 gtgccgctgcaccacgggtcgatcaccgaccgggccgggttgacccggctggccgagcag180 tgtcgcccggaggtcatctgccacctggccgcccaggcggacgtgcgcaactcggtggcc240 gacgccacctcggacaccggggtcaacgtggtcggcaccgtcaacgtcctggaggccgcc300 cgggccatcgacgcccgggtggtcttcgcctccagcggcggcgccctctacggggaggtc360 gacgagctgccctcccccgaggacgtccggccggcgccgtgggcgccgtacggggccgcc420 aagtactgcgcggagcagtacctggcgctctacaaccggctctacggctcgacccacgcg480 gcgctgcggctcggcaacgtgtacgggccacgccaggacccgaccggcgaggccggggtc540 gtctcgatcttctgcggctgcctggtggccgggcgccggccgacggtgttcggcgacggc600 gagcagacccgggactacatctacgtggccgacgtggtggaggcgttcctgctcgcggtc660 gggcacggtggccccggcctgtggaacatcggcaccgggacctccaccagcatccgcaaa720 ctactggacctggtcggccgcaccgccgggcgcgtcccggacccccgcttcgagccaccc780 cgcctgggcg agctgaagca ctccgcgctg gaggtgaccc gcgcggcccg ggagctgcgc 840 tgggcggccc gaacgaggct cgccgacggc atcgcgaagg tctacaagtg ggtcgaggcg 900 gacgaaccgg tccgggggga gcgatga 927 Information for SEQ ID NO: 27 Length: 314 Type: PRT
Organism: Streptomyces mobaraensis Strandedness: positive Sequence: 27 Val Thr Thr Ala His Pro Leu Pro Gly Gly Arg Thr Ala Val Ile Gly Ala Thr Gly Phe Ile Gly Ser Arg Leu Thr Ala Ala Leu Thr Ala Gly Asp Pro Arg Val Pro Ala Ala Phe Asn Arg Ala Val Pro Pro Val Ala Asp Gly Arg Ala Ala Pro Gly Leu Ala Glu Ala Asp Ile Val Tyr Phe Leu Ala Ala Gly Leu Ser Pro Ile Leu Ala Glu Arg Arg Pro Asp Leu Val Glu Ala Glu Arg Arg Leu Leu Ile Glu Val Leu Asp Ala Leu Ala Ala Ala Gly Arg Arg Pro Val Phe Val Leu Ala Gly Ser Gly Gly Ala Val Tyr Ala Pro Glu Val Ala Pro Pro Tyr Arg Glu Thr Thr Pro Thr Arg Pro Asp Ser Ala Tyr Gly His Ala Lys Leu Arg Leu Glu His Glu Leu Phe Arg Arg Arg Aap Ala Val Arg Ala Val Val Ala Arg Leu Ser Asn Val Tyr Gly Pro Gly Gln Arg Pro Val Arg Gly Phe Gly Val Leu Pro His Trp Leu Arg Ala Ala Val Arg Gly Glu Pro Val Arg Val Phe Gly Asp Pro His Val Val Arg Asp Tyr Val His Val Asp Asp Val Thr Arg Phe Leu Leu Ala Leu Arg Thr Arg Ile Ala Gly Gly Arg Leu Pro Ser Val Val Asn Ile Gly Ser Gly Val Pro Thr Ser Leu Ser Gly Leu Leu Asp Ile Val Ser Glu Val Thr Gly Gly Ser Val Ala Val Arg Trp Glu Arg Gly Arg Ser Phe Asp Arg Gln Gly Asn Trp Leu Asp Val Ala Arg Ala Asp Ala Glu Phe Gly Trp Arg Ala Ala Ile Pro Leu Ala Glu Gly Val Arg Ala Cys Trp Glu Arg Val Leu Asp Asp Ala Gly Arg Ala Ala Ser Val Ser Thr Ser Val Ser Thr Leu Information for SEQ ID NO: 28 Length: 945 Type: DNA
Organism: Streptomyces mobaraensis Strandedness: positive Sequence:28 gtgaccacggcacacccgctgcccggcggccgtacggcggtgatcggcgcgaccggcttc60 atcggatcccggctgacggccgccctcaccgccggggacccgcgcgtcccggccgccttc120 aaccgggcggtcccgccggtggccgacggccgcgcggcaccgggactggccgaggccgac180 atcgtctacttcctggccgccgggctgagcccgatcctggccgaacggcggcccgacctg240 gtggaggccgaacgccgcctcctgatcgaggtgttggacgcgctggccgcggcgggccgc300 cgtccggtgttcgtgctggccggctcgggcggcgcggtctacgcgcccgaggtcgcgccg360 ccgtaccgggagaccaccccgacccgccccgactccgcctacgggcacgccaaactccgc420 ctggaacacgagctgttccggcgccgggacgcggtccgcgccgtggtggcgcggctgagt480 aacgtctacggccccggccagcggcccgtccgcgggttcggcgtgctgccgcactggctg540 cgggcggccgtccggggcgagcccgtgcgggtgttcggcgatccgcacgtggtccgggac600 tacgtccacgtcgacgacgtcacccggttcctgctcgcgctgcgcacccggatcgccggc660 ggccggctgccctccgtggtcaacatcggctccggcgtgccgacttcgctgagcggcctg720 ctggacatcgtgtccgaggtgaccggcggctccgtcgcggtgcgctgggagcggggccgc780 tccttcgaccggcagggcaactggctggacgtcgcgcgggccgacgccgagttcggctgg840 cgggccgcgatcccgctggccgagggcgtccgcgcgtgctgggagcgggtgctcgacgac900 gccggccgggccgcttccgtctcgacgtccgtctcgacgctctag 945 Information for SEQ ID NO: 29 Length: 307 Type: PRT
Organism: Micromonospora carbonacea aurantiaca Strandedness: positive Sequence: 29 Val Thr Ala Ala Gln Val Arg Arg Cys Pro Thr Ala Val Ile Gly Ala Thr Gly Phe Ile Gly Ser Arg Leu Val Ala Gln Leu Thr Arg Ala Gly His Pro Val Ala Arg Phe Asn Gln Ala His Pro Pro Val Val Asp Gly Arg Pro Ala Ala Gly Leu Cys Asp Ala Glu Ile Val Leu Phe Leu Ala Ala Arg Leu Ser Pro Ala Leu Ala Glu Arg His Pro Glu Leu Ile Val Ala Glu Arg Arg Leu Leu Val Asp Val Leu Thr Ala Leu Arg His Ser Ala Pro Phe Pro Val Phe Val Leu Ala Ser Ser Gly Gly Thr Val Tyr Ser Pro Asp Ala Cys Pro Pro Tyr Asp Glu Ser Ala Leu Thr Arg Pro Thr Ser Ala Tyr Gly Arg Ala Lys Leu Gly Leu Glu Arg Glu Leu Leu Gly His Ala Asp His Val Arg Pro Val Ile Leu Arg Leu Ser Asn Val Tyr Gly Pro Gly Gln Arg Pro Ala His Gly Tyr Gly Val Leu Ser His Trp Leu Asp Ala Ala Ala Arg Arg Gln Pro Ile Arg Val Phe Gly Asp Pro Glu Val Val Arg Asp Tyr Val His Val Asp Asp Val Ala Glu Ile Leu Lys Ala Val His Arg Arg Thr Val Thr Thr Gly Pro Glu Gly Ile Pro Thr Val Leu Asn Val Gly Ser Gly Ala Pro Thr Ser Leu Ala Asp Leu Leu Ala Val Val Ser Thr Val Val Asp Gln Arg Ile Glu Val Ile Trp Glu Gly Gly Arg Gln Phe Asp Arg Gly Gly Asn Trp Leu Asp Ser Ser Leu Ala His Glu Thr Leu Gly Trp Arg Ala Arg Ile Gly Leu Thr Asp Gly Val Arg Glu Cys Trp Glu His Val Leu Ala His Gln Thr Ala Ala Glu Arg Information for SEQ ID N0: 30 Length: 924 Type: DNA
Organism: Micromonospora carbonacea aurantiaca Strandedness: positive Sequence:30 gtgacggcagcccaggtgaggagatgcccgacggccgtcatcggcgccaccgggttcatc60 gggtcccggctcgtggcccaactgacccgcgcggggcacccggtcgcccgcttcaaccag120 gcgcacccgccggtggtcgacgggcgcccggctgccggcctgtgcgacgccgagatcgta180 ctgttcctcgccgcacggttgagcccggcgctcgccgagcgccatccggaactgatcgtc240 gccgagcgcaggctgctcgtcgacgtcctgacggccctgcggcactccgcccccttcccg300 gtgttcgtactggccagctcaggcggcacggtgtactcgccggacgcgtgcccgccgtac360 gacgaatcggcgttgaccaggcccacgtcggcgtacgggcgcgccaagctcgggctggaa420 cgcgaactgttgggtcacgccgaccatgtccgtcccgtgatcctgcggctcagtaacgtc480 tatgggcccggccagcgcccggcgcacggctacggcgtgctgtcgcactggctggacgcc540 gcggccaggcggcagccgatccgggtcttcggtgatccggaggtggtccgcgactacgtg600 cacgtggacgacgtcgccgagatcctcaaggccgtgcaccgccgtacggtcactaccggt660 ccggagggaatcccgaccgtgttgaacgtcggctcaggggcgcccacctccctggccgat720 ctgctcgcggtggtgtcgacagtggtcgaccagcggatcgaggtgatctgggaaggcggt780 cgccagttcgacagaggtggcaactggctggactcctcgttggcacacgagaccctcggc840 tggcgggccaggatcggtctgacggacggcgtacgtgaatgctgggaacacgtgctcgcg900 catcagaccgccgccgagcgatga 924 Information for SEQ ID NO: 31 Length: 423 Type: PRT
Organism: Micromonospora carbonacea africana Strandedness: negative Sequence: 31 Val Gln Leu Arg Arg Pro Ala Ala Val Val Gly Ala Thr Gly Phe Ile Gly Ser Arg Leu Val Ser Arg Leu Ala Glu Ala Gly His Pro Val Ala Arg Phe Ser Arg Ala Ala Pro Pro Val Val Asp Gly Arg Pro Ala Pro Gly Leu Arg Glu Ala Gln Val Val Tyr Phe Leu Ala Ala Arg Leu Ser Pro Ala Leu Ala Glu Gln Gln Pro Glu Arg Val Val Arg Glu Arg Glu Leu Leu Leu Asp Val Leu Ser Ala Leu Ala Gly Val Asp His Arg Pro Val Phe Val Leu Ala Ser Ser Gly Gly Ala Val Tyr Thr Pro Thr Val Trp Pro Pro Tyr His Glu Arg Ser Ala Thr Gly Pro Ala Ser Ala Tyr Gly Arg Ala Lys Leu Arg Leu Glu Gln Glu Leu Leu Arg His Thr Asp Arg Val Gln Pro Val Val Thr Arg Leu Ser Asn Val Tyr Gly Pro Gly Gln Arg Pro Thr Pro Gly Tyr Gly Val Leu Ser His Trp Leu Glu Ala Thr Val Arg Gly Glu Pro Ile Arg Leu Phe Gly Asp Pro Ala Val Val Arg Asp Tyr Val His Val Asp Aap Val Thr Ala Ile Met Glu Val Ile Ala Gln Arg Ala Gly Asp Gly Asp Arg Asp Arg Leu Pro Thr Val Val Asn Val Gly Ser Gly Leu Pro Thr Ser Leu Ala Glu Leu Leu Gln Thr Met Ser Thr Val Ala Gly Arg Glu Leu Glu Val Ile Arg Asp Val Arg Arg Gln Phe Asp His Arg Gly Asn Trp Leu Asp Thr Thr Leu Ala Arg Glu Thr Leu Gly Trp Gln Ala Arg Ile Ser Leu Pro Asp Gly Val Arg Gln Cys Trp Glu Ala Val Leu Thr Arg Ala Gly Gly Pro Gly Gly Ser Pro Ala Arg Pro Ser Ala Arg Leu Gly Arg Ala Ser Arg Gly Arg Glu Pro Pro Gln Pro Arg Pro Ser Gln Gln Phe Val Ala Gln Pro Gly Gly Gly Arg Arg Gly Val Ala Glu Gly Gln Trp Gln Gly Glu Pro Gly Phe Arg Gly Arg Thr Val Pro Val Pro Phe Leu Ala Gln Ala Gly Pro Cys Leu Val Pro Ser Gly Leu Ala Pro Pro Leu Pro Val Arg Pro Pro Gln Gln Val Pro Gly Gly Gln Pro Ala Arg Val Asp Val Met Gly Asp Arg Val Val Arg Glu Gln Leu Leu Ala Gly Ala Gly Gly Leu His Gly Ala Asp Glu Gly Gly Val Leu Pro Information for SEQ ID N0: 32 Length: 1272 Type: DNA
Organism: Micromonospora carbonacea africana Strandedness: negative Sequence: 32 gtgcagttgc gacgccccgc agcggtggtc ggcgccaccg gcttcatcgg ctcacgcctc 60 gtctcccgcc tggccgaggc cgggcatccg gtggcgcgct tcagccgtgc cgccccaccc 120 gtcgtcgacg gccggcccgc gccggggctc cgcgaggcgc aggtcgtcta cttcctcgcc 180 gcccggctgagcccggcgctggcggagcagcaaccggaacgggtcgtccgggaacgcgag240 ttgttgctggacgtgctaagtgcgctggcgggggtggaccaccggccggtgttcgtcctg300 gccagctcgggcggggcggtgtacacgccgacggtgtggccgccctaccacgagcggtcg360 gccaccgggcccgcctcggcgtacggccgggcgaagctgcggctggaacaggagctgctg420 cgccacaccgaccgggtgcagccggtggtgacccggctgagcaacgtctacggtccgggg480 cagcggccgacccccgggtacggtgtcctgtcacactggctggaggccaccgtgcgcgga540 gagccgatccggctcttcggcgatccggccgtggtgcgggactacgtacacgtcgacgac600 gtcaccgcgatcatggaggtcatcgcgcagcgggccggtgacggcgaccgggaccggctg660 cccacggtcgtgaacgtcggctcgggcctgcccacctccctcgccgagttgctccagacg720 atgtccacggtggccggtcgtgagctggaggtcatccgggacgtccgccggcagttcgac780 catcggggcaactggctcgacaccaccctcgcccgggagaccctgggctggcaggcgcgg840 atcagcctccccgacggcgtccgccagtgctgggaggccgtcctcacccgggcgggcggc900 ccggggggttccccggcccgaccgtcagcccggctcgggagagcgtctcgggggcgggaa960 ccgccgcaaccgcgcccttcgcagcagttcgtggctcaacccggcggcggtcgccgcggt1020 gtagccgagggccagtggcagggcgagccgggattccggggccgtacggtgccggtccca1080 ttccttgcgcaggccggcccgtgcctggttccgtccggcctcgcaccgcccctgccagta1140 cgcccgccgcagcaggtaccggggggtcagccggcccgggtcgatgtcatgggtgaccgc1200 gtggtccggg agcagttgct cgcgggcgcc ggcggccttc atggcgctga tgaaggaggt 1260 gtcctcccct ga 1272 Information for SEQ ID NO: 33 Length: 342 Type: PRT
Organism: Streptomyces mobaraensis Strandedness: positive Sequence: 33 Met Ala His Cys Leu Val Thr Gly Gly Ala Gly Phe Ile Gly Ser His Val Ala Glu Ala Leu Leu Gly Leu Gly His Arg Val Ser Val Leu Asp Asp Leu Ser Gly Gly Ser Ala Glu Arg Val Pro Asp Gly Ala Glu Leu Phe Val Gly Ser Val Thr Asp Ala Glu Leu Val Asp Lys Leu Phe Ala Glu Gln Arg Phe Asp Arg Val Phe His Phe Ala Ala Phe Ala Ala Glu Ala Ile Ser His Ser Val Lys Ser Leu Asn Tyr Gly Thr Asn Val Met Gly Ser Val Asn Leu Ile Asn Ala Ala Leu His His Gly Val Ser Phe Phe Cys Phe Ala Ser Ser Val Ala Val Tyr Gly His Gly Glu Thr Pro Met Arg Glu Ser Ser Ile Pro Val Pro Ala Asp Ser Tyr Gly Asn Ala Lys Leu Thr Val Glu Arg Glu Leu Glu Thr Thr Met Arg Thr Gln Gly Leu Pro Phe Thr Ala Phe Arg Met His Asn Val Tyr Gly Glu Trp Gln Asn Met Arg Asp Pro Tyr Arg Asn Ala Val Ala Ile Phe Phe Asn Gln Ile Leu Arg Gly Glu Pro Ile Ser Val Tyr Gly Asp Gly Gly Gln Val Arg Ala Phe Ser Tyr Val Lys Asp Ile Val Asp Val Ile Val Arg Ala Pro Glu Thr Glu Ala Ala Trp Gly Arg Ala Phe Asn Val Gly Ser Ser Arg Thr Asn Thr Val Leu Glu Leu Ala Gln Ala Val Arg Arg Ala Ala Gly Ala Pro Asp His Pro Ile Ala His Leu Pro Ala Arg Asp Glu Val Met Val Ala Tyr Thr Ala Thr Glu Glu Ala Arg Glu Val Phe Gly Asp Trp Ala Asp Thr Pro Leu Ala Asp Gly Leu Ala Arg Thr Ala Ala Trp Ala Ala Ser Val Gly Pro Ala Glu Leu Ser Pro Ser Phe Asp Ile Glu Ile Gly Gly Glu His Val Pro Glu Trp Ala Gln Cys Val Ala Asp Arg Leu Ser Ala Ala Gly Arg Information for SEQ ID NO: 34 Length: 1029 Type: DNA
Organism: Streptomyces mobaraensis Strandedness: positive Sequence: 34 atggcgcactgtctggtgaccggcggagccggcttcatcggctcgcacgtggcggaggcc60 ctgctcggcctcgggcaccgggtgtcggtcctcgacgacctcagcggcggcagcgccgag120 cgcgtacccgacggtgcggagctgttcgtcggctcggtgaccgacgcggagctggtcgac180 aagctcttcgccgagcagcggttcgaccgcgtcttccacttcgcggcgttcgccgccgag240 gccatcagccactcggtcaagagcctcaactacggcaccaacgtcatgggcagcgtgaac300 ctcatcaacgccgccctgcaccacggggtctccttcttctgcttcgcctcctcggtcgcg360 gtgtacggccacggcgagaccccgatgcgggagtcgtcgatcccggtcccggccgacagc420 tacggcaacgccaagctcacggtcgagcgcgagctggaaacgaccatgcgcacccagggc480 ctgcccttcacggccttccgcatgcacaacgtgtacggcgaatggcagaacatgcgcgac540 ccgtaccgcaacgcggtcgccatcttcttcaaccagatcctgcgcggcgagccgatctcc600 gtgtacggcgacggcggccaggtccgggcgttcagctacgtgaaggacatcgtggacgtc660 atcgtccgcgcccccgagaccgaggccgcctggggccgggccttcaacgtcggctcgtcc720 cgcaccaacaccgtgctggagctcgcccaggccgtgcgccgggcggccggcgcccccgac780 caccccatcgcccacctgccggcccgcgacgaggtgatggtcgcctacaccgccaccgag840 gaggcccgcgaggtcttcggcgactgggcggacaccccgctggcggacggcctggcccgc900 accgccgcctgggccgcctccgtcggccccgccgagctgagcccctcgttcgacatcgag960 atcggcggggagcacgtcccggagtgggcgcagtgcgtggccgaccggctcagcgccgcc1020 ggccgctga Information for SEQ ID NO: 35 Length: 342 Type: PRT
Organism: Micromonospora carbonacea aurantiaca Strandedness: positive Sequence: 35 Met Ala His Cys Leu Val Thr Gly Gly Ala Gly Phe Ile Gly Ser His Leu Ala Gly Arg Leu Thr Ser Asp Gly His Arg Val Thr Val Leu Asp Asp Leu Ser Gly Gly Ser Ala Ser Arg Val Pro Ala Gly Ala Asp Leu Ile Val Gly Ser Val Thr Asp Ala Asp Leu Val Glu Arg Ala Phe Ala Glu His Arg Phe Asp Arg Val Phe His Phe Ala Ala Phe Ala Ala Glu Ala Ile Ser His Ser Val Lys Lys Leu Asn Tyr Gly Thr Asn Val Met Gly Ser Ile Asn Leu Ile Asn Ala Ser Leu Gln Thr Gly Val Ser Phe Phe Cys Phe Ala Ser Ser Val Ala Val Tyr Gly His Gly Glu Thr Pro Met Arg Glu Thr Ser Ile Pro Val Pro Ala Asp Ser Tyr Gly Asn Ala Lys Leu Val Ile Glu Arg Glu Leu Glu Val Thr Ala Arg Thr Gln Gly Leu Pro Phe Thr Ala Phe Arg Met His Asn Val Tyr Gly Glu Trp Gln Asn Met Arg Asp Pro Tyr Arg Asn Ala Val Ala Ile Phe Phe Asn Gln Ile Leu Arg Gly Glu Pro Ile Thr Val Tyr Gly Asp Gly Gly Gln Val Arg Ala Phe Thr Tyr Val Gly Asp Val Val Asp Val Val Cys Gln Ala Pro Asp Val Glu Glu Ala Trp Gly Arg Ser Phe Asn Val Gly Ala Ala Ser Thr Asn Thr Val Leu Glu Leu Ala Glu Ala Val Arg Val Ala Ala Gly Val Pro Asp His Pro Ile Val His Leu Pro Ala Arg Asp Glu Val Arg Val Ala Tyr Thr Ala Thr Asp Ser Ala Arg Lys Val Phe Gly Asp Trp Ala Asp Thr Pro Leu Ala Asp Gly Leu Ala Arg Thr Ala Thr Trp Ala Ala Gly Val Gly Pro Thr Glu Leu Arg Ser Ser Phe Asp Ile Glu Ile Gly Gly His Gln Val Pro Glu Trp Ala Arg Leu Val Glu Lys Arg Leu Gly Ser Ala Pro Arg Information for SEQ ID NO: 36 Length: 1029 Type: DNA
Organism: Micromonospora carbonacea aurantiaca Strandedness: positive Sequence: 36 atggctcactgcctggtcacgggtggcgccggtttcatcggttcgcacctggcgggacgg60 ttgaccagtgacgggcaccgggtcaccgtgctcgacgatctcagcggcggcagcgcctcc120 cgcgtgcccgcgggcgccgatctgatcgtcggctcggtgaccgacgccgacctggtggaa180 cgggccttcgccgagcaccgcttcgaccgggtcttccacttcgcggccttcgcagccgaa240 gcgatcagccactcggtcaagaagctcaactacggcaccaacgtgatgggcagcatcaac300 ctcatcaacgcgtcgttgcagaccggggtgtcgttcttctgcttcgcctcctcggtcgcc360 gtctacggtcacggtgaaacgccgatgcgagaaacctccatcccggtgccggcggacagc420 tacggcaacgccaagctcgtcatcgagcgcgaactcgaggtgacggcgcggacgcagggc480 cttccgttcaccgccttccgcatgcacaacgtctacggcgagtggcagaacatgcgcgac540 ccgtaccggaacgcggtcgcgatcttcttcaaccagatcctgcgtggcgagccgatcacg600 gtctacggcgacggcggtcaggtgcgggcgttcacgtacgtgggcgacgtcgtggacgtg660 gtgtgccaggcgcccgacgtcgaggaggcctggggccggagcttcaacgtgggcgcggcc720 agcaccaacaccgtgctggagctcgcggaggcggtccgggtggcggccggcgttccggat780 catccgatcgtgcacctgcccgcgcgcgacgaggtccgggtggcgtacaccgcgaccgac840 agcgcccggaaggtcttcggcgactgggcggacaccccgctggcggacggactggcccgg900 accgccacgtgggcggccggtgtgggaccgacggaactgcgatcgtcgttcgacatcgag960 atcggcggccatcaggttccggagtgggcgcggcttgtcgaaaagcgcctgggatcggcg1020 cctcgctga 1029 Information for SEQ ID NO: 37 Length: 342 Type: PRT
Organism: Micromonospora carbonacea africana Strandedness: positive Sequence: 37 Met Val Arg Cys Leu Val Thr Gly Gly Ala Gly Phe Ile Gly Ser His Leu Val Glu Ser Leu Val Arg Asn Gly His Arg Val Thr Val Leu Asp Asp Leu Ser Gly Gly Ser Arg Gln Arg Val Pro Ala Gly Val Asp Leu Ala Val Gly Ser Val Thr Asp Val Asp Phe Val Asp Ser Leu Phe Ala Glu Asn Arg Phe Glu Arg Val Phe His Phe Ala Ala Phe Ala Ala Glu Ala Ile Ser His Ser Val Lys Gln Leu Asn Tyr Gly Thr Asn Val Met Gly Ser Ile Asn Leu Ile Asn Ala Ser Leu Arg Thr Gly Val Arg Phe Phe Cys Phe Ala Ser Ser Val Ala Val Tyr Gly His Gly Glu Thr Pro Met Arg Glu Ser Val Val Pro Val Pro Ala Asp Ser Tyr Gly Leu Ala Lys Tyr Leu Val Glu Arg Glu Leu Glu Val Thr Met Arg Thr Gln Gly Leu Pro Phe Thr Ala Phe Arg Met His Asn Val Tyr Gly Glu Trp Gln Asn Met Arg Asp Pro Tyr Arg Asn Ala Val Ala Ile Phe Phe Asn Gln Ile Leu Arg Gly Glu Pro Ile Thr Val Tyr Gly Asp Gly Gly Gln Val Arg Ala Phe Thr Tyr Val Gly Asp Val Val Asn Val Val Ser Arg Ala Ala Glu Thr Glu Ala Ala Trp Gly Arg Ala Phe Asn Val Gly Ser Ser Ser Thr Asn Thr Val Leu Glu Leu Ala Gln Ala Val Arg Ser Ala Ala Gly Val Pro Glu His Pro Ile Ala His Leu Pro Ser Arg Asp Glu Val Arg Thr Ala Tyr Thr Ala Thr Glu Leu Ala Arg Ser Val Phe Gly Asp Trp Thr Asp Thr Pro Leu Ala Glu Gly Leu Ala Arg Thr Ala Arg Trp Ala Ala Aap Ala Gly Pro Ala Glu Leu Gln Ser Ser Phe Asp Ile Glu Ile Gly Gly Asp Arg Ile Pro Glu Trp Ala Arg Leu Val Asn Glu Arg Leu Ser Thr Ala Ser Arg Information for SEQ ID NO: 38 Length: 1029 Type: DNA
Organism: Micromonospora carbonacea africana Strandedness: positive Sequence:38 atggttcgttgtctggtgactggtggtgccggattcatcggctcgcacctggtggagtca60 ttggtcaggaatgggcaccgggtcaccgttctggacgacctcagcggcggcagccggcag120 cgggttccggccggggtggacctggccgtcggttcggtgaccgacgtggacttcgtcgat180 tcactgttcgccgagaaccgattcgagcgggtcttccactttgccgccttcgcggcggag240 gcgatcagccattcggtgaagcagctcaactacggcaccaatgtgatgggcagcataaat300 ctgatcaacgcgtcgctgcgtaccggcgtgcggttcttctgcttcgcctcctccgtggcg360 gtctacggccacggcgagacgccgatgcgcgagtccgtcgtccccgtccccgcggacagc420 tacggcctggccaagtacctggtcgagcgcgagctggaggtgacgatgcggacccagggg480 ctgcccttcaccgccttccgcatgcacaacgtctacggcgagtggcagaacatgcgggac540 ccgtaccgcaacgcggtcgccatcttcttcaaccagatcctgcgaggcgagccgatcacc600 gtgtacggcgacggcggccaggtccgcgcgttcacgtacgtcggtgacgtggtgaacgtg660 gtcagccgcgccgccgagaccgaggccgcgtgggggcgggcattcaacgtgggctcgtcg720 agcaccaacaccgtgctggagctggcccaggcggtacggtcggcggccggcgtgccggag780 catccgatcgcccacctgccgtcgcgggacgaggtgcgaaccgcgtacaccgcgacggag840 ctggcccgatcggtcttcggcgactggacggacaccccgctggccgagggactggcccgc900 accgccaggtgggcggccgacgccgggccggcggaactccagtcgtccttcgatatcgag960 atcggcggggaccggattccggaatgggcccggctggtcaacgaaaggctcagcacggcg1020 agccgctga 1029 Information for SEQ ID N0: 39 Length: 466 Type: PRT
Organism: Streptomyces mobaraensis Strandedness: positive Sequence: 39 Val Val Ser Leu Pro Ala Gly Pro Ala Gly Gly Leu Val Arg Ser Leu Pro Pro Glu Ser Tyr Leu Ala Asp Phe Ala Gly Ala Glu Phe Ile Asp Trp Phe Ala Arg Cys Ser Ala Ala Ala Arg Cys Arg Val Thr Arg Thr Pro Leu Thr Glu Leu Arg Arg Trp Arg Phe Asp Ala Asp Thr Gly Asn Leu Gly His Asp Ser Gly Ala Phe Phe Val Val Glu Gly Leu Gln Val Arg Thr Gly Tyr Gly Pro Val Arg Glu Trp Ser Gln Pro Ile Ile Asn Gln Pro Glu Ile Gly Ile Leu Gly Met Leu Val Lys Glu Val Asp Gly Val Pro Tyr Cys Leu Val Gln Ala Lys Ile Glu Pro Gly Asn His Asn Gly Ile Gln Val Ser Pro Thr Val Gln Ala Thr Arg Ser Asn Tyr Thr Arg Ile His Gln Gly Arg Ser Thr Arg Tyr Leu Glu Tyr Phe Thr Asp Pro Gly Ala Gly Arg Thr Leu Val Asp Val Leu Gln Ser Glu Gln Gly Ser Trp Phe Leu Arg Lys Arg Asn Arg Asn Met Val Val Gln Val Thr Glu Asp Val Pro Ala Gly Glu Asp His His Trp Leu Pro Leu Pro Glu Leu Arg Arg Leu Leu Arg Ile Asp Gly Leu Val Asn Met Asp Thr Arg Thr Val Leu Ala Cys Leu Pro Ala Asp Thr Phe Pro Ala Ser Gln Pro Val Pro Ala Gly Asp Ala Ala Ser Ala Leu Val Leu Ser Thr Thr Gly Arg Gly Arg Ala Leu Asn Asp Thr Ala Thr Val Leu Arg Trp Phe Thr Gly Ala Lys Ser Arg His Glu Leu Ser Ala His Arg Ile Pro Leu Arg Asp Leu Pro Gly Trp Arg Ser Thr Pro Glu Gln Ile Thr His Glu Asp Gly Arg His Phe Ser Ile Ile Gly Val Thr Ala Glu Val Gly Asn Arg Glu Val Ala Ala Trp Asp Gln Pro Leu Leu Tyr Pro Glu Gly Arg Gly Val Val Ala Phe Val Ile Lys Val Ile Glu Gly Val Ala His Leu Leu Val His Ala Arg Phe Gln Pro Gly Leu Leu Asp Gly Met Glu Met Gly Pro Thr Val Gln Cys Val Pro Glu Asn Tyr Pro Asp Gly Pro Pro Arg Phe Leu Asp Tyr Val Leu Asn Ala Pro Arg Glu Arg Val Leu Tyr Asn Ala Val Leu Ala Glu Glu Gly Gly Arg Phe Tyr His Ser Gln Asn Asn Tyr Leu Leu Val Glu Ala Gly Asp Asp Phe Pro Thr Ala Val Pro Glu Asp Tyr Cys Trp Met Thr Ala His Gln Leu Thr Leu Leu Leu Arg His Gly Tyr Tyr Val Asn Val Glu Ala Arg Ser Leu Leu Ala Cys Leu Gln Ser Trp Information for SEQ ID NO: 40 Length: 1401 Type: DNA
Organism: Streptomyces mobaraensis Strandedness: positive Sequence: 40 gtggtaagcctcccggccggaccggccggcgggctcgtacggtccctcccgccggagtcg60 tacctcgcggacttcgccggcgccgagttcatcgactggttcgcccggtgctccgccgcc120 gcgcggtgccgggtgacccggacgccgctgaccgagctgcggcgctggcggttcgacgcc180 gacacggggaacctcgggcacgactccggggccttcttcgtcgtcgagggcctccaggtg240 cggaccgggtacggaccggtccgcgagtggagccagcccatcatcaaccagcccgagatc300 ggcatcctcggcatgctggtcaaggaggtcgacggcgtcccgtactgcctcgtgcaggcc360 aagatcgagcccggcaaccacaacggcatccaggtgtcgccgaccgtccaggccacgcgc420 agcaactacacccgcatccaccagggccggagcacccgctacctcgagtacttcacggac480 cccggcgcgggccgcaccctggtcgacgtcctgcagtccgagcagggctcctggttcctg540 cgcaagcgcaaccgcaacatggtcgtgcaggtcaccgaggacgtcccggccggcgaggac600 caccactggctgccgctgcccgaactgcggcggctgctgcggatcgacggcctggtcaac660 atggacacccggaccgtactggcctgcctcccggcggacacgttccccgcgtcccagccg720 gtgccggccggggacgccgcgtcggcgctcgtcctctcgacgacgggccggggccgcgcc780 ctgaacgacacggcgaccgtgctgcgctggttcaccggcgccaagagccggcacgaactg840 tcggcgcaccggatccccctgcgcgacctgcccgggtggcgcagcaccccggagcagatc900 acccacgaggacgggcggcacttcagcatcatcggggtgaccgccgaggtcggcaaccgc960 gaggtggccgcatgggaccagcccctgctgtacccggaagggcgcggcgtcgtcgcgttc1020 gtcatcaaggtgatcgaaggggtcgcgcacctgctggtccacgcccggttccagcccggc1080 ctgctggacgggatggagatggggccgaccgtgcagtgcgtcccggagaactacccggac1140 gggccgccgcggttcctcgactacgtcctgaacgccccccgggagcgcgtcctctacaac1200 gccgtcctggccgaggaggggggccgcttctaccactcgcagaacaactacctcctcgtc1260 gaagccggcgacgacttccccacggccgtccccgaggactactgctggatgaccgcgcac1320 cagctcaccctgctcctccggcacggctactacgtcaacgtcgaggcgcgcagcctgctg1380 gcctgcctgc agagttggtg a 1401 Information for SEQ ID NO: 41 Length: 470 Type: PRT
Organism: Micromonospora carbonacea aurantiaca Strandedness: negative Sequence: 41 Val Ser Glu Leu Pro Ala Leu Ser Glu Gly Lys Leu Met Thr Ser Gly Asp Asp Pro Met Glu Trp Trp Glu Pro Ala Ala Gly Gly Val Ser Pro Ala Phe Arg Ser Trp Leu Ala Glu Arg Ser Thr Met Thr Ser Cys Glu Val Glu Arg Ile Arg Leu Asp Glu Leu Arg Gly Trp Ala Phe Asp Glu Thr Thr Gly Asn Leu Ala His Glu Ser Gly Arg Phe Phe Val Val Glu Gly Val His Val Arg Thr Thr Tyr Gly Ala Val Ala Glu Trp Tyr Gln Pro Ile Ile Asn Gln Pro Glu Ile Gly Ile Leu Gly Met Leu Met Lys Leu Val Asp Gly Val Pro His Cys Leu Leu Gln Ala Lys Val Glu Pro Gly Asn Ile Asn Met Met Gln Leu Ser Pro Thr Val Gln Ala Thr Arg Ser Asn Tyr Thr Arg Val His Gly Gly Gly Gly Thr Arg Tyr Leu Glu Tyr Phe Thr Arg Pro Gly Ala Gly Arg Val Leu Val Asp Val Leu Gln Ser Glu Gln Gly Ser Trp Phe Leu His Lys Arg Asn Arg Asn Met Val Val Leu Val Asp Asp Val Pro Pro Ser Asp Tyr His Tyr Trp Leu Pro Leu His Glu Val Arg Arg Leu Leu Arg Ile Asp Val Leu Val Asn Met Asp Thr Arg Ser Val Leu Ser Cys Leu Pro Ser Thr Phe Phe Ala Gly Ser Gly Val Val Pro Ala Asn Ser Ala Met Ala Ala Ala Leu Ala Arg Ser Ala Ser Gly Glu Gly Pro Ser His Arg Ser Thr Glu Ala Val Leu Ser Trp Phe Thr Glu Ala Lys Ser Arg His Glu Leu Ala Val Ser Arg Ile Ser Leu Arg Asp Leu Arg Gly Trp Arg His Thr Pro Tyr Glu Ile Ser Arg Glu Asp Gly Arg His Phe Ser Ile Val Gly Val Thr Val Arg Ile Asn Asn Arg Glu Val Thr Glu Trp Ser Gln Pro Leu Leu His Pro Arg Gln Arg Gly Ile Ile Ala Phe Ala Leu Arg Ile Ile Asn Gly Val Ala His Val Leu Val His Ala Arg Phe Gln Val Gly Leu Leu Asp Ala Met Glu Met Gly Pro Thr Val Gln Cys Thr Pro Ser Ser Asp Ala Glu Gln Arg Pro Pro Phe Leu Asp Leu Ile Leu Asn Ala Pro Ser Glu Arg Ile Leu Phe Asp Thr Val Leu Ala Glu Glu Gly Gly Arg Phe Tyr Arg Ala Glu Asn Arg Tyr Met Leu Val Glu Val Gly Asp Asp Leu Pro Ala Val Pro Asp Thr Phe Cys Trp Val Ala Val His Gln Leu Ala Thr Leu Leu Arg His Gly Tyr Tyr Leu Aan Val Glu Ala Arg Ser Leu Leu Ala Cys Leu His Ser Leu Trp Information for SEQ ID N0: 42 Length: 1413 Type: DNA
Organism: Micromonospora carbonacea aurantiaca Strandedness: negative Sequence:42 gtgagcgagcttcccgcgttgtccgagggcaagctcatgacgtccggggatgacccgatg60 gagtggtgggagccggcggcgggcggggtgagccccgccttccggtcctggcttgccgaa120 cggtcgacgatgacctcctgcgaggtggagcggatccggctggacgagctgcgtggttgg180 gcgttcgacgagaccaccggcaacctggcgcacgagagcggccgcttcttcgtggtcgag240 ggcgtgcacgtgcgaacgacgtacggtgcggtggcggagtggtaccagccgatcatcaat300 cagccggagatcgggattctcggcatgttgatgaagctcgtcgacggcgtcccgcactgc360 ctgttgcaggcgaaggtcgagccgggcaacatcaacatgatgcagctgtcgcccaccgtg420 caggcgacccgcagcaactacacgcgggtccacgggggcggcggcacgaggtatctcgaa480 tacttcacccggccgggcgccggccgggtcctggtcgacgtgttgcagtccgagcagggc540 tcctggttcctgcacaagcggaaccgcaacatggtcgtgctggtcgacgacgtgccgccc600 tccgactaccactactggctgccgctgcacgaggttcggcggctcctgcgcatcgacgtt660 ctggtcaacatggacacccgttccgtgctgtcctgcctgccgtcgaccttcttcgccggt720 tcgggagtggtccccgcgaactccgccatggccgccgcgctggcccggtccgccagcggc780 gagggtccgtcgcaccgctccacggaggcggtgctgagctggttcaccgaggccaagagc840 cggcacgagctggcggtgagccggatctcgctccgcgacctgcgcggctggcggcacacg900 ccgtatgagatctcccgcgaggacgggcgacacttcagcatcgtcggggtcacggtacgc960 atcaacaaccgtgaggtcaccgagtggagtcaaccgttgctgcatccgcggcagcggggg1020 atcatcgcattcgcgctcaggatcatcaatggggtggcgcacgtgctggtgcacgctcgg1080 ttccaggtggggctgctggatgccatggagatgggacccacggtccagtgcacgccgagt1140 tcggatgccgaacagcggcctcctttcctcgacctgatactcaacgccccgagcgagcgg1200 atcctgttcgacaccgtcctcgcggaggagggcggccggttctaccgcgccgagaaccgc1260 Ser Arg Glu Asp Gly Arg His Phe Ser Ile Val Gly Val Thr Val Arg tacatgctgg tggaggtcgg cgacgacctt ccggcggtgc cggacacctt ctgctgggtc 1320 gccgtccatc agctcgccac gctgctgcgg cacggctact acctcaacgt cgaagcacgc 1380 agtctgcttg cctgcctgca cagcctgtgg tga 1413 Information for SEQ ID NO: 43 Length: 484 Type: PRT
Organism: Micromonospora carbonacea africana Strandedness: positive Sequence: 43 Val Ser Asp Ser Ser Pro Asp Pro Lys Val Arg Ala Asp Gly Pro Leu Leu Thr Arg Asp Ala Gly Pro His Arg Pro Gly Pro Val Asp Gly Gly Ser Trp Ser Ala Leu His Ala Glu Gly Val Arg Pro Asp Phe Leu Ser Trp Phe Ala Glu Arg Thr Arg Ser Thr Tyr Cys Arg Val Asp Arg Val Pro Leu Asp Arg Leu Pro Gly Trp Ala Phe Asp Pro Val Thr Gly Asn Leu Gly His Glu Ser Gly Arg Phe Phe Val Ile Glu Gly Leu His Val Gln Thr Thr Tyr Gly Ala Val Arg Glu Trp His Gln Pro Ile Ile Asn Gln Pro Glu Ile Gly Ile Leu Gly Met Leu Val Lys Val Val Asp Gly Thr Pro Tyr Cys Leu Leu Gln Ala Lys Val Glu Pro Gly Asn Ile Asn Val Met Gln Leu Ser Pro Thr Val Gln Ala Thr Arg Ser Asn Tyr Thr Arg Val His Arg Gly Gly Gly Thr Lys Tyr Leu Asp Tyr Phe Thr Arg Pro Gly Ala Gly Arg Val Leu Val Asp Val Leu Gln Ser Glu Gln Gly Ser Trp Phe Leu Arg Lys Arg Asn Arg Aan Met Val Val Gln Val Asp Glu Asp Val Pro Ala Gly Asp Tyr His Arg Trp Leu Pro Leu Arg Glu Leu Leu Ala Leu Leu Arg Val Asp Gly Leu Val Asn Met Asp Thr Arg Thr Val Leu Ser Cys Leu Pro Ser Ala Phe Tyr Ala Ala Ala Gln Glu Thr Glu Ala Pro Ser Ser Pro Ala Val Ala Ala Ile Val Arg Ser Ala Ala Gly Ala Pro Gly Arg His Asp Leu Val Ser Val Leu Ser Trp Phe Thr Gly Ala Lys Gly Arg His Glu Met Thr Val Arg Arg Val Pro Leu Arg Gly Leu Pro Asp Trp Arg His Thr Ala Asp Gly Ile Ala Arg Asp Asp Gly Arg His Phe Ser Val Val Gly Val Thr Val Arg Ile Asp Asn Arg Glu Val Thr Gly Trp Ser Gln Pro Leu Leu Tyr Pro Arg His Arg Gly Val Val Ala Phe Leu Val Lys Glu Ile Asp Gly Val Ala His Leu Leu Val His Ala Arg Tyr Gln Ala Gly Leu Leu Asp Ala Met Glu Met Gly Pro Thr Val Gln Cys Ile Pro Asp Asn Gln Pro Gly Pro Arg Pro Leu Phe Leu Ala Glu Val Leu Glu Ala Ala Pro Glu Arg Val Leu Tyr Asp Thr Val Leu Thr Glu Glu Gly Gly Arg Phe Tyr Arg Ser Glu Asn Arg Tyr Leu Leu Val Asp Ala Gly Asp Asp Phe Pro Thr Glu Val Pro Asp Glu Phe Cys Trp Val Thr Val Arg Gln Leu Glu Ala Leu Leu Arg His Gly Tyr Tyr Leu Asn Ile Glu Ala Arg Ser Leu Leu Ala Cys Leu Arg Ser Leu Trp Information for SEQ ID N0: 44 Length: 1455 Type: DNA
Organism: Micromonospora carbonacea africana Strandedness: positive Sequence: 44 gtgagcgatt cgtcgcccga cccgaaggtc cgcgccgacg ggccgttgct gacccgggac 60 gcgggcccgc accggcccgg cccggtcgac ggcggctcgt ggtcggcgtt gcacgccgag 120 ggggtccggc cggacttcct ctcctggttc gccgagcgga cccggtcgac ctactgccgg 180 gtggaccgcgtgccgctggaccggctgcccgggtgggcgttcgacccggtgaccggcaac240 ctcgggcacgagagcggccggttcttcgtgatcgaggggctgcacgtccagaccacctac300 ggcgcggtgcgcgaatggcaccagccgatcatcaaccagccggagatcggcatcctcggc360 atgctcgtcaaggtcgtcgacgggacaccgtactgcctgctccaggccaaggtggagccc420 ggcaacatcaacgtcatgcagctctcgccgacggtgcaggccacceggagcaactacacc480 cgggtgcaccgtggcggcggcacgaagtacctcgactacttcacccgcccgggggccggt540 cgggtgctggttgacgtcctgcagtcggagcagggctcctggttcctgcgcaagcgcaac600 cggaacatggtggtccaggtcgacgaggacgtgccggccggcgactaccaccggtggctc660 ccgctgcgcgaactgctcgcgctgctgcgggtggacggcctggtcaacatggacacgcgt720 acggtgctgtcatgtctgccctcggccttctacgccgcggcccaggagaccgaggcgccg780 tcgtcgccggcggtggcggcgatcgtgcgctcggcggcgggcgcccccggccgccacgac840 ctggtgtcggtgttgagctggttcaccggggccaagggccggcacgagatgaccgtgcgg900 cgggtgccgctacgagggctgccggactggcggcacaccgcggacgggatcgcgcgggac960 gacggccggcacttctccgtggtcggcgtcacggtgcgcatcgacaaccgcgaggtgacc1020 gggtggagtcagccgctgttgtatccgcggcaccggggcgtggtcgccttcctggtcaag1080 gaaatcgacggagtggcgcacctgctggtgcacgcgcgctaccaggcggggctgctcgac1140 gcgatggagatgggcccaacggtgcagtgcatcccggacaaccagcccggcccccgacca1200 ctgttcctggccgaggtgctggaagccgcccccgagcgtgtcctctacgacaccgtgctc1260 accgaggagggcgggcgattctaccggtcggagaaccgttacctcctggtcgacgccggc1320 gacgacttcccgaccgaggtgccggacgagttctgctgggtgacggtgcgccagctggag1380 gccctgctccggcacggctactacctcaacatcgaggcccgcagcttgctggcctgcctg1440 cgcagcctgtggtga 1455 Information for SEQ ID NO: 45 Length: 430 Type: PRT
organism: Streptomyces mobaraensis Strandedness: positive Sequence: 45 Met Arg Ile Met Phe Thr Ala Ser Asn Trp Ala Gly His Tyr Met Cys Met Val Pro Leu Ala Trp Ala Leu Arg Gly Ala Gly His Glu Ile Arg Val Ala Cys Pro Pro Ser Gln Val Arg Gly Val Ala Ala Ala Gly Leu Met Pro Val Pro Val Leu Asp Ala Pro Asp Met Met Glu Ser Ala Arg Leu Ala Phe Tyr Val Gln Ala Met Tyr Thr Pro Pro Gln Ser Gly Pro Arg Pro Leu Pro Leu His Pro Phe Thr Gly Glu Pro Met Glu Ser Leu Asn Asp Phe Asp Ala Ser Asp Leu Arg Asp Phe Trp Gln Lys Ser Ile Asp Ala Val Gln Arg Ser Tyr Asp Asn Ala Val 5er Phe Gly Asp His Trp Lys Pro Asp Leu Val Val His Asp Ile Met Ala Val Glu Gly Ala Leu Val Ala Ala Leu Arg GIy Val Pro Ser Val Tyr Val Ser Pro Gly Phe Ile Gly Thr Val Glu Thr GIu Pro Gly Leu Asp Leu Val Ser Ala Asp Pro Leu Ser Cys Phe Glu Lys Tyr Gly Val Asp Trp Gly Arg Asp Arg Ile Arg Tyr Ala Val Asp Pro Ser Pro Asp Ser Ala Ile Pro Pro Leu Gly Asp Ala Leu Arg Leu Pro Met Arg Tyr Val Pro Tyr Asn Gly Ala Gln Ser Ala Asp Pro Trp Gln Leu Gly Pro Ile Arg Gly Lys Arg Val Cys Val Val Trp Gly Asn Ser Ala Ser Gly Ile Phe Gly Ser Asp Val Pro Ala Leu Arg His Ala Ile Asp Ala Ala Val Ala Gln Gly Ala Glu Val Val Leu Thr Ala Pro Pro Glu Gln Val Glu Arg Leu Gly Ser Leu Pro Asp Gly Val Arg Leu Leu Arg Aan Cys Pro Leu Glu Leu Ile Leu Pro Tyr Cys Asp Leu Leu IIe His His Gly Ser Ala Asn Cys Tyr Met Asn Gly Ile Val Ala Gly Ile Pro Gln Leu Ser Leu Ala Leu Asn Tyr Asp Thr Leu Val Cys Gly Arg Arg Val Asp Pro Thr Gly Ala Thr Val Thr Leu Ser Gly Leu Glu Ala Thr Ala Glu Lys Ile Asp Glu Ala Leu Arg Gly Ala Leu Phe Asp His Arg Tyr Arg Leu Ala Ala Glu Lys Leu Arg Asp Gly Val Glu Arg Ala Pro Ser Pro Ala Ala Val Ala Gly Leu Leu Glu Arg Leu Val Ala Asp Gly Gly Leu Thr Ala Glu Asp Val Ala Glu Thr Val Arg Asn Ala Asn Glu Val Arg Arg Val Ala Information for SEQ ID N0: 46 Length: 1293 Type: DNA
Organism: Streptomyces mobaraensis Strandedness: positive Sequence:46 atgaggattatgttcaccgcctcgaactgggcgggacattacatgtgcatggtgcccctg60 gcctgggcgctgcgcggcgcggggcacgagatccgcgtggcgtgcccgccgtcgcaggtg120 cgcggggtcgccgcggccggcctgatgccggtcccggtgctggacgcgccggacatgatg180 gagagcgcgcggctggcgttctacgtgcaggccatgtacacgccgccgcagtccggcccg240 cggccgctgccgctgcacccgttcaccggggagccgatggagtcgctcaacgacttcgac300 gcctccgacctgcgggacttctggcagaagtcgatcgacgcggtgcagcgcagctacgac360 aacgcggtgagcttcggcgaccactggaagcccgacctcgtggtgcacgacatcatggcc420 gtcgagggcgccctggtcgccgccctccggggtgtgcccagcgtctacgtctcccccggg480 ttcatcgggaccgtggagaccgagccgggcctcgacctggtctcggcggacccgctgtcg540 tgcttcgagaagtacggggtcgactggggccgcgaccggatccggtacgcggtggacccc600 tcgccggactcggcgatcccgccgctcggcgacgcgctgcggctgcccatgcgctacgtg660 ccgtacaacggcgcgcagtcggccgacccctggcagctgggcccgatccgcggcaagcgc720 gtctgcgtcgtctggggcaactcggcgtcgggcatcttcggctccgacgtccccgcgctg780 cggcacgccatcgacgcggcggtggcgcagggcgccgaggtggtgctcacggcgccgccg840 gagcaggtggagcggctgggctcgctgcccgacggcgtccggctgctgcgcaactgcccg900 ctggagctgatcctcccctactgcgacctgctcatccaccacggcagcgccaactgctac960 atgaacggcatcgtggcggggatcccccagctctccctcgccctcaactacgacacgctg1020 gtctgcggcc gccgcgtcga ccccaccggc gccacggtga ccctctccgg cctcgaggcc 1080 accgccgaga agatcgacga ggcgctgcgc ggcgcgctgt tcgaccaccg ctaccgcctt 1140 gcggcggaga agctccggga cggcgtcgag cgggcgccct cgcccgccgc ggtcgccggg 1200 ctgctggagc gcctggtggc ggacggcggg ctgacggcgg aggacgtcgc cgagaccgtc 1260 aggaacgcca acgaggttcg gagggtcgcg tga 1293 Information for SEQ ID NO: 47 Length: 438 Type: PRT
Organism: Micromonospora carbonacea aurantiaca Strandedness: negative Sequence: 47 Met Arg Ile Leu Phe Thr Val Ser Asn Trp Ala Gly His Tyr Met Cys Met Val Pro Leu Ala Trp Ala Phe Arg Ala Ala Gly His Glu Val Arg Val Ala Cys Pro Pro Gln Gln Val Ser Gly Val Gln Ala Thr Gly Leu Met Pro Val Ser Met Leu Asp Ser Ala Asp Met Met Glu Ser Ala Arg Leu Ala Tyr Trp Ser Leu Ala Ile Asn Thr Pro Pro Gln Ser Gly Glu Met Pro Leu Pro Leu His Pro Phe Thr Gly Glu Ala Leu Gly Ser Val Arg Asp Phe Asp Thr Gly Met Leu Ser Asp Phe Trp Lys Arg Ser Ile Ala Ala Val Gln Arg Ser Phe Asp Asn Ala Val Asp Tyr Ala Ala Ser Trp Arg Pro Asp Leu Val Val Tyr Asp Ile Met Ala Val Glu Gly Ala Leu Val Gly Ile Leu Asn Asp Val Pro Ser Val Phe Phe Gly Pro Gly Phe Ile Gly Thr Val Glu Thr Glu Pro Gly Leu Asn Met Met Ala Gly Asp Pro Leu Ser Cys Phe Glu Lys Tyr Gly Val Gln Trp Thr Arg Arg Asp Ile Lys Tyr Ala Val Asp Pro Ser Pro Asp Val Ala Ile Pro Pro Met Gly Asp Ala Leu Arg Ile Pro Ile Arg Tyr His Pro Phe Asn Gly Ser Gln Asp Val Asp Pro Trp Leu Leu Gly Pro Val Lys Gly Lys Arg Val Cys Val Val Trp Gly Asn Ser Ala Thr Gly Val Phe Gly Glu Arg Leu Pro Ala Leu Arg Gln Ala Val Glu Thr Ala Ala Gln Leu Ala Thr Glu Val Val Leu Thr Ala Ala Leu Ser Glu Val Asp Ala Met Gly Thr Leu Pro Pro Asn Val Arg Val Leu Arg Asn Cys Pro Leu Glu Leu Ile Leu Pro Asp Cys Asp Leu Leu Ile His His Gly Ser Ala Asn Cys Leu Met Aan Gly Ile Ala Met Gly Val Pro Gln Leu Ser Leu Ala Leu Asn Phe Asp Gly Gln Ile Tyr Gly Arg Arg Leu Asp Pro Gln Gly Ala Thr Lys Thr Leu Pro Gly Leu Leu Ile Asp Arg Asp Ala Ile Asp Lys Ala Ile Gly Glu Val Leu Phe Asp His Arg Tyr Arg Arg Arg Ala Val Glu Leu Ser Glu Ser Val Gly Ala Ala Pro Thr Ala Ala Gln Val Ala Asp Leu Leu Val Thr Leu Ala Arg Glu Gly Glu Leu Thr Ala Ser Asp Val Ala Gly Leu Val Thr Gly Arg Gly Pro Gln Arg Lys Glu Ile Thr Gln Asp Thr Val Ser Glu Val Information for SEQ ID NO: 48 Length: 1317 Type: DNA
Organism: Micromonospora carbonacea aurantiaca Strandedness: negative Sequence: 48 atgcggattc tgtttaccgt gtccaactgg gctggacact acatgtgcat ggttccgctc 60 gcatgggcgt tccgggcggc cgggcacgag gtccgggtcg cctgcccgcc ccagcaggtt 120 tcgggggtcc aggcgaccgg cctgatgccg gtgtcgatgc tcgactccgc cgacatgatg 180 gaaagcgccc ggctggccta ctggtcgttg gcgatcaaca ccccgccgca gagcggggag 240 atgcctctgc cgctgcatcc cttcaccggt gaggcgctcg gctccgtgcg cgatttcgac 300 accggcatgc tgagcgactt ctggaagagg tccatcgccg cggtccagcg cagcttcgac 360 aacgccgtcg actacgccgc gtcctggcgt cccgacctgg tggtgtacga catcatggcg 420 gtcgaggggg cgctggtcgg cattttgaac gacgtgccca gcgtcttctt cgggccgggc 480 ttcatcggcacggtggagaccgagccggggctcaacatgatggcgggggatcccctctcc540 tgcttcgagaagtacggcgtgcagtggacccggcgcgacatcaagtacgccgtcgatccg600 tcgccggacgtggccattccgcccatgggcgacgcgctgcggatacccatccgataccac660 cccttcaacggatctcaggacgtcgacccctggctgttgggcccggtcaagggcaagcgg720 gtgtgcgtggtgtggggcaactccgcgacgggggtgttcggcgagcggctgccggccctt780 cgacaggcggtcgagaccgcggcgcaactggccaccgaggtggtgctcacggcggcgctg840 tccgaggtggacgcgatgggcacgctgccaccgaacgtccgggtcctgcgcaactgccca900 ctcgaactgatcctgcccgactgcgacctgctcatccaccacggcagcgcgaactgcctg960 atgaacggcatcgccatgggggtgccgcagctgtcactcgcgctgaacttcgacgggcag1020 atctacggtcggcggctggatccgcagggagcgaccaagacgttgcccgggctgctgatc1080 gaccgcgacgcgatcgacaaggccatcggtgaggtgctgttcgaccaccggtaccggcgc1140 cgggcggtcgagctcagcgagtccgtcggtgccgcgccgaccgccgcgcaggtcgccgac1200 ctgctcgtcaccctggcccgcgagggtgagctgaccgcctccgacgtcgccgggttggtg1260 accgggcgagggccgcaacgtaaggaaatcacccaggacaccgtgagcgaggtctga 1317 Information for SEQ ID NO: 49 Length: 423 Type: PRT
Organism: Micromonospora carbonacea africana Strandedness: positive Sequence: 49 Met Lys Val Leu Phe Thr Val Ser Asn Trp Ala Gly His Tyr Met Cys Met Val Pro Leu Ala Trp Ala Leu Arg Ala Ala Gly His Asp Val Lys Val Ala Cys Ser Pro Ser Gln Val Arg Gly Val Ala Ala Ala Gly Met Met Pro Val Ser Val Leu Asp Gly Pro Asp Met Met Glu Ser Ala Arg Leu Gly Phe Tyr Val Gln Ala Leu Tyr Thr Pro Gln His Met Val Glu Gln Pro Leu Pro Leu Asn Pro Phe Thr Gly Arg Pro Met Asp Ser Leu Ala Aap Phe Asp Thr Asp Leu Leu Ala Asp Tyr Trp Lys Arg Thr Val Thr Ala Val Gln Arg Ser Tyr Asp Asn Ala Val Asp Tyr Ala Ala His Tyr Arg Pro Asp Leu Val Val His Asp Ile Met Ala Val Glu Gly Ala Leu Val Ala Glu Leu His His Ile Pro Ser Val Tyr Phe Ser Pro Gly Phe Ile Gly Thr Ile Glu Thr Glu Pro Gly Leu Asp Leu Val Ser Gly Asp Pro Val Thr Glu Phe Arg Lys Tyr Gly Val Glu Trp Ser Arg His Gln Ile Arg Tyr Ala Ile Asp Pro Ser Pro Asp Val Ala Ile Pro Pro Met Gly Asp Ala Leu Arg Ile Pro Ile Arg Tyr Gln Pro Tyr Asn Gly Ser Gln Asp Val Asp Pro Trp Leu Leu Gly Pro Arg Arg Gly Lys Arg Val Cys Val Val Trp Gly Asn Ser Ala Thr Gly Val Phe Gly Ala Gln Val Pro Ala Leu Arg His Ala Val Asp Ala Ala Ala Gln Arg GIy Val Glu Val Val Ile Thr Ala Ala Ser Ser Glu Val Glu Gly Leu Gly Ala Leu Pro Pro Asn Val Arg Val Leu Ser Asn Cys Pro Leu Glu Leu Ile Leu Pro Asp Cys Asp Leu Leu Val His His Gly Ser Ala Asn Cys Tyr Met Aan Gly Leu Ala Met Gly Val Pro Gln Leu Ser Leu Ala Leu Asn Tyr Asp Ala Leu Ile Tyr Gly Arg Arg Leu Asp Pro Gln Gly Ala Thr Lys Thr Leu Pro Gly Leu Lys Ala Ser Arg Glu Glu Val Asp Glu Ala Leu Gly Ser Val Leu Tyr Asp His Arg Phe Arg Val Ala Ala Gln Arg Met Arg Glu Ser Val Thr Thr Gly Pro Thr Ala Val Gln Val Ala Glu Leu Leu Ala Arg Leu Ala Asp Thr Gly Ala Leu Ser Pro Glu Asp Val Ala Glu Phe Ala Arg Arg Pro Information for SEQ ID NO: 50 Length: 1272 Type: DNA
Organism: Micromonospora carbonacea africana Strandedness: positive Sequence: 50 atgaaggttc tgttcaccgt gtccaactgg gccgggcact acatgtgcat ggtgccgctg 60 gcctgggcgc tgcgggcggc cggccacgac gtcaaggtcg cctgttcgcc gtcccaggtc 120 cggggcgtgg ccgcggcggg catgatgccc gtctcggtgc tcgacggacc cgacatgatg 180 gagagcgccc ggctgggctt ctacgtccag gccctctaca ccccgcagca catggtggag 240 caaccgctgc cgttgaaccc cttcacgggc cggccgatgg actcgctcgc cgacttcgac 300 accgacctgctcgccgactactggaagcgcacggtcaccgcggtccagcgcagctacgac360 aacgcggtcgactacgccgcccactaccggcccgacctggtggtccacgacatcatggcc420 gtggagggcgcgctggtcgccgagctgcaccacatccccagcgtctacttctcgcccggg480 ttcatcggcaccatcgagaccgagcccgggctcgacctggtctccggcgacccggtgacc540 gagttccgcaagtacggcgtcgagtggagccggcaccagatccggtacgccatcgacccg600 tcgcccgacgtggcgatcccgccgatgggcgacgcgctgcggatcccgatccgctaccag660 ccctacaacggctcccaggacgtggacccctggctgctcggtccgcgccggggcaagcgg720 gtctgcgtggtgtggggcaactccgccacgggcgtgttcggcgcgcaggtgccggcgctg780 cggcacgccgtcgacgccgccgcccagcggggcgtggaggtcgtgatcaccgccgcctcc840 tccgaggtggaggggctgggcgcgctgccgccgaacgtgcgggtgctcagcaactgcccg900 ctggagctcatcctccccgactgcgacttgctggtgcaccacggcagcgccaactgctac960 atgaacgggctcgccatgggcgtgccgcagttgtcgctggcgctcaactacgacgccctg1020 atctacgggcggcggctcgacccgcagggcgcgacgaagacgctgcccggcctgaaggcg1080 tcccgcgaggaggtcgacgaggccctcggctccgtcctctacgaccaccgattccgggtg1140 gcggcgcagcggatgcgggagtccgtcaccaccgggccgaccgccgtccaggtggccgaa1200 ctcctggcccggctggccgacaccggcgcgctgagccccgaggacgtcgccgagttcgcc1260 cggcgtccgtga 1272 Information for SEQ ID N0: 51 Length: 335 Type: PRT
Organism: Streptomyces mobaraensis Strandedness: positive Sequence: 51 Met Asp Ala Glu Arg Arg Pro Lys Val Val Thr Ile Thr Val Gly Thr Asn Glu Leu Arg Trp Leu Asp Arg Cys Leu Gly Ser Leu Leu Asp Ser Asp Thr Thr Gly Ile Asp Leu Glu Val Cys Tyr Val Asp Asn Asp Ser Ala Asp Gly Ser Val Glu His Val Arg Glu Lys Tyr Pro Arg Ala Thr Val Ile Arg Asn Asp Arg Asn Leu Gly Phe Ala Gly Ala Asn Asn Val Gly Met Arg Arg Ala Leu Glu Ser Gly Ala Asp Tyr Val Phe Leu Val Asn Pro Asp Thr Trp Thr Pro Pro Gly Leu Val Arg Gly Leu Thr Glu Val Ala Glu Glu Trp Pro Glu Tyr Gly Ile Leu Gly Pro Leu Gln Tyr Arg Tyr Asp Ala Asp Ser Thr Ala Leu Asp Glu Phe Asn Asp Trp Thr His Thr Val Leu Trp Leu Gly Glu Gln His Ala Phe Ala Gly Asp Gly Ile Ala His Pro Ser Thr Ala Gly Pro Ala Glu Gly Arg Ala Pro Arg Thr Leu Glu His Ala Tyr Val Gln Gly Ser Ala Leu Phe Ala Arg Thr Ala Met Leu Arg Glu Thr Gly Leu Phe Aap Glu Val Leu His Thr Tyr Tyr Glu Glu Thr Asp Leu Cys Arg Arg Ala Arg Trp Ala Gly Trp Arg Val Ala Leu His Leu Asp Leu Gly Ile Gln His Arg Gly Gly Gly Gly Ala Ala Val Pro Ser Glu Tyr Ser Arg Val His Met Arg Arg Asn Arg Tyr Tyr Tyr Leu Leu Thr Asp Ile Asp Trp His Pro Ala Lys Ala Ala Arg Leu Ala Gly Arg Trp Leu Val Ala Asp Leu Arg Gly His Ser Val Val Gly Arg Val Pro Ala Ala Thr Gly Ala Arg Glu Thr Ala Glu Ala Leu Arg Trp Leu Ala Gly Arg Val Pro Thr Met Arg Ser Arg Arg Arg Ser His Arg Ala Leu Arg Ala Arg Gly Ala Lys Gly Ala Ser Arg Information for SEQ ID NO: 52 Length: 1008 Type: DNA
Organism: Streptomyces mobaraensis Strandedness: positive Sequence:52 atggacgccgagcgccggccgaaggtcgtgacgatcaccgtcgggaccaacgagctgcgc60 tggctggaccggtgcctcggttccctgctggacagcgacaccaccggcatcgacctggag120 gtctgctacgtcgacaacgactcggcggacggcagcgtcgagcacgtgcgggagaagtac180 ccccgggcgacggtcatccgcaacgaccgcaacctgggcttcgccggggcgaacaacgtc240 ggcatgcgccgcgcgctggagtccggcgcggactacgtgttcctcgtgaaccccgacacc300 tggaccccgcccggcctggtgcgcggcctcaccgaagtggccgaggagtggccggagtac360 ggcatcctcggcccgctccagtaccgctacgacgccgactccacggcgctcgacgagttc420 aacgactggacgcacacggtcctgtggctgggcgagcagcacgccttcgccggcgacggg480 atcgcccacccctccaccgccggccccgccgagggccgggccccgcgcacgctggagcac540 gcctacgtccagggttcggcgctgttcgcccggaccgcgatgctccgcgagaccgggctc600 ttcgacgaggtgctgcacacctactacgaggagaccgacctgtgccgccgggcccgctgg660 gcgggctggcgggtggcgctccacctggacctgggcatccagcaccggggcggcggaggc720 gccgccgtccccagcgagtacagccgggtccacatgcggcgcaaccgctactactacctc780 ctgacggacatcgactggcatccggcgaaggccgcccggctggccgggcgctggctggtg840 gcggacctcaggggccacagcgtcgtcggccgcgtgccggccgcgacgggggcgcgggag900 acggccgaggcgctgcgctggctcgccggccgggtgccgaccatgcggtcccggcgccgg960 agccaccgcgcgctgcgggcccggggcgcgaagggggcttcccgatga 1008 Information for SEQ ID NO: 53 Length. 337 Type: PRT
Organism: Micromonospora carbonacea aurantiaca Strandedness: positive Sequence: 53 Met Thr Ser Gly Arg Pro Arg Val Ala Thr Val Thr Val Thr Thr Asn Glu Ser Lys Trp Leu Arg Arg Cys Leu Gly Ala Leu Val Asp Ser Asp Thr Glu Gly Phe Asp Leu Asp Val His Leu Ile Asp Asn Ala Ser Thr Asp Gly Ser Ala Glu Leu Val Ala Arg Glu Phe Pro Ser Val Lys Ile Thr Arg Asn Pro Thr Asn Leu Gly Phe Ala Gly Ala Asn Asn Val Gly Ile Arg Ala Ala Leu Ala Ala Gly Ala Asp Tyr Val Phe Leu Val Asn Pro Asp Thr Trp Thr Pro Pro Arg Leu Val Arg Ala Met Val Glu Phe Ala Glu Arg Trp Pro Glu Tyr Gly Ile Val Gly Pro Leu Gln Tyr Arg Tyr Asp Ala Glu Ser Thr Glu Leu Val Glu Phe Asn Asp Trp Thr Asn Thr Ala Leu Trp Leu Gly Glu Gln His Ala Phe Ala Gly Asp Gly Met Ala His Pro Ser Pro Ala Gly Ser Pro Gln Gly Arg Ala Pro Arg Thr Leu Glu His Ala Tyr Val Gln Gly Ala Ala Leu Phe Ala Arg Val Ala Met Leu Arg Glu Val Gly Val Phe Asp Glu Val Phe His Thr Tyr Tyr Glu Glu Val Asp Leu Cys Arg Arg Ala Arg Trp Ala Gly Trp Arg Val Ala Leu Leu Leu Asp Glu Gly Leu Gln His His Gly Gly Gly Gly Ala Ala Thr Arg Ser Ala Tyr Thr Arg Val His Met Arg Arg Asn Arg Tyr Tyr Tyr Leu Leu Thr Asp Val Asp Trp His Pro Thr Lys Ala Thr Arg Leu Ala Ala Arg Trp Leu Val Ala Asp Leu Val Gly Arg Thr Val Val Gly Arg Val Asp Pro Met Thr Gly Ala Arg Glu Thr Leu Ala Ala Val Arg Trp Leu Ala Gly His Ala Pro Thr Ile Ala Glu Arg Arg Arg Ser His Arg Ala Leu Arg Ala Gly Arg Thr Pro Ala Arg Arg Glu Val Ala Ser Information for SEQ ID NO: 54 Length: 1014 Type: DNA
Organism: Micromonospora carbonacea aurantiaca Strandedness: positive Sequence:54 atgacatcgggacgcccgcgggtggcgaccgtcacggtgaccaccaacgagagcaagtgg60 ctgcgtcgctgcctgggggcgcttgtcgacagtgacaccgaaggattcgatcttgacgtg120 cacctgatcgacaacgcctccaccgacggcagcgcggagctggtcgcgcgggagttcccg180 agcgtgaagatcacccgtaatcccaccaacctcgggttcgccggcgccaacaacgtcggc240 atccgggccgcgctcgccgccggcgccgactacgtgttcctggtcaacccggacacctgg300 accccgccacggctcgtccgggcgatggtcgaattcgccgagcgttggccggagtacggc360 atcgtcggcccgctgcaataccgctacgacgccgagtcgaccgagctcgtcgagttcaac420 gactggaccaacacggcactctggctgggcgaacagcacgcgttcgcgggcgacgggatg480 gctcatccctccccggccggcagcccgcaaggccgcgcgccgaggaccctggagcacgcg540 tacgtccagggcgcggcgctgttcgcgcgggtggcgatgctgcgcgaggtgggcgtgttc600 gatgaggtgttccacacgtactacgaggaggtggacctgtgccggcgggccagatgggcg660 ggctggcgggtggccctcctgctcgacgagggcctgcaacaccacggcggcggcggtgcg720 gccacgcgcagcgcgtacacccgggtgcacatgcggcgcaaccgttactactacctgctc780 acggacgtggactggcacccgaccaaggcgacccggctggccgcccggtggctggtggcg840 gacctggtcggccggaccgtggtcggcagggtggacccgatgaccggggcccgggaaacc900 ctggcggcggtgcgctggctggcgggccacgcgccgaccatagcggaacgtcgacgcagt960 caccgggcgttgcgcgcgggccgtacgccggcacggcgtgaggtggcgtcgtga 1014 Information for SEQ TD NO: 55 Length: 347 Type: PRT
Organism: Micromonospora carbonacea africana Strandedness: negative Sequence: 55 Met Thr Arg Glu Gly Ser Thr Pro Pro Val Arg Val Ala Thr Ile Thr Val Gly Thr Asn Glu Ile Arg Trp Leu Asp Arg Ala Leu Gly Ser Leu Leu Ala Ser Asp Thr Thr Gly Phe Glu Leu Thr Val Phe Tyr Val Asp Asn Ala Ser Ala Asp Gly Ser Val Ala His Val Met Ser Ala Phe Pro Gly Val Arg Val Ile Arg Asn Pro Arg Asn Leu Gly Phe Thr Gly Ala Asn Asn Val Gly Met Arg Ala Ala Leu Ala Glu Gly Phe Asp His Ile Phe Leu Val Asn Pro Asp Thr Trp Thr Pro Pro Gly Leu Val Arg Gly Leu Val Glu Phe Ala Gln Arg Trp Pro Gln Tyr Gly Val Ile Gly Pro Leu Gln Tyr Arg Tyr Asp Pro Ala Ser Thr Glu Leu Thr Asp Phe Asn Asp Trp Thr Gln Val Ala Leu Tyr Leu Gly Glu Gln His Thr Phe Ala Gly Asp Leu Leu Asp His Pro Ser His Val Thr Ala Thr Val Arg Asp Arg Ala Pro Arg Thr Leu Glu His Ala Tyr Val Gln Gly Ser Ala Leu Phe Val Arg Ala Ala Val Leu Arg Glu Val Gly Leu Leu Asp Glu Val Phe His Thr Tyr Tyr Glu Glu Val Asp Leu Cys Arg Arg Ala Arg Trp Ala Gly Trp Arg Val Ala Leu Leu Leu Asp Leu Gly Ile Gln His Lys Gly Gly Gly Gly Thr Ala Ala Ser Ala Tyr Ser Arg Ile His Met Arg Arg Asn Arg Tyr Tyr Tyr Leu Leu Thr Asp Val Asp Trp Pro Pro Ala Lys Ala Ala Arg Leu Ala Ala Arg Trp Leu Phe Ser Asp Val Arg Gly Arg Gly Val Thr Gly Arg Thr Ser Ala Gly Val Gly Ala Arg Glu Thr Phe Val Ala Leu Gly Trp Leu Ala Arg Gln Ala Pro Val Ile Arg Glu Arg Arg Arg Arg His Arg Leu Leu Arg Ala Arg Gly Thr Gly Val Asp Arg Ala Arg Glu Arg Lys Glu Thr Val Arg Gly Information for SEQ ID NO: 56 Length: 1044 Type: DNA
Organism: Micromonospora carbonacea africana Strandedness: negative _ 55 _ Sequence:56 atgacccgcgaggggtcaacgccgccggttagggtcgccaccatcacggtcggcaccaac60 gagatccgttggctggaccgcgcgctcggctcgctgctcgccagcgacacgaccggcttc120 gagctgacggtcttctacgtggacaacgcctcggccgacggcagcgtggcgcacgtcatg180 tcggcgtttcccggcgtccgggtcatccgaaacccccgcaatctcggcttcaccggcgcg240 aacaacgtcggcatgcgggcggccctggcggagggcttcgaccacatcttcctggtcaac300 ccggacacctggacaccgccggggctggtccgcgggctggtcgagttcgcgcagcggtgg360 ccgcagtacggcgtcatcggcccgttgcagtaccgctacgacccggcgtcgaccgagttg420 accgacttcaacgactggacgcaggtcgccctctacctgggcgagcagcacaccttcgcc480 ggcgacctgctggatcatccctcgcacgtcaccgcgacggtccgcgaccgcgcgccgcgc540 accctggagcacgcgtacgtgcagggctcggcgctgttcgtccgggccgccgtgctacgc600 gaggtcggcctgctcgacgaggtgttccacacctactacgaggaggtcgacctgtgccgg660 cgggcccggtgggcgggctggcgggtggcgctcctactcgacctcggcatccagcacaaa720 ggcggcggtggcaccgccgcgagcgcgtacagccggatacacatgcgccgcaaccgctac780 tactatctgctgaccgatgtggactggcccccggccaaggccgcccggctcgccgcccgc840 tggctgttctccgacgtccgtgggcggggcgtgacgggtcggacgagcgcgggcgtcggg900 gcgcgggagaccttcgtggcgctcgggtggctggcccgccaggccccggtgatccgggaa960 cgtcgtcggcggcaccggctgctgcgggcacgagggacgggcgtggaccgcgcccgagag1020 cggaaggaaaccgtgcggggatga 1044 Information for SEQ ID NO: 57 Length: 294 Type: PRT
Organism: Streptomyces mobaraensis Strandedness: negative Sequence: 57 Val Asp Lys Lys Glu Thr Ala Asp Arg Gly Thr Pro Gly Gln Arg Pro Met Val Val Ala Val Cys Ala Phe Arg Leu Glu Asn Val Arg Arg His Leu Arg His Asn Leu Asp Gln Leu Asn Gly Asp Glu Tyr Leu Val Leu Leu Asp Arg Pro Val Thr Pro Glu Ala Glu Lys Val Ala Ala Gln Val Asn Glu Gly Gly Gly Thr Met Arg Val Leu Gly Ala Thr Asn Gly Leu Ser Ala Ser Arg Asn Thr Val Leu Arg Glu Tyr Ala Gly Arg His Ile Leu Phe Val Asp Asp Asp Val Arg Leu Asp Ala Ser Ala Val Asp Ala Val Arg Ala Ala Phe Arg Ala Gly Ala His Val Val Gly Ala Arg Leu Arg Pro Pro His Asp Ile Gly Arg Leu Pro Trp Phe Leu Ser Ser Gly Gln Tyr His Leu Val Gly Trp His Arg Ala Arg Gly Pro Val Lys Ile Trp Gly Ala Cys Met Gly Val Asp Ala Asp Phe Ala His Arg Arg Gly Leu Thr Phe Asp Leu Gly Leu Ser Arg Thr Gly Gly Asn Leu Gln Ser Gly Glu Asp Thr Thr Phe Ile Ala Leu Met Lys Glu Ala Gly Ala Val Glu Arg Leu Leu Pro Glu His Ala Val Val His Asp Ile Asp Pro Gly Arg Leu Thr Leu Arg Tyr Leu Leu Arg Arg Ala Tyr Trp Gln Gly His Ser Glu Ala Arg Arg His Gln Ser Val Ala Gly Leu Arg Lys Glu Leu Asp Arg His Arg Thr Ala Pro Glu Ser Arg Cys Thr Pro Leu Leu Phe Cys Leu Tyr Gly Ala Ala Thr Ala Ile Gly Val Gly His Gly Leu Leu Leu Arg Leu Arg Gly Lys Information for SEQ ID N0: 58 Length: 885 Type: DNA
Organism: Streptomyces mobaraensis Strandedness: negative Sequence:58 gtggacaagaaagagaccgcggaccggggcactcccgggcagcgtcccatggtggtggcg60 gtctgtgccttccgtctggagaacgtgcgccggcatctgcggcacaacctcgaccagttg120 aacggcgacgaatacctggtcctgctggaccggccggtcaccccggaggccgagaaggtc180 gccgcccaggtgaacgagggcggcgggacgatgcgcgtcctcggcgccacgaacgggctg240 tcggcgtcccgcaacacggtgctgcgcgagtacgccgggcggcacatcctgttcgtcgac300 57 _ gacgacgtgcgcctcgacgcctccgccgtggacgccgtccgcgccgcgttccgcgccggg360 gcgcacgtggtcggggcccggctgcgaccgccgcacgacatcggccggctgccctggttc420 ctgtcctccggccagtaccacctcgtcggctggcaccgagcccgcggtcccgtcaagatc480 tggggcgcctgcatgggcgtcgacgccgacttcgcccaccgccggggcctcaccttcgac540 ctgggcctcagccgcacgggcggcaacctgcagtccggcgaggacaccaccttcatcgcg600 ctgatgaaggaggcgggcgccgtggagcgcctgctgcccgagcacgcggtcgtccacgac660 atcgaccccggccggctgaccctccgctacctcctgcgcagggcctactggcaggggcac720 tcggaggcgcggcgccaccagtccgtcgcgggtctccgcaaggagctggaccggcaccgc780 accgcccccgagtcccgctgcacccccctgctgttctgcctgtacggcgccgccaccgcg840 atcggtgtcggccacggcctcctgctgcgcctgcgcggcaagtga 885 Information for SEQ ID NO: 59 Length: 295 Type: PRT
Organism: Micromonospora carbonacea aurantiaca Strandedness: negative Sequence: 59 Met Glu Thr Lys Lys Lys Gly Pro Ser Asp Tyr Pro Met Val Val Ala Ile Cys Ala Phe Ser Val Glu Asn Val Arg Lys His Leu His His Asn Leu Ala Gln Leu Ser Gly Asp Glu Tyr Phe Val Gln Leu Asp Arg Pro Ser Thr Pro Glu Ala Glu Ser Val Ala Ala Glu Val Asp Ala Ala Gly Gly Thr Met Arg Val Leu Gly Ala Thr Gly Gly Leu Ser Ala Ser Arg Asn Leu Met Leu Ala Arg Trp Pro Asn His His Val Met Phe Val Asp Asp Asp Val Arg Leu Asp Ala Lys Ala Val Thr Ala Val Arg Glu Ser Leu Arg Ala Gly Thr His Val Val Gly Thr Arg Leu Ala Arg Pro Pro Arg Pro Leu Pro Trp Tyr Val Thr Ser Gly Gln Phe His Leu Leu Gly Trp His Arg Asp Asp Arg Glu Ile Lys Ile Trp Gly Ala Cys Met Ala Val Asp Thr Ala Phe Ala His Ala Lys Gly Leu Asp Phe Asp Leu Ala Leu Ser Arg Thr Gly Gly Asn Leu Gln Ser Gly Glu Asp Thr Thr Phe Val Lys Leu Met Lys Asp Ala Gly Ala Arg Glu Gln Leu Leu Pro Asp Cys Ser Val Thr His Asp Val Asp Pro Ala Arg Leu Thr Leu Arg Tyr Leu Leu Arg Arg Ala Tyr Trp Gln Gly Arg Cys Glu Ser Arg Arg GIy Gln Ala Trp Ala Gly Phe Arg Lys Glu Trp Asp Arg His Arg Thr Ala Pro Glu Ser Arg Leu Arg Leu Pro Leu Ala Cys Leu Tyr Gly Ala Val Thr Ala Trp Gly Val Leu His Asp Gln Leu Leu Leu Arg Trp Gly Arg Asp His Arg Ser Ala Ala Val Information for SEQ ID N0: 60 Length: 888 Type: DNA
Organism: Micromonospora carbonacea aurantiaca Strandedness: negative Sequence: 60 atggaaacgaagaaaaaagggccctccgattatccgatggtagtggctatatgcgcattc60 agtgtcgagaatgtgcgtaagcacctacaccacaacctggcccaactgtccggtgacgag120 tacttcgtgcagttggaccggccgagcactccggaggcggagtcggtcgcggccgaggtg180 gatgccgccggcggaacgatgcgcgttctcggcgccaccggcggcctgtccgcctcgcgg240 aatctcatgttggcccggtggccgaatcaccacgtgatgttcgtcgacgacgacgtgcgg300 ctggacgccaaggccgtcacggcggtccgcgagagcctgcgcgcgggcacccacgtcgtc360 ggcactcggctcgcccgtccaccacgccccctgccctggtacgtgacgtccggtcagttc420 cacctgctcggctggcaccgggacgaccgcgagatcaagatctggggtgcctgtatggcg480 gtggacaccgccttcgcccatgccaagggcctcgacttcgatctcgcgttgagccggacg540 ggtggcaacctccagtccggcgaggacaccacgttcgtcaagctcatgaaggacgcgggg600 gcccgggagcaactcctgccggactgctccgtgacccacgacgtggatcccgcccggctg660 accctgcgctacctgctgcggcgggcgtactggcagggtcggtgcgagtcacggcgtggc720 caggcgtggg cggggttccg gaaggaatgg gaccggcatc gcaccgcgcc ggagtcccgc 780 ctgcgcctgc cgctggcgtg tctctacggt gccgtgaccg cgtggggcgt ccttcatgat 840 cagctcctgc tgaggtgggg gcgtgatcat cgctcggcgg cggtctga 888 Information for SEQ ID NO: 61 Length: 304 Type: PRT
organism: Micromonospora carbonacea africana Strandedness: positive Sequence: 61 Met Pro Asn Asn Ala Ser Val Val Ser Arg Asp Pro Ser Asp His Pro Met Val Val Ala Ile Cys Ala Phe Arg Val Glu Asn Val Arg Lys His Leu Ala His Asn Met Ala Gln Leu Ser Gly Asp Glu Tyr Tyr Val Leu Leu Asp Arg Pro Val Thr Ala Glu Ala Glu Glu Val Ala Glu Glu Val Arg Ala Ala Gly Gly Thr Met Arg Ile Leu Gly Ala Thr Asn Gly Leu Ser Ala Ser Arg Asn Ala Met Leu Ala Arg Trp Pro His His His Leu Met Phe Val Asp Asp Asp Val Arg Leu Asp Ala Ala Ala Val Asp Ala Val Arg Lys Ser Leu Arg Asp Gly Ala His Val Val Gly Thr Arg Leu Ala Arg Pro Ala Leu Arg Leu Pro Trp Tyr Val Thr Ser Gly Gln Phe His Leu Val Gly Trp His Arg Aap Gln Gly Asn Ile Lys Ile Trp Gly Ala Cys Met Gly Val Asp Ser Ala Phe Ala His Ala His Gly Leu Asp Phe Aap Leu Ala Leu Ser Arg Thr Gly Gly Asn Leu GIn Ser Gly Glu Asp Thr Ser Phe Ile Ser Ala Met Lys Ala Ala Gly Ala Arg Glu Gln Leu Leu Pro Asp His Ala Val Thr His Asp Ile Asp Pro Gly Arg Leu Thr Pro Arg Tyr Leu Leu Arg Arg Ala Tyr Trp Gln Gly Arg Cys Glu Ala Gly Arg Asn Gln Ala Arg Ala Gly Leu Arg Lys Glu Trp Asp Arg His Arg Thr Ala Pro Glu Ser Arg Leu Ala Leu Pro Leu Ala Leu Gly Tyr Thr Ala Ala Thr Ala Ala Gly Leu Ser His Glu Leu Leu Arg Arg Ala Arg Leu Arg Arg Phe Pro Pro Pro Arg Arg Ser Pro Glu Pro Gly Information for SEQ ID NO: 62 Length: 915 Type: DNA
Organism: Micromonospora carbonacea africana Strandedness: positive Sequence:62 atgccgaacaacgcctctgtggtcagccgcgatccgtccgaccacccgatggtggtggcg60 atctgcgcgttccgggtggagaacgtcaggaaacacctcgcgcacaacatggcccagctc120 tccggcgacgagtactacgtcctgctggaccggcccgtcacggccgaggcggaggaggtc180 gccgaggaggtccgggccgccggcggcaccatgcgcatcctcggtgccaccaatggcctg240 tcggcctcccgcaacgcgatgctcgcccgctggccgcaccaccatctgatgttcgtcgac300 gacgacgtgcggctcgacgccgctgccgtcgacgccgtccgcaagagcctgcgcgacggc360 gcgcacgtggtcggcacccggctggcccgccccgcgctgcgtctgccgtggtacgtcacc420 tccggccagttccacctggtcggctggcaccgtgaccagggaaacatcaagatctggggc480 gcgtgcatgggggtggactccgcgttcgcgcacgcccacgggttggatttcgacctggcc540 ctcagccgtaccgggggcaacctccagtcaggggaggacacctccttcatcagcgccatg600 aaggccgccggcgcccgcgagcaactgctcccggaccacgcggtcacccatgacatcgac660 ccgggccggctgaccccccggtacctgctgcggcgggcgtactggcaggggcggtgcgag720 gccggacggaaccaggcacgggccggcctgcgcaaggaatgggaccggcaccgtacggcc780 ccggaatcccggctcgccctgccactggccctcggctacaccgcggcgaccgccgccggg840 ttgagccacgaactgctgcgaagggcgcggttgcggcggttcccgcccccgagacgctct900 cccgagccgg gctga 915 Information for SEQ ID NO: 63 Length: 342 Type: PRT
Organism: Streptomyces mobaraensis Strandedness: positive Sequence: 63 Met Arg Pro Leu Lys Val Ala Leu Val Asn Ile Pro Leu Arg Val Pro Gly Ser Asp Glu Trp Ile Thr Val Pro Pro Gln Gly Tyr Gly Gly Ile Gln Trp Val Val Ala Asn Leu Met Asp Gly Leu Leu Glu Leu Gly His Glu Val Phe Leu Leu Gly Ala Pro Gly Ser Pro Ala Gly Ala Pro Gly Leu Thr Val Val Pro Val Gly Glu Pro Glu Glu Ile Gln Arg Trp Leu Arg Glu Ser Asp Val Asp Val Val His Asp His Ser Gly Gly Leu Ile Gly Pro Ala Gly Leu Arg Pro Gly Thr Ala Phe Ile Ser Ser His His Phe Thr Thr Arg Ala Val Asn Pro Ala Gly Cys Thr Tyr Ser Ser Arg Ala Gln Arg Ala His Cys Gly Gly Gly Asp Asp Ala Pro Val Ile Pro Ile Pro Val Asp Pro Ala Arg Tyr Arg Ser Ala Ala Asp Ala Val Pro Lys Glu Asp Phe Leu Leu Phe Met Gly Arg Ile Ser Pro His Lys Gly Ala Leu Glu Ala Ala Ala Phe Ala Ala Ala Ser Gly Arg Arg Leu Val Leu Ala Gly Pro AIa Trp Glu Pro Asp Tyr Phe Ala Glu Ile Thr Ser Arg Tyr Gly Ser Thr Val Glu Val Ile Gly Glu Val Gly Ala Glu Arg Arg Leu Asp Leu Leu Ala Ser Ala His Ala Val Leu Ala Met Ser Gln Ala Val Glu Gly Pro Trp Gly Gly Ile Trp Cys Glu Pro Gly Ala Thr Val VaI Ser GIu Ala Ala Val Ser Gly Thr Pro Val Val Gly Thr Arg Asn Gly Cys Leu Ala Glu Ile Val Pro Ser Val Gly Glu Val Val Ala Tyr Gly Thr Asp Phe Thr Pro Glu Glu Ala Arg Arg Val Leu Asp Gly Leu Pro Ser Pro Asp Glu Val Arg Arg Glu Ala Val Arg Leu Trp Gly His Val Glu Ile Ala Gly Arg Tyr Val Glu Gln Tyr Arg Arg Leu Leu Ser Gly Val Thr Trp Lys Information for SEQ ID N0: 64 Length: 1029 Type: DNA
Organism: Streptomyces mobaraensis Strandedness: positive Sequence:64 atgaggccgttgaaggtcgcgctggtcaacatcccgctccgggtcccggggagcgacgag60 tggatcaccgtgccgccgcaggggtacggcgggatccagtgggtcgtcgccaacctgatg120 gacgggctgctcgaactcggccacgaggtcttcctgctgggcgccccgggcagcccggcg180 ggggccccgggactgaccgtcgtgccggtcggcgagccggaggagatccagcggtggctg240 cgggagtcggacgtcgacgtggtccacgaccacagcggcggcctgatcggcccggccggg300 ctgcgtcccggcaccgcgttcatcagctcgcaccacttcaccacccgggcggtcaaccca360 gcgggctgcacctacagttcccgggcgcagcgcgcgcactgcgggggcggcgacgacgcg420 cccgtcatcccgatccccgtcgacccggcccgctaccggtcggccgcggacgcggtgccc480 aaggaggacttcctgctcttcatgggccggatctcgccgcacaagggggcgctggaggcc540 gccgcgttcgcggcggcgagcggccggcgcctggtgctggccgggccggcctgggagccc600 gactacttcgcggagatcacctcccggtacggctcgaccgtcgaggtgatcggggaggtg660 ggcgccgaacggcggctcgacctgctcgcctccgcgcacgcggtgctcgccatgtcccag720 gcggtcgaggggccgtggggcggcatctggtgcgaaccgggcgccacggtggtctccgag780 gccgcggtgagcggcacccccgtcgtgggcacgcgcaacggctgcctggcggagatcgtg840 ccgtcggtcggcgaggtcgtggcctacgggacggacttcacgcccgaggaggcccgccgg900 gtgctggacgggctgccgtcgcccgacgaggtccggcgcgaggcggtccggctgtggggc960 cacgtggagatcgccggacggtatgtggagcagtaccgcaggctgctctccggggtgacc1020 tggaagtga Information for SEQ ID NO: 65 Length: 339 Type: PRT
Organism: Micromonospora carbonacea aurantiaca Strandedness: positive Sequence: 65 Met Arg Val Val Leu Val Thr Met Ala Leu Arg Val Pro Thr Asp Pro Ser His Trp Ile Thr Val Pro Pro Gln Gly Tyr Ala Gly Ile His Trp Ile Val Ala Aan His Met Asp Gly Leu Leu Glu Leu Gly His Glu Val Phe Leu Leu Gly Ala Pro Gly Thr Thr Pro Val Ala Pro Ala Val Thr Val Val Asp Ala Gly Glu Ile Glu Asp Met His Ala Trp Leu Asn Gly Pro Glu Ala Ala Thr Ile Asp Val Val His Asp Phe Ser Cys Gly Gln Ile Asp Pro Asp Arg Leu Pro Arg Gly Met Ala Tyr Leu Ser Thr His His Leu Thr Gly Lys Pro Lys Tyr Pro Arg Asn Cys Val Tyr Ala Ser Tyr Ala Gln Arg Ala Gln Ala Glu Asn Asp Val Ala Pro Val Val Arg Ile Ser Val Aan Gln Ala Arg Tyr Pro Phe Arg Ala Asp Lys Aap Asp Tyr Leu Leu Tyr Leu Gly Arg Ile Ser Glu Trp Lys Gly Thr Tyr Glu Ala Ala Ala Phe Ala Ser Ala Ala Gly Arg Arg Leu Val Val Ala Gly Pro Ser Trp Glu Glu Asp Tyr Leu Ala Arg Ile Leu Arg Asp Phe Gly Asp Ser Val Asp Leu Val Gly Glu Val Gly Gly Asp Arg Arg Leu Asp Leu Ile Ser Arg Ala Thr Ala Met Met Val Leu Ser Gln Ser Thr Met Gly Pro Trp Gly Val Val Trp Cys Glu Pro Gly Ser Thr Val Val Ser Glu Ala Ala Ala Cys Gly Thr Pro Val Ile Gly Thr Pro Asn Gly Cys Leu Ala Glu Ile Val Pro Ala Val Gly Thr Val Val Pro Glu Gly Ala Asp Phe Thr Val Glu Gln AIa Arg Ser Val Val Ala Ala Leu Pro Gly Pro Asp Ala Val Arg Ala Ala Ala Leu Glu Arg Trp Asp His Val Val Val Ala Lys Glu Phe Glu Ala Ile Tyr His Asp Val Leu Ala Gly Arg Thr Trp Thr Tnformation for SEQ ID NO: 66 Length: 1020 Type: DNA
Organism: Micromonospora carbonacea aurantiaca Strandedness: positive Sequence:66 atgcgtgtcgtgttggtgacgatggcactgcgggtgccgacggatccgagccactggatc60 acggtcccgccgcagggctatgccggcatccactggatcgtggcgaaccacatggacggc120 ctgctcgaactcggccacgaggtgttcctgctcggcgcgccgggcacgacgccggtcgca180 ccggcggtcaccgtggtggacgcgggcgagatcgaggacatgcacgcctggctgaacggc240 cctgaggcggccacgatcgacgtcgtccacgacttctcctgcgggcagatcgatcccgac300 cggcttccccggggcatggcgtacctgtccacccaccacctgaccggcaagccgaagtat360 ccgcgcaactgtgtgtacgcctcgtatgcccaacgggcccaggcggagaacgacgtcgcg420 ccggtggtccgcatctcggtgaaccaggcgcgctacccgttccgggccgacaaggacgac480 tacctgctctacctcggtcggatctcggaatggaagggcacctacgaggcggccgccttc540 gccagcgccgccgggcgtcgcctcgtcgtggcgggcccgtcctgggaagaggactacctg600 gcccggatcctgcgcgacttcggggacagcgtcgaccttgtcggcgaggtggggggcgac660 cggcggctcgacctgatctcccgcgcgaccgcgatgatggtcctgtcgcagagcaccatg720 gggccgtggggcgtggtgtggtgtgagcccggatcgaccgtggtgtcggaggccgcggcg780 tgcgggacgcccgtcatcggcacgccgaacggatgcctggccgagatcgtgcccgcggtc840 ggaacggtcgtgcccgagggcgcggacttcaccgtcgaacaggcccggagcgtcgtggcg900 gcgctgcccgggccggacgcggtccgggcggcggcgctggagcggtgggaccacgtcgtg960 gtggccaaggagttcgaggccatctaccacgacgtgctcgccggtcgtacctggacgtga1020 Information for SEQ ID N0: 67 Length: 340 Type: PRT
Organism: Micromonospora carbonacea africana Strandedness: negative Sequence: 67 Met Thr Pro Leu Arg Ile Ala Met Val Asn Ile Pro Phe Arg Leu Pro Ser Asp Glu Arg Gln Trp Ile Thr Val Pro Pro Gln Gly Tyr Gly Gly Ile Gln Trp Ile Val Ala Asn Lys Ile Lys Gly Leu Leu Glu Leu Gly His Glu Val Phe Leu Leu Gly Ala Pro Gly Ser Pro Arg Thr His Pro Arg Leu Thr Val Val Pro Ala Gly Glu Pro Glu Asp Ile Arg Ala Trp Leu Lys Ser~Ala Pro Val Asp Val Val Asn Asp Tyr Ser Cys Gly Lys Val Asp Pro Ile Glu Leu Pro Pro Gly Val Gly Leu Val Ala Ser His His Met Thr Thr Arg Pro Ser Tyr Pro Ala Gly Cys Val Tyr Ala Ser Lys Ala Gln Arg Glu Gln Cys Gly Gly Gly Ala Aep Ala Pro Val Ile Pro Ile Gly Val Asp Pro Ser Leu Tyr Arg Pro Gly Asp Arg Lys Asp Asp Phe Leu Leu Phe Met Gly Arg Ile Ser Pro Phe Lys Gly Ala Leu Glu Ala Ala Ala Phe Ala Arg Ala Ala Gly Arg Arg Leu Leu Met Ala Gly Pro Ala Trp Glu Pro Glu Tyr Leu Asp Arg Ile Met Gly Glu Tyr Gly Asp His Val Thr Leu Val Gly Glu Val Gly Gly Gln Glu Arg Met Asp Leu Leu Ala Thr Ala Ala Ala Ile Leu Val Leu Ser Gln Pro Val Pro Gly Pro Trp Gly Gly Thr Trp Cys Glu Pro Gly Ala Thr Val Val Ser Glu Ala Ala Ala Ser Gly Thr Pro Val Val Gly Thr Ser Asn Gly Cys Leu Ala Glu Ile Val Pro Ala Val Gly Glu Val Val Gly Phe Gly Thr Gly Phe Asp Glu Arg Glu Ala Arg Ala Val Leu Ser Arg Leu Pro Ser Pro Ala Gln Ala Arg Lys Ala Ala Ile Arg Cys Trp Gly His Val Glu Ile Ala Arg Arg Tyr Glu Ala Val Tyr Arg Asp Val Leu Ala Gly Ala Arg Trp Ser Information for SEQ ID NO: 68 Length: 1023 Type: DNA

Organism: Micromonospora carbonacea africana Strandedness: negative Sequence:68 atgacccccctgcggatcgcgatggtcaacataccgttccggttgccgagcgacgagcgg60 cagtggatcacggtcccgccgcaggggtacggcgggatccagtggatcgtggccaacaag120 atcaagggcctgctcgaactcgggcacgaggtgttcctgctcggtgccccgggcagtccg180 cgtacgcatccacgcctgaccgtggtgccggcgggcgagcccgaggacatccgggcatgg240 ttgaagtccgctccggtggacgtcgtcaacgactacagctgcggcaaggtggatccgatc300 gagctgcccccgggggtcggcctggtggcctcgcaccacatgaccacccgcccgtcctat360 ccggccggctgcgtgtacgcctcgaaggcgcagcgggagcagtgcggcggcggcgcggac420 gccccggtcatcccgatcggggtggatccgtcgctctaccgccccggcgaccgcaaggac480 gacttcctgctcttcatgggccggatctccccgttcaagggcgcgctggaggcggccgcg540 ttcgcccgggccgccggccgccggctactgatggccggtccggcctgggagccggagtac600 ctcgaccggatcatgggcgagtacggcgaccacgtcaccctcgtcggcgaggtggggggt660 caggaacgtatggacctgctcgccacggcggctgccatcctggtgctctcccagccggtg720 cccggcccgtggggcggcacgtggtgcgagccgggtgcgaccgtggtgtccgaggcggcg780 gccagcggcaccccggtggtcggcacgagcaacggctgcctggcggagatcgtgccggcc840 gtcggcgaggtggtgggcttcggcaccggcttcgacgagcgggaggcccgagcggtgctg900 tcccgactgccgtcgcccgcccaggcgcggaaggccgcgatccggtgctgggggcacgtg960 gagatcgcccggcgctacgaggcggtgtaccgcgacgtgctggccggcgcgcgctggtcc1020 tga 1023 Information for SEQ ID N0: 69 Length: 283 Type: PRT
Organism: Streptomyces mobaraensis Strandedness: positive Sequence: 69 Met Gly Thr Arg Val Leu Thr Glu Glu Gln Val Glu Gly Phe Val Ser Asp Gly Phe Val His Leu Pro Gly Ala Phe Pro Gly Glu Leu Ala Glu Glu Ala Arg Ala Leu Leu Trp Arg Gln Leu Asp Met Asp Pro Asp Aap Pro Gly Thr Trp Thr Arg Glu Val Val Arg Leu Gly Val Arg Asp Asp Asp Val Phe Val Arg Ala Ala Asn Thr Pro Leu Leu His Ala Ala Tyr Asp Gln Leu Ala Gly Glu Gly Arg Trp Gln Pro Leu Thr Gln Val Gly Thr Phe Pro Val Arg Phe Pro Val Thr Lys Arg Pro Glu Glu Thr Glu Asp Tyr Gly Trp His Ile Asp Ala Ser Phe Leu Ala Glu Gly Ala Asp Ala Asp Arg Asp Trp Ser Gly Glu Leu Asp Val Ile Pro Pro Asp Tyr Asp Lys Ile Phe Arg Tyr Asn Val Trp Ser Arg Gly Arg Ala Leu Leu Leu Leu Leu Leu Phe Ser Asp Thr Gly Glu Glu Asp Ala Pro Thr Leu Ile Arg Val Gly Ser His Leu Asp Val Pro Pro Leu Leu Ala Pro Tyr Gly Ala Glu Gly Thr Tyr Leu Glu Ala Gly Glu Val Gly Arg Asp Arg Pro Leu Arg Ser Ala Thr Gly Lys Ala Gly Asp Ala Tyr Leu Cys His Pro Phe Leu Val His Thr Pro Val Ala Asn Thr Gly Val Arg Pro Arg Phe Met Ala Gln Pro Asn Leu Leu Pro Val Gly Gln Leu Glu Leu Asp Arg Pro Asp Gly Arg Tyr Thr Pro Val Glu Arg Ala Val Arg Arg Gly Leu Gly Glu Asp Ala Pro Arg Arg Glu Ser Arg Information for SEQ ID N0: 70 Length: 852 Type: DNA
Organism: Streptomyces mobaraensis Strandedness: positive Sequence: 70 atgggaacccgcgtactgaccgaggagcaggtcgagggcttcgtctccgacggcttcgtc60 cacctgccgggtgcgttcccgggggagctcgccgaggaggcgcgcgccctgctgtggcgg120 cagctggacatggacccggacgacccgggcacctggacgcgggaggtggtccggctcggg180 gtgcgcgacgacgacgtgttcgtccgtgccgccaacaccccgctgctgcacgccgcctac240 gaccagctcgccggggagggccgctggcagccgctgacccaggtcggcacgttcccggtg300 cggttccccgtgacgaagcggccggaggagaccgaggactacggctggcacatcgacgcc360 agcttcctcgccgagggcgccgacgccgaccgcgactggtccggcgagctcgacgtgatc420 ccgccggactacgacaagatcttccggtacaacgtgtggtcccgcggccgggcgctgctg480 ctcctgctgctgttctccgacaccggcgaggaggacgcgcccacgctgatccgcgtcggc540 tcccacctggacgtaccgccgctgctggcaccgtacggcgccgagggcacctacctggag600 gccggggaggtgggacgggaccggccgctgaggtccgcgacgggcaaggccggggacgcc660 tacctctgccaccccttcctggtgcacacgccggtcgccaacaccggcgtccgcccgcgc720 ttcatggcccagccgaacctgctgcccgtggggcagctcgaactcgaccggcccgacggc780 cggtacacccccgtcgagcgggccgtgcgccggggcctcggcgaggacgccccccgacga840 gagagccggtga 852 Informationfor SEQ
ID NO:

Length: 6 Type:
PRT

Organism:Micromonospora carbonacea aurantiaca Strandedness: negative Sequence: 71 Met Leu Thr Ala Glu Gln Ile Glu Ser Phe Val Ala Asp Gly Phe Val Arg Val Pro Asn Ala Phe Pro Ala Ala Leu Ala Ala Glu Cys Arg Asn Leu Leu Trp Lys Gln Leu Asp Val Asp Pro Asp Asp Ser Ser Thr Trp Thr Arg Glu Val Val Arg Leu Gly Leu Arg Gly Asp Asp Ala Phe Val Gln Ser Ala Asn Thr Pro Ala Leu Val Glu Ala Tyr Asp Gln Leu Val Gly Ala Gly Arg Trp Arg Pro Leu Asp Met Val Gly Thr Phe Pro Ile Arg Phe Pro Val Asp Arg Asp Pro Glu Gln Ala Glu Asp Tyr Gly Trp His Ile Asp Ala Ser Phe Leu Ser Pro Glu Gly Val Ala Ala Met Ser Ser Gly Gln Asp Trp Glu Gly Glu Leu Pro Leu Val Pro Pro Asp Tyr Asp Arg Ile Phe Arg Ser Asn Leu Val Ser Arg Gly Arg Ala Leu Leu Val Leu Leu Leu Tyr Ser Asp Thr Gly Glu Arg Asp Ala Pro Thr Leu Ile Arg Val Gly Ser His Leu Asp Val Pro Pro Leu Leu Ala Pro Tyr Gly Ala Glu Gly Thr Tyr Leu Ala Cys Arg Asp Val Gly Ala Asp Arg Pro Leu Ala Met Ala Thr Gly Arg Ala Gly Asp Ala Tyr Leu Cys His Pro Phe Leu Val His Thr Pro Ile Thr Asn Thr Gly Thr Ser Pro Arg Phe Met Ala Gln Pro Ser Leu Gln Pro Thr Gly Glu Phe Asp Leu Asp Arg Ala Asp Gly Gln Tyr Val Pro Val Glu Arg Ala Ile Arg Ala Gly Leu Ala Arg Gly Information for SEQ ID NO: 72 Length: 831 Type: DNA
Organism: Micromonospora carbonacea aurantiaca Strandedness: negative Sequence: 72 atgctgacag ccgagcagat cgagagcttc gtcgccgacg gcttcgtccg ggtgccgaac 60 gctttccccg ccgcgctcgc cgccgagtgc cgcaatctgc tctggaagca actcgacgtg 120 gatcccgacg acagctcgac ctggaccagg gaggtcgtcc ggctcggtct gcggggcgac 180 gacgcgttcg tgcagagcgc caacaccccg gcgttggtcg aggcgtacga ccagctcgtc 240 ggtgcgggccggtggcgaccgctggacatggtcgggacgttcccgatccgttt<:ccggtg300 gaccgggatccggaacaggccgaggactacggctggcacatcgacgccagcttcctcagc360 cccgagggcgtggccgccatgagcagcggccaggactgggagggcgagctcccgctcgtg420 ccgccggactacgaccggatcttccgcagcaacctggtttcgcgtggccgggccctgctg480 gtgctgctcctctactccgacaccggcgagcgtgacgcgcccacgctgatccgggtcggt540 tcgcacctggacgtgccgcccctgctggcgccctacggcgccgaggggacctacctcgcc600 tgccgcgacgtgggcgcggaccgccccctcgcgatggccaccgggcgggcgggcgacgcc660 tacctctgccatccgttcctggtgcacacgccgatcaccaacaccggcaccagcccccgg720 ttcatggccc agccctcgct gcaaccgacc ggcgagttcg acctggaccg cgccgacggg 780 cagtacgtcc cggtcgagcg ggcgatccgg gcgggactcg cccgtggata g 831 Information for SEQ ID NO: 73 Length: 284 Type: PRT
Organism: Micromonospora carbonacea africana Strandedness: positive Sequence: 73 Met Ser Ala Gln Val Leu Ser Asp Glu Gln Val Glu Arg Phe Val Ser Asp Gly Phe Val Lys Leu Glu Ala Ala Phe Pro Ala Glu Leu Ala Gln Gln Gly Arg Glu Leu Leu Trp Arg Gln Leu Gly Met Asp Pro Glu Asp Arg Ser Thr Trp Ser Arg Glu Val Val Arg Leu Gly Leu Gln Asp Ala Glu Pro Phe Val Arg Ser Ala Thr Thr Pro Arg Leu His Ala Ala Phe 65 70 ~'5 80 Asp Gln Leu Val Gly Val Gly Arg Trp Lys Pro Leu Asp Arg Ile Gly Thr Phe Pro Val Arg Phe Pro Val Pro Lys Arg Pro Glu Gln Thr Glu Asp Tyr Gly Trp His Ile Asp Ala Ser Phe Leu Ala Asp Asp Ala Gln Arg Leu Gly Pro Gln Asn Trp Glu Gly Glu Leu Asp Leu Val Pro Pro Asn Tyr Ala Glu Val Phe Arg Cys Asn Leu Arg Ser Arg Gly Arg Ala Leu Leu Leu Leu Phe Leu Phe Ser Asp Thr Asp G.Lu Arg Glu Ala Pro Thr Leu Val Arg Val Gly Ser His Leu Asp Val Pro Pro Leu Leu Glu Pro Tyr Gly Pro Glu Gly Thr Tyr Leu Asp Val Gly Asp Val Gly Arg Asp Arg Pro Leu Ala Ser Ala Thr Gly Arg Ala Gly Asp Val Tyr Leu Cys His Pro Phe Leu Val His Thr Pro Val Ala Asn Thr Gly Thr Arg Pro Arg Phe Met Ala Gln Pro Ser Leu Gln Pro Val Gly Glu Leu Asp Leu Asp Arg Pro Asp Gly Asp Tyr Ser Pro Val Glu Arg Ala Va1 Arg Ile Gly Leu Gly Leu Pro Ala Ala Glu Pro Val Arg Information for SEQ ID NO: 74 Length: 855 Type: DNA
Organism: Micromonospora carbonacea africana Strandedness: positive Sequence:74 atgagcgcccaggtcctcagcgacgagcaggtcgagcggttcgtctccgacgggttcgtg60 aagttggaggcggcgttcccggccgagctcgcgcagcagggtcgcgaactgctgtggcgg120 caactcggcatggaccccgaggaccggagcacctggtcccgcgaggtggtccggctgggc180 ctccaggacgccgagcccttcgtgcgcagcgccaccacgccccggctgcacgccgccttc240 gaccagctcgtcggggtggggcgctggaagccgttggaccggatcggcaccttcccggtc300 cggttcccggtgcccaagcgcccggagcagaccgaggactacggctggcacatcgacgcc360 agtttcctggccgacgacgcgcagcggctgggcccccagaactgggagggcgagctggac420 ctggtcccgcccaactacgcggaggtcttccgctgcaacctccggtcgcgggggcgggcc480 ctgctgctgctgttcctcttctccgacaccgacgagcgggaggcgccgacgctggtcagg540 gtcggctcccacctggacgtgccgccgctgctcgaaccgtacggcccggagggc:acctac600 ctcgacgtgggcgacgtcggccgggaccgcccgctcgcctccgccaccggacgcgccggt660 gacgtctacctgtgccacccgttcctggtgcacaccccggtggccaacaccggcacgcgg720 ccccggttcatggcccagccgagcctgcagccggtgggcgagctggacctggaccgcccg780 gacggcgactactcgccggtcgaacgggccgtgcgcatcgggctgggcctgccggccgcc840 gagccggtccgatga 855 Informationfor SEQ NO: 75 ID

Length:

Type:
PRT

Organism:Streptomyces mobaraensis Strandedness:
negative Sequence: 75 Met Pro Lys Thr Val Leu Val Ile Gly Gly Gly Pro Ala Gly Ser Thr Ala Ala Ser Leu Leu Ser Lys Ala Gly Met Ser Val Lys Leu Leu Glu Arg Glu Thr Phe Pro Arg Tyr His Ile Gly ;slu Ser Ile Ala Ser Ser Cys Arg Thr Ile Val Asp Leu Val Gly Ala Leu Asp Glu Val Asp Ser Arg Gly Tyr Thr Val Lys Asn Gly Val Leu Leu Arg Trp Gly Lys Glu Asp Trp Ala Ile Asp Trp Pro Lys Ile Phe Gly Pro Asp Val Arg Ser Trp Gln Val Asp Arg Asp Asp Phe Asp His Val Leu Leu Lys Asn Ala Val Lys Gln Gly Ala Asp Val Thr Glu Gly Val Thr Val Lys Arg Val Leu Phe Asp Gly Asp Arg Ala Val Gly Ala Glu Trp Thr Asp Pro Asp Ser Gly Glu Leu Val Ser Glu Glu Phe Asp Tyr Val Ile Asp Ala Ser 145 150 7.55 160 Gly Arg Thr Gly Val Ile Ser Arg His Leu Lys Asn Arg Gln Pro His Glu Ile Phe Arg Asn Val Ala Ile Trp Gly Tyr Trp Gln Gly Gly Ser Leu Leu Pro Thr Ser Pro Thr Gly Gly Ile Asn Val Ile Gly Ala Pro Asp Gly Trp Tyr Trp Val Ile Pro Leu Arg Gly Asp Arg Tyr Ser Val Gly Phe Val Cys His Gln Asp Arg Phe Leu Glu Arg Arg Lys Glu His Asp Asp Leu Glu Ala Met Leu Ala Ser Leu Val Gln Glu Asn Pro Thr Val Arg Asp Leu Met Ala Glu Gly Glu Tyr Gln Pro Gly Val Arg Val Glu Gln Asp Phe Ser Tyr Val Ala Asp Ser Phe His Gly Pro Gly Tyr Tyr Leu AIa Gly Asp Ala Ala Cys Phe Leu Asp Pro Leu Leu Ser Thr Gly Val His Leu Ala Leu Tyr Ser Gly Met Leu Ala Ala Thr Ser Val Leu Ala Thr Val Asn Glu Asp Val Thr Glu Lys Glu Ala Gly Ala Phe Tyr Glu Ser Leu Tyr Arg Asn Ala Tyr Gln Arg Leu Phe Thr Leu Val Ser Gly Val Tyr Gln Gln Gln Ala Gly Lys Ala Ala Tyr Phe Gly Leu Ala Asp Ala Met Val Pro Glu Arg Ala Thr Glu Glu Tyr Glu Gln Val Asp Gly Ala Val Ala Phe Ala Glu Leu Val Ala Gly Leu Ala Asp Ile His Asp Ala Val Thr Gly Thr His Glu Asp His Ala His Gln Ala His Thr Gln Ala Val Ala Leu Pro Glu Asp Asn Ser Val Arg Gln Leu Leu Ala Ala Ala Glu Asn Ala Arg Leu Met Ala Glu Ala Gly Thr Pro Ser Ala Pro Val Ser Glu Ala Pro Gly Lys Met Asp Ala His Asp Leu Tyr Asp Pro Ala Thr Gly Leu Tyr Leu Arg Thr Thr Pro Thr Leu Gly Ile Gly Arg Ser Arg Ala Information for SEQ ID N0: 76 Length: 1458 Type: DNA
Organism: Streptomyces mobaraensis Strandedness: negative Sequence:76 atgcccaaaacagtgctcgtaatcggtggtgggccggcagggtccactgcggcctccttg60 ctcagcaaggccggaatgtccgtcaagctgctggagcgggagactttcccgcggtaccac120 atcggcgagtcgatcgcctcttcctgccgcaccatcgtcgatctcgtcggcgcgctggac180 gaggtcgactcccgcggttacacggtcaagaacggtgtgctgctgcgctggggcaaggag240 gactgggccatcgactggccgaagatcttcggcccggacgtgcggtcctggcaggtcgac300 cgcgacgacttcgaccacgtgctcctgaagaacgcggtcaagcagggcgccgacgtcacc360 gagggcgtcacggtcaagcgcgtgctgttcgacggcgaccgcgcggtcggggccgagtgg420 acggacccggactccggtgagctggtcagcgaggagttcgactacgtcatcgacgcctcc480 ggccgtacgggcgtcatctcccggcacctgaagaaccgccagccgcacgagatcttccgc540 aacgtcgcgatctggggctactggcagggcggttccctgctgcccacctcgcccaccggc600 ggcatcaacgtcatcggcgcccccgacggctggtactgggtgatcccgctgcgcggcgac660 cggtacagcgtcggcttcgtctgccaccaggaccgcttcctggagcgccgcaaggagcac720 gacgacctggaggcgatgctcgcctcgctggtccaggagaacccgaccgtccgcgacctc780 atggcggagg gcgagtacca gccgggcgtc cgggtggagc aggacttctc ctacgtcgcc 840 gacagcttcc acggccccgg ctactacctc gccggcgacg ccgcctgctt cctcgacccg 900 ctgctgtcca ccggcgtcca cctggcgctc tacagcggca tgctggccgc gacgtccgtg 960 ctcgccacgg tgaacgagga cgtcacggag aaggaggccg gcgccttcta cgagtcgctc 1020 taccgcaacgcctaccagcggctgttcaccctggtgtccggtgtgtaccagcagcaggcc1080 ggcaaggccgcctacttcggcctggccgacgcgatggtcccggagcgcgccacggaggag1140 tacgagcaggtggacggcgcggtggccttcgccgagctggtcgcggggctcgccgacatc1200 cacgacgccgtcaccgggacgcacgaggaccacgcccaccaggcgcacactcaggccgtc1260 gccctgccggaggacaactccgtccgccagctgctcgccgcggccgagaacgcccgcctg1320 atggcggaggccggcacgccgagcgcgccggtcagcgaggcccccggcaagatggacgcc1380 cacgacctctacgaccccgccacgggcctgtacctgcggaccaccccgaccctggggatc1440 ggccggtccagggcctga 1458 Information for SEQ ID NO: 77 Length: 481 Type: PRT
Organism: Micromonospora carbonacea aurantiaca Strandedness: negative Sequence: 77 Met Val Ser Thr Val Leu Val Ile Gly Gly Gly Pro Ala Gly Ser Thr Ala Ala Ala Leu Leu Ala Arg Ala Gly Leu Ser Val Thr Leu Leu Glu Lys Glu Thr Phe Pro Arg Tyr His Ile Gly Glu Ser Ile Ala Ser Ser Cys Arg Thr Ile Val Asp Phe Val Gly Ala Leu Ser Asp Val Asp Ala Arg Gly Tyr Thr Gln Lys Asn Gly Val Leu Leu Arg Trp Gly Lys Glu Asp Trp Ala Ile Asp Trp Thr Glu Ile Phe Gly Pro Gly Val Arg Ser Trp Gln Val Asp Arg Asp Asp Phe Asp His Val Leu Leu Asn Asn Ala Ala Lys Gln Gly Ala Thr Val Ile Gln Asn Ala Glu Val Lys Arg Val Ile Phe Asp Gly Asp Arg Ala Val Ala Ala Glu Trp Ala Glu Pro Asp Ser Gly Glu Arg Arg Thr Thr Glu Phe Asp Phe Val Val Asp Ala Ser Gly Arg Ala Gly Met Ile Pro Ala Arg His Phe Lys His Arg Arg AIa Asn Asp Thr Phe Lys Asn Val Ala Ile Trp Gly Tyr Trp Asp Gly Gly Ser Leu Leu Pro Asn Ser Pro Gln Gly Gly Ile Asn Val Ile Gly Ala Pro Asp Gly Trp Tyr Trp Val Ile Pro Leu Arg Gly Asn Arg Tyr Ser Val Gly Phe Val Cys His Gln Lys Arg Phe Leu Glu Arg Arg Ser Glu His Gly Ser Leu Glu Asp Met Leu Ala Ala Leu Val Glu Glu Ser Pro Thr Val Arg Ser Leu Val Ala Thr Gly Thr Tyr Gln Pro Gly Val Arg Val Glu Gln Asp Phe Ser Tyr Val Ser Asp Ser Phe Cys Gly Pro Gly Tyr Phe Ala Ala Gly Asp Ser Ala Cys Phe Leu Asp Pro Leu Leu Ser Thr Gly Val His Leu Ala Leu Tyr Ser Gly Met Leu Ala Ser Ala Ser Ile Leu Gly Ile Val Asn Gly Asp Val Glu Glu Glu Gln Ala Tyr Gly Phe Tyr Glu Thr Leu Tyr Arg Asn Ala Phe Glu Arg Leu Phe Thr Leu Val Ala Ala Val Tyr Gln Gln Gln Ala Gly Lys Ala Asn Tyr Phe Ala Leu Ala Asp Arg Leu Ile Gly Glu His Asp Glu Ala Glu Phe Glu Arg Val Asp Gly Ala Lys Ala Phe Ala Gln Leu Ile Ala Gly Leu Ala Asp Val Ser Asp Ala Met Ala Gly Arg Ser Val Pro Pro Gln Leu Pro Ala 405 410 41'i Ala Asp Ala Gly Asn Ser Val Gly Gln Leu Phe Leu Ala Ala Glu Gln Ala Arg Arg Met Ala Glu Ala Gly Val Pro Lys Ala Pro Val Ser Glu Gly Leu Asn Lys Ile Asp Gly Leu Glu Leu Phe Asp Pro Glu Thr Gly Leu Tyr Leu Met Thr Ser Pro Arg Leu Gly Ile Gly Arg Thr Arg Pro Ala Information for SEQ ID NO: 78 Length: 1446 Type: DNA
Organism: Micromonospora carbonacea aurantiaca Strandedness: negative Sequence:78 atggtcagcacggtcttggttatcggtggcggtccggccggatcgaccgccgcggcactg60 ctcgcgcgtgcgggactgtcggtgaccttgctggagaaggagaccttcccgcgctaccac120 atcggcgagtcgatcgcgtcctcgtgccggaccatcgtcgacttcgtcggcgcgctgagc180 gacgtcgacgcccgcggctacacccagaagaacggtgtgctgctgcggtggggcaaggag240 gactgggccatcgactggaccgagatcttcggtcccggagtcaggtcctggcaggtggac300 cgcgacgacttcgaccacgtgctgctgaacaacgccgccaaacagggcgcgacggttatc360 cagaacgccgaggtcaagcgggtgatcttcgacggcgaccgcgcggtggccgcggagtgg420 gccgagccggacagcggcgagcggcgcaccaccgagttcgacttcgtcgtggacgcctcc480 ggccgtgccggcatgatccccgcccgccacttcaagcaccggcgggcgaacgac:acgttc540 aagaacgtcgccatctggggctactgggacggcggatcactgctgcccaactcgccgcag600 ggtggcatcaacgtgatcggcgcgccggacggctggtactgggtcatcccgctgcggggc660 aaccgctacagcgtcgggttcgtgtgtcaccagaagcgcttcctcgaacgccgcagcgaa720 cacggctcactcgaggacatgctcgccgcgctcgtcgaggagtcgccgacggtgcggagc780 ctggtggcgaccgggacgtaccagccgggtgtccgggtcgagcaggacttctcgtacgtg840 tccgacagcttctgcggccccggctacttcgccgcgggcgacagcgcctgcttc:ctggac900 ccgctgctgtcgaccggcgtgcacctcgcgctctacagcggcatgctcgcgtcggcgtcg960 atcctgggcatcgtcaacggcgacgtcgaggaggagcaggcctacgggttctacgagacg1020 ctctaccgcaacgccttcgagcgcctgttcaccctggtcgcggccgtctaccagcagcag1080 gccggcaaggcgaactacttcgccctcgccgaccggctgatcggcgagcacgacgaggcc1140 gagttcgaacgggtcgacggcgccaaggcgttcgcccagctgatcgccgggctcgccgac1200 gtgagcgacgccatggccggtcgctcggttccgccgcagctaccggcggccgacgccggc1260 aactcggtcggccagctcttcctcgccgccgagcaggcccgccggatggcggaggccggt1320 gtccccaaagcgccggtcagcgaggggctgaacaagatcgacgggctcgagctgttcgac1380 ccggagaccggcctgtacctgatgacgagcccgcgactgggcatcggacggacccggccg1440 gcgtga 1446 Information for SEQ ID NO: 79 Length: 492 Type: PRT
77 _ Organism: Micromonospora carbonacea africana Strandedness: negative Sequence: 79 Met Ser Ser Lys Ile Leu Val Ile Gly Gly Gly Pro Ala Gly Ser Thr Ala Ala Ala Leu Leu Ala Arg Ser Gly Leu Ser Val Thr Leu Leu Glu Lys Glu Thr Phe Pro Arg Tyr His Ile Gly Glu Ser Ile Ala Ser Ser Cys Arg Thr Ile Val Asp Phe Val Gly Ala Leu Asp Glu Val Asp Ser Arg Gly Tyr Pro Gln Lys Asn Gly Val Leu Leu Arg Trp Gly Asn Glu 65 70 '75 80 Asp Trp Ala Ile Asp Trp Ala Lys Ile Phe Gly Pro Gly Val Arg Ser Trp Gln Val Asp Arg Asp Asp Phe Asp His Val Leu Leu Asn Asn Ala Gly Lys Gln Gly Ala Lys Ile Ile Gln Gly Ala Ala Val Lys Arg Val Leu Phe Asp Gly Glu Arg Ala Thr Ala Ala Glu Trp Phe Asp Pro Glu Ser Gly Glu Val Arg Thr Ile Asp Phe Asp Tyr Val Val Asp Ala Ser Gly Arg Ala Gly Leu Ile Pro Ser Gln His Phe Lys His Arg Arg Pro Thr Glu Thr Phe Lys Asn Val Ala Ile Trp Gly Tyr Trp Gln Gly Gly Ser Leu Leu Pro Asn Ser Pro Ser Gly Gly Ile Asn Val Ile Ser Ala Pro Asp Gly Trp Tyr Trp Val Ile Pro Leu Arg Gly Asp Arg Tyr Ser Ile Gly Phe Val Cys His Gln Ser Arg Phe Leu Glu Arg Arg Lys Glu His Ala Ser Leu Glu Asp Met Leu Ala Ala Leu Val Gln Glu Ser Pro Thr Val Arg Gly Leu Thr Ala Asn Gly Thr Tyr Gln Pro Gly Val Arg Val Glu Gln Asp Phe Ser Tyr Ile Ser Asp Ser Phe Cys Gly Pro Gly Tyr Phe Ala Ala Gly Asp Ser Ala Cys Phe Leu Asp Pro Leu Leu Ser _ 78 -Thr Gly Val His Leu Ala Leu Tyr Ser Gly Met Leu Ala Ser Ala Ser Ile Leu Ala Thr Ile His Gly Asp Val Thr Glu Glu Glu Ala Arg Ala Phe Tyr Glu Ser Leu Tyr Arg Asn Ala Tyr Gln Arg Leu Phe Thr Leu Val Ala Gly Val Tyr Gln Gln Gln Ala Gly Lys Arg Ala Tyr Phe Gly Leu Ala Asp Ala Leu Val His Asp Ser Gly Glu Pro Glu Tyr Glu Lys Val Asp Gly Ala Arg Ala Phe Ala Gln Leu Val Ala Gly Leu Ala Asp Leu Asp Asp Ala Ala Glu Gly Arg His Asp Ser Thr Ala Ala Ala Ala Pro Ala Glu Gln Asp Asn Ser Val Arg Gln Leu Phe Leu Ala Ala Glu Glu Ala Arg Arg Met Ala Asp Ala Arg Thr Pro Ser Ala Pro Val Ser Glu Ala Pro Gly Lys Leu Asp Ser His Asp Leu Phe Asp Ser Ala Thr Gly Leu Tyr Leu Val Thr Thr Pro Arg Leu Gly Ile Arg Arg Ala Lys Pro Ala Asp Thr Gln Ala Ala Ala Glu Gln Ser Ala Information for SEQ ID NO: 80 Length: 1479 Type: DNA
Organism: Micromonospora carbonacea africana Strandedness: negative Sequence: 80 atgtccagcaagatcctagtcatcggtggaggtccggccggatccacggccgccgcgctg60 ctcgcccgatcggggctgtcggtgacgctcctggaaaaggagacgttcccgcgataccac120 atcggcgagtcgatcgcgtcctcgtgccgcaccatcgtcgatttcgtgggcgctctcgac180 gaggtcgactcgcggggctacccgcagaagaacggggtcctgctgcgctggggcaacgag240 gactgggccatcgactgggccaagatcttcggtccgggcgtgcggtcctggcaggtcgac300 cgggacgacttcgaccacgtcctgctcaacaacgccggcaagcagggcgccaagatcatc360 cagggcgcgg ctgtcaagcg ggtgttgttc gacggtgagc gggccaccgc cgccgagtgg 420 ttcgaccccg agtcgggtga ggtccgcacc atcgatttcg actacgtggt cgacgcgtcc 480 _ 79 _ ggccgggccgggctgatcccgtcccagcacttcaagcaccggcgccccac cgagacgttc540 aagaacgtggccatctggggctactggcagggtggctcgctgctgccgaa ctctccctcc600 ggcgggatcaacgtcatctccgcgcccgacggctggtactgggtcattcc gctgcgcggc660 gaccggtacagcatcggcttcgtctgccaccagagccgcttcctggagcg gcgcaaggag720 cacgcctcgctggaggacatgctcgccgcactggtacaggagtccccgac cgtgcgcggc780 ctgacggcgaacgggacgtaccagccgggcgtgcgggtggagcaggactt ctcgtacatc840 tccgacagcttctgcgggcccggctacttcgcggccggcgactccgcctg cttcctggac900 ccactgctgtccaccggcgtgcacctcgccctctacagcggcatgctcgc ctcggcgtcc960 atcctggccaccatccacggtgacgtcaccgaggaggaggcgcgggcgtt ctacgagtcc1020 ctctaccgcaacgcctaccagcgcctgttcaccctcgt.cgccggcgtcta ccagcagcag1080 gccggcaagagggcatacttcggcctggccgacgcgctggtgcacgacag cggcgaaccc1140 gagtacgagaaggtagacggggcccgcgccttcgcccagctcgtcgccgg cctcgccgac1200 ctggacgacgcggcggagggacggcacgacagcaccgcggcggcggcacc ggcggagcag1260 gacaactccgtccggcagctcttcctggccgccgaggaggcccgccggat ggccgacgcg1320 cgcacgccgagcgccccggtcagcgaggcgccgggcaagctcgacagcca cgacctcttc1380 gactcggcaaccggcctctacctggtcaccaccccgcgactggggatccg ccgggccaag1440 ccggccgacacgcaggcggcggcagagcagtctgcctga 1479 Informationfor SEQ N0: 81 ID

Length:

Type:
PRT

Organism:Streptomyces mobaraensis Strandedness:
positive Sequence: 81 Val Gln Ile Ala Asp Leu Tyr Ile Gly Ala Leu Gly Val Phe Val Pro Pro Val Val Ser Val Glu Trp Ala Val Glu Arg Gly Leu Tyr Pro Ala Glu Glu Ala Glu Ala His Glu Leu Gly Gly Val Ala Val Ala Gly Asp Ile Pro Pro Pro Glu Met Ala Leu Arg Ala Ala Gln Gln Ala Val Lys Arg Trp Gly Gly Ser Pro Lys Glu Phe Asp Leu Leu Leu Tyr Ala Ser Thr Trp His Gln Gly Pro Asp Gly Trp Pro Pro Gln Ala Tyr Leu Gln Arg His Leu Val Gly Gly Asp Met Leu Ala Leu Glu Ile Arg Gln Gly Cys Asn Gly Val Phe Ser Ala Leu Glu Leu Ala Ala Ala Tyr Leu Arg Ala Asp Pro Asp Arg Thr Ser Ala Leu Ile Val Ala Ala Asp Asn Tyr Gly Thr Pro Leu Val Asp Arg Trp Arg Met Gly Pro Gly Phe Ile Gly Gly Asp Ala Ala Ser Ala Leu Val Leu Thr Lys Arg Pro Gly Phe Ala Arg Leu Cys Ser Leu Ala Ser Lys Gly Leu Pro Glu Ile Glu Ser Leu His Arg Gly Asp Glu Pro Leu Phe Pro Pro Ser Ile Thr Arg Gly Arg Pro Thr Asp Phe Ser Ala Arg Ile Gly Gln Gln Phe Ala Thr Arg Ser Pro Ala Ser Leu Ala Met Ala Asp Ile Gln Asp His Met Thr Glu Ile Ala Glu Arg Ala Leu Ala Gly Ala Gly Ile Gly Met Ala Asp Val Ala Arg Val Ser Phe Met Asn Tyr Ser Arg Glu Val Val Glu Gln Arg Cys Met Ala Ala Trp Gly Leu Pro Leu Ser Arg Ser Thr Trp Glu Phe Gly Arg Gly Ile Gly His Cys Gly Ala Ser Asp His Leu Leu Ser Met Glu His Leu Val Arg Thr Gly Glu Leu Ala Pro Gly Asp His Val Leu Gln Leu Ala Thr Ala Pro Gly Leu Val Val Ser Ser Ala Val Leu Gln Val Leu Glu Ser Pro Asp Trp Asp Ala Information for SEQ ID NO: 82 Length: 1035 Type: DNA
Organism: Streptomyces mobaraensis Strandedness: positive Sequence. 82 gtgcagatag ccgacctgta catcggtgct ctcggcgtat tcgtaccgcc cgtggtgagc 60 gtcgaatggg ccgtggaacg cggcctctat ccggccgagg aagccgaggc gcacgaactc 120 ggcggggtcgccgtcgccggcgacatcccccctccggaaatggccttgcgggcggcgcag180 caggcggtgaaacggtggggcggatcgccgaaggaattcgatctgctgctgtacgcgagc240 acctggcaccagggccccgacggctggccgccgcaggcgtacctgcagcggcacctggtg300 ggcggggacatgctcgccctggagatccggcagggctgcaacggcgtcttcagcgcgctg360 gaactcgccgccgcctacctgcgggccgatccggaccgcacgagcgcgctgatcgtcgcc420 gccgacaactacggcacgccgctggtcgaccggtggcggatgggtcccggcttcatcggc480 ggcgacgccgcgtcggccctggtgctgaccaagcggcccggcttcgcccggctgtgctcg540 ctggcgtccaagggcctgccggagatcgagtcgctgcaccggggcgacgagccgctgttc600 ccgccgagcatcacccggggccgtccgaccgacttcagcgcccgcatcggccagcagttc660 gccacccggagccccgcctccctcgcgatggccgacatccaggaccacatgaccgagatc720 gccgagcgcgccctggcgggggccggcatcggcatggc:ggacgtcgcccgcgtctccttc780 atgaactactcccgggaagtggtcgaacagcgctgcatggcggcctgggggctgccgctg840 tcccgttcgacctgggagttcggccgcggcatcggccactgcggggccagcgaccacctg900 ctctccatggaacacctcgtacggaccggtgagctcgcccccggcgaccacgtcctgcaa960 ctcgccaccgctcccggcctcgtggtgtccagcgccgtccttcaggttctcgaatcgccg1020 gactgggacgcatga 1035 Information for SEQ ID NO: 83 Length: 344 Type: PRT
Organism: Micromonospora carbonacea aurantiaca Strandedness: positive Sequence: 83 Met Arg Thr Pro Asp Met Phe Ile Gly Gly Val Gly Thr Phe Ile Pro Pro Arg Val Ser Val Asp Trp Ala Val Ala Arg Gly Leu Tyr Ser Ala Glu Asp Ala Glu Ala His Glu Leu Val Gly VaI Ala Val Ala Gly Asp Met Pro Pro Pro Glu Met Ala Leu Arg Ala Ala Gln Gln Ala Val Lys Arg Trp Gly Gly Ser Pro Lys Glu Phe Asp Leu Leu Leu Tyr Ala Ser Thr Trp His Gln Gly Pro Asp Gly Trp Pro Pro Gln Ser Tyr Leu Gln Arg His Leu Val Gly Gly Asp Leu Leu Ala Leu Glu Ile Arg Gln Gly Cys Asn Gly Leu Phe Ser Ala Met Glu Leu Ala Ala Ser Tyr Leu Thr Ala Val Pro Glu Arg Thr Ser Ala Leu Leu Val Ala Ala Asp Asn Tyr Gly Thr Pro Leu Ile Asp Arg Trp Ser Met Gly Pro Gly Phe Ile Gly Gly Asp Ala Ala Ser Ala Ile Val Leu Thr Lys Gln Pro Gly Phe Ala Arg Leu Arg Ser Val Cys Thr Arg Thr Met Thr Thr Ala Glu Ala Leu His Arg Gly Asp Glu Pro Leu Phe Pro Pro Ser Ile Thr Val Gly Arg Thr Thr Asp Phe Ser Ala Arg Ile Gly Gln Gln Phe Ala Ser Arg Ser Pro Ala Ala Ala Ala Met Ala Asp Val Pro Uln Arg Val Val Glu Leu Val Asp Gln Ala Leu Ala Glu Ala Glu Ile Gly Ile Gly Asp Ile Ala Arg Val Gly Phe Met Asn Tyr Ser Arg Glu Val Val Glu Gln Arg Val Met Thr Met Trp Asp Leu Pro Met Ser Arg Ser Thr Trp Glu Tyr Gly Arg Gly Ile Gly His Cys Gly Ala Ser Asp Thr Ile Leu Ser Phe Asp His Leu Val Arg Thr Gly Glu Leu Arg Pro Gly Asp His Met Leu Met Leu Gly Thr Ala Pro Gly Val Val Leu Ser Cys Val Ile Val Gln Val Leu Glu Ser Pro Ala Trp Thr Lys Information for SEQ ID NO: 84 Length: 1035 Type: DNA
Organism: Micromonospora carbonacea aurantiaca Strandedness: positive Sequence: 84 atgcggacac cggacatgtt catcggcggt gtcgggacgt tcattccgcc gcgggtgagc 60 gtcgactggg cggtcgcccg gggcctctat tcggccgagg acgccgaggc gcacgaactc 120 gtcggcgtcg cggtcgcggg cgacatgcct ccccccgaga tggcactccg ggccgcacag 180 caggcggtcaagcggtggggcgggtcgccgaaggagttcgacctgctgctgtacgccagc240 acgtggcaccagggaccggacggctggccgccgcagtcgtacctgcaacggcatctggtg300 ggcggcgacctgctcgccctggagatccggcagggctgcaacggtctgttcagcgcgatg360 gaactcgccgccagctacctgaccgccgttccggaacgcacgagcgccctgctcgtcgcg420 gcggacaactacggcacgccgctgatcgaccgctggtcgatgggacccggcttcatcggt480 ggcgacgccgcctcggccatcgtgctgaccaaacaaccggggttcgcccggctgcgttcg540 gtgtgcacacggacgatgacgaccgccgaagccctgcaccgcggcgacgagccgctgttc600 ccgcccagcatcacggtcggccgcaccacggacttcagcgcccggatcggccagcagttc660 gccagccgcagcccggcggccgcagccatggccgacgtgccgcagcgggtcgtcgagctg720 gtcgaccaggcgctggcggaggccgagatcgggatcggcgacatcgcccgggtggggttc780 atgaactactcccgcgaggtggtcgagcagcgggtgatgacgatgtgggacctgccgatg840 tcgcgttcgacctgggagtacggtcgcgggatcgggcactgcggcgccagcgacaccatc900 ctgtccttcgatcacctggtgcgcacgggggagctccggccgggcgaccacatgttgatg960 ctgggcaccgcacccggcgtcgtgctgtcctgtgtcatcgtccaggtcctcgaatcgccg1020 gcctggacgaagtga 1035 Information for SEQ ID NO: 85 Length: 344 Type: PRT
Organism: Micromonospora carbonacea africana Strandedness: negative Sequence: 85 Val Arg Thr Pro Asp Leu Phe Ile Gly Ala Val Gly Ala Phe Val Pro Pro Thr Val Ser Val Glu Trp Ala Ile Asp Arg Gly Leu Tyr Ser Arg Glu Gln Val Glu Leu His Glu Leu Ala Gly Thr Ala Ile Ala Gly Asp Leu Pro Ala Pro Glu Met Ala Leu Arg Ala Ala Gln Gln Ala Vai Lys Arg Trp Gly Gly Ser Pro Thr Glu Phe Asp Leu Leu Leu Tyr Ala Ser Thr Trp His Gln Gly Pro Asp Gly Trp Pro Pro His Ser Tyr Leu Gln Arg His Leu Val Gly Gly Asp Leu Leu Ala Leu Glu Ile Arg Gln Gly Cys Asn Gly Met Phe Ser Ala Phe Glu Leu Ala Ala Ser His Leu Gln Ala Val Pro Glu Arg Thr Ser Ala Leu Leu Val Ala Ala Asp Asn Tyr Gly Thr Pro Met Val Asp Arg Trp Arg Met Gly Pro Gly Phe Ile Gly Gly Asp Ala Gly Ser Ala Leu Ile Leu Thr Lys Arg Pro Gly Phe Ala Arg Leu Arg Ser Val Cys Thr Lys Ser Val Pro Glu Ala Glu Arg Leu His Arg Gly Asp Glu Pro Leu Phe Pro Pro Ser Val Leu Thr Gly Arg Glu Leu Asn Phe Thr Ala Arg Ile Asp Gln Gln Phe Ala Ala Arg Ser Pro Ala Ser Ile Ala Met Ala Asp Val Gly Asp His Ile Glu Glu Val Val Gly Arg Ala Leu Ala Glu Ala Glu Ile Glu Val Gly Asp Leu Ala Arg Val Ala Phe Met Asn Phe Ser Arg Glu Ile Met Glu Gln Arg Cys Leu Ala Asn Trp Gly Leu Pro Met Ser Arg Ser Thr Phe Asp Phe Gly Arg Arg Ile Gly His Cys Gly Ala Ser Asp Pro Leu Leu Ala Leu Glu His Leu Ala Arg Thr Gly Gly Leu Gly Pro Gly Asp His Leu Leu Thr Leu Gly Thr Ala Pro Gly Val Val Val Ser Cys Ala Ile Val Gln Val Ile Glu Ser Pro Thr Trp Arg Glu Information for SEQ ID NO: 86 Length: 1035 Type: DNA
Organism: Micromonospora carbonacea africana Strandedness: negative Sequence: 86 gtgcgtacac cggatctgtt catcggcgcc gtcggcgcct tcgtcccgcc gacggtgagc 60 gtcgagtggg cgatcgaccg cggtctttac tcccgcgagc aggtggagct gcacgagctg 120 gcgggcacgg ccatcgccgg cgacctgccc gcgccggaga tggcgctgcg cgccgcccaa 180 caggcggtca agcgctgggg cggctcgccg acggagttcg acctgctgct ctacgccagc 240 _ 85 _ acctggcaccaggggcccgacggctggccgccgcactcctatctccagcggcacctggtc300 ggcggcgacctgctggcgttggagatccggcagggctgcaacgggatgttcagcgcgttc360 gagctggccgccagccacctccaggcggtacccgagcgcaccagcgccctgctggtcgcc420 gccgacaactacggcaccccgatggtcgaccgctggcggatgggccccggcttcatcggt480 ggcgatgccggcagcgccctcatcctcaccaagcgacccggcttcgcgcggctccgctcg540 gtctgcaccaagtcggtcccggaggccgagcggctgcaccggggcgacgagccgctgttc600 cccccgagcgtcctgaccggccgggagctgaacttcaccgcccggatcgaccaacagttc660 gccgcccgcagccccgcctcgatcgccatggcggacgtcggcgaccacatcgaggaggtc720 gtggggcgcgccctcgccgaggcggagatcgaggtcggcgacctcgccagggtcgccttc780 atgaacttttcccgggagatcatggagcagcgctgcctggccaactggggcctgcccatg840 agccggtccaccttcgacttcggtcgccggatcgggcactgcggggcgagcgaccccttg900 ctggccctggaacacctggccaggacggggggcctcggccccggcgatcacctgctgacc960 ctcggcaccgcgccgggcgtggtggtgtcgtgcgcgatcgtccaggtgatcgagtcgccg1020 acgtggcgggagtga 1035 Information for SEQ ID NO: 87 Length: 456 Type: PRT
Organism: Streptomyces mobaraensis Strandedness: positive Sequence: 87 Val Leu Pro Leu Leu Ala Ala Gly Leu Leu Leu Trp His Gly Ala Ala Gly Lys Gly Gly Gly His Ser Gly Ser Gly Gly Ala Ala Arg Gly Gly Pro Pro Val Glu Ala Trp Arg Leu Leu Leu Ala Leu Ala Val Vai Val Ala Val Ala Arg Gly Leu Gly Thr Leu Ala Ser Arg Tyr Leu Gly Gln Pro Arg Val Val Gly Glu Met Val Ser Gly Ile Val Leu Gly Pro Ser Met Leu Gly Leu Val Ala Pro Arg Ala Tyr Asp Ala Leu Phe Pro Ala Ala Leu His Ala Tyr Leu Asn Leu Val Ala Gln Ile Gly Leu Ala Leu Phe Met Phe Leu Ile Gly Met Glu Phe Gly Glu Thr Arg His Glu Asp Thr Gly Arg Thr Gly Val Ala Val Gly Leu Val Giy Val Cys Leu Ser Phe Gly Leu Gly Cys GIy Leu Gly Tyr Ala Leu Tyr Thr Gly Tyr Ala Pro Glu Gly Val Gly Phe Leu Pro Phe Thr Leu Phe Leu Gly Leu Ala Met Ser Val Thr Ala Phe Pro Val Leu Ala Arg Leu Leu Met Glu Arg Gly Met Leu Gln Ser Arg Ala Gly Ala Tyr Ala Ile Val Gly Ala Ala Thr Ala Asp Leu Ala Cys Trp Leu Leu Leu Ala Gly Val Val Ala Leu Leu Arg Gly Gly Ser Pro Leu Gly Val Leu Arg Thr Leu Ala Leu Thr Ala Ala Phe Phe Gly Val Met Val Val Val Val Arg Pro Ala Leu Arg Arg Leu Leu Glu Arg Pro Glu Arg Arg Leu Ala Asp Gly Gly Val Leu Thr Leu Ile Ile Pro Gly Val Leu Leu Ser Ala Val Ala Thr Glu Leu Ile Gly Ile His Leu Ile Phe Gly Ala Phe Leu Phe Gly Ala Val. Cys Pro Lys Ser Thr Pro Val Leu Glu Asn Ala Arg Gly Lys Leu Gln Glu Leu Val Thr Ala Val Leu Leu Pro Pro Phe Phe Ala Ser Val Gly Val Lys Thr Asp Leu Leu Arg Leu Gly Asp Gly Gly Gly Ala Leu Trp Val Trp Ala Gly Val Ala Leu Leu Val Ala Val Ala Gly Lys Leu Thr Gly Ser Ala Ala Ala Ala Ala Leu Met Ser Val Glu Arg Val Asp Ala Leu Arg Ile Gly Val Leu Met Asn Cys Arg Gly Leu Thr Glu Leu Val Ile Leu Thr Ile Gly Leu Asp Leu Gly Val Leu Ser Pro Ala Leu Ph<=_ Thr Met Leu Val Val Val Thr Leu Cys Ala Thr Val Met Thr AIa Pro Leu Leu Asp Leu Leu Asp Arg Ala Glu Ala Arg Arg Ala Ala Pro Ala Pro _ 87 _ Ala Lys Ala Ala Ala Val Ala Arg Information for SEQ ID NO: 88 Length: 1371 Type: DNA
Organism: Streptomyces mobaraensis Strandedness: positive Sequence:88 gtgctgccgctgctggccgcgggcctgctgctgtggcacggcgccgcgggcaagggcggc60 gggcactccgggtccgggggcgccgcgcggggcggaccccccgtcgaggcgtggcggttg120 ctgctcgccctcgccgtggtggtggccgtcgcgcggggcctcgggacgctggcgagccgg180 tacctggggcagccgcgcgtggtgggcgagatggtctccgggatcgtcctgggcccctcg240 atgctcggcctcgtcgccccccgcgcctacgacgccctcttccccgccgccctccacgcg300 tacctcaacctcgtcgcccagatcgggctggcgctgttcatgttcctgatcggcatggag360 ttcggcgagacccggcacgaggacacgggccgcaccggcgtcgccgtgggactcgtcggc420 gtctgcctgtcgttcgggctcggctgcggcctggggtacgcgctgtacaccggca acgcc480 ccggagggcgtcggcttcctgccgttcaccctgttcctgggcctggccatgagcgtgacc540 gccttccccgtactggcccggctgctgatggaacgcggcatgctgcagtcccgggcgggc600 gcgtacgccatcgtgggggcggccaccgccgacctcgcctgctggctgctgctc;gcgggc660 gtcgtcgcgctgctgcgcggcggttcgccgctgggcgtgctgcgcaccctcgcgctgacc720 gcggccttcttcggcgtcatggtggtggtcgtccgccccgccctgcgccgcctcctggaa780 cggcccgagcgccggctcgccgacggcggtgtactcaccctgatcatcccgggcgtcctg840 ctgtcggcggtcgccaccgagctgatcggcatccacctcatcttcggcgcctt<:ctcttc900 ggtgccgtctgcccgaagtcgacgccggtcctggagaacgcccgcggcaaactgcaggaa960 ctcgtcacggccgtgctgctgcccccgttcttcgcctcggtcggggtgaagaccgacctg1020 ctgcggctcggtgacggcggcggggccctgtgggtgtgggccggtgtcgccctcctcgtc1080 gccgtcgcggggaagttgacgggcagcgcggccgcggcggcgctgatgtcggtggaacgc1140 gtcgacgccctgcgcatcggcgtgctcatgaactgccgcgggctgaccgagctggtgatc1200 ctcaccatcggactcgacctcggcgtcctgtcgccggcgctgttcaccatgctggtcgtg1260 gtcaccctgtgcgccaccgtcatgaccgctccgctgctcgacctgctggaccgcgcggag1320 g~gcgccgcgccgcgcccgctcccgcgaaggcggcggccgtggcccgctga 1371 Information for SEQ ID NO: 89 Length: 442 _ 88 _ Type: PRT
Organism: Micromonospora carbonacea aurantiaca Strandedness: positive Sequence: 89 Met Ile Ala Ser Ala Ala Pro Val Ala Pro Leu Ala Ser His Gln Leu Leu Leu Phe Leu Leu Glu Val Gly Val Leu Leu Leu Leu Ala Val Ile Leu Gly Arg Leu Ala Gln Arg Phe Gly Leu Pro Ala Val Val Gly Glu Leu Leu Thr Gly Ile Leu Leu Gly Pro Ser Leu Leu Gly Gln Leu Ala Pro Ser Val Gly His Trp Leu Leu Pro Gly Glu Pro Ser Gln Met His Leu Leu Asp AIa Leu Gly Gln Phe Ser Val Val Leu Leu Val Gly Val Ala Gly Leu His Val Asp Leu Arg Leu Ile Arg Arg Arg Ala Gly Thr Val Ala Thr Val Thr Met Gly Gly Leu Leu Leu Pro Leu Gly Leu Gly Val Ala Thr Gly Leu Leu Val Pro Ala Ala Leu Leu Ala Ala Thr Asp Gln Arg Val Met Phe Ala Phe Phe Leu Gly Val A.La Met Ala Val. Ser Ala Val Pro Val Ile Ala Lys Thr Leu Thr Asp Met Arg Leu Met His Arg Asp Val Gly Gln Leu Ile Leu Ala Ala Ala Ser Leu Asp Asp Ala Phe Ala Trp Phe Met Leu Ser Leu Ile Ser Ser Met Ala Val Ser Ala Leu Thr Val Gly Asn Val Leu Ala Ser Leu Leu Asn Leu Val Leu Phe Ile Val Ala Ala Ala Leu Ile Gly Arg Pro Val Val Arg Arg Ala Met Arg Trp Ala Asn Ala Gln Ile Asp Val Gly Pro Ala Val Ala I1P Ala Val Val Thr Val Leu Leu Phe Ser Ala Ala Gly His Ala Leu Gly Leu Glu Ala Ile Phe Gly Ala Leu Val Ala Gly Val Leu Leu Gly Leu Pro Gly Gly Val Glu Pro Ala Arg Leu Ala Pro Leu Arg Thr Val Val Leu Ser Val Leu Ala Pro Leu Phe Leu Ala Thr Ala Gly Leu Arg Val Asp Leu Arg Ala Leu Ala Asp Pro Val Val Leu Val Ala Gly Leu Val Ile Leu Val Leu Ala Val Leu Gly Lys Phe Cys Gly Ala Tyr Leu Ala Gly Arg Leu Thr Arg Gln Ser His Trp Glu Ala Val Aia Leu Gly Ala Gly Leu Asn Ser Arg Gly Val Val Glu Tle Val Ile Ala Met Val Gly Leu Arg Leu Gly Ile Leu Asn Thr Ala Thr Tyr Thr Ile Val Val Leu Val Ala Val Leu Thr Ser Val Met Ala Pro Pro Met Leu Gln Arg Ala Met Arg Arg Ile Glu His Asn Ala Glu Glu Ala Leu Arg Glu Glu Asn Gln Ala Gln Leu Ile Thr Arg Pro Val Val Arg Information for SEQ ID N0: 90 Length: 1329 Type: DNA
Organism: Micromonospora carbonacea aurantiaca Strandedness: positive Sequence: 90 atgatcgcga gcgccgcacc cgtggctccc ctggcttcac atcaattgtt gttgtttctt 60 ctcgaggtcggggtcctgttgctgctggccgtcattctgggccggttggcgcagcgtttc120 ggcctgccggcggtcgtcggcgagttgctgaccgggatcttgctcgggccgtcgttgctg180 ggccagttggcgccctccgtcgggcattggctgctgcccggtgagccgtcgcagatgcac240 ctgctggacgcgctcgggcagttcagcgtcgtgctgctggtcggcgtggccgggctgcac300 gtcgacctgcgactgatccgacgccgggccggcacggtcgccacggtgaccatgggtggc360 ctcctgctgcccctcgggttgggcgtggccaccggcctgctggtgccggcggcgctgttg420 gcggcgacggaccagcgcgtgatgttcgccttcttcctcggggtggcgatggccgtcagc480 gccgtgccggtgatcgccaagacgctcaccgacatgcggctgatgcaccgtgacgtcggt540 cagctcatcctcgccgcagcgtccctggacgacgcgttcgcctggttcatgctgtcgctg600 atctcgtccatggcggtcagcgccctcaccgtggggaacgtgctggcctcgctgctcaac660 ctcgtcctgttcatcgtcgcggcggcgctgatcggccgcccggtggtcaggcgtgcgatg720 cggtgggcgaacgcccagatcgacgtggggccagccgtcgccatcgcggtcgtcaccgtc780 ctgctgttctcggcggccggacacgcgctcggccttgaggcgatcttcggcgcattggtg840 gcgggagtcctgctcgggctgcccggaggcgtcgagccggcccggctggcgccgttgcgt900 accgtggtgctctccgtgctggcgccgctcttcctggccaccgccgggctccgggtcgac960 ctgcgcgccctcgccgacccggtggtgctcgtggccggtctggtgatcctggtgctcgcc1020 gtcctgggcaagttctgcggcgcgtacctggcaggccggctgacgcgccagagccactgg1080 gaggcggtcgccctcggggcgggactcaactcacggggcgtcgtggagatcgtcatcgcg1140 atggtcgggctgcgcctgggcatcctcaacaccgccacctacacgatcgtggtgctcgtc1200 gccgtcctcacgtccgtcatggcgccgccgatgctccagcgggcgatgcgccggatcgag1260 cacaatgccgaggaggcgctgcgggaggagaaccaggcgcagttgatcacccgcccggtg1320 gtgcggtga Information for SEQ ID NO: 91 Length: 445 Type: PRT
Organism: Micromonospora carbonacea africana Strandedness: positive Sequence: 91 Val Ile Val Ala Ala Pro Val Pro Pro Leu Gly Ser His Gln Leu Leu Leu Phe Leu Leu Gln Val Gly Leu Leu Leu Leu Leu Ala Val Val Leu Gly Arg Val Ala Gln Arg Phe Gly Leu Pro Ala Val Val Gly Glu Leu Leu Thr Gly Val Leu Leu Gly Pro Ser Val Leu Gly Ala Leu Ala Pro Asp Ile Gly Arg Trp Leu Leu Pro Ala Asp Pro Asp Gln Val His Leu Leu Asp Ala Ile Gly Gln Phe Gly Val Val Leu Leu Val Ala Val Ala Gly Leu His Leu Asp Leu Arg Leu Val Arg Arg Arg Ala Gly Thr Ile Gly Ala Val Ala Val Gly Gly Leu Ala Val Pro Leu Gly Leu Gly Ile Ala Ala Gly Leu Leu Ala Pro Ala Ala Leu Leu Ala Ala Gly Gln Glu Arg Thr Val Phe Ala Leu Phe Val Gly Val Ala Met Ala Val Ser Ala Val Pro Val Ile Ala Lys Thr Leu Thr Asp Met Arg Leu Leu His Arg Asp Val Gly Gln Ile Ile Leu Ala Ala Ala Ser Leu Glu Asp Ala Ala Ala Trp Phe Leu Leu Ser Leu Ile Ser Ser Val Ala Val Ser Thr Leu Thr Ala Gly Gln Val Val Thr Ala Leu Leu Tyr Leu Val Ala Tyr Leu Ala Val Ala Val Leu Val Gly Arg Pro Val Thr Arg Arg Ala Met Arg Trp Ala Asn Ala Gln Pro Asp Gly Gly Ala Ala Ser Ala Val Ala Val Val Ile Val Leu Ala Phe Ala Ala Gly Ala His Ala Leu Gly Leu Glu Ala Ile Phe Gly Ala Leu Val Ala Gly Val Leu Ile Gly Leu Pro Gly Asn Gly Glu Pro Ala Arg Leu Ala Pro Leu Arg Thr Val Val Leu Ser Val Leu Ala Pro Ile Phe Leu Ala Ser Ala Gly Leu Arg Val Asp Leu Arg Ala Leu Ala Asp Pro Glu Val Leu Ala Ala Gly Ala Val Ile Leu Ala Leu Ala Val Leu Gly Lys Tyr Thr Gly Ala Tyr Leu Gly Ala Arg Leu Ala Arg Gln Ser His Trp Glu Gly Val Ala Leu Gly Ala Gly Leu Asn Ala Arg Gly Ala Val Glu Ile Ile Ile Ala Met Val Gly Leu Arg Leu Gly Val Leu Asn Thr Ala Ser Tyr Thr Ile Val Val Leu Val Ala Val Val Thr Ser Val Met Ala Pro Pro Met Leu Arg Val Ala Met Arg Arg Val Glu Gln Asn Ala Glu Glu Thr Leu Arg Glu Ser Arg His Leu Glu Trp Ala Ala Thr Pro Ala Val Asp Gln Arg Pro Gly Information for SEQ ID NO: 92 Length: 1338 Type: DNA
Organism: Micromonospora carbonacea africana Strandedness: positive _ Sequence:92 gtgatcgtggccgcgccggtgcccccgctgggctcccaccagctactgctgttcctgctc60 caggtgggcctgctgctgctgctcgccgtcgtcctgggacgggtggcgcaacgcttcggc120 ctgccggcggtggtcggtgagctgctgaccggggtgctgctcggcccctcggtgctgggg180 gccctggcacccgacatcggacggtggctgctgcccgccgaccccgaccaggtccacctg240 ctcgacgccatcggtcagttcggcgtcgtactgctggtcgccgtggccggtctgcacctg300 gacctgcggctggtccggcggcgggccggcacgatcggcgcggtggccgtcggcggcctc360 gcggtgcccctcggcctgggcatcgccgccggcctgctggccccggcggcgcttctcgcg420 gccgggcaggagcggactgtcttcgcgctgttcgtcggcgtggcgatggcggtcagcgcc480 gtgccggtgatcgcgaagacgctcaccgacatgcgcct:gctgcaccgcgacgtggggcag540 atcatcctggctgcggcgtcgctggaggacgctgcggcctggttcctgctgtcgctcatc600 tcgtcggtggcggtgagcaccctcaccgccgggcaggtggtgaccgccctgctttacctc660 gtggcctacctcgcggtggccgtcctggtcggccggccggtgacccggcgcgccatgcgc720 tgggcgaacgcccagcccgacggcggggccgccagcgccgtcgccgtggtgatcgtgctg780 gccttcgcggcgggggcgcacgcgctgggcctggaggcgatcttcggcgcgctggtggcg840 ggtgtcctgatcggccttcccggcaacggggagccggcccggctggcaccgctgcgcacg900 gtggtgctgtccgtgctcgccccgatcttcctggccagcgcggggctccgggtcgatctg960 cgtgccctcgccgacccggaggtgctcgccgccggggcggtgatcctggcgctcgccgtg1020 ctcggcaagtacaccggcgcgtacctgggtgcacggctggcccggcagagccactgggag1080 ggcgtcgccctcggcgccgggctcaacgcgcgcggtgccgtggagatcatcatcgcgatg1140 gtcgggctgcgcctgggcgtgctgaacaccgcctcatacaccatcgtggtgctggtcgcg1200 gtcgtcacctcggtcatggcgccgccgatgctgcgcgtcgccatgcggcgggtcgaacag1260 aacgccgaggagaccctgcgggagagccgccacctggagtgggccgccacccctgcggtg1320 gaccagcggc cgggctga 1338 Information for SEQ ID NO: 93 Length: 348 Type: PRT
Organism: Streptomyces mobaraensis Strandedness: positive Sequence: 93 Met Glu Glu Gln His Asn Ala Pro Ala Pro Trp Ala Asp Val Ile Arg Leu Val Phe Gly Gly Met Ala Thr Gln Val Val Gly Leu Ala Val Arg Leu Arg Leu Pro Asp Ala Ile Gly Ala Gly Glu Arg Thr Ala Asp Gly Leu Ala Ala Arg Phe Asp Gly Glu Pro Ala Ala Met Asn Arg Leu Leu Arg Gly Leu Ala Ala Leu Gly Val Leu Arg Glu Pro Glu Pro Gly Val Phe Asp Leu Thr Pro Val Gly Glu Leu Leu Arg Ala Asp Arg Ser Pro Ser Phe His Ala Leu Ser Arg Met Leu Thr Asp Pro Ala Val Ser Thr Ala Trp Gln His Leu Asp His Ser Val Arg Thr G1y Gly Pro Ala Phe Asp Gln Val Phe Gly Arg Asp Phe Phe Ala His Leu Ala Asp Asp Pro Asp Leu Ser Gly Leu Tyr Asn Ala Ala Met Ser Gln Gly Thr Arg Gly Ile Ala Asp Leu Val Ala Leu Arg Gln Asp Phe Ser Gly Val Arg Thr Val Val Asp Val Gly Gly Gly Asp Gly Thr Leu Leu Ala Ala Val Leu Arg Ala His Pro Ala Leu Arg Gly Val Leu Tyr Asp Thr Ala Thr Gly Ala Ala Arg Ala Gly Glu Glu Leu Ala Ala Ala Gly Val Ala Asp Arg Ala Thr Val Glu Thr Gly Asp Phe Phe Ala Ala Val Pro Pro G1~~ Gly Asp Leu Tyr Leu Leu Lys Ser Val Val His Gly Trp Glu Asp Glu Arg Ala Ala Ala Ile Leu Ala His Cys Arg Arg Ala Leu Pro Ala His Gly Arg Val Val Met Val Glu His Val Leu Pro Asp Thr Val Pro Ala Asp Ala Val Pro Thr Thr Tyr Leu Asn Asp Leu Asn Leu Leu Val Asn Gly Asn Gly Leu Glu Arg Thr Arg Gly Asp Phe :erg Arg Leu Cys Ala Ala Ala Gly Leu Thr Ala Gly Ala Phe Thr Pro Leu Asp Gly Thr Asp Leu Trp Leu Ile Glu Ala Val Pro Ala Ala Pro Ala Asp Information for SEQ ID N0: 94 Length: 1047 Type: DNA
Organism: Streptomyces mobaraensis Strandedness: positive Sequence: 94 atggaagagc agcacaacgc gccggccccc tgggcggacg tgatccggct ggtgttcggc 60 ggcatggcca cccaggtcgt cggcctggcg gtacggctcc ggctgcccga cgcgatcggc 120 gccggcgagc ggaccgccga cggcctggcc gcccgcttcg acggcgaacc cgccgccatg 180 aaccggctgc tgcgcggcct cgccgcgctc ggggtgctcc gcgagcccga gccgggcgtg 240 ttcgacctga cgccggtcgg cgagctgctg cgcgccgacc gctcgccgtc cttccacgcg 300 ctgtcccgcatgctcaccgacccggcggtctccacggcctggcagcacctggaccacagc360 gtccgcaccggcggcccggccttcgaccaggtcttcggccgcgacttcttcgcccacctc420 gcggacgaccccgacctgtcggggctctacaacgccgccatgagccagggcacccgcggc480 atcgccgacctggtcgcgctgcgccaggacttctccggcgtccgcaccgtcgtggacgtc540 gggggcggtgacgggacgctgctcgccgcggtgctgcgcgcgcacccggcgctgcgcggc600 gtgctgtacgacacggcgaccggggccgcccgggccggcgaggagctggcggcggcgggc660 gtcgcggaccgcgccacggtggagaccggcgacttcttcgccgccgtgcccccgggcggc720 gacctctacctgctcaagagcgtcgtccacggctgggaggacgagcgggccgcggcgatc780 ctcgcgcactgccgccgggcgctgcccgcgcacggccgggtcgtcatggtcgagcacgtg840 ctgcccgacaccgtccccgccgacgcggtgcccacgacgtacctcaacgacctcaacctg900 ctggtcaacggcaacgggctggagcgcacccgcggcgacttccggcggctctgegcggcg960 gcgggcctgacggccggcgcgttcaccccgctggacggcaccgacctgtggctgatcgag1020 gccgtccccgccgccccggcggactga 1047 Information for SEQ ID NO: 95 Length: 342 Type: PRT
organism: Micromonospora carbonacea aurantiaca Strandedness: negative Sequence: 95 Val Asp Ser Pro Ala Ser Ser Pro Trp Pro Ala Val Leu Arg Leu Val Phe Gly Gly Met Ala Thr His Val Val Ala Leu Ala Val Arg Leu Arg _ 95 _ Leu Pro Asp Ala Ile Gly Asp GIu Glu Arg Thr Ala Ala Gly Val Ala Aie Glu Tyr Gly Phe Gln Glu Gly Pro Met Leu Arg Leu Leu Arg Ala Leu Ala Ala Leu Asp Leu Leu Ala Glu Pro Arg Pro Gly Arg Phe Thr Val Thr Pro Val Gly Ala Leu Phe Arg Ser Asp Arg Pro Gly Ser Met Tyr Pro Leu Ala Arg Met Leu Thr Asp Pro Thr Met Thr Ser Ala Trp Glr. Asn Leu Glu Phe Ser Leu Arg Thr Gly Gly Pro Ala Phe Asp Glu Ala Phe Gly Ile Asp Phe Phe Gly Tyr Leu Ser Ser His Pro Glu Leu Ser Glu Leu Tyr Asn Ala Ala Met Ser Gln Gly Thr Arg Gly Val Ala Arg Val Leu Ala Gly Ala Tyr Asp Phe Gly Arg Phe Arg Thr Val Val Asp VaI Gly Gly Gly Asp Gly Thr Ser Leu val G'~u Ile Leu Ala Glu His Pro Arg Leu Gly Gly Val Leu Phe Asp Ser Pro Ser Gly Val His Ala Ala Glu Gln Thr Leu Glu Ala Ala Gly Leu Thr Ala Arg Cys Arg Ile Glu Thr Gly Asp Phe Phe Ser Glu Val Pro Arg Asp Gly Asp Leu Tyr Leu Leu Lys Ser Val Ile His Gly Trp Asp Asp Glu His Ala Ala Val Ile Leu Arg Asn Cys Ala Arg Ala Ala Arg Glu Gln Gly Arg Ile Leu Leu Val Asp His Leu Met Pro Asp Thr Val Leu Pro Gly Gln Ser Pro Thr Thr Tyr Leu Thr Asp Leu Gly Leu Leu Val Asn Gly Gln Gly Met Glu Arg Thr Arg Asp Asp Phe Ala Gly Leu Cys Ala Lys Ala Gly Leu Arg I1e Ala Glu Val Gly Ser Leu Pro Ser Thr Gly Phe His Trp Ile Glu Leu Cys Pro Asp Information for SEQ ID NO: 96 Length: 1029 Type: DNA
Organism: Micromonospora carbonacea aurantiaca Strandedness: negative Sequence:96 gtggatagccctgcgagctccccgtggccggcagtgctgcggctggtgttcggcgggatg60 gcgacgcacgtggtcgcgctcgcggtccggctgcggctgcccgacgcgatcggcgacgag120 gaacggacagcggcaggtgtcgccgccgagtacggcttccaggagggtccgatgctgcgg180 ctgctgcgtgcgctcgccgcgctcgacctgctcgccgaaccccgccccggccggttcacc240 gtcacccccgtgggcgcgctgttccgcagcgaccggccgggatcgatgtacccgctggcc300 cggatgctgaccgatccgacgatgacgagcgcctggcagaacctcgagttcagcctgcgc360 accggcggcccggccttcgacgaggcgttcgggatcgacttcttcggctacctgtcgtcc420 catccggagctgtccgagctgtacaacgccgcgatgagtcagggcacccggggagtcgcc480 agggtgctggccggcgcgtacgacttcggccgtttccggacggtcgtcgatgtcggcggt540 ggcgacgggacgtcgctcgtcgagatcctggccgagcacccc:cggctgggcggggtgctc600 ttcgacagcccgtccggtgtgcacgcggccgagcagaccctggaagcggcaggtctgacg660 gcccgctgccggatcgaaacgggagacttcttctcggaggtgccgcgcgatggcgacctg720 tacctgctcaagagcgtgatccacggctgggacgacgagcatgccgcggtgatcctgcgc780 aactgtgcccgcgccgccagggaacagggacgcatcctgctcgtcgaccacctgatgccg840 gacaccgtgctgcccgggcagagccccaccacctacctcaccgacctgggcctgctcgtc900 aacggtcaggggatggagcggacgagggacgacttcgccggcctctgcgccaaggcgggt960 ctgcggatcgccgaggtcgggtcgctgccgtccacgggcttccactggatcgagctctgt1020 cccgattga 1029 Information for SEQ ID NO: 97 Length: 362 Type: PRT
Organism: Micromonospora carbonacea africana Strandedness: positive Sequence: 97 Met Thr Val Pro Asp Thr Asp Glu Arg Ala Thr Thr Thr Asp Glu Pro Ser Ala Arg Arg Ala Gln Thr Gly Ala Asp Ala Ala Trp Pro Glu Leu _ 97 _ Met Arg Leu Val Phe Gly Gly Met Ala Ser Arg Leu Val Gly Tyr Cys Val Arg Leu Gly Leu Pro Asp Ala Ile Gly Asp Asp Glu Arg Thr Pro Gln Glu Leu Ala Leu Arg Tyr Asp Ala Arg Ala Asp Thr Met Phe Arg Val Leu Arg Ala Leu Ala Ala Leu Arg Val Leu Thr Glu Thr Thr Pro Gly Arg Phe Ala Leu Ala Pro Met Gly Ala Leu Leu Arg Gly Asp Arg Pro Gly Thr Leu Arg Pro Leu Ala Arg Met Leu Thr Asp Pro Ala Met Thr Thr Ala Trp Asp Gly Leu Ala His Ser Val Arg Thr Gly Glu Pro Ala Phe Asp Gly Ile Phe Gly Thr Asp Phe Phe Ser Tyr Val Gly Gly Arg Pro Asp Leu Ser Glu Leu Tyr Asn Ala Ala Met Ser Gln Val Thr His Ser Val Ala Ala Ala Val Ala Glu Arg Thr Asp Leu Ala Gly Val Arg Thr Val Val Asp Val Gly Gly Gly Asp Gly Thr Leu Leu Ala Ala Val Leu Ala Ala Asn Pro Gly Val Arg Gly Val Leu Tyr Asp Sex Ala Ser Gly Ser Ala Glu Ala Ala Gly Asn Leu Arg Arg Ala Gly Val Gly Asp Arg Cys Arg Ile Glu Val Gly Asp Phe Phe Glu Arg Val Pro Ala Asp Ala Asp Leu Tyr Leu Leu Lys Ser Val Ile His Gly Trp Gly Asp Gly Arg Ala Thr Gly Ile Leu Arg His Cys Ala Glu Ala Val Ala Pro Gly Gly Arg Ile Val Met Ile Asp His Val Leu Pro Asp Val Val Gly Pro Ala Ala Asn Ala Leu Ala Tyr Leu Thr Asp VaI Gly Met Leu Val Asn Gly Gln Gly Leu Glu Arg Thr Arg Gly Asp Leu Glu Arg Leu Cys Gly Lys Ala Gly Leu Ser Leu Glu Asp Val Thr Pro Leu Pro Pro Thr Asp Phe His Trp Ile Glu Ser Arg Pro Ala _ 98 _ Information for SEQ ID NO: 98 Length: 1089 Type: DNA
Organism: Micromonospora carbonacea africana Strandedness: positive Sequence:98 atgaccgtcccggacacggacgagcgggccacgacgaccgacgagcccagcgcccgccgg60 gcgcagaccggcgcggacgccgcctggccggagctgatgcggttggtgttcggcgggatg120 gccagccggctggtcggctactgcgtccggctggggctgcccgacgcgatcggcgacgac180 gagcgcaccccgcaggagttggcgctgcggtacgacgcccgagcggacaccatgttccgg240 gtgctgcgcgccctggccgcgctgcgggtgctcaccgagaccacacccggccggttcgcg300 ctcgccccgatgggggcgctgctgcgtggggaccgacccggcacgctgcgcccgctggcc360 cggatgctgaccgacccggccatgaccacggcctgggacggcctggcgcacagcgtccgc420 accggcgagccggccttcgacggcatcttcggcaccgacttcttcagctacgtgggcggg480 cgacccgacctttccgagctgtacaacgcggcgatgagccaggtgacccacagcgtcgcg540 gcggccgttgccgagcgtaccgacctggccggcgtgcggacggtggtcgacgtgggcggc600 ggagacggcaccctgctggccgccgtgctcgccgcgaaccccggcgtgcggggcgtgctc660 tacgacagcgccagcggcagcgcggaggcggcggggaacctgcgccgcgccggggtcggc720 gaccggtgccggatcgaggtgggtgacttcttcgagagggtccccgccgacgccgacctc780 tacctgctcaaaagtgtgatccacggttggggcgacggccgggcgacgggaatcctccgg840 cactgtgccgaggcggtcgccccgggcggccggatcgtcatgatcgaccacgtgctgccg900 gacgtggtcggcccggcggccaacgcgctggcgtacctgaccgacgtggggatgctggtc960 aacgggcagggcctggagcggacccgaggcgacctggaacggctgtgcggcaaggcgggg1020 ctgtccctggaggacgtcacgccgctgcctcccaccgacttccactggatcgagtcccgg1080 cctgcctga 1089 Information for SEQ ID N0: 99 Length: 242 Type: PRT
Organism: Streptomyces mobaraensis Strandedness: positive Sequence: 99 Met Thr Ser Glu Thr Pro Ala Gly Lys Gly Val Arg Met Pro Arg Ser _ 99 _ Val Glu Asp Leu Asn Val Val Trp Glu Trp Asp Asp Ala Glu Gly Met Gln Val Arg Leu Ala Gly Tyr Gln Pro Arg Asp Glu Tyr Leu Arg Asp Arg Ala Glu Arg Ile Gly Gln Leu Thr Glu Ala Leu Ser Leu His Gln Gly Val Asp Leu Phe Glu Ile Gly Ser Gly Glu Gly Val Met Ala Arg Glu Leu Ala Pro Arg Val Asn Ser Leu Leu Cys Ala Asp Val Ser Gln Ser Phe Leu Asp Lys Thr Arg Ala Thr Cys Ala Gl.y Val Pro Asn Val Glu Tyr His His Ile Arg Asn Asp Tyr Leu Ala Gly Leu Pro Asp Ala Ser Phe Asp Ala Gly Phe Ala Leu Asn Val Phe Ile His Leu Asn Ala Phe Glu Val Tyr Leu Tyr Leu Arg Glu Ile Arg Arg Val Leu Arg Pro Gly Gly Arg Phe Leu Phe Asn Tyr Leu Asp Phe Gly Asp Val Thr Arg Pro Gln Phe His Glu Tyr Val Ala Ser Tyr Pro Gly Ala His Pro Val Ala Val Lys Gly Phe Met Ser Trp Leu Gly Ser Asp Val Val Gly Lys Leu Ala Ala Glu Ala Gly Leu Thr Pro Val Pro Gly Ser Leu Val. Asp Gln Gly Gly Val Cys Phe Leu Thr Leu Arg Arg Asp Asp Glu Gly Ala Thr Ala Information for SEQ ID NO: 100 Length: 729 Type: DNA
Organism: Streptomyces mobaraensis Strandedness: positive Sequence: 100 atgaccagcg agacaccggc cggcaagggc gtgcgcatgc cgcggagcgt cgaggacctc 60 aacgtcgtct gggagtggga cgacgccgag gggatgcagg tgcggctcgc cggctaccag 120 ccgcgcgacg agtacctgcg cgaccgcgcc gagcggatcg gccagttgac ggaggcgctg 180 tccctgcacc agggcgtcga cctgttcgag atcggcagtg gcgagggcgt catggcccgc 240 gagctggcgccccgggtcaacagcctgctctgcgcggacgtcagccagtcgttcctggac300 aagacccgcgccacctgcgcgggcgtcccgaacgtcgagtaccaccacattcggaacgac360 tacctggccgggctgccggacgcctcgttcgacgccgggttcgcgctgaacgtgttcatc420 cacctgaacgccttcgaggtctacctctacctgcgcgagatccgccgggtgctgcgcccc480 ggcggccggttcctcttcaactacctggacttcggcgacgtcaccaggccgcagttccac540 gagtacgtggcgagctacccgggcgcccacccggtggccgtgaagggcttcatgtcctgg600 ctcggcagcgacgtggtcggcaagctcgcggcggaggccgggctgacgcccgtccccggg660 tcgctggtggaccagggcggcgtctgcttcctcaccctgcgcagggacgacgaaggggcg720 accgcgtga 729 Information for SEQ ID NO: 101 Length: 240 Type: PRT
Organism: Micromonospora carbonacea aurantiaca Strandedness: negative Sequence: 101 ' Met Ser Asn Ala Gln Gly Thr Pro Ala Thr Gly Pro Lys Pro Pro Leu Arg Ser Met Gly Asp Leu Asn Met Val Trp Glu Trp Arg Thr Pro Asp Glu Met Gln Ile Gln Leu Ala Gly Thr Gln Pro Arg Asp Glu Tyr Leu Gln Asp Arg Val Asp Arg Ala Lys Trp Met Ala Glu Arg Leu Gly Ile Thr Pro Glu Ser Ser Ile Phe Glu Ile Gly Ser Gly Glu Gly Ile Met 65 70 '75 80 Ala Asn Val Leu Ala Pro Ser Val Arg Arg Met Leu Cys Thr Asp Val Ser Arg Ser Phe Leu Asp Lys Ala Arg Val Thr Cys Gln Asp His Ala Asn Val Asp Tyr His His Ile Asp Asn Asp Tyr Leu Ala Ala Leu Pro Ser Ala Glu Phe Asp Ala Gly Phe Ser Leu Asn Val Phe Ile His Leu Asn Val Phe Glu Phe Phe His Tyr Phe Arg Gln Ile Ala Arg Ile Leu Arg Pro Gly Gly Arg Phe Gly Val Asn Phe Leu Asp Ile Gly Ala Ser Thr Arg Ser Phe Phe His Phe Tyr Ala Glu Arg Tyr Leu Thr Ala Asn Pro Val Clu Phe Lys Gly Phe Leu Ser Phe His Gly Ile Asp Val Ile Ser Ser Leu Ala Val Glu Ala Gly Leu Thr Pro Leu Leu Asp Glu Phe Val Asn Glu Asp Gly Val Cys Tyr Leu Ile Leu Arg Arg Asp Gln Lys Information for SEQ ID NO: 102 Length: 723 Type: DNA
Organism: Micromonospora carbonacea aurantiaca Strandedness: negative Sequence:102 atgagcaacgcgcaaggaacacccgccaccggccccaagccgccgctgcggagcatgggc60 gacctcaacatggtctgggagtggaggacgccggacgagatgcagatccagctcgccggc120 acccagccccgcgacgagtacctgcaggaccgcgtcgaccgggcgaagtggatggccgag180 cgcctcgggatcaccccggagtcgtcgatcttcgagatcggcagcggcgaggggatcatg240 gccaacgtcctcgcgccttcggtgcgacggatgctgtgcaccgacgtcagccggtccttc300 ctggacaaggcacgggtcacctgccaggaccatgccaacgtcgactaccaccacatcgac360 aacgactacctggcggcgttgccgtcggcggagttcgacgcgggcttttcgctgaacgtg420 ttcattcacctcaacgtcttcgagttcttccactacttccgtcagatcgcccggatcctg480 cggcccggcggcaggttcggggtcaacttcctcgacatcggcgcgtcgacgcgcagcttc540 ttccacttctacgcggagcggtacctgacggcgaacccggtcgagttcaagggtttcctc600 tccttccacggcatcgacgtgatctcctcgttggccgtcgaggccggcctgacgccgctg660 ctggacgagttcgtcaacgaggacggggtctgctacctgatcctgcgccgcgaccagaag720 tag 723 Information for SEQ ID NO: 103 Length: 247 Type: PRT
Organism: Micromonospora carbonacea africana Strandedness: positive Sequence: 103 Met Asn Asp Ala Lys Gln Ala Thr Pro Pro Pro Gln Asp Ala Pro Arg Gly Cys Pro Leu Arg Ser Thr Gly Asp Leu Asn Tyr Val Trp Glu Trp Asn Thr Pro Glu Glu Met Gln Met Gln Leu Ala Gly Tyr Gln Pro Arg Glu Glu Tyr Leu Gln Asp Arg VaI Asp Lys Val Ala Leu Val Val Glu Gin Leu Gly Leu Gly Pro Glu Ser Glu Ile Phe Glu Ile Gly Ser Gly Glu Gly Ile Met Ala Ala Gly Leu Ala Asp Arg Val Arg Ala Val Leu Cys Ala Asp Val Ser Arg Ser Phe Leu Asp Lys Ala Arg Ala Thr Cys Glu Gly Arg Glu Asn Val Ser Tyr His His Ile Glu Asn Asp Phe Leu Glu Lys Leu Pro Thr Ala Ala Phe Asp Ala Gly Phe Ser Leu Asn Val Phe Ile His Leu Asn Val Phe Glu Val Phe Leu Tyr Phe Arg Gln Ile Arg Arg Ile Leu Arg Pro Gly Gly Leu Phe Cys Phe Asn Phe Leu Asp Leu Gly Asp Asn Thr Arg Gly Phe Phe His Thr Tyr Ala Glu Arg Tyr Arg Asp Ala Asn Pro Val Glu Phe Lys Gly Phe Leu Asn Trp His Gly Val Glu Leu Met Thr Gly Ile Ala Ala Glu Ala Gly Leu Thr Pro Val Thr Asp Lys Met Ile Asn His Asp Gly Val Val Phe Leu Thr Leu Arg Ser Asp Gly Glu Pro Ala Ala Information for SEQ ID NO: 104 Length: 744 Type: DNA
Organism: Micromonospora carbonacea africana Strandedness: positive Sequence: 104 atgaacgacgcgaagcaggcgacgccgccgccgcaggacgccccccggggctgtccgctc60 cggagcaccggcgacctcaactacgtctgggagtggaacacgccggaggagatgcagatg120 cagctcgccggctaccagccgcgcgaggagtacctccaggaccgtgtcgacaaggtcgcc180 ctggtcgtcgagcagctcgggctcggcccggaatcggagatcttcgagatcggcagcggc240 gagggcatcatggcggccgggctcgccgaccgggtgcgcgccgtgctctgcgccgacgtc300 agccgatccttcctcgacaaggcgcgcgccacctgcgagggccgggagaacgtctcctac360 caccacatcgagaacgacttcctggagaagctgccgaccgccgcgttcgacgccgggttc420 tccctgaacgtgttcatccacctcaacgtcttcgaggtcttcctctacttccgtcagatc480 cggcggatcctgcggcccggcgggctgttctgcttcaacttcctggacctcggcgacaac540 acgcgcggcttcttccacacctacgccgagcggtaccgcgacgcgaaccccgtggagttc600 aaggggttcctgaactggcacggcgtggagttgatgaccggcatcgccgccgaggccggg660 ctgacgccggtcaccgacaagatgatcaatcacgacggcgtggtgttcctgacgctgcgc720 agcgacggggagccggcggcgtga 744 Information for SEQ ID N0: 105 Length: 238 Type: PRT
Organism: Streptomyces mobaraensis Strandedness: positive Sequence: 105 Val Ala Glu Pro Ile Pro Val Asp Arg Leu Asn Glu Trp Tyr Asp Glu Leu Thr Val Glu Ile Ile Glu Arg Val Cys Ala Pro Asp Ser Gly Cys Leu Asp Ile Gly Ala Gly Gly Gly Glu Ile Leu Ala His Met Arg Arg Ala Ala Pro Glu Gly Arg His Phe Ala Val Glu Pro Leu Pro His Tyr Ala Glu Gly Leu Arg Arg Asp Phe Pro Glu Val Thr Val Trp Gln Ala Ala Ala Ala Asp Ala Gln Gly Arg Asp Ser Phe Val His Val Val Ser Asn Pro Gly Tyr Ser Gly Leu Arg Arg Arg Pro Tyr Asp Arg Ala Asp Glu Thr Leu Val Glu Ile Ala Val Asp Thr Val Arg Leu Asp Asp Val Val Pro Ala Asp Ala Arg Val Asp Leu Val Lys Val Asp Val Glu Gly Gly Glu Val Gly Ala Leu Arg Gly Ala Ala Glu Leu Leu Arg Arg Gln Ser Pro Val Val Val Phe Glu His Gly Gly Asp His Ala Met Arg Asp Tyr Gly Thr Thr Ser Asp Asp Leu Trp Ala Leu Leu Val Asp Asp Leu Gly Tyr Thr Ile His Thr Leu Ala Gly Trp Leu Ala Ala Glu Pro Gly Leu Asp Arg Ala Ala Phe Ala Ala Glu Leu Arg Thr Gln Trp Tyr Phe 210 215 22.0 Val Ala Ala Arg Gly Pro Leu Pro Thr Arg Ser His Glu Glu Information for SEQ ID NO: 106 Length: 717 Type: DNA
Organism: Streptomyces mobaraensis Strandedness: positive Sequence: 106 gtggctgagccgatacccgtcgaccggctcaacgagtggtacgacgagctcaccgtcgag60 atcatcgagcgggtctgcgcccccgactccggctgcctggacatcggcgccggaggcggc120 gagatcctcgcgcacatgcgccgggccgcccccgagggccggcacttcgcggtcgagccg180 ctgccgcactacgcggaggggctgcgccgggacttccccgaggtcacggtgtggcaggcc240 gccgcggccgacgcccagggccgggacagcttcgtccacgtcgtctccaatcccggctac300 agcggcctgcgccgccggcc gtacgaccgcgccgacgagacgctcgtggagatcgcggtg360 gacaccgtccgcctcgacga cgtcgtcccggccgacgcccgcgtcgacctggtgaaggtg420 gacgtggagggcggcgaggt gggcgccctgcgcggggcggccgagctgctgcgccggcag480 tcccccgtcgtggtcttcga gcacggcggcgaccacgcgatgcgcgactacggcaccacc540 agcgacgacctgtgggccct gctggtcgacgacctcggctacacgatccacaccctcgcg600 ggctggctggccgcggagcc gggcctcgaccgggccgccttcgccgccgagctgcgcacc660 cagtggtacttcgtcgccgc ccgcgggcccctgcccacccgttcccacgaggagtag 717 Information for SEQ
ID NO:

Length:

Type:
PRT

Organism:Micromonospora carbonacea aurantiaca Strandedness:
negative Sequence: 107 Val Ser Gly Ser Leu Ala Ala Asp Pro Ala Gln Arg Asn Glu Asp Tyr Asp Arg Leu Thr Val Glu Ile Ile Glu Arg Val Cys Gly Arg Thr Ala Val Ser Val Asp Leu Gly Ala Gly Val Gly Glu Ile Thr Gln His Leu Val Arg Val Ala Pro Glu Gly Asn His Phe Ala Val Glu Pro Leu Pro Ala Leu Ala Asp Glu Leu Ala Asp Arg Leu Pro Ser Val Thr Val Val Arg Ala Ala Ala Ala Asp Ala Ala Gly Arg His Ser Phe Val His Val Val Ser Asn Pro Gly Tyr Ser Gly Leu Arg Arg Arg Pro Tyr Asp Arg Pro Ala Glu Thr Leu His Glu Ile Thr Val Asp Thr Val Arg Leu Asp Asp Val Ile Pro Ala Asp Val Arg Val Asp Leu Ile Lys Ile Asp Ile Glu Gly Gly Glu Val Leu Ala Leu Arg Gly Ala Arg Asp Thr Leu Arg Arg Gly Arg Pro Val Ile Val Phe Glu His Gly Gly Asp Asn Val. Met Arg Glu Tyr Gly Thr Thr Thr Asp Asp Leu Trp Ser Leu Leu Val Glu Glu Leu Gly Tyr Gln Val Phe Thr Leu Pro Ala Trp Leu Ala Gly Gly Arg Ala Leu Ser Arg Ala Ala Leu Thr Thr Ala Leu Glu Gln Asp Trp Tyr Phe Val Ala Asp Arg Ala Ala Gly Pro Ala Thr Ala Asp Gln Lys Gly Val Asp Information for SEQ ID N0: 108 Length: 732 Type: DNA
Organism: Micromonospora carbonacea aurantiaca Strandedness: negative Sequence: 108 gtgagcggctccctggcggccgaccccgcccagcgcaacgaggactacgaccgactcacc60 gtcgagatcatcgagcgggtctgcggccggacggccgtgtcggtggacctcggcgccggc120 gtcggcgagatcacccagcacctggtccgggtggcgccggagggcaaccacttcgccgtc180 gaaccgttgcccgcactcgccgacgagttggcagaccggctgccgtcggtcacggtcgtc240 agggcggcagccgccgacgccgcgggccggcacagcttcgt.gcacgtggtgtccaacccc300 ggctacagcggcctgcgccgacgtccctacgaccggccggcggagacgctgcacgagatc360 accgtggacacggtgcgcctggacgacgtcattcccgcggacgtacgtgtcgacctgatc420 aagatcgaca tcgagggggg cgaggtgctg gccctgcggg gggcgcgcga cacgttgcgg 480 cgtggtcggccggtgatcgt cttcgagcatggcggcgacaacgtcatgcgggagtacggc540 acgacgacggatgacctgtg gtcactgctggtggaggagctgggttaccaggtgttcacc600 ctgccggcgtggctcgcggg cgggcgggcgctgagccgcgccgcgctcaccacggccctg660 gagcaggactggtacttcgt cgcggaccgcgccgccgggccc:gcgaccgctgatcagaag720 ggtgtggactga 732 Information for SEQ ID NO:

Length:

Type:
PRT

Organism:Micromonospora a carbonacea african Strandedness:
positive Sequence: 109 Met Ser Thr Glu Leu Ser Ala Thr Asp Glu Ala Gly Ala Leu Sex.- Met Asn Asp Trp Tyr Asp Gln Leu Thr Val Ala Leu Ile Glu Gln Ile Cys Glu Pro Asp AIa Asn Thr Val Asp Ile Gly Ala Gly A1a Gly Asp Ile Leu Arg His Leu Leu Arg Val Ala Pro Arg Gly Arg His Val Ala Val Glu Ala Leu Pro Ser Tyr Ala Glu Gly Leu Arg Arg Asp Phe Pro Gly Val Thr Val Val Ala Ala Ala Ala Ala Glu Arg Thr Gly Arg Asp Ser Phe Val His Val Val Ser Asn Pro Gly Tyr Ser Gly Leu Arg Arg Arg Pro Tyr Asp Arg Pro Asp Glu Thr Leu Arg Glu Leu Thr Val Asp Thr Val Arg Leu Asp Asp Val Leu Pro Gly Asp Arg Arg Ile Asp Leu Val Lys Val Asp Thr Glu Gly Gly Glu Val Leu Ala heu Arg Gly Ala Val Glu Leu Leu Arg Arg Trp Arg Pro Val Ile Val Phe Glu His Gly Gly Asp His Ala Met Arg Glu Tyr Gly Thr Thr Ser Ala Asp Leu Trp Ala Leu Leu Val Thr Glu Leu Gly Tyr Glu Leu Arg Thr Leu Pro Gly Arg Arg Ala Gly Gln Pro Ala Leu Asp Arg Ala Gly Phe Ala Asp Ala Leu Arg Glu His Trp Tyr Phe Val Ala Asp Arg Pro Ser Pro Gly Pro Ala Gly Gly Ser Gly Gln His Glu Pro Ser Ala Trp Gln Lys Gly Thr Glu Pro Information for SEQ ID NO: 110 Length: 774 Type: DNA
Organism: Micromonospora carbonacea africana Strandedness: positive Sequence: 110 atgtccactg agctttccgc gaccgacgag gccggcgcgc tgtcgatgaa cgactggtac 60 gaccagctca ccgtggcgct gatcgagcag atctgcgaac cggacgccaa caccgtggac 120 atcggggccg gcgccggcga catcctgcgt cacctgctgc gggtcgcccc ccgtggccgg 180 cacgtggccg tcgaggcgct gccgtcgtac gccgaggggc tgcgccggga cttccccggc 240 gtgacggtgg tggccgccgc cgccgccgag cgcaccggcc gggacagctt cgtccacgtg 300 gtctccaaccccggctacagcgggctgcgccggcgtccctacgaccgcccggacgagacc360 ctgcgggagctgacggtcgacaccgtccgcctggacgacgtgctccccggtgaccgccgg420 atcgacctggtcaaggtggacaccgagggcggcgaggtgctcgccctgcgcggtgccgtg480 gagctgctccgccgctggcggccggtgatcgtcttcgagcacggcggcgaccacgccatg540 cgggagtacggcaccaccagcgccgacctgtgggcgctgctcgtgaccgagctgggttac600 gagctgcgta ccctgcccgg gcgccgcgcc gggcagccgg cgctggaccg ggcggggttc 660 gccgacgcgc tgcgggagca ctggtacttc gtcgccgacc gaccgagccc gggcccggcc 720 ggcggctccg ggcagcacga accgtccgcc tggcagaagg gaaccgaacc atga 774 Information for SEQ ID NO: 111 Length: 278 Type: PRT
Organism: Streptomyces mobaraensis Strandedness: positive Sequence: 111 Val Val Ser Leu Leu Asn Arg Leu Pro Leu Val Gly Ala Leu Thr Gly Ala Pro Ala Arg Pro Arg Arg Glu Pro Thr His Asp Glu Ile Val Ala Glu Arg Tyr Arg Glu Arg Thr Asp Pro Arg Pro Gly Asp Trp Ala Tyr Ala His Leu Leu Asp Leu Arg Asp Ala Leu Ala Glu Glu Leu Arg Gly Ala Ser Gly Arg Trp Leu Asp Phe Gly Ala Gly Thr Ser Pro Tyr Arg Gly Leu Leu Pro Gly Ala Glu Leu Glu Thr Ala Glu Met Arg Gly Gly Glu Asp Leu Thr Ala Asp His Glu Leu Asp Ala Asp Gly Leu Ser Ala Leu Pro Asp Gly Ser Phe Asp Gly Ile Leu Ser Thr Gln Val Leu Glu His Val Thr Asp Pro Asp Thr His Leu Arg Glu Ala Leu Arg Leu Leu Arg Pro Gly Gly Lys Leu Val Leu Ser Thr His Gly Val Trp Glu Glu His Gly Gly Gln Asp Leu Trp Arg Trp Thr Ala Asp Gly Leu Ala Ala Gln Ala Glu Arg Ser Gly Phe Thr Val Asp Arg Ala Val Lys Leu Thr Cys Gly Pro Arg Gly Leu Leu Leu Leu Leu Arg Tyr His Gly Arg Gln His Gly Trp Pro Ala Gly Gly Pro Val Gly Leu Leu Leu Arg Thr Leu Thr Leu Ala Asp Arg Leu Arg Pro Arg Leu Val Asp Asp Tyr Leu Asp Arg Val Phe Gly Gly Leu Gly Arg Val Glu Gly Pro Ala Glu Pro Phe Tyr Leu Asp Ile Leu Leu Thr Ala Thr Lys Pro Ser Ala Pro Glu Thr Pro Glu Arg Asp Ala Lys Information for SEQ ID N0: 112 Length: 837 Type: DNA
Organism: Streptomyces mobaraensis Strandedness: positive Sequence: 112 gtggtgagcc tgctgaaccg gctgccgctg gtgggggccc tcaccggagc cccggcccgc 60 ccgcgccgcg aacccaccca cgacgagatc gtcgccgagc gctaccgcga gcggaccgac 120 ccgcgccccg gcgactgggc ctacgcccac ctgctcgacc tccgcgacgc gctggccgag 180 gagctccgcg gggcgtccgg ccgctggctc gacttcggcg ccggcacgtc gccctaccgc 240 ggcctgctcc cgggcgccga gctggagacc gccgagatgc gcggcggcga ggacctgacc 300 gccgaccacg agctggacgc ggacggcctg agcgccctgc cggacggctc gttcgacggg 360 atcctctcca cccaggtcct ggagcacgtg acggacccgg acacccacct gcgggaggcc 420 ctccggctgctgcggcccggcggcaagctggtgctgtccacgcacggggtgtgggaggag480 cacggcggccaggacctgtggcgctggacggccgacgggctcgcggcccaggccgagcgg540 tccgggttcacggtggaccgcgcggtcaagctcacctgcggcccccgcggcctgctgctc600 ctgctgcgctaccacggccggcagcacggctggccggccggcgggccggtgggactgctg660 ctgcgcaccctgacgctggccgaccggctgcgcccgcgcctcgtcgacgactacctggac720 cgcgtcttcggcggtctgggacgcgtcgagggcccggccgagcccttctacctggacatc780 ctgctgaccgccaccaagccgagcgcaccggagacgccggagagggacgcgaaatga 837 Information for SEQ ID NO: 113 Length: 274 Type: PRT
Organism: Micromonospora carbonacea aurantiaca Strandedness: positive Sequence: 113 Val Ile Gly Leu Leu Gly Arg Leu Pro Gly Val Asn Ala Val Leu Gly Ala Val Ser Lys Gln Gln Ala Glu Pro Thr Leu Asp Glu Val Met Ala Glu Arg Phe Arg Glu Arg Thr Asp Pro Arg Arg Gly Asp Trp Ala Tyr Ala His Phe Ile Asp Leu Arg Asp Ala Leu Ala Glu Val Leu Gly Asp Ala Ser Gly Asn Trp Leu Asp Tyr Gly Ala Gly Thr Ser Pro Tyr Arg Asn Leu Phe Thr Ala Ala Asp Leu Lys Thr Ala Asp Ile Pro Gly Gly Glu Ser Tyr Pro Ala Asp Tyr Ala Leu Asp His Asp Gly Arg Cys Pro Ala Pro Asp Ala Thr Phe Asp Gly Val Leu Ser Thr Gln Val Leu Glu His Val Thr Asp Ala Asp Ala Tyr Leu Arg Glu A~ya Leu Arg Leu Leu Arg Pro Gly Gly Arg Leu Val Leu Ser Thr His Gly Val Trp Glu Glu His Gly Gly Gln Asp Leu Trp Arg Trp Thr Ala Asp Gly Leu Ala Arg Gln Ala Glu Leu Ala Gly Phe Ala Val Asp Arg Val Leu Lys Leu Thr Cys Gly Pro Arg Gly Leu Leu Leu Leu Leu Arg Trp Tyr Gly Arg Glu Asn Gly Trp Pro Ala Ile Gly Pro Val Gly Leu Val Leu Arg Ser Leu Trp Leu Val Asp His Leu Leu Pro Ser Ser Leu Asp Thr Tyr Leu Asp Arg Ala Phe Gly Asp Leu Gly Arg Arg Glu Gly Pro Asp Ala Pro Phe Tyr Leu Asp Leu Leu Leu Val Ala Arg Lys Pro His Thr Lys Glu Thr Ala Thr Information for SEQ ID N0: 114 Length: 825 Type: DNA
Organism: Micromonospora carbonacea aurantiaca Strandedness: positive Sequence: 114 gtgatcggcttgctgggccggctcccgggggtgaacgccgtgctcggggccgtctcgaag60 cagcaggccgagccgaccctcgacgaggtgatggccgaacgtttccgcgaacggacggat120 ccgcgccggggcgactgggcctacgcgcacttcatcgatctgcgcgacgcgctcgccgag180 gtgctgggcgacgcttccggcaactggctcgactacggcgcgggcacgtcgccgtaccgg240 aacctgttcaccgcggccgatctgaagacggccgacattcccggcggcgagtcctacccg300 gccgactacgcgctcgaccacgacggacgctgtccggcacccgacgcgacgttcgacggc360 gtgctgtccacccaggtcctcgagcacgtgaccgacgcggacgcctacctgcgtgaggcg420 ctgcggctgttgcggcccgggggccggctggtgctgtccacccacggcgtgtgggaggag480 cacggcggtcaggacctctggcggtggacggcggacggcctggcccggcaggccgaactg540 gccgggttcgccgtcgaccgggtgctgaagctgacctgcgggccgcgaggactgctgctc600 ctgctgcgctggtacggacgcgagaacggctggcccgcgatcggcccggtcgggttggtg660 ctgcgctccctgtggttggtggaccacctgctacccagctccctggacacgtatctggat720 cgcgcattcggcgatctcgggagacgcgagggcccggacgcgccgttctatctggacctt780 ctgctcgtcgcccggaaaccccacacgaaggagaccgctacgtga 825 Information for SEQ ID NO: 115 Length: 282 Type: PRT
Organism: Micromonospora carbonacea africana Strandedness: positive Sequence: 115 Val Thr Leu Leu Arg Arg Val Pro Gly Leu Gly Pro Leu Leu Thr Gly Ala Gly Thr Ala Pro Thr Gly Pro Ser Leu Asp Glu Val Met Ala Glu Arg Phe Arg Glu Arg Ile Glu Pro Arg Pro Gly Asp Trp Ala Tyr Ala His Phe Leu Asp Leu Arg Asp Ala Leu Ala Glu Ala Val Arg Asp Ala Thr Gly Val Trp Leu Asp Tyr Gly Ala Gly Thr Ser Pro Tyr Arg Gly Leu Phe Arg Ser Ala Glu Leu Gln Thr Ala Asp Ile Pro Gly Gly Glu Ser Leu Pro Ala Asp His Ala Leu Asp Arg Asp Gly Arg Cys Pro Val Pro Asp Gly Thr Phe Asp Gly Val Leu Ser Thr Gln Val Leu Glu His Val Ser Asp Ala Asp Ala Tyr Leu Arg Glu Ala Tyr Arg Leu Leu Arg Pro Gly Gly Arg Leu Val Leu Ser Thr His Gly Val Trp Glu Glu His Gly Gly Gln Asp Leu Trp Arg Trp Thr Ala Asp Gly Leu Ala Arg Gln Ala Glu Trp Ala Gly Phe Thr Val Asp Arg Thr Val Lys Leu Thr Cys Gly Pro Arg Gly Leu Leu Leu Leu Leu Arg Trp Tyr Gly Arg Glu His Gly Trp Pro Ser Gly Gly Pro Val Gly Leu Ala Leu Arg Ala Leu Trp Leu Val Asp Arg Leu Arg Pro Arg Ala Leu Asp Glu Tyr Leu Asp Arg Ala Phe Arg His Leu Gly Arg Ser Glu Gly Pro Asp Gln Pro Phe Tyr Leu Asp Ile Leu Leu Val Ala Ser Lys Pro His Asp Gly Ala Pro Gly Pro Ala His Arg His Glu Thr Arg Arg Thr Information for SEQ ID N0: 116 Length: 849 Type: DNA
Organism: Micromonospora carbonacea africana Strandedness: positive Sequence:116 gtgaccctgctgcggcgcgtacccgggcttggtcccctgctgaccggtgccggcaccgcc60 cccaccggcccgtcgctggacgaggtgatggccgaacggttccgggagcggatcgagccc120 cggcccggggactgggcatacgcccacttcctggacctgcgcgacgcgctggcggaggcg180 gtccgggacgccacgggagtctggctcgactacggcgcgggcacctcgccataccggggc240 ctgttccgctccgccgagttgcagaccgccgacatcccgggcggtgagtccctgccggcc300 gaccacgccctcgaccgggacgggcgctgcccggtgccggacgggacgttcgacggggtg360 ctctccacccaggtgctcgaacacgtctcggacgcggacgcgtacctgcgggaggcgtac420 cggctgctgcgcccgggcggccggctggtgctctccacccacggggtgtgggaggagcac480 ggcggccaggacctgtggcggtggaccgccgacgggctggcccgccaggccgagtgggcc540 ggcttcaccgtcgaccgcaccgtcaagctgacctgcgggccgcgcggcctgttgctgctg600 ctgcgctggtacggccgcgagcacggctggccgtccggcggcccggtgggtctggccctg660 cgcgccctgtggctggtggaccggctgcggccccgcgcgctcgacgagtacctcgaccgg720 gcgttccggcatctcggccgcagcgagggccccgaccagcccttctatctggacatcctg780 ctggtcgcca gcaaaccgca cgacggggcg cctggtcccg cacaccggca cgagacgagg 840 aggacctga 849 Information for SEQ ID NO: 117 Length: 407 Type: PRT
Organism: Streptomyces mobaraensis Strandedness: positive Sequence: 117 Met Thr Thr Thr Gly His Ser Thr Val Ile Asp Arg Cys Arg Ile Cys Asp Asn Thr Glu Leu Leu Pro Val Leu Asp Leu Gly Pro Gln Ala Leu Thr Gly Val Phe Pro Arg Thr Arg Gly Glu Asp Val Pro Tyr Val Pro Leu Glu Leu Val Arg Cys Ser Pro Ala Gly Cys Glu Leu Val Gln Leu Arg His Thr Ala Asp Phe Asp Leu Met Tyr Gly Glu Gly Tyr Gly Tyr Arg Ser Ser Leu Asn Arg Ser Met Ala Asp His Leu Arg Gly Lys Val Ala Ala Ile Thr Arg Leu Val Asp Leu Gly Pro Gly Asp Leu Val Leu Asp Ile Gly Ser Asn Asp Gly Thr Leu Leu Ser Ala Tyr Pro Ala Gly Gly Pro Ala Leu Val Gly Val Asp Pro Ala Ala Ser Val Phe Ala Glu Thr Tyr Pro Pro Gly Ala Glu Leu Ile Pro Asp Phe Phe Ala Ala Glu Leu Leu Gly Glu Arg Arg Ala Lys Val Val Thr Ser Ile Ala Met Phe Tyr Asp Leu Pro Arg Pro Met Asp Phe Met Arg Glu Ile Arg Arg Val Leu Thr Asp Asp Gly Ile Trp Val Thr Glu Gln Ser Tyr Leu Pro Ser Met Leu His Ala Ala Ala Tyr Asp Val Val Cys His Glu His Leu Asp Tyr Tyr Gly Leu Arg Gln Ile Glu Trp Met Ala Glu Arg Thr Gly Leu Lys Val Val Asp Ala Glu Leu Thr Pro Val 'ryr Gly Gly Ser Leu Ser Leu Val Leu Ala Arg Arg Asp Ser Pro Arg Glu Val Asn Glu Pro Ala Leu Ala Arg Ile Arg Ala Gly Glu Thr Asp Leu Pro Tyr Ala Glu Phe Ala Arg Arg Thr Glu Glu Ser Arg Asp Arg Leu Val Asp Phe Leu Thr Thr Ser Arg Asp Lys Gly Leu Arg Thr Leu Gly Tyr Gly Ala Ser Thr Lys Gly Asn Val Ile Leu Gln Tyr Cys Gly Leu Asp Glu Thr Leu Leu Pro Cys Ile Ala Glu Val Asn Glu Asp Lys Phe Gly Cys Phe Thr Pro Gly Thr Asp Ile Pro Ile Val Ser Glu Lys Glu Ala Arg Ala Leu Glu Pro Asp Arg Phe Leu Val Leu Pro Trp Ile Tyr Arg Asp Ala Met Val Ala Arg Glu Arg Asp Phe Leu Ala Ala Gly Gly Asn Leu Val Phe Pro Leu Pro Ala Leu Glu Val Val Tnf~rmation for SEQ ID NO: 118 Length: 1224 Type: DNA
Organism: Streptomyces mobaraensis Strandedness: positive Sequence:118 atgaccacgaccggtcactcgacggtgatcgaccgttgccggatctgcgacaacaccgag60 ttgctgcccgtgctcgacctcggtccgcaggcactcaccggggtgttcccgcggacccgc120 ggcgaggacgtcccgtacgtcccgctggagctggtgcgctgctcgcccgccggctgcgag180 ctggtgcagctccggcacaccgccgacttcgacctcatgtacggcgagggctacggctac240 cggtccagcctcaaccgctccatggcggaccacctgcgcggcaaggtcgccgccatcacc300 cggctggtcgacctcggccccggcgacctggtcctggacatcggcagcaacgacggcacc360 ctgctgtcggcctaccccgcgggcggccccgccctggtcggcgtggaccccgccgcctcc420 gtcttcgccgagacctacccgccgggcgccgagctgatccccgacttcttcgccgccgaa480 cr_gctcggcgagcgccgcgccaaggtcgtcacctcgatcgcgatgttctacgacctgccc540 cgtcccatggacttcatgcgcgagatccgccgcgtcctgacggacgacgggatctgggtg600 accgagcagagctacctgccgtcgatgctgcacgccgccgcctacgacgtcgtctgccac660 gagcacctggactactacgggctccgccagatcgagtggatggccgaacgcaccggcctg720 aaggtcgtggacgccgagctgacccccgtctacggcggcagcctctcgctcgtcctggcc780 cggcgcgactccccgcgcgaggtcaacgagccggccctggcccggatccgcgccggcgag840 acggacctgccctacgccgagttcgcccggcggaccgaggaatcccgcgaccggctcgtg900 gacttcctcaccacctcgcgggacaaggggctgcgcaccctcgggtacggcgcctccacc960 aagggcaacgtcatcctccagtactgcggcctggacgagacgctcctgccgtgcatcgcc1020 gaggtgaacgaggacaagttcggctgcttcacgcccggcacggacatcccgatcgtctcc1080 gagaaggaggcccgggcgctggagccggaccggttcctggtcctcccgtggatca 1140 accgg gacgcgatggtcgcccgggaacgcgacttcctggcggccggcggcaacctggtcttcccg1200 ctgcccgccctggaagtggtgtga 1224 Information for SEQ ID NO: 119 Length: 429 DEMANDES OU BREVETS VOLUMINEUX
LA PRESENTE PARTIE DE CETTE DEMANDE OU CE BREVETS
COMPRI~:ND PLUS D'UN TOME.
CECI EST L,E TOME 1 DE 2 NOTE: Pour les tomes additionels, veillez contacter 1e Bureau Canadien des Brevets.
JUMBO APPLICATIONS / PATENTS
THIS SECTION OF THE APPLICATION / PATENT CONTAINS MORE
THAN ONE VOLUME.

NOTE: For additional valumes please contact the Canadian Patent Office.

Claims (42)

1. A method of identifying an orthosomycin biosynthetic gene, gene fragment, or gene cluster comprising the steps of providing a sample containing genomic DNA, and detecting in the sample the presence of a nucleic acid sequence coding for a polypeptide from at least two of the groups consisting of:
a. SEQ ID NO:51; Genbank accession no. AAK83192; SEQ ID NO:53;
SEQ ID NO:55; and polypeptides having at least 65% homology to a polypeptide having the sequence of SEQ ID NOS: 51, 53, 55 or Genbank accession no. AAK83192;

b. SEQ ID NO:57; Genbank accession no. AAK83170; SEQ ID NO:59;
SEQ ID NO:61; and polypeptides having at least 65% homology to a polypeptide having the sequence of SEQ ID NOS: 57, 59, 61 or Genbank accession no. AAK83170;

c. SEQ ID NO:63, Genbank accession no. AAK83193, SEQ ID NO:65, SEQ ID NO:67, and polypeptides having at least 65% homology to a polypeptide having the sequence of SEQ ID NOS: 63, 65, 67 or Genbank accession no. AAK83193;

d. SEQ ID NO:69, SEQ ID NO:71, SEQ ID NO:73, and polypeptides having at least 65% homology to a polypeptide having the sequence of SEQ ID NOS:69, 71 or 73;

e. SEQ ID NO:99, Genbank accession no. AAK83184, SEQ ID NO:101, SEQ ID NO:103, and polypeptides having at least 65% homology to a polypeptide having the sequence of SEQ ID NOS:99, 101, 103 or Genbank accession no. AAK83184;

f. SEQ ID NO:105, Genbank accession no. AAK83186, SEQ ID NO:107, SEQ ID NO:109, and polypeptides having at least 65% homology to a polypeptide having the sequence of SEQ ID NOS: 105, 107, 109 or Genbank accession no. AAK83186;

g. SEQ ID NO:111, Genbank accession no. AAK83188, SEQ ID NO:113, SEQ ID NO: 115, and polypeptides having at least 65% homology to a polypeptide having the sequence of SEQ ID NOS: 111, 113, 115 or Genbank accession no. AAK83188;

h. SEQ ID NO: 127, Genbank accession no. AAG32067, SEQ ID NO: 129, SEQ ID NO: 131 and polypeptides having at least 65% homology to a polypeptide having the sequence of SEQ ID NOS: 127, 129, 131 or Genbank accession no. AAG32067;
i. SEQ ID NO: 123, Genbank accession no. AAG32066, SEQ ID NO: 125 and polypeptides having at least 65% homology to a polypeptide having the sequence of SEQ ID NOS: 123, 125 or Genbank accession no.
AAG32066;
j. SEQ ID NO: 153, Genbank accession no. AAK83187, SEQ ID NO: 155, SEQ ID NO: 157, and polypeptides having at least 65% homology to a polypeptide having the sequence of SEQ ID NOS: 153, 155, 157 or Genbank accession no. AAK83187 k. SEO ID NO: 159, SEQ ID NO: 161, SEQ ID NO: 163 and polypeptides having at least 65% homology to a polypeptide having the sequence of SEQ ID NOS: 159, 161 or 163;
l. SEQ ID NO: 167, SEQ ID NO: 173, Genbank accession no. AAK83181, SEQ ID NO: 169 and polypeptides having at least 65% homology to a polypeptide having the sequence of SEQ ID NOS: 167, 169, 173 or Genbank accession no. AAK83181;
m. SEQ ID NO: 175, SEQ ID NO: 177, SEQ ID NO: 179 and polypeptides having at least 65% homology to a polypeptide having the sequence of SEQ ID NOS: 175, 177 or 179;
n. SEQ ID NO: 165, SEQ ID NO: 171, SEQ ID NO: 169 and polypeptides having at least 65% homology to a polypeptide having the sequence of SEQ ID NOS: 165, 169 or 171;
o. SEQ ID NO: 193, Genbank accession no. AAK83189, SEQ ID NO: 195, SEQ ID NO: 197 and polypeptides having at least 65% homology to a polypeptide having the sequence of SEQ ID NOS: 193, 195, 197 or Genbank accession no. AAK83189;
p. SEQ ID NO: 199, Genbank accession no. AAK83174, SEQ ID NO: 201 and polypeptides having at least 65% homology to a polypeptide having the sequence of SEQ ID NOS: 199, 201 or Genbank accession no.
AAK83174; and q. SEQ ID NO: 203, SEQ ID NO: 205, SEQ ID NO: 207 and polypeptides having at least 65% homology to a polypeptide having the sequence of SEQ ID NOS: 203, 205 or 207.
2. The method of claim 1 further comprising the step of detecting the presence of either:
a. a nucleic acid sequence coding for a polypeptide from at least one of the of the groups consisting of:
r. SEQ ID NO: 209, SEQ ID NO: 211 and polypeptides having at least 65% homology to a polypeptide of SEQ ID NO: 209 or SEQ
ID NO: 211;
s. SEQ ID NO: 213, SEQ ID NO: 215 and polypeptides having at least 65% homology to a polypeptide of SEQ ID NO: 213 or SEQ
ID NO: 215;
t. SEQ ID NO: 217, SEQ ID NO: 219 and polypeptides having at least 65% homology to a polypeptide of SEQ ID NO: 217 or SEQ
ID NO: 219;
u. SEQ ID NO: 221, SEQ ID NO: 223 and polypeptides having at least 65% homology to a polypeptide of SEQ ID NO. 221 or SEQ
ID NO: 223;
v. SEQ ID NO: 225, SEQ ID NO: 227 and polypeptides having at least 65% homology to a polypeptide of SEQ ID NO: 225 or SEQ
ID NO: 227;
w. SEQ ID NO: 229, SEQ ID NO: 231 and polypeptides having at least 65% homology to a polypeptide of SEQ ID NO: 229 or SEQ
ID NO: 231;
x. SEQ ID NO: 233, SEQ ID NO: 235 and polypeptides having at least 65% homology to a polypeptide of SEQ ID NO: 233 or SEQ
ID NO: 235;
y. SEQ ID NO: 237, SEQ ID NO: 239 and polypeptides having at least 65% homology to a polypeptide of SEQ ID NO: 237 or SEQ
ID NO: 239; and z. SEQ ID NO: 241, SEQ ID NO: 243 and polypeptides having at least 65% homology to a polypeptide of SEQ ID NO: 241 or SEQ
ID NO: 243;
or;
b. detecting the presence of a nucleic acid sequence coding for a polypeptide from at least one of the groups consisting of:
aa. SEQ ID NO: 245, Genbank accession no. AAG32068 and polypeptides having at least 65% homology to a polypeptide of SEQ ID NO: 245 or Genbank accession no. AAG32068;
bb. SEQ ID NO: 247, Genbank accession no. AAK83183, and polypeptides having at least 65% homology to a polypeptide of SEQ ID NO: 247 or Genbank accession no. AAK83183;
cc. SEQ ID NO: 249, accession no. AAG32069, and polypeptides having at least 65% homology to a polypeptide of SEQ ID NO:
249 or Genbank accession no. AAG32069;
dd. SEQ ID NO: 251, Genbank accession no. AAK83172, and polypeptides having at least 65% homology to a polypeptide of SEQ ID NO: 251 or Genbank accession no. AAK83172;
ee. SEQ ID NO: 253, Genbank accession no. AAK83171 and polypeptides having at least 65% homology to a polypeptide of SEQ ID NO: 253 or Genbank accession no. AAK83171; and ff. SEQ ID NO: 255, Genbank accession no. AAK83175, and polypeptides having at least 65% homology to a polypeptide of SEQ ID NO: 255 or Genbank accession no. AAK83175;
and further determining whether the gene cluster detected is an everninomicin-type orthosomycin gene cluster, or an avilamycin-type orthosomycin gene cluster.
3. The method of claim 1 or 2 further comprising the step of using the nucleic acid sequence detected to isolate an orthosomycin gene cluster from the sample containing genomic DNA.
4. The method of claim 1, 2 or 3 further comprising identifying an organism containing the nucleic acid sequence detected from the genomic DNA in the sample.
5. The method of any one of claims 1 to 4 wherein the sample containing DNA is biomass from an environmental source.
6. The method of claim 5 wherein the biomass is a mixed microbial culture.
7. The method of any one of claims 1 to 6 wherein the sample containing genomic DNA is obtained from a mixed population of organisms.
8. The method of any one of claims 1 to 7 wherein the sample containing genomic DNA is a genomic library containing a plurality of clones, and the genomic DNA for generating the clones is obtained from a mixed population of organisms.
9. The method of any one of claims 1 to 4, wherein the sample containing genomic DNA is obtained from a pure culture.
10. The method of any one of claims 1 to 4 wherein the sample containing genomic DNA is a genomic library containing a plurality of clones, and the DNA
for generating the clones is obtained from a pure culture.
11. The method of any one of claims 1 to 10 wherein the presence in the genomic DNA sample of a nucleic acid sequence from at least 4 of the groups (a) to (q) is detected.
12. The method of any one of claims 1 to 11 wherein detecting the presence of a nucleic acid sequence coding for a polypeptide from groups (a) to (q) involves use of a hybridization probe or PCR primer derived from:
a. an isolated, purified, or enriched nucleic acid comprising one of the sequences of SEQ ID NOS: 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 100, 102, 104, 106, 108, 110, 112, 114, 116, 124, 126, 128, 130, 132, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 194, 196, 198, 200, 202, 204, 206, 208, the sequences complementary thereto, or a fragment comprising at least 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, 150, 200, 300, 400 or 500 consecutive bases of one of the sequences of SEQ
ID NOS: 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 100, 102, 104, 106, 108, 110, 112, 114, 116, 124, 126, 128, 130, 132, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 194, 196, 198, 200, 202, 204, 206, 208 or the sequences complementary thereto;
or b. an isolated, purified or enriched nucleic acid which encodes one or the polypeptides of SEQ ID NOS: 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 99, 101, 103, 105, 107, 109, 111, 113, 115, 123, 125, 127, 129, 131, 153, 155, 157, 159, 161, 163, 165, 167, 169, 171, 173, 175, 177, 179, 193, 195, 197, 199, 201, 203, 205, 207 or fragments comprising at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100 or 150 consecutive amino acids of one of the polypeptides of SEQ ID NOS: 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 99, 101, 103, 105, 107, 109, 111, 113, 115, 123, 125, 127, 129, 131, 153, 155, 157, 159, 161, 163, 165, 167, 169, 171, 173, 175, 177, 179, 193, 195, 197, 199, 201, 203, 205, 207.
13. An orthosomycin gene cluster obtained by the methods of claim 3.
14. A method of identifying an everninomicin-type orthosomycin biosynthetic gene, gene fragment or gene cluster comprising the steps of providing a sample containing DNA, and detecting the presence of a nucleic acid sequence coding for a polypeptide from at least one of the of the groups consisting of:
r. SEQ ID NO: 209, SEQ ID NO: 211 and polypeptides having at least 65% homology to a polypeptide of SEQ ID NO: 209 or SEQ ID NO: 211;
s. SEQ ID NO: 213, SEQ ID NO: 215 and polypeptides having at least 65% homology to a polypeptide of SEQ ID NO: 213 or SEQ ID NO: 215;
t. SEQ ID NO: 217, SEQ ID NO: 219 and polypeptides having at least 65% homology to a polypeptide of SEQ ID NO: 217 or SEQ ID NO: 219;

u. SEQ ID NO: 221, SEQ ID NO: 223 and polypeptides having at least 65% homology to a polypeptide of SEQ ID NO: 221 or SEQ ID NO: 223;
v. SEQ ID NO: 225, SEQ ID NO: 227 and polypeptides having at least 65% homology to a polypeptide of SEQ ID NO: 225 or SEQ ID NO: 227 w.SEQ ID NO: 229, SEQ ID NO: 231 and polypeptides having at least 65% homology to a polypeptide of SEQ ID NO: 229 or SEQ ID NO: 231;
x. SEQ ID NO: 233, SEQ ID NO: 235 and polypeptides having at least 65% homology to a polypeptide of SEQ ID NO: 233 or SEQ ID NO: 235;
y. SEQ ID NO: 237, SEQ ID NO: 239 and polypeptides having at least 65% homology to a polypeptide of SEQ ID NO: 237 or SEQ ID NO: 239;
and z. SEQ ID NO: 241, SEQ ID NO: 243 and polypeptides having at least 65% homology to a polypeptide of SEQ ID NO: 241 or SEQ ID NO: 243.
15. The method according to claim 14 further comprising the step of detecting in the sample the presence of a nucleic acid sequence coding for a polypeptide from at least two of groups (a) to (q) recited in claim 1.
16. A method according to claim 14 or 15 wherein detecting the presence of a nucleic acid sequence coding for a polypeptide from at least two of the groups (r) to (g) involves use of a hybridization probe or PCR primer derived from:
a. an isolated, purified, or enriched nucleic acid comprising one of the sequences of SEQ ID NOS: 210, 212, 214, 216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, the sequences complementary thereto, or a fragment comprising at least 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, 150, 200, 300, 400 or 500 consecutive bases of one of the sequences of SEQ ID NOS: 210, 212, 214, 216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244 or the sequences complementary thereto;
or, b. an isolated, purified or enriched nucleic acid which encodes one or the polypeptides of SEQ ID NOS: 209, 211, 213, 215, 217, 219, 221, 223, 225, 227, 229, 231, 233, 235, 237, 239, 241, 243 or fragments comprising at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100 or 150 consecutive amino acids of one of the polypeptides of SEQ ID NOS: 209, 211, 213, 215, 217, 219, 221, 223, 225, 227, 229, 231, 233, 235, 237, 239, 241, 243.
17. A method according to claim 14 to 17 further comprising the step of using the detected nucleic acid sequence to isolate an everninomicin-type orthosomycin biosynthetic gene cluster from the sample containing the genomic DNA.
18. The method of any one of claims 14 to 27 identifying an organism containing the nucleic acid sequence detected from the genomic DNA in the sample.
19. The method of any one of claims 14 to 18 wherin the sample containing DNA is biomass from an environmental source.
20. The method of any one of claims 14 to 19 wherein the biomass is a mixed microbial culture.
21. The method of any one of claims 14 to 20 wherein the sample containing genomic DNA is obtained from a mixed population of organisms.
22. The method of any one of claims 14 to 21 wherein the sample containing genomic DNA is a genomic library containing a plurality of clones, and the genomic DNA for generating the clones is obtained from a mixed population of organisms.
23. The method of any one of claims 14 to 18 wherein the sample containing genomic DNA is obtained from a pure culture.
24. The method of any one of claims 14 to 18 wherein the sample containing genomic DNA is a genomic library containing a plurality of clones, and the DNA
for generating the clones is obtained from a pure culture.
25. The method of any one of claims 14 to 24 wherein the presence in the genomic DNA sample of at least two of the groups (r) to (z) is detected.
26. An everninomicin-type orthosomycin biosynthetic gene cluster obtained by the method of claim 17.
27. A method of identifying an avilamycin-type orthosomycin biosynthetic gene, gene fragment, or gene cluster comprising providing a sample containing genomic DNA, and detecting in the sample the presence of a nucleic acid sequence coding for a polypeptide from at least one of the groups consisting of:

aa. SEQ ID NO: 245, Genbank accession no. AAG32068 and polypeptides having at least 65% homology to a polypeptide of SEQ ID NO: 245 or Genbank accession no. AAG32068;
bb. SEQ ID NO: 247, Genbank accession no. AAK83183, and polypeptides having at least 65% homology to a polypeptide of SEQ ID NO: 247 or Genbank accession no. AAK83183;
cc. SEQ ID NO: 249, accession no. AAG32069, and polypeptides having at least 65% homology to a polypeptide of SEQ ID NO: 249 or Genbank accession no. AAG32069;
dd. SEQ ID NO: 251, Genbank accession no. AAK83172, and polypeptides having at least 65% homology to a polypeptide of SEQ ID NO: 251 or Genbank accession no. AAK83172;
ee. SEQ ID NO: 253, Genbank accession no. AAK83171 and polypeptides having at least 65% homology to a polypeptide of SEQ ID NO: 253 or Genbank accession no. AAK83171; and ff. SEQ ID NO: 255, Genbank accession no. AAK83175, and polypeptides having at least 65% homology to a polypeptide of SEQ ID NO: 255 or Genbank accession no. AAK83175.
28. The method of claim 27 further comprising the step of detecting in the DNA
sample the presence of a nucleic acid sequence coding for a polypeptide from at least two of the groups (a) to (q) recited in claim 1.
29. The method of claim 27 or 28 wherein detecting the presence of a nucleic acid sequence coding for a polypeptide from at least two of the groups (aa) to (ff) involves use of a hybridization probe or PCR primer derived from:

a. an isolated, purified, or enriched nucleic acid comprising one of the sequences of SEQ ID NOS: 246, 248, 250, 252, 254, 256 and the nucleic acid sequences corresponding to Genbank accession nos. AAG32068, AAK83183, AAG32069, AAK83172, AAK83171 and AAK83175; the sequences complementary thereto; or a fragment comprising at least 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, 150, 200, 300, 400 or 500 consecutive bases of one of the sequences of SEQ ID NOS: 246, 248, 250, 252, 254, 256 and the nucleic acid sequences corresponding to Genbank accession nos. AAG32068, AAK83183, AAG32069, AAK83172, AAK83171 and AAK83175; or the sequences complementary thereto;
or b. an isolated, purified or enriched nucleic acid which encodes one or the polypeptides of SEO ID NOS: 245, 247, 249, 251, 253 or Genbank accession nos: AAG32068, AAK83183, AAG32069, AAK83172, AAK83171 and AAK83175 or fragments comprising at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100 or 150 consecutive amino acids of one of the polypeptides of SEQ ID NOS: 245, 247, 249, 251, 253 or Genbank accession nos:
AAG32068, AAK83183, AAG32069, AAK83172, AAK83171 and AAK83175.
30. The method of claim 27, 28 or 29 further comprising the step of using the detected nucleic acid sequences to isolate an avilamycin-type orthosomycin biosynthetic gene cluster from the sample containing the genomic DNA.
31. The method of any one of claims 27 to 30 identifying an organism containing the nucleic acid sequence detected from the genomic DNA in the sample.
32. The method of any one of claims 27 to 31 wherein the sample containing DNA is biomass from an environmental source.
33. The method of any one of claims 27 to 32 wherein the biomass is a mixed microbial culture.
34. The method of any one of claims 27 to 33 wherein the sample containing genomic DNA is obtained from a mixed population of organisms.
35. The method of any one of claims 27 to 34 wherein the sample containing genomic DNA is a genomic library containing a plurality of clones, and the genomic DNA for generating the clones is obtained from a mixed population of organisms.
36. The method of any one of claims 27 to 31 wherein the sample containing genomic DNA is obtained from a pure culture.
37. The method of any one of claims 27 to 31 wherein the sample containing genomic DNA is a genomic library containing a plurality of clones, and the DNA
for generating the clones is obtained from a pure culture.
38. The method of any one of claimes 27 to 37 wherein the presence in the genomic DNA sample of at least two of the groups (aa) to (zz) is detected.
39. An avilamycin-type orthosomycin biosynthetic gene cluster obtained from claim 30.
40. A computer readable medium having stored thereon a sequence selected from the group consisting of:

a. SEQ ID NOS: 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 100, 102, 104, 106, 108, 110, 112, 114, 116, 124, 126, 128, 130, 132, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 194, 196, 198, 200, 202, 204, 206, 208;
b. fragments of SEQ ID NOS: 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 100, 102, 104, 106, 108, 110, 112, 114, 116, 124, 126, 128, 130, 132, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 194, 196, 198, 200, 202, 204, 206, 208 comprising at least 10 consecutive nucleotides of SEQ ID NOS: 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 100, 102, 104, 106, 108, 110, 112, 114, 116, 124, 126, 128, 130, 132, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 194, 196, 198, 200, 202, 204, 206, 208;
c. sequences at least 70% identical to SEQ ID NOS: 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 100, 102, 104, 106, 108, 110, 112, 114, 116, 124, 126, 128, 130, 132, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 194, 196, 198, 200, 202, 204, 206, 208, or at least 70% identical to fragments of SEQ ID NOS: 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 100, 102, 104, 106, 108, 110, 112, 114, 116, 124, 126, 128, 130, 132, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 194, 196, 198, 200, 202, 204, 206, 208; and d. sequences complementary to the sequences of (a) (b) and (c);
for use in the detection of orthosomycin-genes, orthosomycin gene fragments, orthosomycin gene clusters and orthosomycin producing organisms.
41. A computer readable medium having stored thereon a sequence selected from the group consisting of:

a. SEQ ID NOS: 210, 212, 214, 216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244;
b. fragments of SEQ ID NOS: 210, 212, 214, 216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244 comprising at least 10 consecutive nucleotides of SEQ ID NOS: 210, 212, 214, 216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244;

c. sequences having at least 70% identical to SEQ ID NOS: 210, 212, 214, 216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, or 70% identical to fragments of SEQ ID NOS: 210, 212, 214, 216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244; and d. sequences complementary to the sequences of (a) (b) and (c);
for use in identifying everninomycin-type orthosomycin genes, gene fragments, gene clusters or everninomycin-type orthosomycin-producing organisms.
42. A computer readable medium having stored thereon a sequence selected from the group consisting of:
a. SEQ ID NOS: 246, 248, 250, 252, 254, 256 and the nucleic acid sequences corresponding to Genbank accession nos. AAG32068, AAK83183, AAG32069, AAK83172, AAK83171 and AAK83175;
b. fragments of SEQ ID NOS: 246, 248, 250, 252, 254, 256 and the nucleic acid sequences corresponding to Genbank accession nos.
AAG32068, AAK83183, AAG32069, AAK83172, AAK83171 and AAK83175 comprising at least 10 consecutive nucleotides of SEQ ID NOS:
246, 248, 250, 252, 254, 256 and the nucleic acid sequences corresponding to Genbank accession nos. AAG32068, AAK83183, AAG32069, AAK83172, AAK83171 and AAK83175;
c. sequences 70% identical to SEQ ID NOS: 246, 248, 250, 252, 254, 256 or the nucleic acid sequences corresponding to Genbank accession nos.
AAG32068, AAK83183, AAG32069, AAK83172, AAK83171 and AAK83175; or 70% identical to fragments of SEQ ID NOS: 246, 248, 250, 252, 254, 256 or the nucleic acid sequences corresponding to Genbank accession nos. AAG32068, AAK83183, AAG32069, AAK83172, AAK83171 and AAK83175; and d. sequences complementary to the sequences of (a) (b) and (c);
for use in identifying avilamycin-type orthosomycin genes, gene fragments, gene clusters or avilamycin-type orthosomycin producing organisms.
CA002375097A 2001-03-28 2002-03-28 Compositions and methods for identifying and distinguishing orthosomycin biosynthetic loci Abandoned CA2375097A1 (en)

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
US27909501P 2001-03-28 2001-03-28
US60/279,095 2001-03-28
US27970901P 2001-03-30 2001-03-30
US60/279,709 2001-03-30
US28521401P 2001-04-20 2001-04-20
US60/285,214 2001-04-20

Publications (1)

Publication Number Publication Date
CA2375097A1 true CA2375097A1 (en) 2002-06-08

Family

ID=27403041

Family Applications (1)

Application Number Title Priority Date Filing Date
CA002375097A Abandoned CA2375097A1 (en) 2001-03-28 2002-03-28 Compositions and methods for identifying and distinguishing orthosomycin biosynthetic loci

Country Status (5)

Country Link
EP (1) EP1373309A2 (en)
JP (1) JP2004532021A (en)
AU (1) AU2002245973A1 (en)
CA (1) CA2375097A1 (en)
WO (1) WO2002079505A2 (en)

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6861513B2 (en) * 2000-01-12 2005-03-01 Schering Corporation Everninomicin biosynthetic genes
US20030143666A1 (en) * 2000-01-27 2003-07-31 Alfredo Staffa Genetic locus for everninomicin biosynthesis
DE10109166A1 (en) * 2001-02-25 2002-09-12 Combinature Biopharm Ag Avilamycin derivatives
CA2352451C (en) * 2001-07-24 2003-04-08 Ecopia Biosciences Inc. High throughput method for discovery of gene clusters

Also Published As

Publication number Publication date
WO2002079505A2 (en) 2002-10-10
EP1373309A2 (en) 2004-01-02
JP2004532021A (en) 2004-10-21
AU2002245973A1 (en) 2002-10-15
WO2002079505A3 (en) 2003-10-09

Similar Documents

Publication Publication Date Title
Zhou et al. A novel DNA modification by sulphur
Lopez et al. Isolation of the lysolipin gene cluster of Streptomyces tendae Tü 4042
Boccazzi et al. Generation of dominant selectable markers for resistance to pseudomonic acid by cloning and mutagenesis of the ileS gene from the archaeon Methanosarcina barkeri Fusaro
KR20180093083A (en) Kelimycin biosynthesis gene cluster
US7462705B2 (en) Nucleic acids encoding an enediyne polyketide synthase complex
CA2365904A1 (en) Mitomycin biosynthetic gene cluster
Anderson et al. The detection of diverse aminoglycoside phosphotransferases within natural populations of actinomycetes
US20070117113A1 (en) Nucleic acid fragment encoding an NRPS for the biosynthesis of anthramycin
US20020160476A1 (en) Nucleic acids and proteins from cenarchaeum symbiosum
CN103224905A (en) Identification and characterization of the spinactin biosysnthesis gene cluster from spinosyn producing saccharopolyspora spinosa
CA2375097A1 (en) Compositions and methods for identifying and distinguishing orthosomycin biosynthetic loci
US7108998B2 (en) Nucleic acid fragment encoding an NRPS for the biosynthesis of anthramycin
CA2391131C (en) Genes and proteins for rosaramicin biosynthesis
US20030224364A1 (en) Compositions and methods for identifying and distinguishing orthosomycin biosynthetic loci
KR100861771B1 (en) 2-epi-5-epi-valiolone synthase for validamycin biosynthesis and method for preparing the same
KR100889800B1 (en) Validamycin biosynthesis gene cluster and it?s primer
CA2445687C (en) Compositions, methods and systems for the discovery of enediyne natural products
KR101492561B1 (en) Microarray for detection of antibiotic resistant microbe and method for analyzing antibiotic resistant microbe using the same
WO2001055180A2 (en) Gene cluster for everninomicin biosynthesis
AU2006274822A1 (en) Genes involved in the biosynthesis of thiocoraline and heterologous production of same
Ning Studies on the Biosynthesis of Rifamycin by Amycolatopsis mediterranei
EP1461434A2 (en) Compositions, methods and systems for discovery of lipopeptides
CZ2006299A3 (en) Conservative fragment of hemA-asuA actinomycete gene and its use for searching producers of biologically active substances containing CSN unit and isolation of corresponding gene clusters

Legal Events

Date Code Title Description
EEER Examination request
FZDC Correction of dead application (reinstatement)
FZDE Dead