CA2527557A1 - Genes encoding carotenoid compounds - Google Patents

Genes encoding carotenoid compounds Download PDF

Info

Publication number
CA2527557A1
CA2527557A1 CA002527557A CA2527557A CA2527557A1 CA 2527557 A1 CA2527557 A1 CA 2527557A1 CA 002527557 A CA002527557 A CA 002527557A CA 2527557 A CA2527557 A CA 2527557A CA 2527557 A1 CA2527557 A1 CA 2527557A1
Authority
CA
Canada
Prior art keywords
nucleic acid
ala
leu
gene
carotenoid
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
CA002527557A
Other languages
French (fr)
Inventor
Qiong Cheng
Natalia Sedkova
Luan Tao
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
EIDP Inc
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Publication of CA2527557A1 publication Critical patent/CA2527557A1/en
Abandoned legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/90Isomerases (5.)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/0004Oxidoreductases (1.)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/0004Oxidoreductases (1.)
    • C12N9/0071Oxidoreductases (1.) acting on paired donors with incorporation of molecular oxygen (1.14)
    • C12N9/0083Miscellaneous (1.14.99)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/10Transferases (2.)
    • C12N9/1048Glycosyltransferases (2.4)
    • C12N9/1051Hexosyltransferases (2.4.1)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/10Transferases (2.)
    • C12N9/1085Transferases (2.) transferring alkyl or aryl groups other than methyl groups (2.5)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y205/00Transferases transferring alkyl or aryl groups, other than methyl groups (2.5)
    • C12Y205/01Transferases transferring alkyl or aryl groups, other than methyl groups (2.5) transferring alkyl or aryl groups, other than methyl groups (2.5.1)
    • C12Y205/01029Geranylgeranyl diphosphate synthase (2.5.1.29)
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12YENZYMES
    • C12Y503/00Intramolecular oxidoreductases (5.3)
    • C12Y503/03Intramolecular oxidoreductases (5.3) transposing C=C bonds (5.3.3)
    • C12Y503/03002Isopentenyl-diphosphate DELTA-isomerase (5.3.3.2)
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02EREDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
    • Y02E50/00Technologies for the production of fuel of non-fossil origin
    • Y02E50/30Fuel from waste, e.g. synthetic alcohol or diesel

Abstract

A carotenogenic biosynthetic gene cluster has been isolated from Panteoa stewartii strain DC413, wherein the genetic organization of the cluster is crtE-idi-crtX-crtY-crtI-crtB-crtZ. The genes contained within this cluster encode geranylgeranyl pyrophosphate (GGPP) synthetase (CrtE), isopentenyl pyrophosphate isomerase (Idi), zeaxanthin glucosyl transferase (CrtX), lycopene cyclase (CrtY), phytoene desaturase (CrtI), phytoene synthase (CrtB), and b-carotene hydroxylase (CrtZ). The gene cluster, genes and their products are useful for the conversion of farnesyl pyrophosphate to carotenoids.
Vectors containing those DNA segments, host cells containing the vectors and methods for producing those enzymes by recombinant DNA technology in transformed host organisms are disclosed.

Description

TITLE
GEi~~ES Ef~CODIi~G C~'-~aI~OTEl~01~ COi~IPOUf~DS
This application claims the benefit of U.S. Provisional Application fro.
60/488,188 filed July 17, 2008 and U.S. Provisional Application I~o.
60/527,083 filed December 3, 2003 FIELD OF THE INVENTION
The invention relates to the field of molecular biology and microbiology. More specifically, this invention pertains to nucleic acid fragments isolated from Pantoea stewartii encoding enzymes useful for microbial production of carotenoid, compounds (e.g., lycopene, ~-carotene, zeaxanthin, and zeaxanthin-~i-glucosides).
BACKGROUND OF THE INVENTION
Carotenoids represent one of the most widely distributed and structurally diverse classes of natural pigments, producing pigment colors of light yellow to orange to deep red. Eye-catching examples of carotenogenic tissues include carrots, tomatoes, red peppers, and the ' petals of daffodils and marigolds. Carotenoids are synthesized by all photosynthetic organisms, as well as some bacteria and fungi. These pigments have important functions in photosynthesis, nutrition, and protection against photooxidative damage. For example, animals do not have the ability to synthesize carotenoids but must obtain these nutritionally important compounds through their dietary sources.
Industrially, only a few carotenoids are used for food colors, animal feeds, pharmaceuticals, and cosmetics, despite the existence of more than 600 different carotenoids identified in nature. This is largely due to difficulties in production. Presently, most of the carotenoids used for industrial purposes are produced by chemical synthesis; however, these compounds are very difficult to make chemically (Nelis and Leenheer, Appl. Bacteriol. 70:181-191 (1991)). Natural carotenoids can either be obtained by extraction of plant material or by microbial synthesis; but, only a few plants are widely used for commercial carotenoid production and the productivity of carotenoid synthesis in these plants is relatively low. As a result, carotenoids produced from these plants are very expensive. One way to increase the productive capacity of biosynthesis would be to apply recombinant DNA technology (reviewed in Misawa and Shimada, J. Biotech. 59:169-181 (1998)). Thus, it would be desirable to produce carotenoids in non-carotenogenic bacteria and yeasts, thereby permitting control over quality, quantity and selection of the most suitable and efficient prcd~acer organisms. The latter is especially important for commercial producti~n economics (and therefore availability) to consumers.
Structurally, the most common carotenoids are 4.0-carbon (Cqo) terpenoids; however, carotenoids with only ~0 carbon atoms (C3o=
diapocarotenoids) are detected in some species. Biosynthesis of each of these types of carotenoids is derived from the isoprene biosynthetic pathway and its five-carbon universal isoprene building block, isopentenyl pyrophosphate (IPP). This biosynthetic pathway can be divided into two portions: 1) the upper isoprene pathway, which leads to the formation of farnesyl pyrophosphate (FPP); and 2) the lower carotenoid biosynthetic pathway, comprising various crt genes which convert FPP into long C3o and C4o carotenogenic compounds. Both portions of this pathway are shown in Figure 1.
Typically, the formation of phytoene represents the first step unique to biosynthesis of Cq,o carotenoids (Figures 1 and 2). Phytoene itself is a colorless carotenoid and occurs via isomerization of IPP to dimethylallyl pyrophosphate (DMAPP) by isopentenyl pyrophosphate isomerase (encoded by the gene ids). The reaction is followed by a sequence of 3 prenyltransferase reactions in which geranyl pyrophosphate (GPP), farnesyl pyrophosphate (FPP), and geranylgeranyl pyrophosphate (GGPP) are formed. The gene crtE, encoding GGPP synthetase, is responsible for this latter reaction. Finally, two molecules of GGPP
condense to form phytoene (PPPP). This reaction is catalyzed by phytoene synthase (encoded by the gene crt8).
Lycopene is a "colored" carotenoid produced from phytoene.
Lycopene imparts the characteristic red color of ripe tomatoes and has great utility as a food colorant. It is also an intermediate in the biosynthesis of other carotenoids in some bacteria, fungi and green plants. Lycopene is prepared biosynthetically from phytoene through four sequential dehydrogenation reactions by the removal of eight atoms of hydrogen, catalyzed by the gene crtl (encoding phytoene desaturase).
Intermediaries in this reaction are phytofluene, ~-carotene, and neurosporene.
Lycopene cyclase (CrtY) converts lycopene to ~i-carotene.
~-carotene is a typical carotene with a color spectrum ranging from yellow to orange. Its utility is as a colorant for margarine and butter, as a source f~r ~rifamin a producti~n, and recently as a comp~und with p~tential preventative effects against certain Izinds of cancers.
(~-car~tene is c~nverted t~ zea~;anthin via a hydro~eylation reaction resulting from the activity ~f [3-carotene hydroxylase (encoded by the crt~
gene). For example, it is the yellow pigment that is present in the weeds of maize. ~e~axanthin is contained in feeds for hen or colored carp and is an important pigment source for their coloration. Finally, zeaxanfihin can be converted to zeaxanthin-[i-monoglucoside and zeaxanthin-[i-diglucoside.
This reaction is catalyzed by zeaxanthin glucosyl transferase (encoded by the crt~C gene).
In addition to the carotenoid biosynthetic genes and enzymes responsible for creation of phytoene, lycopene, [i-carotene, zeaxanthin, and zeaxanthin-~-glucosides, various other crt genes are known which enable the intramolecular conversion of C4p compounds to produce numerous other functionalized carotenoid compounds by:
(i) hydrogenation, (ii) dehydrogenation, (iii) cyclization, (iv) oxidation, (v) esterification/ glycosylation, or any combination of these processes.
Many of the bacteria within the family Enterobacteriaceae are naturally pigmented, thus indicating the ability of these organisms to produce carotenoids. Furthermore, Cqp carotenoid biosynthesis has been particularly well-studied within the genus Pantoea, a small group of organisms previously classified within a broad group of bacteria all formerly known within the genus Enwinia [see Hauben et al., Syst. Appl.
Microbiol. 21 (3):384-397 (Aug. 1998), for details concerning the reclassification of the large former genus Ervvinia into four phylogenetic groups comprised of Enwinia, Pectobacterium, Brenneria gen. nov., and Pantoea]. For example, several reviews discuss the genetics of carotenoid pigment biosynthesis, such as those of G. Armstrong (J. Bact.
176: 4795-4802 (1994); Annu. Rev. Microbiol. 51:629-659 (1997)). Gene sequences encoding crtEXYIBZ are available for Pantoea agglomerans (formerly known as E. herbicola EHO-10 (ATCC #39368)), P. ananatis (formerly known as E. uredovora 20D3 (ATCC #19321)), P. stewartii (formerly known as E. stewartii (ATCC #8200)), and P. agglomerans pv.
milletiae (US 5,656,472; US 5,545,816; US 5,530,189; US 5,530,188;
US 5,429,939; WO 02/079395 A2; see also GenBank~ Accession No.'s M87280, D90087, AY166713, and AB076662, respectively). However, the existing literature provides limited information concerning diversity of gene sequences encoding crtEXYIBZ and the genetic organization of these sequences in organisms that are related tea these well-characterize d P~ni;oea species.
The problem to be solved, therefore, is to identify more nucleic acid sequences encoding all or a portion of the carotenoid biosynthetic enzymes from organismsthatarerelated to Pantoea. agglome~ans, P. ananatis, P. sfewarlii, and P. agglomerans pv. milletiae, to facilitate studies to better understand carotenoid biosynthetic pathways, provide genetic tools for the manipulation of those pathways, and provide a means to synthesize carotenoids in large amounts by introducing and expressing the appropriate genes) in an appropriate host. This will lead to carotenoid production superior to synthetic methods.
Applicants have solved the stated problem by isolating seven unique open reading frames (ORFs) in the carotenoid biosynthetic pathway encoding CrtE, Idi, CrtX, CrtY, Crtl, CrtB and CrtZ enzymes from a yellow-pigmented bacterium designated as Pantoea stewartii strain DC413. The gene sequences and the genetic organization of the gene cluster in P. stewartii DC413 are different from those of the P. stewartii ATCC 8200.
SUMMARY OF THE INVENTION
The invention provides seven genes isolated from Pantoea stewartii strain DC413 that have been demonstrated to be involved in the synthesis of various carotenoids including lycopene, ~i-carotene, zeaxanthin, and zeaxanthin-~-glucosides. The genes are clustered on the same operon and include the crtE, idi, crtX, crtY, crtl, crt8 and crtZ genes. The DNA
sequences of the crtE, idi, crtX, crtY, crtl, crt8 and crtZ genes correspond to ORFs 1-7 and SEQ ID NOs:1, 3, 5, 7, 9, 11 and 13, respectively.
Accordingly, the invention provides an isolated nucleic acid molecule encoding a carotenoid biosynthetic pathway enzyme, selected from the group consisting of:
(a) an isolated nucleic acid molecule encoding the amino acid sequence selected from the group consisting of SEQ ID
NOs:2, 4, 6, 8, 10, 12 and 14;
(b) an isolated nucleic acid molecule that hybridizes with (a) under the following hybridization conditions: 0.1X SSC, 0.1 SDS, 65°C and washed with 2X SSC, 0.1 % SDS followed by 0.1X SSC, 0.1% SDS; and (c) an isolated nucleic acid molecule that is complementary to (a) or (b).
Sif nifarly the inventi~n provides an is~lafed nucleic acid m~lecule as set forth in SEA ID i~~:~0, comprising the c~~ idi-c~-c~~ cal-cr~~-o~~, genes or an isolated nucleic acid molecule having at least 95°~~
identity to SECT ID i~~:~0, wherein the isolated nucleic acid molecule encodes all of the polypeptides crtE, idi, crf~~, crf'~, crtl, crt~,and crt~.
The invention additionally provides polypeptides encoded by the instant genes and genetic chimera comprising suitable regulatory regions for genetic expression of the genes in bacteria, yeast, filamentous fungi, algae, and plants as well as transformed hosts comprising the same.
The invention provides a method of obtaining a nucleic acid molecule encoding a carotenoid biosynthetic pathway enzyme comprising:
(a) probing a genomic library with the present nucleic acid molecules;
{b) identifying a DNA clone that hybridizes with the present nucleic acid molecules; and (c) sequencing the genomic fragment that comprises the clone identified in step (b), wherein the sequenced genomic fragment encodes a carotenoid biosynthetic enzyme.
Similarly, the invention provides a method of obtaining a nucleic acid molecule encoding a carotenoid biosynthetic pathway enzyme comprising:
(a) synthesizing at least one oligonucleotide primer corresponding to a portion of the present nucleic acid sequences; and (b) amplifying an insert present in a cloning vector using the oligonueleotide primer of step (a);
wherein the amplified insert encodes a portion of an amino acid sequence encoding a carotenoid biosynthetic pathway enzyme.
In a preferred embodiment, the invention provides a method for the production of carotenoid compounds comprising:
(a) providing a transformed host cell comprising:
(i) suitable levels of farnesyl pyrophosphate; and (ii) a set of nucleic acid molecules encoding the present carotenoid enzymes under the control of suitable regulatory sequences;
(b) contacting the host cell of step (a) under suitable growth conditions with an effective amount of a fermentable carbon substrate whereby a carotenoid compound is produced.
In ~:~ specific preferred ernbo~9iment, the invention provi~9es a method for the pr~duction of carotenoid c~mp~~ands in a ~1 metab~lizing host, for e~~ample a high gr~wth methan~trophic bacterial strain such as ~le~Hyl~mona~ 15a (ATC~ designation PTA ~40~), ~ehere the D1 metabolizing host:
(a) grows on a D1 carbon substrate selected from the group consisting of methane and methanol; and (b) comprises a functional Embden-fVleyerhof carbon pathway, said pathway comprising a gene encoding a pyrophosphate-dependent phosphofructokinase enzyme.
Additionally, the invention provides a method of regulating carotenoid biosynthesis in an organism comprising over-expressing at least one carotenoid gene selected from the group consisting of SEQ ID
NOs:1, 3, 5, 7, 9, 11 and 13 in an organism such that the carotenoid biosynthesis is altered in the organism.
In an alternate embodiment, the invention provides a mutated gene encoding a carotenoid biosynthetic pathway enzyme having an altered biological activity produced by a method comprising the steps of:
(i) digesting a mixture of nucleotide sequences with restriction endonucleases wherein said mixture comprises:
a) an isolated nucleic acid molecule encoding a carotenoid biosynthetic pathway enzyme selected from the group consisting of SEQ ID NOs:1, 3, 5, 7, 9, 11 and 13;
b) a first population of nucleotide fragments which will hybridize to said isolated nucleic acid molecules of step (a); and c) a second population of nucleotide fragments which will not hybridize to said isolated nucleic acid molecules of step (a);
wherein a mixture of restriction fragments are produced;
(ii) denaturing said mixture of restriction fragments;
(iii) incubating the denatured said mixture of restriction fragments of step (ii) with a polymerase; and (iv) repeating steps (ii) and (iii) wherein a mutated carotenoid gene is produced encoding a protein having an altered biological activity.

In another embodiment, the invention provides a h~nl~ea sfe~a'~ii strain DC413 comprising the 16S rDf~A sequence ass set forth in SEQ ID
NO:18.
l~dditionally, the invention provides an isolated nucleic acid molecule encoding all of heaminoacidsequenees as set forth in SEQ ID
NO:2, 4, 6, 8, 10, 12, and 14, wherein the preferred isolated nucleic acid molecule of the invention is a nucleic acid molecule having the nucleic acid sequence as set forth in SEQ ID NO:20.
BRIEF DESCRIPTION OF THE DRAWINGS
SEQUENCE DESCRIPTIONS AND BIOLOGICAL DEPOSITS
Figure 1 shows the upper isoprenoid and lower carotenoid biosynthetic pathways.
Figure 2 shows a portion of the lower Cq.o carotenoid biosynthetic pathway, to illustrate the specific chemical conversions catalyzed by CrtE, CrtX, CrtY, Crtl, CrtB and CrtZ.
Figure 3 presents results of an HPLC analysis of the carotenoids contained within Pantoea stewartii strain DC413.
Figure 4 presents results of an HPLC analysis of the carotenoids contained within transformant E. coli comprising cosmid pWEB-413.
Figure 5 shows the Pantoea stewartii strain DC413 gene cluster containing the carotenoid biosynthetic genes crtE-idi-crt~PYIBZ.
Figure 6 shows the HPLC analysis of the carotenoids from Methylomonas 16a MWM1000 (aleflCrtN1-) strain containing pDCQ332.
The invention can be more fully understood from the following detailed description and the accompanying sequence descriptions that form a part of this application.
The following sequences conform with 37 C.F.R. 1.821-1.825 ("Requirements for Patent Applications Containing Nucleotide Sequences and/or Amino Acid Sequence Disclosures - the Sequence Rules") and consistent with World Intellectual Property Organization (WIPO) Standard ST.25 (1998) and the sequence listing requirements of the EPO and PCT
(Rules 5.2 and 49.5(a-bis), and Section 208 and Annex C of the Administrative Instructions). The symbols and format used for nucleotide and amino acid sequence data comply with the rules set forth in 37 C. F. R. ~ 1.822.
SEQ ID NOs:1-14 are full length genes or proteins as identified in Table 1.
T~a~LE 1 Summare~ ~f ~art~~ea si'e~a~ii Strain DC4.13 Oene and Protein SEO ID Lumbers Description ORF No. Nucleic Peptide acid SECT ID NO.
SEO lD NO.

cr f~ 1 1 2 idi 2 3 4 crt~C 3 5 6 c~Y 4 7 8 crfi 5 9 10 Crib 7 13 14 SEQ ID NOs:15-17, and 19 are the nucleotide sequences encoding primers HK12, JCR14, JCR15, and TET-1 FP-1, respectively.
SEQ ID N0:18 provides the 16S rRNA gene sequence of strain DC413.
SEQ ID N0:20 is the nucleotide sequence of a 9,127 by fragment of DNA from strain DC413 encoding the crtE, idi, ert~C, crtY, crtl, crtB and crtZ genes.
SEQ ID N0:21 is the nucleotide sequence of primer pWEB413F.
SEQ ID N0:22 is the nucleotide sequence of primer pWEB413R.
Applicants made the following biological deposit under the terms of the Budapest Treaty on the International Recognition of the Deposit of Micro-organisms for the Purposes of Patent Procedure:
International Depositor Identification Depository _ Reference Designation Date of Deposit Methylomonas 16a ATCC PTA 2402 August 22, 2000 As used herein, "ATCC" refers to the American Type Culture Collection International Depository Authority located at ATCC, 10801 University Blvd., Manassas, VA 20110-2209, U.S.A. The "International Depository Designation" is the accession number to the culture on deposit with ATCC.
The listed deposit ~~eill be nlair~tained in the indicated internatic~n~l depository for at least thiri:y (3g) veers and will be made available to the public upon the grant of a patent disclosing it. The availability of a deposit does not constitute a license to practice the sub)ect invention in derogation of patent rights granted by government action.
DETAILED DESCRIPTI~N ~F THE INlIENTI~N
The genes of this invention and their expression products are useful for the creation of recombinant organisms that have the ability to produce various carotenoid compounds. Nucleic acid fragments encoding CrtE, Idi, CrtX, CrtY, Crtl, CrtB, and CrtZ have been isolated from Pantoea stevvartii strain DC4.13 and identified by comparison to public databases containing nucleotide and protein sequences using the BLAST and FASTA algorithms, well known to those skilled in the art. The genes and gene products of the present invention may be used in a variety of ways for the enhancement or manipulation of carotenoid compounds. Further advantages may be incurred as a result of the genetic organization of the gene cluster comprising these genes.
There is a general practical utility for microbial production of carotenoid compounds as these compounds are very difficult to make chemically (Nelis and Leenheer, supra). Most carotenoids have strong color and can be viewed as natural pigments or colorants. Furthermore, many carotenoids have potent antioxidant properties and thus inclusion of these compounds in the diet is thought to be healthful. Well-known examples are a-carotene, canthaxanthin, and astaxanthin. Additionally, carotenoids are required elements of aquaculture. Salmon and shrimp aquacultures are particularly useful applications for this invention as carotenoid pigmentation is critically important for the value of these organisms (Shahidi, F., and Brown, J.A., Critical reviev~s in Food Science 38(1): 1-67 (1998)). Finally, carotenoids have utility as intermediates in the synthesis of steroids, flavors and fragrances and compounds with potential electro-optic applications.
The disclosure below provides a detailed description of the isolation of carotenoid synthesis genes from Pantoea stewartii strain DC413, modification of these genes by genetic engineering, and their insertion into compatible plasmids suitable for cloning and expression in E. coli, bacteria, yeasts, fungi and higher plants.
~c~flnltl~n~
In this disclosure, a number ~f terms and abbreviations are used.
The following definitions are provided.
"Open reading frame" is abbreviated OI~F.
"Polymerase chain reaction" is abbreviated PCR.
"High Performance Liquid Chromatography" is abbreviated HPLC.
The term "isoprenoid compound" refers to compounds formally derived from isoprene (2-methylbuta-1,3-diene; CHI=C(CH3)CH=CH2), the skeleton of which can generally be discerned in repeated occurrence in the molecule. These compounds are produced biosynthetically via the isoprenoid pathway beginning with isopentenyl pyrophosphate (IPP) and formed by the head-to-tail condensation of isoprene units, leading to molecules which may be--for example--of 5, 10, 15, 20, 30, or 40 carbons in length.
The term "carotenoid biosynthetic pathway" refers to those genes comprising members of the upper isoprenoid pathway and/or lower carotenoid biosynthetic pathway of the present invention, as illustrated in Figure 1.
The terms "upper isoprenoid pathway" and "upper pathway" will be use interchangeably and will refer to enzymes involved in converting pyruvate and glyceraldehyde-3-phosphate to farnesyl pyrophosphate (FPP). These enzymes include, but are not limited to: the "dxs" gene (encoding 1-deoxyxylulose-5-phosphate synthase); the "dxr" gene (encoding 1-deoxyxylulose-5-phosphate reductoisomerase); the "ispD"
gene (encoding a 2C-methyl-D-erythritol cytidyltransferase enzyme; also known as ygbP); the "ispE" gene (encoding 4- diphosphocytidyl-2-C-methylerythritol kinase; also known as ychB); the "ispF" gene (encoding a 2C-methyl-D-erythritol 2,4-cyclodiphosphate synthase; also known as ygbB); the "pyre" gene (encoding a CTP synthase); the "IytB" gene involved in the formation of dimethylallyl diphosphate; the "gcpE" gene involved in the synthesis of 2-C-methyl-D-erythritol 4-phosphate; the "idi"
gene (responsible for the intramolecular conversion of IPP to dimethylallyl pyrophosphate); and the "ispA" gene (encoding geranyltransferase or farnesyl diphosphate synthase) in the isoprenoid pathway.
The term "Idi" refers to an isopentenyl diphosphate isomerase enzyme (E.C. 5.3.3.2) encoded by the idi gene. A representative idi gene is provided as SEQ ID N0:3.

The terr-rrs "lower caroten~id biosynthetic pathe~~ay" and "I~wer pathway" will be used interchangeably and refer to those enzymes e~hich convert FPP to a suite of carotenoids. These include those genes and gene products that are involved in the immediate synthesis of either diapophytoene (whole synthesis represents the first step uniqueto biosynthesis of C3o carotenoids) or phytoene (whose synthesis represents the first step unique to biosynthesis of Cqo carotenoids). All subsequent reactions leading to the production of various C3o-Cq.o carotenoids are included within the lower carotenoid biosynthetic pathway. These genes and gene products comprise all of the "crf" genes including, but not limited to: crtllel, crtN, crtN2, crtE, crtX, crtY, crtl, crt8, crtZ, crtVll, crt0, crtA, crtC, crtD, crtF, and crtU. Finally, the term "lower carotenoid biosynthetic enzyme" is an inclusive term referring to any and all of the enzymes in the present lower pathway including, but not limited to: CrtM, CrtN, CrtN2, CrtE, CrtX, CrtY, Crtl, CrtB, Crt~, CrtW, CrtO, CrtA, CrtC, CrtD, CrtF, and CrtU.
For the present application, the term "carotenoid compound" is defined as a class of hydrocarbons having a conjugated polyene carbon skeleton formally derived from isoprene. This class of molecules is composed of triterpenes (C3o diapocarotenoids) and tetraterpenes (C4o carotenoids) and their oxygenated derivatives; and, these molecules typically have strong light absorbing properties and may range in length in excess of Coo. Other "carotenoid compounds" are known which are C35, CSO~ C6o~ C7o~ and C$o in length, for example.
"C3o diapocarotenoids" consist of six isoprenoid units joined in such a manner that the arrangement of isoprenoid units is reversed at the center of the molecule so that the two central methyl groups are in a 1,6-positional relationship and the remaining nonterminal methyl groups are in a 1,5-positional relationship. All C3o carotenoids may be formally derived from the acyclic C3oH~.2 structure, having a long central chain of conjugated double bonds, by: (i) hydrogenation (ii) dehydrogenation, (iii) cyclization, (iv) oxidation, (v) esterification/ glycosylation, or any combination of these processes.
"Tetraterpenes" or "Cq.o carotenoids" consist of eight isoprenoid units joined in such a manner that the arrangement of isoprenoid units is reversed at the center of the molecule so that the two central methyl groups are in a 1,6-positional relationship and the remaining nonterminal methyl groups are in a 1,5-positional relationship. All C4o carotenoids rnay be formally dr~ri~red fr~m the acyclic C~~H55 structure (F~rmula I
below), having a I~ng central chain of conjugated double bonds, by (i) hydrogenation, (ii) dehydrogenation, (iii) cyclization, (iv) oxidation, (v) esterificationl glycosylation, or any combination of these processes.
This class also includes certain c~mpounds that arisefrom rearrangements of the carbon skeleton (Formula I), or by the (formal) removal of part of this structure.
Formula I
~Hg H~ ~H~ H !~Hs H '~H9 H H H H H H H~ H
J~S o~a, a~4 ..~~i . \ .e~1 .e~h~~ ~y C, Cyy .~ ~. .~i -CL .~.A r H~'. C C C '~.' ~ ~' C ~G"C' C 'L""" ,~~' .~' ~ ~" g H Hz H H H H H H ~H H ~H H ~H H ~H
A~) 9 s s For convenience, carotenoid formulae are often written in a shorthand form as (Formula IA below):
Formula IA
..y .-~1~...~,.
where the broken lines indicate formal division into isoprenoid units The term "functionalized" or "functionalization" refers to the (i) hydrogenation, (ii) dehydrogenation, (iii) cyclization, (iv) oxidation, or (v) esterification/glycosylation of any portion of the carotenoid backbone.
This backbone is defined as the long central chain of conjugated double bonds. Functionalization may also occur by any combination of the above processes.
The term "CrtE" refers to a geranylgeranyl pyrophosphate synthetase enzyme encoded by the crtE gene and which converts trans-trans-farnesyl diphosphate and isopentenyl diphosphate to pyrophosphate and geranylgeranyl diphosphate. A representative crtE gene is provided as SEQ ID N0:1.
The term "CrtX" refers to a zeaxanthin glucosyl transferase enzyme encoded by the crtX gene and which converts zeaxanthin to zeaxanthin-~i-diglucoside. A representative crtX gene is provided as SEQ ID N0:5.

The term "~rt'lQ" refers to ~1 lyc~pence cyclase enzyme encoded by the criY gene which c~nverts lyc~pene t~ ~-car~tene. A representative c~'~gene is provided as SEO I~ i~~~:'~.
The term "Oral" refers to a phytoene desaturase enzyme encoded by the c~fl gene. Ortl converts phytoene into lycopene via fihe intermediaries of phytofluene, ~-carotene and neurosporene by the introduction of 4. double bonds. A representative crtl gene is provided as SECT ID NO:9.
The term "Crt~" refers to a phytoene synthase enzyme encoded by the crlB gene which catalyzes the reaction from prephytaene diphosphate to phytoene. A representative crt8 gene is provided as SEQ ID N~:11.
The term "CrtZ" refers to a ~-carotene hydroxylase enzyme encoded by the crtZ gene which catalyzes a hydroxylation reaction from ~i-carotene to zeaxanthin. A representative crfZ gene is provided as SEQ ID
N0:13.
In the present application, the genetic organization of 3 different clusters of DNA are described, each of which is defined below:
1. The term "crtE-idi-crtY crtl-crt8-crtZ" or "crtE-idi-crtYIBZ" refers to a molecule having the following genetic organization: the 2p crtE, idi, crtY, crtl, crtB, and crtZ genes are clustered in the order stated and the transcription of the crtZ occurs in opposite orientation to that of crtE, idi, crtY, crtl, and crt8.
2. The term "crtE-crt,~C crtY crtl-crtB-crtZ" or "crtE)fYIBZ" refers to a molecule having the following genetic organization: the crtE, crtX, crtY, crtl, crt8, and crtZ genes are clustered in the order stated and the transcription of the crtZ occurs in opposite orientation to that of crfE, crtX, crfY, crtl, and crt8.
3. The term "crtE-idi-crtX crtY crll-crtB-crfZ" or "crtE-idi-crt)CYIBZ"
refers to a molecule having the following genetic organization:
30 the crtE, idi, crtX, crtY, crtl, cri~B, and crtZ genes are clustered in the order stated and the transcription of the crtZ occurs in opposite orientation to that of crtE, idi, crtX, crtY, crtl, and crtB.
The term "Embden-Meyerhof pathway" refers to the series of biochemical reactions for conversion of hexoses such as glucose and 35 fructose to important cellular 3-carbon intermediates such as glyceraldehyde 3-phosphate, dihydroxyacetone phosphate, phosphoenol pyruvate and pyruvate. These reactions typically proceed with net yield of biochemically useful energy in the form of ATP. The key enzymes unique to tl-re Embden-i~eyerof pathv,~ay are the ph~sphofruct~Iainase and fructose 1,0-bisphosphate aldolase.
The term "Entner-~ouderoff pafihe~~ay" refers to a series of biochemical reactions for conversion of he~~oses such as glucose or fructose to the important 3-carbon cellular infiermediates pyruvateand glyceraldehyde 3-phosphate without any net production of biochemically useful energy. The key enzymes unique to the Entner-Douderoff pathway are 6-phosphogluconate dehydratase and a ketodeoxyphospho-gluconate aldolase.
The term "C~ carbon substrate" or "single carbon substrate" refers to any carbon-containing molecule that lacks a carbon-carbon bond.
Examples are methane, methanol, formaldehyde, formic acid, formats, methylated amines (e.g., mono-, di-, and tri-methyl amine), methylated thiols, and carbon dioxide.
The term "Cq metabolizer" refers to a microorganism that has the ability to use a single carbon substrate as its sole source of energy and biomass. Cq metabolizers will typically be methylotrophs and/or methanotrophs.
The term "methylotroph" means an organism capable of oxidizing organic compounds that do not contain carbon-carbon bonds. Where the methylotroph is able to oxidize CHq., the methylotroph is also a methanotroph.
The term "methanotroph" or "methanotrophic bacteria" means a prokaryote capable of utilizing methane as its primary source of carbon and energy. Complete oxidation of methane to carbon dioxide occurs by aerobic degradation pathways. Typical examples of methanotrophs useful in the present invention include (but are not limited to) the genera Methylomonas, Methylobacter, Methylococcus, and Methylosinus.
The term "high growth methanotrophic bacterial strain" refers to a bacterium capable of growth with methane or methanol as the sole carbon and energy source and which possesses a functional Embden-Meyerof carbon flux pathway resulting in a high rate of growth and yield of cell mass per gram of C~ substrate metabolized. The specific "high growth methanotrophic bacterial strain" described herein is referred to as "Methylomonas 16a", "16a" or "Methylomonas sp. 16a", which terms are used interchangeably and which refer to the Methylomonas sp. 16a strain (ATCC PTA-2402) used in the present invention (US 6,689,601 ).

The term °'cr~ gene charter" in ~~Vef67yfon~~nas refers to an ~pcen reading frame c~rr~prising c''tI~9, ald, and c~CN2 that is active in the native C30 carotenoid biosynthetic pathe~a~y of I~'Vefhylomonas sp. 16a.
The term "Cr~i~1" refers to an enzyme encoded by the crfiit9 e~ene, active in the native carotenoid biosynthetic pathway of f~ethyLomonas sp.
16a. This gene is the first gene located on the c~f gene cluster in Methylomonas.
The term "ALD" refers to an enzyme encoded by the ald gene, active in the native carotenoid biosynthetic pathway of Methylomonas sp.
16a. This gene is the second gene located on the crt gene cluster in Methylomonas.
The term "CrtN2" refers to an enzyme encoded by the crtN2 gene, active in the native carotenoid biosynthetic pathway of Methylomonas sp.
16a. This gene is the third gene located on the crt gene cluster in Methylomonas.
The term "CrtN3" refers to an enzyme encoded by the crtN3 gene, which affects the native carotenoid biosynthesis in Mefhylomonas sp. 16a.
This gene is not located within the crt gene cluster; instead this gene is present in a different locus within the Methylomonas genome (WO 02/18617).
The term "pigmentless" or "white mutant" or "non-pigmented strain"
refers to a Mefhylomonas sp. 16a bacterium wherein the native pink pigment (e.g., a C3o carotenoid) is not produced. Thus, the bacterial cells appear white in color, as opposed to pink. Methylomonas sp. 16a white mutants have been engineered by deleting all or a portion of the native C3o carotenoid genes. For example, disruption of either the aldlcrtN9 genes or the promoter driving the native crt gene cluster in Methylomonas sp. 16a creates a non-pigmented ("white") mutant better suited for C4o carotenoid production (WO 02/18617).
The term "Methylomonas sp. 16a MWM1000" or "MWM1000" refers to a non-pigmented methanotropic bacterial strain created by deleting a portion of the ald and crtN1 genes native to Methylomonas sp. 16a (WO 02/18617). The deletion disrupted Cso carotenoid production in MWM1000. The aldlcrtN1 deletion is denoted as "DaldlcrtN1".
As used herein, an "isolated nucleic acid fragment" is a polymer of RNA
or DNA that is single- or double-stranded, optionally containing synthetic, non-natural or altered nucleotide bases. An isolated nucleic acid fragment in the form ~f a polymer ~f Df\~A may be comprised ~f one or more segrnents of c~f~A, c~en~rnic DY~J~A or synthetic Di~A.
A nucleic acid molecule is "hybridizable" to another nucleic said molecule, such ass a cDl~l~a, gen~mic DI~A, or Rf~A molecule, when a single-stranded form ofthenucleic acid m~lecola can anneal to the other nucleic acid molecule under the appropriate conditions of temperature and solution ionic strength. Hybridization and washing conditions are well known and exemplified in Sambrook, J., Fritsch, E. F. and Maniatis, T.
Molecular Cloning: A Laboratory Manual, 2nd ed., Cold Spring Harbor Laboratory: Cold Spring Harbor, NY (1989), particularly Chapter 11 and Table 11.1 therein (hereinafter "Maniatis"). The conditions of temperature and ionic strength determine the "stringency" of the hybridization.
Stringency conditions can be adjusted to screen for moderately similar fragments (such as homologous sequences from distantly related organisms), to highly similar fragments (such as genes that duplicate functional enzymes from closely related organisms). Post-hybridization washes determine stringency conditions. One set of preferred conditions uses a series of washes starting with 6X SSC, 0.5% SDS at room temperature for 15 min, then repeated with 2X SSC, 0.5% SDS at 45°C
for 30 min, and then repeated twice with 0.2X SSC, 0.5% SDS at 50°C for min. A more preferred set of stringent conditions uses higher temperatures in which the washes are identical to those above except for the temperature of the final two 30 min washes in 0.2X SSC, 0.5% SDS
was increased to 60°C. Another preferred set of highly stringent 25 conditions uses two final washes in 0.1X SSC, 0.1% SDS at 65°C. An additional set of stringent conditions include hybridization at 0.1X SSC, 0.1% SDS, 65°C and washed with 2X SSC, 0.1% SDS followed by 0.1X
SSC, 0.1 % SDS, for example.
Hybridization requires that the two nucleic acids contain 30 complementary sequences, although depending on the stringency of the hybridization, mismatches between bases are possible. The appropriate stringency for hybridizing nucleic acids depends on the length of the nucleic acids and the degree of complementation, variables well known in the art. The greater the degree of similarity or homology between two nucleotide sequences, the greater the value of Tm for hybrids of nucleic acids having those sequences. The relative stability (corresponding to higher Tm) of nucleic acid hybridization decreases in the following order: RNA:RNA, DNA: RNA, DNA:DNA. For hybrids of greater than 100 nuclc~~atides, in length, eqraations for calculating T~7 have been derived (see f~laniatis, ~up~-a, 9.50-9.51). For hybridizations with shorter nucleic acids, i.e., oligonucleotides, the position of mismatches becomes more important, and the length of the oligonucleotide determines its specificity (see l~'laniatis, su/ara, 11.7-11.3). In one embodiment the length for a hybridizable nucleic acid is at least about 10 nucleotides.
Preferably a minimum length for a hybridizable nucleic acid is at least about 15 nucleotides; more preferably at least about 20 nucleotides; and most preferably the length is at least about 30 nucleotides. Furthermore, the skilled artisan will recognize that the temperature and wash solution salt concentration may be adjusted as necessary according to factors such as length of the probe.
A "substantial portion" of an amino acid or nucleotide sequence is that portion comprising enough of the amino acid sequence of a polypeptide or the nucleotide sequence of a gene to putatively identify that polypeptide or gene, either by manual evaluation of the sequence by one skilled in the art, or by computer-automated sequence comparison and identification using algorithms such as BLAST (Basic Local Alignment Search Tool; Altschul, S. F., et al., J. Mol. Biol. 215:403-410 (1993)). In general, a sequence of ten or more contiguous amino acids or thirty or more nucleotides is necessary in order to putatively identify a polypeptide or nucleic acid sequence as homologous to a known protein or gene.
Moreover, with respect to nucleotide sequences, gene specific oligonucleotide probes comprising 20-30 contiguous nucleotides may be used in sequence-dependent methods of gene identification (e.g., Southern hybridization) and isolation (e.g., in situ hybridization of bacterial colonies or bacteriophage plaques). In addition, short oligonucleotides of 12-15 bases may be used as amplification primers in PCR in order to obtain a particular nucleic acid fragment comprising the primers.
Accordingly, a "substantial portion" of a nucleotide sequence comprises enough of the sequence to specifically identify and/or isolate a nucleic acid fragment comprising the sequence. The instant specification teaches partial or complete amino acid and nucleotide sequences encoding one or more particular microbial proteins. The skilled artisan, having the benefit of the sequences as reported herein, may now use all or a substantial portion of the disclosed sequences for purposes known to those skilled in this art. Accordingly, the instant invention comprises the complete seq~ar~nces as reported in t1 ~e ace~mpanyir~g Sequence Listing, as ~,~ell as substantial portions of th~se sequenees as defined above.
TIIe term "complementary" is used to describe the relationship between nucleotide bases that are capable of hybridising to one another.
For example, with respect to Di~A, adenosine is complementary to thymine and cyfiosine is complementary to guanine. Accordingly, the instant invention also includes isolated nucleic acid fragments that are complementary to the complete sequences as reported in the accompanying Sequence Listing, as well as those substantially similar nucleic acid sequences.
The term "percent identity", as known in the art, is a relationship between two or more polypeptide sequences or two or more polynucleotide sequences, as determined by comparing the sequences.
In the art, "identity" also means the degree of sequence relatedness between polypeptide or polynucleotide sequences, as the case may be, as determined by the match between strings of such sequences. "Identity"
and "similarity" can be readily calculated by known methods, including but not limited to those described in: 1.) Computational Molecular Biology (Lesk, A. M., Ed.) Oxford University: NY (1988); 2.) Biocomputina:
Informatics and Genome Projects (Smith, D. W., Ed.) Academic: NY
(1993); 3.) Computer Analysis of Seguence Data Part I (Griffin, A. M., and Griffin, H. G., Eds.) Humana: NJ (1994); 4.) Seguence Anal sis in Molecular Bioloay (yon Heinje, G., Ed.) Academic (1987); and 5.) Seauence Anal~rsis Primer (Gribskov, M. and Devereux, J., Eds.) Stockton: NY (1991 ). Preferred methods to determine identity are designed to give the best match between the sequences tested. Methods to determine identity and similarity are codified in publicly available computer programs. Sequence alignments and percent identity calculations may be performed using the Megalign program of the LASERGENE bioinformatics computing suite (DNASTAR Inc., Madison, WI). Multiple alignment of the sequences is performed using the Clustal method of alignment (Higgins and Sharp, CABIOS. 5:151-153 (1989)) with default parameters (GAP PENALTY=10, GAP LENGTH PENALTY=10).
Default parameters for pairwise alignments using the Clustal method are:
KTUPLE 1, GAP PENALTY=3, WINDOW=5 and DIAGONALS SAVED=5.
Suitable nucleic acid fragments (isolated polynucleotides of the present invention) encode polypeptides that are at least about 70%
identical, preferably at least about 75% identical, and more preferably at least about 80~Go identical t~ the amino acid sequences rep~rted hcerein.
preferred nucleic acid fragments enc~de amino acid sequences that are ab~ut 85~~o identical to the amino acid sequences reported herein. More preferred nucleic acid fragments encode amino acid sequences that are at least about 90% identical to the amino acid sequences reported herein.
liilosfi preferred are nucleic acid fragments thafi encode amino acid sequences that are at least about 95% identical to the amino acid sequences reported herein. Suitable nucleic acid fragments not only have the above homologies but typically encode a polypeptide having at least 50 amino acids, preferably at least 100 amino acids, more preferably at least 150 amino acids, still more preferably at least 200 amino acids, and most preferably at least 250 amino acids.
"Codon degeneracy" refers to the nature in the genetic code permitting variation of the nucleotide sequence without effecting the amino acid sequence of an encoded polypeptide. Accordingly, the instant invention relates to any nucleic acid fragment that encodes all or a substantial portion of the amino acid sequence encoding the instant microbial polypeptides as set forth in SEQ ID NOs:2, 4, 6, 8, 10, 12 and 14. The skilled artisan is well aware of the "codon-bias" exhibited by a specific host cell in usage of nucleotide codons to specify a given amino acid. Therefore, when synthesizing a gene for improved expression in a host cell, it is desirable to design the gene such that its frequency of codon usage approaches the frequency of preferred codon usage of the host cell.
"Synthetic genes" can be assembled from oligonucleotide building blocks that are chemically synthesized using procedures known to those skilled in the art. These building blocks are ligated and annealed to form gene segments that are then enzymatically assembled to construct the entire gene. "Chemically synthesized", as related to a sequence of DNA, means that the component nucleotides were assembled in vitro. Manual chemical synthesis of DNA may be accomplished using well-established procedures, or automated chemical synthesis can be performed using one of a number of commercially available machines. Accordingly, the genes can be tailored for optimal gene expression based on optimization of nucleotide sequence to reflect the codon bias of the host cell. The skilled artisan appreciates the likelihood of successful gene expression if codon usage is biased towards those codons favored by the host. Determination of preferred cod~ns can be based on a s~ar~ey ~f genes derived fr~rn the host cell where sequence inf~rmation is available.
"gene" refers t~ a nucleic acid fragment that expresses a specific protein, including regulatory sequences preceding (5' non-coding sequences) and following (~' non-coding sequences) the coding sequence. "Native gene" refers to a gene as found in nature with its own regulatory sequences. "Chimeric gene" refers to any gene that is not a native gene, comprising regulatory and coding sequences that are not found together in nature. Accordingly, a chimeric gene may comprise regulatory sequences and coding sequences that are derived from different sources, or regulatory sequences and coding sequences derived from the same source, but arranged in a manner different than that found in nature. "Endogenous gene" refers to a native gene in its natural location in the genome of an organism. A "foreign" gene refers to a gene not normally found in the host organism, but that is introduced into the host organism by gene transfer. Foreign genes can comprise native genes inserted into a non-native organism, or chimeric genes. A
"transgene" is a gene that has been introduced into the genome by a transformation procedure.
"Coding sequence" refers to a DNA sequence that codes for a specific amino acid,sequence. "Suitable regulatory sequences" refer to nucleotide sequences located upstream (5' non-coding sequences), within, or downstream (3' non-coding sequences) of a coding sequence, and which influence the transcription, RNA processing or stability, or translation of the associated coding sequence. Regulatory sequences may include promoters, translation leader sequences, introns, polyadenylation recognition sequences, RNA processing sites, effector binding sites and stem-loop structures.
"Promoter" refers to a DNA sequence capable of controlling the expression of a coding sequence or functional RNA. In general, a coding sequence is located 3' to a promoter sequence. Promoters may be derived in their entirety from a native gene, or be composed of different elements derived from different promoters found in nature, or even comprise synthetic DNA segments. It is understood by those skilled in the art that different promoters may direct the expression of a gene in different tissues or cell types, or at different stages of development, or in response to different environmental or physiological conditions. Promoters that cause a gene to be expressed in most cell types at most times are comm~nly referred tea as "eonstit~ati~,~e prom~ters". It is further reeogni~ed that since in m~st cases the r~isaet boundaries of regulat~ry sequences have n~t been completely defined, Di~A fragments of different lengths may have identical promoter activity.
The "3' non-coding sequences" refer o DI~A sequences located downstream of a coding sequence and include polyadenylation recognition sequences and other sequences encoding regulatory signals capable of affecting mRNA processing or gene expression. The polyadenylation signal (normally limited to eukaryotes) is usually characterized by affecting the addition of polyadenylic acid tracts to the 3' end of the mRNA precursor.
"RNA transcript" refers to the product resulting from RNA
polymerase-catalyzed transcription of a DNA sequence. When the RNA
transcript is a perfect complementary copy of the DNA sequence, it is referred to as the primary transcript or it may be a RNA sequence derived from post-transcriptional processing of the primary transcript and is referred to as the mature RNA. "Messenger RNA" or "mRNA" refers to the RNA that is without introns and that can be translated into protein by the cell. "cDNA" refers to a double-stranded DNA that is complementary to and derived from mRNA. "Sense" RNA refers to RNA transcript that includes the mRNA and so can be translated into protein by the cell.
"Antisense RNA" refers to an RNA transcript that is complementary to all or part of a target primary transcript or mRNA and that blocks the expression of a target gene (US 5,107,065; WO 99/28508). The complementarity of an antisense RNA may be with any part of the specific gene transcript, i.e., at the 5' non-coding sequence, 3' non-coding sequence, or the coding sequence. "Functional RNA" refers to antisense RNA, ribozyme RNA, or other RNA that is not translated yet has an effect on cellular processes.
The term "operably linked" refers to the association of nucleic acid sequences on a single nucleic acid fragment so that the function of one is affected by the other. For example, a promoter is operably linked with a coding sequence when it is capable of affecting the expression of that coding sequence (i.e., the coding sequence is under the transcriptional control of the promoter). Coding sequences can be operably linked to regulatory sequences in sense or antisense orientation.
The term "expression", as used herein, refers to the transcription and stable accumulation of sense (mRNA) or antisense RNA derived from a nucleic acid ~trac~ment ~f the inveni:ion. Eaepression may also refer t~
translation ~f mRi~A int~ a p~lypeptide.
"Mature" protein refers to a post-translan~nally processed polypeptide; i.e., one from which any pre- or propeptides present in the primary translation product have been remo~eal. "Precursor" protein refers to the primary product of translation of mRNA; i.e., with pre- and propeptides still present. Pre- and propeptides may be (but are not limited to) intracellular localization signals.
The term "signal peptide" refers to an amino terminal polypeptide preceding the secreted mature protein. The signal peptide is cleaved from, and is therefore not present in, the mature protein. Signal peptides have the function of directing and translocating secreted proteins across cell membranes. A signal peptide is also referred to as a signal protein.
"Conjugation" refers to a particular type of transformation in which a unidirectional transfer of DNA (e.g., from a bacterial plasmid) occurs from one bacterium cell (i.e., the "donor") to another (i.e., the "recipient"). The process involves direct cell-to-cell contact. Sometimes another bacterial cell (i.e., the "helper") is present to facilitate the conjugation.
"Transformation" refers to the transfer of a nucleic acid fragment into the genome of a host organism, resulting in genetically stable inheritance. Host organisms containing the transformed nucleic acid fragments are referred to as "transgenic", "recombinant" or "transformed"
organisms.
The terms "plasmid", "vector" and "cassette" refer to an extra chromosomal element often carrying genes which are not part of the central metabolism of the cell, and usually in the form of circular double-stranded DNA fragments. Such elements may be autonomously replicating sequences, genome integrating sequences, phage or nucleotide sequences, linear or circular, of a single- or double-stranded DNA or RNA, derived from any source, in which a number of nucleotide sequences have been joined or recombined into a unique construction which is capable of introducing a promoter fragment and DNA sequence for a selected gene product along with appropriate 3' untranslated sequences into a cell. "Transformation cassette" refers to a specific vector containing a foreign genes) and having elements in addition to the foreign genes) that facilitate transformation of a particular host cell.
"Expression cassette" refers to a specific vector containing a foreign r~ene(s) and h~~,~ing elements in addition tea the foreign genes) that ~Ilo~,~
f~r enhanced e~zpression of that genes) in a foreign host.
The term "altered biological activity" will refer to an activity, associated e~ith a protein encoded by a nucleoside sequence which can be measured by an assay method, where that activity is either greater than or less than the activity associated with the native sequence. "Enhanced biological activity" refers to an altered activity that is greater than that associated with the native sequence. "Diminished biological activity" is an altered activity that is less than that associated with the native sequence.
The term "sequence analysis software" refers to any computer algorithm or software program that is useful for the analysis of nucleotide or amino acid sequences. "Sequence analysis software" may be commercially available or independently developed. Typical sequence analysis software will include, but is not limited to: 1.) the GCG suite of programs (Wisconsin Package Version 9.0, Genetics Computer Group (GCG), Madison, WI); 2.) BLASTP, BLASTN, BLASTX (Altschul et al., J. Mol. Biol. 215:403-410 (1990)); 3.) DNASTAR (DNASTAR, Inc.
Madison, WI); 4.) the FASTA program incorporating the Smith-Waterman algorithm (W. R. Pearson, Comput. Methods Genome Res., [Proc. Int.
Symp.] (1994), Meeting Date 1992, 111-20. Editor(s): Suhai, Sandor.
Plenum: New York, NY); and 5.) the Vector NTI programs version 7.0 (Informax, Inc., Bethesda, MD). Within the context of this application it will be understood that where sequence analysis software is used for analysis, that the results of the analysis will be based on the "default values" of the program referenced, unless otherwise specified. As used herein "default values" will mean any set of values or parameters (set by the software manufacturer) which originally load with the software when first initialized.
Standard recombinant DNA and molecular cloning techniques used herein are well known in the art and are described by (Maniatis, supra); by Silhavy, T. J., Bennan, M. L. and Enquist, L. W., Experiments with Gene Fusions, Cold Spring Harbor Laboratory: Cold Spring Harbor, NY (1984);
and by Ausubel, F. M. et al., Current Protocols in Molecular Bioloay, published by Greene Publishing Assoc. and Wiley-Interscience (1987).
Genes Involved in Carotenoid Production The enzyme pathway involved in the biosynthesis of carotenoid compounds can be conveniently viewed in two parts, the upper isoprenoid pathway providing for the conversion of pyruvate and glyceraldehyde-3-phosphate to farnesyl pyrophosphate and the lower carotenoid biosynthetic path~~ray, er~hich provides f~r the synthesis of either diapophytoene or phytoene and all subsequently produced carotenoids (Figure 1). The upper pathway is ubiquitous in many microorganisms and in these cases it may only be necessary to introduce genes that comprise the lower pathway for biosynthesis of the desired carotenoid. The. division between the two pathways concerns the synthesis of farnesyl pyrophosphate (FPP). Where FPP is naturally present, only elements of the lower carotenoid biosynthetic pathway will be needed. However, it will be appreciated that for the lower pathway carotenoid genes to be effective in the production of carotenoids, it will be necessary for the host cell to have suitable levels of FPP within the cell. Where FPP synthesis is not provided by the host cell, it will be necessary to introduce the genes necessary for the production of FPP. Each of these pathways will be discussed below in detail.
The Upper Isoprenoid Pathway Isoprenoid biosynthesis occurs through either of two pathways, generating the common C5 isoprene subunit, isopentenyl pyrophosphate (IPP). First, IPP may be synthesized through the well-known acetate/mevalonate pathway. However, recent studies have demonstrated that the mevalonate-dependent pathway does not operate in all living organisms. An alternate mevalonate-independent pathway for IPP biosynthesis has been characterized in bacteria and in green algae and higher plants (Horbach et al., FEMS Microbiol. Lett. 111:135-140 (1993); Rohmer et al., Biochem. 295:517-524 (1993); Schwender et al., Biochem. 316:73-80 (1996); Eisenreich et al., Proc. Natl. Acad. Sei. USA
93:6431-6436 (1996)).
Many steps in the mevalonate-independent isoprenoid pathway are known (Figure 1). For example, the initial steps of the alternate pathway leading to the production of IPP have been studied in Mycobacterium tuberculosis by Cole et al. (Nature 393:537-544 (1998)). The first step of the pathway involves the condensation of two 3-carbon molecules (pyruvate and D-glyceraldehyde 3-phosphate) to yield a 5-carbon compound known as D-1-deoxyxylulose-5-phosphate. This reaction occurs by the Dxs enzyme, encoded by the dxs gene. Next, the isomerization and reduction of D-1-deoxyxylulose-5-phosphate yields 2-C-methyl-D-erythritol-4-phosphate. One of the enzymes involved in the isomerization and reduction process is D-1-deoxyxylulose-5-phosphate reductoisomerase (Dxr), encoded by the gene dxr. 2-C-methyl-D-erythrit~I-~.-ph~sphate is subseq~acntly c~nverted into q.-diphosphocytidyl-2C-methyl-D-erythritol in a CTP-dependent reaction by the enzyme encoded by the n~n-annotated gene yg,~~ (Cole et al., scepra). F~ecently, however, the yglaP gene was renamed as isp~ as a part of the isp gene cluster (SwissProtein Accession #46393).
Next, the 2nd position hydroxy group of 4-diphosphocytidyl-2C-methyl-D-erythritol can be phosphorylated in an ATP-dependent reaction by the enzyme encoded by the ychB gene. This product phosphorylates 4-diphosphocytidyl-2C-methyl-D-erythritol, resulting in 4-diphosphocytidyl-2C-methyl-D-erythritol 2-phosphate. The ychB gene was renamed as ispE, also as a part of the isp gene cluster (SwissProtein Accession #P24209). Finally, the product of the ygbB gene converts 4-diphosphocytidyl-2C-methyl-D-erythritol 2-phosphate to 2C-methyl-D-erythritol 2,4-cyclodiphosphate in a CTP-dependent manner. This gene has also been recently renamed, and belongs to the isp gene cluster.
Specifically, the new name for the ygbB gene is ispF (SwissProtein Accession #P36663). The product of the pyre gene is important in these reactions, as a CTP synthase.
The enzymes encoded by the IytB and gcpE genes (and perhaps others) are thought to participate in the reactions leading to formation of isopentenyl pyrophosphate (IPP) and dimethylallyl pyrophosphate (DMAPP). IPP may be isomerized to DMAPP via isopentenyl diphosphate isomerase (or "IPP isomerase"), encoded by the idi gene; however, this enzyme is not essential for survival and may be absent in some bacteria using the 2-C-methyl-D-erythritol 4-phosphate (MEP) pathway. Recent evidence suggests that the MEP pathway branches before IPP and separately produces IPP and DMAPP via the IytB gene product. A IytB
knockout mutation is lethal in E, eoli except in media supplemented with both IPP and DMAPP.
The synthesis of FPP occurs via the isomerization of IPP to dimethylallyl pyrophosphate (DMAPP). This reaction is followed by a sequence of two prenyltransferase reactions catalyzed by ispA, leading to the creation of geranyl pyrophosphate (GPP; a 10-carbon molecule) and farnesyl pyrophosphate (FPP; a 15-carbon molecule), respectively.
The Lower Carotenoid Biosynthetic Pathway The division between the upper isoprenoid pathway and the lower carotenoid pathway is somewhat subjective. Because FPP synthesis is common in both carotenogenic and non=carotenogenic bacteria, the applicants consider the first step in thr loe~er carotenoid biosynthetic pathway to begin with the conversion ~f farnesyl pyrophosphate (FPP) tca compounds of two divergent pathways, leading to the formation of either C3o diapocarotenoids or Cq.o carotenoids.
1/Vithin the Coo pathway, the first step in the biosynthetic pathway begins with the prenyltransferase reaction converting farnesyl pyrophosphate (FPP) to a 20-carbon molecule known as geranylgeranyl pyrophosphate (GGPP) by the addition of IPP. The gene ertE (EC
2.5.1.29), encoding GGPP synthetase, is responsible for this prenyltransferase reaction. Then, a condensation reaction of two molecules of GGPP occurs to form phytoene ((7,8,11,12,7',8',11',12'-~-octahydro-w, cu-carotene; or PPPP), the first 40-carbon molecule of the lower carotenoid biosynthesis pathway. This enzymatic reaction is catalyzed by CrtB (phytoene synthase; EC 2.5.1.-).
From the compound phytoene, a spectrum of C4o carotenoids is produced by subsequent hydrogenation, dehydrogenation, cyclization, oxidation, or any combination of these processes. For example, lycopene, which imparts a "red"-colored spectra, is produced from phytoene through four sequential dehydrogenation reactions by the removal of eight atoms of hydrogen, catalyzed by the gene crtl (encoding phytoene desaturase) (see Figure 2). Lycopene cyclase (CrtY) converts lycopene to ~-carotene (~,~-carotene). (3-carotene is converted to zeaxanthin ((3R,3'R)- (i,a-carotene-3,3'-diol) via a hydroxylation reaction resulting from the activity of ~i-carotene hydroxylase (encoded by the crtZ gene). Zeaxanthin can be converted to zeaxanthin-~i-glucosides by zeaxanthin glucosyl transferase (EC 2.4.1.-; encoded by the crtX gene).
In addition to crtE, crt)C, ertY, crtl, crt8, and crtZ, which can be utilized in combination to create phytoene, lycopene, ~-carotene, zeaxanthin, and zeaxanthin-~3-glucosides, various other crt genes are known which enable the intramolecular conversion of linear Cq.o compounds to produce numerous other functionalized carotenoid compounds. One skilled in the art will be able to identify various other crt genes, according to publicly available literature (e.g., GenBank~), the patent literature, and experimental analysis of microorganisms having the ability to produce carotenoids. For example:
~ (i-carotene can be converted to canthaxanthin by ~3-carotene ketolases encoded by crtVV (e.g., GenBank~ Accession #s AF218415, D45881, D58420, D58422, X86782, Y15112), crt0 (e.g., GenBanl~~J Ace~ession #s ~58~ 782 and X15112) ~r ~f;t.
~chinen~ne in an interr~edi~ate in this reacti~n.
Oantha~a~nthin can be c~nverted tca astaxanthin by (3-carotene hydro~;ylase encoded by the c~h gene. Adonirubin is an intermediate in this reaction.
~ 2eaxanthin can be converted to astaxanthin by ~i-carotene ketolases encoded by crfl~l, cri~, or bkt. Adonixanthin is an intermediate in this reaction.
~ Spheroidene can be converted to spheroidenone by spheroidene monooxygenase encoded by crtA (e.g., GenBank~ Accession #s AJ010302, 211165, and X52291 ).
~ Neurosporene can be converted to spheroidene and lycopene can be converted to spirilloxanthin by the sequential actions of hydroxyneurosporene synthase, methoxyneurosporene desaturase and hydroxyneurosporene-O-methyltransferase encoded by the crtC (e.g., GenBank~ Accession #s AB034704, AF195122, AJ010302, AF287480, U73944, X52291, 211165, 221955), crtD (e.g., GenBank~ Accession #s AJ010302, X63204, U73944, X52291, 211165) and crtF (e.g., GenBank~
Accession #s AB034704, AF288602, AJ010302, X52291, and 211165) genes, respectively.
~ ~i-carotene can be converted to isorenieratene by ~3-carotene desaturase encoded by crtU (e.g., GenBank~ Accession #s AF047490, AF121947, AF139916, AF195507, AF272737, AF372617, AJ133724, AJ224683, D26095, U38550, X89897, and Y15115).
These examples are not limiting and many other carotenoid genes and products exist within this C4o lower carotenoid biosynthetic pathway.
Thus, by using various combinations of the crtE, crt?P, crtY, crtl, crtB, and crtZ genes presented herein, optionally in addition with any other known crt genes) isolated from plant, animal, and/or bacterial sources, innumerable different carotenoids and carotenoid derivatives could be made using the methods of the present invention, provided sufficient sources of FPP are available in the host organism.
It is envisioned that useful products of the present invention will include any carotenoid compound as defined herein including, but not limited to: antheraxanthin, adonirubin, adonixanthin, astaxanthin, canthaxanthin, capsorubrin, (i-cryptoxanthin, a,-carotene, ~-carotene, epsilo~i-car~tene, echinen~ne, 3-hydr~~eyechinen~ne, 3'_ hydr~~zyechinen~ne, y-carotene, 4-I-seto-y-car~tene, ~-car~tene, c~-crypt~~~anthin, de~~;yfle~ci~zanthin, diet~a~anthin, 7,8-didehydroasta~za~nthin, fuco~eanthin, fuco~ganthinol, isorenieratene, laetucaxa~nthin, lutein, lycopene, myxobactone, neoxanthin, neurosporene, hydroxyneurosporene, peridinin, phytoene, rhodopin, rhodopin glucoside, 4-keto-rubixanthin, siphonaxanthin, spheroidene, spheroidenone, spirilloxanthin, 4-keto-torulene, 3-hydroxy-4-keto-torulene, uriolide, uriolide acetate, violaxanthin, zeaxanthin-~-diglucoside, and zeaxanthin.
Additionally, the invention encompasses derivitization of these molecules to create hydroxy-, methoxy-, oxo-, epoxy-, carboxy-, or aldehydic functional groups, glycoside esters, or sulfates.
Interaction between the Upper Isoprenoid Pathway and the Lower Carotenoid Biosynthetic Pathway A variety of studies have attempted to enhance carotenoid production by enhancing overall isoprenoid biosynthesis. The up-regulation of idi, in particular, has been demonstrated to dramatically affect carotenoid production. For example, Kajiwara et al. (Biochem. J.
324:421-426 (1997)) first demonstrated that "IPP isomerase forms an influential step in isoprenoid biosynthesis of the prokaryote E. coli, with potential for the efficient production of industrially useful isoprenoids by metabolic engineering". Specifically, exogenously expressed IPP
isomerases permitted 3.6-4.5 fold greater levels of lycopene production in E. coli comprising an Erwinia carotenoid biosynthesis gene cluster, as compared to the control; likewise, 1.5-2.7 fold greater levels of ~i-carotene and 1.7-2.1 fold greater levels of phytoene were produced.
Subsequent work by Wang et al. (Biotech. Bioengineering 62(2):235-241 (1999)) resulted in 50 times greater astaxanthin production in an E. coli transformed with the E. coli idi gene, Archaeoglobus fulgidus gps gene, and Agrobacterium aurantiacum crtBIYZV1/gene cluster. It was concluded that the last step in GGPP synthesis is the first rate-controlling step in carotenoid production, while the second rate-controlling step was IPP isomerization. Finally, Albrecht et al. (Biotech. Letters 21:791-795 (1999)) discovered that over-expression of the endogenous dxs and dxr genes and an exogenous idi gene (from Phaffia rhodozyma) in E. coli could stimulate carotenogenesis up to 3.5 fold.
Thus, metabolic engineering methods directed toward maximizing the production of industrially valuable carotenoids in E. coli and other bacteria should carefully consider the flint and rate-limiting steps in the upper isopren~id pathway, as well as e~zpression levels within the lower carotenoid biosynthetic pathway. Over-eztpression of rate-limiting genes of the upper isoprenoid pathway (e.g., ids' can dramatically increase carotenogenesis.
Seguence Identification of P. siev~arfii strain DC4.13 Carotenoid Biosynthetic Oenes and Enzymes A variety of nucleotide sequences have been isolated from strain DC413 encoding gene products involved in the Cqp carotenoid biosynthetic pathway. ORF's 1 and 3-7, for example, encode the crtE, crt~C, crtY, crtl, crt8 and crtZ genes in the lower carotenoid biosynthetic pathway (see Figures 1 and 2) and their enzymatic products lead to the production of the pigmented carotenoids lycopene, ~-carotene, zeaxanthin, and zeaxanthin-~i-glucosides. ORF 2 encodes the idi gene in the upper isoprenoid pathway. These 7 ORFs are comprised on a single nucleic acid fragment (SEQ ID N0:20), having the following genetic organization: crtE-idi-crtX crtY crtl-crt8-crtZ. The crtE-idi-crtX crtY crtl-crt8 genes appear operably linked in an operon, whereas the crtZ gene is transcribed in the opposite orientation.
The entire set of genes (crtE idi-crtX crtY crtl-crt8-crtZ) isolated from strain DC413 are disclosed herein in a single sequence (SEQ ID
NO:20). This gene cluster has been placed on a vector and expressed in microbial hosts for the production of carotenoid compounds. The skilled person will recognize that minor nucleic acid substitutions, additions and deletions (such as the substitutions of preferred codons for specific host cell expression) may be made to such a gene cluster without affecting its utility provided that all of the encoded polypeptides are expressed and are enzymatically active. Accordingly it is within the scope of the invention to provide an isolated nucleic acid molecule as set forth in SEQ ID N0:20, comprising the crtE-idi-criY crtl-crt8-crtZ, genes or an isolated nucleic acid molecule having at least 95% identity to SEQ ID N0:20, wherein the isolated nucleic acid molecule encodes all of the polypeptides crtE, idi, crtX, crtY, crtl, crtB, and crtZ
Comparison of the crtE nucleotide base and deduced amino acid sequences (ORF 1) to public databases reveals that the most similar known sequences are about 66% identical to the amino acid sequence of CrtE reported herein over a length of 302 amino acids using a Smith-Waterman alignment algorithm (W. R. Pearson, supra). More preferred an~inc acid fragments, are at least ab~ut ~0%-80% identical to the sequences herein, where those sequences that are 85%-90% identical are particularly suitable and those sequences that are about 95% identical are most preferred. Similarly, preferred crfE encoding nucleic acid sequences corresponding to the instant ORF's are those encoding active proteins and which are at least about 70%-80% identical to the nucleic acid sequences of crfiE reported herein, where those sequences that are 85%-90%
identical are particularly suitable and those sequences that are about 95%
identical are most preferred.
Comparison of the idi nucleotide base and deduced amino acid sequences (ORF 2) to public databases reveals that the most similar known sequences are about 65% identical to the amino acid sequence of Idi reported herein over a length of 344 amino acids using a Smith-Waterman alignment algorithm (W. R. Pearson, supra). More preferred amino acid fragments are at least about 70%-80% identical to the sequences herein, where those sequences that are 85%-90% identical are particularly suitable and those sequences that are about 95% identical are most preferred. Similarly, preferred idi encoding nucleic acid sequences corresponding to the instant ORF's are those encoding active proteins and which are at least about 70%-80% identical to the nucleic acid sequences of idi reported herein, where those sequences that are 85%-90% identical are particularly suitable and those sequences that are about 95% identical are most preferred.
Comparison of the crt)C nucleotide base and deduced amino acid sequences (ORF 3) to public databases reveals that the most similar known sequences are about 59% identical to the amino acid sequence of Idi reported herein over a length of 429 amino acids using a Smith-Waterman alignment algorithm (W. R. Pearson, supra). More preferred amino acid fragments are at least about 70%-80% identical to the sequences herein, where those sequences that are 85%-90% identical are particularly suitable and those sequences that are about 95% identical are most preferred. Similarly, preferred crtX encoding nucleic acid sequences corresponding to the instant ORF's are those encoding active proteins and which are at least about 70%-80% identical to the nucleic acid sequences of crtX reported herein, where those sequences that are 85%-90%
identical are particularly suitable and those sequences that are about 95%
identical are most preferred.

C~rnparis~an ~f the c~~~~°-n~acle~atide base and deduced amino acid sequences (ORF 4) t~ public databases reveals that the most sinlilar hnovan sequences are ab~~at 54% identical to the amino acid sequence of Crt~ reported herein over a length of 387 amino acids using a Smith-Waterman alignment algorithm (W. R. Pearson, supra). fi~lore preferred amino acid fragments are at least about 70%-80% identical to the sequences herein, where those sequences that are 85%-90% identical are particularly suitable and those sequences that are about 95% identical are most preferred. Similarly, preferred crtY encoding nucleic acid sequences corresponding to the instant ORF's are those encoding active proteins and which are at least about 70%-80% identical to the nucleic acid sequences of crtY reported herein, where those sequences that are 85%-90%
identical are particularly suitable and those sequences that are about 95%
identical are most preferred.
Comparison of the crtl nucleotide base and deduced amino acid sequences (ORF 5) to public databases reveals that the most similar known sequences are about 81 % identical to the amino acid sequence of Crtl reported herein over a length of 493 amino acids using a Smith-Waterman alignment algorithm (W. R. Pearson, supra). Preferred amino acid fragments are at least about 70%-80% identical to the sequences herein, where those sequences that are 85%-90% identical are particularly suitable and those sequences that are about 95% identical are most preferred. Similarly, preferred crtl encoding nucleic acid sequences corresponding to the instant ORF's are those encoding active proteins and which are at least about 70%-80% identical to the nucleic acid sequences of crtl reported herein, where those sequences that are 85%-90% identical are particularly suitable and those sequences that are about 95% identical are most preferred.
Comparison of the crtB nucleotide base and deduced amino acid sequences (ORF 6) to public databases reveals that the most similar known sequences are about 67% identical to the amino acid sequence of CrtB reported herein over a length of 309 amino acids using a Smith-Waterman alignment algorithm (W. R. Pearson, supra). More preferred amino acid fragments are at least about 70%-80% identical to the sequences herein, where those sequences that are 85%-90% identical are particularly suitable and those sequences that are about 95% identical are most preferred. Similarly, preferred crt8 encoding nucleic acid sequences corresponding to the instant ORF's are those encoding active proteins and which are at least about 70oAo-80°~'o identical t~ the nucleic acid sequences ~f c~~ rep~rted herein, where those sequences that are 85%-90%
identical are particularly suitable and those sequences that are about 95°~~
identical are most preferred.
Comparison of the crl~ nucleotide base and deduced amino acid sequences (ORF 7) to public databases reveals that the most similar known sequences are about 82% identical to the amino acid sequence of Crt~ reported herein over a length of 177 amino acids using a Smith-Waterman alignment algorithm (W. R. Pearson, supra). Preferred amino acid fragments are at least about 70%-80% identical to the sequences herein, where those sequences that are 85%-90% identical are particularly suitable and those sequences that are about 95% identical are most preferred. Similarly, preferred crtZ encoding nucleic acid sequences corresponding to the instant ORF's are those encoding active proteins and which are at least about 70%-80% identical to the nucleic acid sequences of crt~ reported herein, where those sequences that are 85%-90%
identical are particularly suitable and those sequences that are about 95%
identical are most preferred.
Isolation of Homologs Each of the nucleic acid fragments of the C4o carotenoid biosynthetic pathway of the instant invention may be used to isolate genes encoding homologous proteins from the same or other microbial (or plant) species. Isolation of homologous genes using sequence-dependent protocols is well known in the art. Examples of sequence-dependent protocols include, but are not limited to: 1.) methods of nucleic acid hybridization; 2.) methods of DNA and RNA amplification, as exemplified by various uses of nucleic acid amplification technologies [e.g., polymerase chain reaction (PCR), Mullis et al., US 4,683,202; ligase chain reaction (LCR), Tabor, S. et al., Proc. Natl. Acad. Sci. USA 82:1074 (1985); or strand displacement amplification (SDA), Walker, et al., Proc.
Natl. Acad. Sci. USA, 89:392 (1992)]; and 3.) methods of library construction and screening by complementation.
For example, genes encoding similar proteins or polypeptides to those of the C4o carotenoid biosynthetic pathway, as described herein, could be isolated directly by using all or a portion of the instant nucleic acid fragments as DNA hybridization probes to screen libraries from any desired bacteria using methodology well known to those skilled in the art (wherein those bacteria producing Cq,p carotenoids would be preferred).

Specific oligonucleotide pr~bes based upon the: instant nucleic acid seqraences can be designed and synthesized by methods I~n~wn in the art (fi~aniatis, raia~-a). f~oreover, the entire sequences can be used directly to synthesize Di~~A probes by methods known t~ the drilled artisan (e.g., random primers DNA labeling, nick translation, or end-labeling techniques), or RNA probes using available in ~itr~ transcription systems.
In addition, specific primers can be designed and used to amplify a part of (or full-length of) the instant sequences. The resulting amplification products can be labeled directly during amplification reactions or labeled after amplification reactions, and used as probes to isolate full-length DNA
fragments under conditions of appropriate stringency.
Typically, in PCR-type amplification techniques, the primers have different sequences and are not complementary to each other.
Depending on the desired test conditions, the sequences of the primers should be designed to provide for both efficient and faithful replication of the target nucleic acid. Methods of PCR primer design are common and well known in the art (Thein and Wallace, "The use of oligonucleotides as specific hybridization probes in the Diagnosis of Genetic Disorders", in Human Genetic Diseases: A Practical Approach, K. E. Davis Ed., (1986) pp 33-50, IRL: Herndon, VA; and Rychlik, W., In Methods in Molecular Bioloay, White, B. A. Ed., (1993) Vol. 15, pp 31-39, PCR Protocols:
Current Methods and Applications. Humania: Totowa, NJ).
Generally two short segments of the instant sequences may be used in polymerase chain reaction protocols to amplify longer nucleic acid fragments encoding homologous genes from DNA or RNA. The polymerase,chain reaction may also be performed on a library of cloned nucleic acid fragments wherein the sequence of one primer is derived from the instant nucleic acid fragments, and the sequence of the other primer takes advantage of the presence of the polyadenylic acid tracts to the 3' end of the mRNA precursor encoding microbial genes.
Alternatively, the second primer sequence may be based upon sequences derived from the cloning vector. For example, the skilled artisan can follow the RACE protocol (Frohman et al., Proc. Natl. Acad.
Sci. USA 85:8998 (1988)) to generate cDNAs by using PCR to amplify copies of the.region between a single point in the transcript and the 3' or 5' end. Primers oriented in the 3' and 5' directions can be designed from the instant sequences. Using commercially available 3' RACE or 5' RACE
systems (BRL, Gaithersburg, MD), specific 3' or 5' cDNA fragments can be is~lated (~hara e~: ~~lo, h~-~e. Nay"to A~ad. ~ci. USA 3~:56~3 (1930; Loh et al., Science ~~.3:~1~ (1939)).
~4lternatively, the instant sequences of the C~.~ carotenoid biosynthetic pathway may be employed as hybridization reagents for the identification of homologs. The basic components of a nucleic acid hybridization test include a probe, a sample suspected of containing the gene or gene fragment of interest, and a specific hybridization method.
Probes of the present invention are typically single-stranded nucleic acid sequences that are complementary to the nucleic acid sequences to be detected. Probes are "hybridizable" to the nucleic acid sequence to be detected. The probe length can vary from 5 bases to tens of thousands of bases, and will depend upon the specific test to be done. Typically a probe length of about 15 bases to about 30 bases is suitable. Only part of the probe molecule need be complementary to the nucleic acid sequence to be detected. In addition, the complementarity between the probe and the target sequence need not be perfect. Hybridization does occur between imperfectly complementary molecules with the result that a certain fraction of the bases in the hybridized region are not paired with the proper complementary base. ' Hybridization methods are well defined. Typically the probe and sample must be mixed under conditions which will permit nucleic acid hybridization. This involves contacting the probe and sample in the presence of an inorganic or organic salt under the proper concentration and temperature conditions. The probe and sample nucleic acids must be in contact for a long enough time that any possible hybridization between the probe and sample nucleic acid may occur. The concentration of probe or target in the mixture will determine the time necessary for hybridization to occur. The higher the probe or target concentration, the shorter the hybridization incubation time needed. Optionally, a chaotropic agent may be added. The chaotropic agent stabilizes nucleic acids by inhibiting nuclease activity. Furthermore, the chaotropic agent allows sensitive and stringent hybridization of short oligonucleotide probes at room temperature (Van Ness and Chen, Nucl. Acids Res. 19:5143-5151 (1991 )). Suitable chaotropic agents include guanidinium chloride, guanidinium thiocyanate, sodium thiocyanate, lithium tetrachloroacetate, sodium perchlorate, rubidium tetrachloroacetate, potassium iodide, and cesium trifluoroacetate, among others. Typically, the chaotropic agent will be present at a final c~ncentrati~n of about 3 f~. If desired, one can adc~
formamide to the hybridizati~n mi~~ture, typically 30-50~/~ (v/v).
carious hybridization soluti~ns can be c mployed. Typically, these comprise from about ~0 to 50~/~ volume, preferably 30°/~, of a polar organic solvent. A common hybridization solution employs about 30-50~/o v/v formamide, about 0.15 to 1 M sodium chloride, about 0.05 to 0.1 M buffers (e.g., sodium citrate, Tris-HCI, PIPES or HEPES (pH range about 6-9)), about 0.05 to 0.2% detergent (e.g., sodium dodecylsulfate), or between 0.5-20 mM EDTA, FICOLL (Pharmacia Inc.) (about 300-500 kdal), polyvinylpyrrolidone (about 250-500 kdal), and serum albumin. Also included in the typical hybridization solution will be unlabeled carrier nucleic acids from about 0.1 to 5 mg/mL, fragmented nucleic DNA (e.g., calf thymus or salmon sperm DNA, or yeast RNA), and optionally from about 0.5 to 2% wt/vol glycine. Other additives may also be included, such as volume exclusion agents that include a variety of polar water-soluble or swellable agents (e.g., polyethylene glycol), anionic polymers (e.g., polyacrylate or polymethylacrylate), and anionic saccharidic polymers (e.g., dextran sulfate).
Nucleic acid hybridization is adaptable to a variety of assay formats. One of the most suitable is the sandwich assay format. The sandwich assay is particularly adaptable to hybridization under non-denaturing conditions. A primary component of a sandwich-type assay is a solid support. The solid support has adsorbed to it or covalently coupled to it immobilized nucleic acid probe that is unlabeled and complementary to one portion of the sequence.
Availability of the instant nucleotide and deduced amino acid sequences facilitates immunological screening of DNA expression libraries. Synthetic peptides representing portions of the instant amino acid sequences may be synthesized. These peptides can be used to immunize animals to produce polyclonal or monoclonal antibodies with specificity for peptides or proteins comprising the amino acid sequences.
These antibodies can be then be used to screen DNA expression libraries to isolate full-length DNA clones of interest (Lerner, R. A., Adv. Immunol.
36:1 (1984); Maniatis, supra).
Genetic Orgianization Although a variety of gene sequences are available encoding idi and crtE, crtX, crtY, crtl, crt8, and crtZ from various species within the genera Pantoea, the instant nucleic acid fragment disclosed as SEQ ID

F~~a:.~'~ (~°i2i f~la) appears tea have a partic~al~~rly ~asef~al genetic organization of c~fE-i~9i-cr~e crl~ cr~l-c~8-crlZ, es~herein:
cr~E (SEQ ID 10:1) is located at nuclra~tides 1772 - 2680 and translated in a direct orientation;
o idi (SEQ ID NO:3) is located pat nucleotides 2715 - 374.9 and translated in a direct orientation;
~ crr~XX (SEQ ID NO:S) is located at nucleotides 374.6 -5035 and translated in a direct orientation;
~ crtY(SEQ ID N0:7) is located at nucleotides 5019 -6182 and translated in a direct orientation;
~ crtl (SEQ ID NO:9) is located at nucleotides 6179 - 7660 and translated in a direct orientation;
~ crtB (SEQ ID N0:11 ) is located at nucleotides 7653 - 8582 and translated in a direct orientation; and ~ crtZ (SEQ ID NO:13) is located at nucleotides 8521 - 9054. and translated in an orientation opposite to crtE-idi-crt~C crtl~ crtl-crtB.
The most "common" genetic organization of crt genes is that observed in P. ananatis (GenBank~ Accession No.D90087), P. stewartii (GenBank Accession No. AY166713), and Pantoea agglomerans pv.
milletiae (GenBank~ Accession No. AB076662), wherein the carotenogenic cluster comprises crtEXYIBZ (also notated as "crtE crtX
crtY crtl-crtB-crtZ'~.
P. agglomerans EHO-10 (GenBank~ Accession No. M87280) is annotated as comprising a carotenogenic cluster of crtE-hypothetical protein-crtJC crtY crtl-crtB-crtZ; however, bioinformatic analysis of the "hypothetical protein" by the Applicants' herein determined that the true P, agglomerans EHO-10 should be considered as comprising crtE-idi-crtX
crtY crtl-crtB-crtZ. Thus, P. agglomerans EHO-10 and P. stewartii DC413 share the same genetic organization.
The genetic organization disclosed herein may convey a significant advantage during metabolic engineering useful for maximizing the production of industrially valuable carotenoids in E. coli and other bacteria.
Specifically, since idi (encoding isopentenyl pyrophosphate isomerase) has been demonstrated to dramatically affect carotenoid production (Kajiwara et al., supra; Wang et al., supra; Albrecht et al., supra), and since this gene is directly incorporated into the carotenogenic crtE-idi-crtX
crtY crtl-crt8-crtZ cluster described herein, it is possible that expression of the operon will lead to increased isoprenoid flux into the lower carotenoid biosynthetic path~.~Pay, thercsby leaaling to increased car~tenoid proc~ucti~n an~1 titer.
Recombinant Eaz~aression in Micro~r~anisms The genes and gene products of the instant sequences may be produced in heterologous host cells, particularly in the cells of microbial hosts. Expression in recombinant microbial hosts may be useful for the expression of various pathway intermediates, and/or for the modulation of pathways already existing in the host for the synthesis of new products heretofore not possible using the host.
Methods for introduction of genes encoding the appropriate upper isoprene pathway genes and various combinations of the lower carotenoid biosynthetic pathway genes of the instant invention (optionally with other crt genes) into a suitable microbial host are common. As will be obvious to one skilled in the art, the particular functionalities required to be introduced into a host organism for production of a particular carotenoid product will depend on the host cell (and its native production of isoprenoid compounds), the availability of substrate, and the desired end product(s).
It will be appreciated that for the present carotenoid biosynthetic pathway genes to be effective in the production of carotenoids, it will be necessary for the host cell to have suitable levels of FPP within the cell.
FPP may be supplied exogenously, or may be produced endogenously by the cell, either through native or introduced genetic pathways. It is contemplated, therefore, that where a specific host cell does not have the genetic machinery to produce suitable levels of FPP, it is well within the grasp of the skilled person in the art to obtain any necessary genes of the upper isoprenoid pathway and engineer these genes into the host to produce FPP as the starting material for carotenoid biosynthesis through the lower pathway. As a precursor of FPP, IPP may be synthesized through the well-known acetate/mevalonate pathway. Alternatively, recent studies have demonstrated that the mevalonate-dependent pathway does not operate in all living organisms; an alternate mevalonate-independent pathway for IPP biosynthesis has been characterized in bacteria and in green algae and higher plants (Horbach et al., FEMS Microbiol. Lett.
111:135-140 (1993); Rohmer et al, Biochem. 295: 517-524 (1993);
Schwender et al., Biochem. 316: 73-80 (1996); Eisenreich et al., Proc.
Natl. Acad. Sci. USA 93: 6431-6436 (1996)).

It is ea~pected, for e~sar-nple, thafi introduction of chimeric genes encoding one or more of the instant lower Cq.~ caroten~id bi~synthetic pathway eriE~~IB~ sequences will lead to production of carotenoid compounds in the host microbe of choice. With an appropriate genetic transformation system, it should be possible to genetically engineer a variety of non-carotenogenic hosts. This has been shown, for example, using ErWinia herbicola crt genes, to produce various carotenoids in the hosts E. coli, Agr~,6acterium tumefaciens, Sacchar~rr~yces eerevisiae, Pichia pastoris (yeast), Aspergillus nidulans (fungi), Rhodobacter sphaeroides, and higher plants (U.S. 5,656,472). Thus, as described previously herein, antheraxanthin, adonirubin, adonixanthin, astaxanthin, canthaxanthin, capsorubrin, ~i-cryptoxanthin, a-carotene, ~-carotene, epsilon-carotene, echinenone, 3-hydroxyechinenone, 3'-hydroxyechinenone, y-carotene, 4-keto-y-carotene, ~-carotene, a,-cryptoxanthin, deoxyflexixanthin, diatoxanthin, 7,8-didehydroastaxanthin, fucoxanthin, fucoxanthinol, isorenieratene, lactucaxanthin, lutein, lycopene, myxobactone, neoxanthin, neurosporene, hydroxyneurosporene, peridinin, phytoene, rhodopin, rhodopin glucoside, 4-keto-rubixanthin, siphonaxanthin, spheroidene, spheroidenone, spirilloxanthin, 4-keto-torulene, 3-hydroxy-4-keto-torulene, uriolide, uriolide acetate, violaxanthin, zeaxanthin-~3-diglucoside, and zeaxanthin may all be produced in microbial hosts using the teachings herein, by introducing various combinations of the following crt enzyme functionalities (for example): CrtE, CrtX, CrtY, Crtl, CrtB, CrtZ, CrtW, CrtO, CrtA, CrtC, CrtD, CrtF, and CrtU. Thus, formation of phytoene from FPP requires CrtE and CrtB; the carotenoid-specific genes necessary for the synthesis of lycopene from FPP include crtE, crtB and crtl; and genes required for ~-carotene production from FPP include crtE, crtB, crtl, and crtY. Given this understanding of the relationship between the crt genes, it will be possible to select appropriate microbial host cells and crt genes for expression of any desired carotenoid product. In a similar manner, expression may be amplified by up-regulation of upper isoprene pathway genes, e.g., idi.
Microbial expression systems and expression vectors containing regulatory sequences that direct high level expression of foreign proteins are well known to those skilled in the art. Any of these could be used to construct chimeric genes for production of any of the gene products of the instant sequences. These chimeric genes could then be introduced into appropriate micr~organisms via transf~arrn~~tion to provide high level e~zpression of the enzymes.
Vectors or cassettes useful for the transformation of suitable hosfi cells are well l.nown in the art. Typically the vector or cassette contains sequences directing transcription and translation of the relevant gene(s), a selectable marker, and sequences allowing autonomous replication or chromosomal integration. Suitable vectors comprise a region 5' of the gene which harbors transcriptional initiation controls and a region 3' of the DNA fragment which controls transcriptional termination. It is most preferred when both control regions are derived from genes homologous to the transformed host cell, although it is to be understood that such control regions need not be derived from the genes native to the specific species chosen as a production host.
Initiation control regions or promoters which are useful to drive expression of the instant ORFs in the desired host cell are numerous and familiar to those skilled in the art. Virtually any promoter capable of driving these genes is suitable for the present invention including, but not limited to: CYC1, HIS3, GAL1, GAL10, ADH1, PGK, PHOS, GAPDH, ADC1, TRP1, URA3, LEU2, ENO, TPI (e.g., useful for expression in Saccharomyces); AOX1 (e.g., useful for expression in Pichia); and lac, ara, tet, trp, IPA, IPR, T7, tac, and trc (e.g., useful for expression in Escherichia coh~ as well as the amy, apr, npr promoters and various phage promoters useful for expression in, e.g., Bacillus. Additionally, the deoxy-xylulose phosphate synthase or methanol dehydrogenase operon promoter (Springer et al., FEMS Microbiol Lett 160:119-124 (1998)), the promoter for polyhydroxyalkanoic acid synthesis (Foellner et al., Appl.
Microbiol. Biotechnol. 40:284-291 (1993)), promoters identified from native plasmids in methylotrophs (EP 296484), Plac (Toyama et al., Microbiology 143:595-602 (1997); EP 62971), Ptrc (Brosius et al., Gene 27:161-172 (1984)), promoters identified from methanotrophs (PCT/US03i33698), and promoters associated with antibiotic resistance [e.g., kanamycin (Springer et al., FEMS Microbiol Lett 160:119-124 (1998); Ueda et al., Appl.
Environ. Microbiol. 57:924-926 (1991)) or tetracycline (US 4,824,786)] are suitable for expression in C1 metabolizers.
It is necessary to include an artificial ribosomal binding site ("RBS") upstream of a gene to be expressed, when the RBS is not provided by the vector. This is frequently required for the second, third, etc. genes) of an operon to be expressed, when a single promoter is driving the expression oi~ a first, second, third, etc. group ~f genes. Method~logy to determine the preferred sequence of a I~B~ in a particular host organism will be familiar to one of slciii in the art, as pare means for creation of this synthetic sits'.
S Termination control regions may also be derived from various genes native to the preferred hosts. ~ptionally, a termination site may be unnecessary; however, it is most preferred if included.
Merely inserting a gene into a cloning vector does not ensure that it will be successfully expressed at the level needed. In response to the need for a high expression rate, many specialized expression vectors have been created by manipulating a number of different genetic elements that control aspects of transcription, translation, protein stability, oxygen limitation, and secretion from the host cell. More specifically, the molecular features that have been manipulated to control gene expression include: 1.) the nature of the relevant transcriptional promoter and terminator sequences; 2.) the strength of the ribosome binding site; 3.) the number of copies of the cloned gene and whether the gene is plasmid-borne or integrated into the genome of the host cell; 4.) the final cellular location of the synthesized foreign protein; 5.) the efficiency of translation in the host organism; 6.) the intrinsic stability of the cloned gene protein within the host cell; and 7.) the codon 'usage within the cloned gene, such that its frequency approaches the frequency of preferred codon usage of the host cell. Each of these types of modifications are encompassed in the present invention, as means to further optimize expression of C4o carotenoids.
Finally, to promote accumulation of Cq.o carotenoids, it may be necessary to reduce or eliminate the expression of certain genes in the target pathway or in competing pathways that may serve as sinks for energy or carbon. Alternatively, it may be useful to over-express various genes upstream of desired carotenoid intermediates to enhance production. Methods of manipulating genetic pathways for the purposes described above are common and well known in the art.
For example, once a key genetic pathway has been identified and sequenced, specific genes may be up-regulated to increase the output of the pathway. For example, additional copies of the targeted genes may be introduced into the host cell on multicopy plasmids such as pBR322.
Alternatively the target genes may be modified so as to be under the control of non-native promoters. Where it is desired that a pathway operate at a particular point in a cell cycle or during a fermentati~n roan, regulated or ind~acible promoters may used t~ replace the native prom~ter of the target gene. Similarly, in some cases the native or endogenous prom~ter may be modified to increase gene ez~pression. For eazample, endogenous promoters can be altered in ~iv~ by mutation, deleti~n,and/or substitution (see, tJS 5,565,350; darling et al., PCT/1JS93/03$63).
Alternatively, where the sequence of the gene to be disrupted is known, one of the most effective methods for gene down-regulation is targeted gene disruption, where foreign DNA is inserted into a structural gene so as to disrupt transcription. This can be effected by the creation of genetic cassettes comprising the DNA to be inserted (often a genetic marker) flanked by sequences having a high degree of homology to a portion of the gene to be disrupted. Introduction of the cassette into the host cell results in insertion of the foreign DNA into the structural gene via the native DNA replication mechanisms of the cell. (See, for example:
Hamilton et al., J. Bacteriol. 171:4617-4622 (1939); Balbas et al., Gene 136:211-213 (1993); Gueldener et al., Nucleic Acids Res. 24:2519-2524 (1996); and Smith et al., Methods Mol. Cell. Biol. 5:270-277(1996)).
Antisense technology is another method of down-regulating genes where the sequence of the target gene is known. To accomplish this, a nucleic acid segment from the desired gene is cloned and operably linked to a promoter such that the anti-sense strand of RNA will be transcribed.
This construct is then introduced into the host cell and the antisense strand of RNA is produced. Antisense RNA inhibits gene expression by preventing the accumulation of mRNA encoding the protein of interest.
The person skilled in the art will know that special considerations are associated with the use of antisense technologies in order to reduce expression of particular genes. For example, the proper level of expression of antisense genes may require the use of different chimeric genes utilizing different regulatory elements known to the skilled artisan.
Although targeted gene disruption and antisense technology offer effective means of down-regulating genes where the sequence is known, other less specific methodologies have been developed that are not sequence-based. For example, cells may be exposed to UV radiation and then screened for the desired phenotype. Mutagenesis with chemical agents is also effective for generating mutants and commonly used substances include chemicals that affect nonreplicating DNA (e.g., HN02 and NH20H), as well as agents that affect replicating DNA (e.g., acridine dyes, notable f~r causing frarneshift mutations). Specific methods for creating mutants using radiati~n or chemical agents are ~,~,~ell documented in the art. See, for e~zample: Thomas D. Br~clz in.Biotechnolo~~: A
Textbool. of Industrial if~icrobioloa~~, 2nd ed., (1939) Sina~aer Associates:
Sunderland, I~iA; or De~hpande, l~lukund ~., ~4ppl. Si~cf~er~~. ~i~f~ehn~I.
36: 227-234 (1992).
Another non-specific method of gene disruption is the use of transposable elements or transposons. Transposons are genetic elements that insert randomly in DNA but can be later retrieved on the basis of sequence to determine where the insertion has occurred. Both in vivo and in vitro transposition methods are known. Both methods involve the use of a transposable element in combination with a transposase enzyme. When the transposable element or transposon is contacted with a nucleic acid fragment in the presence of the transposase, the transposable element will randomly insert into the nucleic acid fragment. The technique is useful for random mutagenesis and for gene isolation, since the disrupted gene may be identified on the basis of the sequence of the transposable element.
Kits for in vitro transposition are commercially available (see, for example:
The Primer Island Transposition Kit, available from Perkin Elmer Applied Biosystems, Branchburg, NJ, based upon the yeast Ty1 element; The Genome Priming System, available from New England Biolabs, Beverly, MA, based upon the bacterial transposon Tn7; and the E~::TN Transposon Insertion Systems, available from Epicentre Technologies, Madison, WI, based upon the Tn5 bacterial transposable element).
Within the context of the present invention, it may be useful to modulate the expression of the carotenoid biosynthetic pathway by any one of the methods described above. For example, the present invention provides a number of isolated genes (crtE, idi, crtaC, crtY, crtl, crt8, and crtZ) encoding enzymes in the carotenoid biosynthetic pathway and methods leading to the production of Cq.o carotenoids. Thus, in addition to over-expressing various combinations of the crtE, idi, crtX, crtY, crtl, crt8, and crtZ genes herein to promote increased production of Cq.o carotenoids, it may also be useful to up-regulate the initial condensation of 3-carbon compounds (pyruvate and D- glyceraldehyde 3-phosphate) to increase the yield of the 5-carbon compound D-1-deoxyxylulose-5-phosphate (mediated by the dxs gene). This would increase the flux of carbon entering the lower carotenoid biosynthetic pathway and permit increased production of C4o carotenoids. Alternatively (or in addition to), it may be desirable to I~n~cl~out the crdl~Vlcrll~ genes leading to the synthesis of C~~ carotenoids, if the microbial host is capable of synthesizing these types of compounds. ~r, in systems having native functional crlE, idi, crl~, crfY, crll, crlB, and e~~'Z genes, the accumulation of (i-carotene or zeaxanthin may beeffectedby the disruptionof down-stream genes (e.g., crlZ or cry by any one of the methods described above.
Preferred Microbial Hosts Preferred heterologous host cells for expression of the instant genes and nucleic acid fragments of the carotenoid biosynthetic pathway are microbial hosts that can be found broadly within the fungal or bacterial families and which grow over a wide range of temperature, pH values, and solvent tolerances. For example, it is contemplated that any bacteria, yeast, and filamentous fungi will be suitable hosts for expression of the present nucleic acid fragments. Because transcription, translation and the protein biosynthetic apparatus are the same irrespective of the cellular feedstock, functional genes are expressed irrespective of carbon feedstock used to generate cellular biomass. Large-scale microbial growth and functional gene expression may utilize a wide range of simple or complex carbohydrates, organic acids and alcohols, and/or saturated hydrocarbons (e.g., methane or carbon dioxide, in the case of photosynthetic or chemoautotrophic hosts). However, the functional genes may be regulated, repressed or depressed by specific growth conditions, which may include the form and amount of nitrogen, phosphorous, sulfur, oxygen, carbon or any trace micronutrient including small inorganic ions. In addition, the regulation of functional genes may be achieved by the presence or absence of specific regulatory molecules that are added to the culture and are not typically considered nutrient or energy sources. Growth rate may also be an important regulatory factor in gene expression.
Examples of suitable host strains include, but are not limited to:
fungal or yeast species such as Aspergillus, Trichoderma, Saccharomyces, Pichia, Candida, Hansenula, Yarrovvia, Rhodosporidium, Lipomyces, Salmonella, Bacillus, Acinetobacter, Zymomonas, Agrobacterium, Flavobacterium, Rhodobacter, Rhodococcus, Streptomyces, Brevibacterium, Corynebacteria, Mycobacterium, Escherichia, Pantoea, Pseudomonas, Methylomonas, Methylobacter, Methylococcus, Methylosinus, Methylomicrobium, Methylocystis, Alcaligenes, Synechocystis, Synechococcus, Anabaena, Thiobacillus, ~~Vathan~h~~cto~icana, fSlefa~iell~~, ~~V~th~,~l~2philr~~, ~Vethylobacillus, I~~tct6l~lobactr~r~iun~, f~yphor~~icrobicam, ~~anthobacter, Paracoccus, f~oeardia, ~ir~fir~bacter, f~hodopseudomona~s, Torulopsis, Phaffia, and f~hodotoy Gala.
filleth~lotro~ahs and I~leth~lomonas s~ 95a as fi~icrobial Hosts Although a number of carotenoids have been produced from recombinant microbial sources [e.g., E. coli and Candida utilis for production of lycopene (Farmer, W.F~. and Liao, J.C., Biotechnol. Prog.
17: 57-61 (2001 ); Wang, C. et al., Biotechnol Prog. 16: 922-926 (2000);
Misawa, N. and Shimada, N., J. Biotechnol. 59: 169-181 (1998); Shimada, H. et al., Appl. Environm. Microbiol. 64:2676-2680 (1998)); E. coli, Candida utilis and Pfaffia rhodozyma for production of ~-carotene (Albrecht, M. et al., Biotechnol. Lett. 21: 791-795 (1999); Miura, Y. et al., Appl. Environm. Microbiol. 64:1226-1229 (1998); US 5,691,190); E.~coli and Candida utilis for production of zeaxanthin (Albrecht, M. et al., supra;
Miura, Y. et al., supra); E. coli and Pfaffia rhodozyma for production of astaxanthin (US 5,466,599; US 6,015,684; US 5,182,208; US 5,972,642);
see also: US 5,656,472, US 5,545,816, US 5,530,189, US 5,530,188, US 5,429,939, and US 6,124,113), these methods of producing carotenoids using various combinations of different crt genes suffer from low yields and reliance on relatively expensive feedstocks. Thus, it would be desirable to identify a method that produces higher yields of carotenoids in a microbial host from an inexpensive feedstock.
There are a number of microorganisms that utilize single carbon substrates as their sole energy source. Such microorganisms are referred to herein as "C1 metabolizers". These organisms are characterized by the ability to use carbon substrates lacking carbon to carbon bonds as a sole source of energy and biomass. These carbon substrates include, but are not limited to: methane, methanol, formate, formaldehyde, formic acid, methylated amines (e.g., mono-, di- and tri-methyl amine), methylated thiols, carbon dioxide, and various other reduced carbon compounds which lack any carbon-carbon bonds.
All C1 metabolizing microorganisms are generally classified as methylotrophs. Methylotrophs may be defined as any organism capable of oxidizing organic compounds that do not contain carbon-carbon bonds.
However, facultative methylotrophs, obligate methylotrophs, and obligate methanotrophs are all various subsets of methylotrophs. Specifically:

o F~c~altative methylotr~phs have the ability to o~zidize organic.
cornpcaunds that do not contain carbon-carbon bonds, but may also use ~ther carbon substrates such as sugars and compleaz carbohydrates for energy and biomass. Facultative mefihylotrophic bacteria are found inmanyenvironments, but are isolated most commonly from soil, landfill and waste treatment sites. Many facultative methylotrophs are members of the [3 and y subgroups of the Proteobacteria (Hanson et al., Microb. Grov~th C1 Compounds., [Int. Symp.], 7th (1993), pp 285-302. Murrell, J. Collin and Don P. Kelly, Eds. Intercept:
Andover, UK; Madigan et al., Brock Biolow of Microore~anisms, gth ed., Prentice Hall: UpperSaddle River, NJ (1997)).
~ Obligate methylotrophs are those organisms that are limited to the use of organic compounds that do not contain carbon carbon bonds for the generation of energy.
~ Obligate methanotrophs are those obligate methylotrophs that have the distinct ability to oxidize methane.
Additionally, the ability to utilize single carbon substrates is not limited to bacteria but extends also to yeasts and fungi. A number of yeast genera are able to use single carbon substrates as energy sources in addition to more complex materials (i.e., the methylotrophic yeasts).
Although a large number of these methylotrophic organisms are known, few of these microbes have been successfully harnessed in industrial processes for the synthesis of materials. And, although single carbon substrates are cost-effective energy sources, difficulty in genetic manipulation of these microorganisms as well as a dearth of information about their genetic machinery has limited their use primarily to the synthesis of native products.
Despite these hardships, many methanotrophs contain an inherent isoprenoid pathway which enables these organisms to synthesize pigments and provides the potential for one to envision engineering these microorganisms for production of various non-endogenous isoprenoid compounds. Since methanotrophs can use single carbon substrates (i.e., methane or methanol) as an energy source, it could be possible to produce carotenoids at low cost in these organisms. One such example wherein a methanotroph is engineered for production of ~-carotene is described in WO 02/18617.

In the present invention, methods are pr~vided f~r the ez~pressior~
~f e~enes involved in the biosynthesis ~f carotenoid compounds in microcrc~anisms that are able to use single carbon substrates as a sole energy source. The host microorganism may be any C1 metabolizer thafi has the ability to synthesize farnesyl pyrophosphate (FPP)asa metabolic precursor for carotenoids. More specifically, facultative methylotrophic bacteria suitable in the present invention include, but are not limited to:
Methylophilus, Methylobacillus, Methyl~bacferlUm, Hyphomicrobium, ~Canthobacter, Bacillus, Paracoccus, Nocardia, Arthrobacter, Rhodopseudomonas, and Pseudomonas. Specific methylotrophic yeasts useful in the present invention include, but are not limited to: Candida, Hansenula, Pichia, Torulopsis, and Rhodotorula. And, exemplary methanotrophs are included in, but not limited to, the genera Methylomonas, Methylobacter, Methylococcus, Methylosinus, Methylocyctis, Methylomicrobium, and Methanomonas.
Of particular interest in the present invention are high growth obligate methanotrophs having an energetically favorable carbon flux pathway. For example, Applicants have discovered a specific strain of methanotroph having several pathway features that makes it particularly useful for carbon flux manipulation. This strain is known as Methylomonas sp. 16a (ATCC PTA 2402) (US 6,689,601); and, this particular strain and other related methylotrophs are preferred microbial hosts for expression of the gene products of this invention, useful for the production of C4o carotenoids (WO 02/18617).
Methylomonas sp. 16a naturally produces C3o carotenoids. Odom et al. has reported that expression of C4o carotenoid genes in Methylomonas 16a produced a mixture of C3o and C4o carotenoids (WO
02/18617). Several of the genes involved in C3o carotenoid production in this strain have been identified including (but not limited to) the crtNl, ald, crtN2, and crtN3 genes. Disruption of the crtN1 ald genes or the promoter driving expression of the crtN1/aldlcrtN2 gene cluster created various non-pigmented mutants ("white mutants") more suitable for C4o carotenoid production (US SN 60/527083, hereby incorporated by reference). For example, non-pigmented Methylomonas sp. 16a strain MWM1000 was created by disrupting the ald and crtN1 genes.
The Methylomonas sp. 16a strain contains several anomalies in the carbon utilization pathway. For example, based on genome sequence data, the strain is shown to contain genes for two pathways of hexose metabolism. The Entner-~ouder~ff Pathway (e~ailich utilizes the heto-~lr~ra~~y ph~sphogluc~natre ald~lase enzyme ) is present in the strain. It is generally v~eell accepted that this is the operative pathv~ay in ~bligate methan~trophs.
Also present, hoer~ever, is the Embden-i~ieyerhof Pathway (which utilizes the fructose bispho~phate aldolase enzyme). It is well knownthat his pathway is either not present, or not operative, in obligate methanotrophs.
Energetically, the latter pathway is most favorable and allows greater yield of biologically useful energy, ultimately resulting in greater yield production of cell mass and other cell mass-dependent products in Methyl~m~nas 16a. The activity of this pathway in the Methylomonas 16a strain has been confirmed through microarray data and biochemical evidence measuring the reduction of ATP. Although the Methylomonas 16a strain has been shown to possess both the Embden-Meyerhof and the Entner-Douderoff pathway enzymes, the data suggests that the Embden-Meyerhof pathway enzymes are more strongly expressed than the Entner-Douderoff pathway enzymes. This result is surprising and counter to existing beliefs concerning the glycolytic metabolism of methanotrophic bacteria.
Applicants have discovered other methanotrophic bacteria having this characteristic, including for example, Methylomonas clara and Methylosinus sporium. It is likely that this activity has remained undiscovered in methanotrophs due to the lack of activity of the enzyme with ATP, the typical phosphoryl donor for the enzyme in most bacterial systems.
A particularly novel and useful feature of the Embden-Meyerhof pathway in Methylomonas 16a is that the key phosphofructokinase step is pyrophosphate-dependent instead of ATP-dependent. This feature adds to the energy yield of the pathway by using pyrophosphate instead of ATP.
In methanotrophic bacteria, methane is converted to biomolecules via a cyclic set of reactions known as the ribulose monophosphate pathway or RUMP cycle. This pathway is comprised of three phases, each phase being a series of enzymatic steps. The first step is "fixation" or incorporation of C-1 (formaldehyde) into a pentose to form a hexose or six-carbon sugar. This occurs via a condensation reaction between a 5-carbon sugar (pentose) and formaldehyde and is catalyzed by hexulose monophosphate synthase. The second phase is termed "cleavage" and results in splitting of that hexose into two 3-carbon molecules. One of those 3-carbon molecules is recycled back through the RUMP pathway and the other 3-carbon fragment is utilized for cell growth.

In r~wthan~trophs and methylotrophs the Ruf~iP path~eay r~nay ~ccur as ~ne of three variants. H~wever, only tw~ of these variants are commonly found: the FBP/TA (fructose bisphosph~tase/transaldolase) pathv~ay or the C~DPG/TA (heto deo~cy phosphogluconate/transaldola~e) pathway (Dijkhuizen, L. and Devries, G.E., "The Physiology and biochemistry of aerobic methanol-utilizing gram negative and gram posifiive bacteria". In: Methane and Methanol Utilizers; Colin Murrell and Howard Dalton, Eds.; Plenum: NY, 1990.
The Methylomonas 16astrain is unique in the way it handles the "cleavage" steps where genes were found that carry out this conversion via fructose bisphosphate as a key intermediate. The genes for fructose bisphosphate aldolase and transaldolase were found clustered together on one piece of DNA. Secondly, the genes for the other variant involving the keto deoxy phosphogluconate intermediate were also found clustered together. Available literature teaches that these organisms (obligate methylotrophs and methanotrophs) rely solely on the KDPG pathway and that the FBP-dependent fixation pathway is utilized by facultative methylotrophs (Dijkhuizen et al., supra). Therefore the latter observation is expected, whereas the former is not. The finding of the FBP genes in an obligate methane-utilizing bacterium is both surprising and suggestive of utility. The FBP pathway is energetically favorable to the host microorganism due to the fact that more energy (ATP) is utilized than is utilized in the KDPG pathway. Thus, organisms that utilize the FBP
pathway may have an energetic advantage and growth advantage over those that utilize the KDPG pathway. This advantage may also be useful for energy-requiring production pathways in the strain. By using this pathway, a methane-utilizing bacterium may have an advantage over other methane-utilizing organisms as production platforms for either single cell protein or for any other product derived from the flow of carbon through the RUMP pathway (e.g., carotenoids).
Accordingly, the present invention provides a method for the production of a carotenoid compound in a high growth, energetically favorable Methylomonas strain which:
(a) grows on a C1 carbon substrate selected from the group consisting of methane and methanol; and (b) comprises a functional Embden-Meyerhof carbon pathway, said pathway comprising a gene encoding a pyrophosphate-dependent phosphofructokinase enzyme.

Transf~rmation ~f C1 f~letab~li'in~ E~acteria Techniques for the transformati~n ~f C1 metabolizing bacteria are not v~,Aell developed, although general meth~doloa~y that is utilized for other bacteria, which is well known to those of shill in the art, may be applied.
Electroporation has been used successfullyforthetransformation of:
Methylo,6acteriurn extorguens AM1 (Toyama, FI., et al., FEMS Micro~biol.
Lett. 166:1-7 (1990), Methylophilus methylotroohus AS1 (Kim, C.S., and Wood, T. K., Appl. Microbiol. ~ioteehnol. 4~: 105-108 (1997)), and Methylobacillus sp. strain 12S (Yoshida, T., et al., Biotechnol. Lett., 23:
7~7-791 (2001 )). Extrapolation of specific electroporation parameters from one specific C1 metabolizing utilizing organism to another may be difficult, however, as is well to known to those of skill in the art.
Bacterial conjugation, relying on the direct contact of donor and recipient cells, is frequently more readily amenable for the transfer of genes into C1 metabolizing bacteria. Simplistically, this bacterial conjugation process involves mixing together "donor" and "recipient" cells in close contact with one another. Conjugation occurs by formation of cytoplasmic connections between donor and recipient bacteria, with direct transfer of newly synthesized donor DNA into the recipient cells. As is well known in the art, the recipient in a conjugation is defined as any cell that can accept DNA through horizontal transfer from a donor bacterium.
The donor in conjugative transfer is a bacterium that contains a conjugative plasmid, conjugative transposon, or mobilizable plasmid. The physical transfer of the donor plasmid can occur in one of two fashions, as described below:
1. In some cases, only a donor and recipient are required for conjugation.
This occurs when the plasmid to be transferred is a self-transmissible plasmid that is both conjugative and mobilizable (i.e., carrying both tra-genes and genes encoding the Mob proteins). In general, the process involves the following steps: 1.) Double-strand plasmid DNA is nicked at a specific site in onT; 2.) A single-strand DNA is released to the recipient through a pore or pilus structure; 3.) A DNA relaxase enzyme cleaves the double-strand DNA at onT and binds to a release 5' end (forming a relaxosome as the intermediate structure); and 4.) Subsequently, a complex of auxiliary proteins assemble at onT to facilitate the process of DNA transfer.
2. Alternatively, a "triparental" conjugation is required for transfer of the donor plasmid to the recipient. In this type of conjugation, donor cells, recipient cells, and a "helper" plasmid participate. The d~anor cells carry a mobilizable plasn~id or conjugative transp~s~n. fUiobilizable vectors contain an ~r~T, a gene encoding a nichase, and have genes encoding the lob proteins; however, the iiilob proteins alone are not sufficient toachieve thetransferof thegenome. Thus, mobilizable plasmids are not able to promote their own transfer unless an appropriate conjugation system is provided by a helper plasmid (located within the donor or within a "helper" cell). The conjugative plasmid is needed for the formation of the mating pair and DNA
transfer, since the plasmid encodes proteins for transfer (Tra) that are involved in the formation of the pore or pilus.
Examples of successful conjugations involving C1 metabolizing bacteria include the work of: Stolyar et al. (Mikrobiologiya 64(5): 686-691 (1995));
Motoyama et al. (Appl. Micro. Biotech. 42(1): 67-72 (1994)); Lloyd et al.
(Archives of Microbiology 171 (6): 364-370 (1999)); and Odom et al. (WO
02118617).
In vitro Bio-Conversion of Carotenoids Alternatively, it is possible to carry out the bioconversions of the present application in vitro. Where substrates for CrtE, CrtX, CrtY, Crtl, CrtB, and CrtZ are not synthesized endogenously by the host cell it will be possible to add the substrate exogenously. In this embodiment the suitable carotenoid substrate may be solubilized with mild detergent (e.g., DMSO) or mixed with phospholipid vesicles. To assist in transport into the cell, the host cell may optionally be permeabilized with a suitable solvent such as toluene. Methods for this type of in-vitro bio-conversion of carotenoid substrates have basis in the art (see for example: Hundle, B. S., et al., FEBS, 315:329-334 (1993); and Bramley, P. M., et al., Phytochemistry, 26:1935-1939 (1987)).
Industrial Production using Recombinant Microorganisms Where commercial production of the instant proteins are desired, a variety of culture methodologies may be applied. For example, large-scale production of a specific gene product over-expressed from a recombinant microbial host may be produced by both batch and continuous culture methodologies.
A classical batch culturing method is a closed system where the composition of the media is set at the beginning of the culture and not subject to artificial alterations during the culturing process. Thus, at the beginning of the culturing process the media is inoculated with the desired organism ~r ~rc~anisms and c~roer~h gar metab~lic activity is permitted to occur while adding nothing to the system. Typically, however, a "batch"
culture is batch with respect to the additi~n of carbon s~urce and attempts are often made at controlling factors such as pF-I and o~zygen concentration. In batch systems the metabolite and biomass compositions of the system change constantly up to the time the culture is terminated. V\lithin batch cultures cells moderate through a static lag phase to a high growth log phase and finally to a stationary phase where growth rate is diminished or halted. If untreated, cells in the stationary phase will eventually die. Cells in log phase are often responsible for the bulk of production of end product or intermediate in some systems.
Stationary or post-exponential phase production can be obtained in other systems.
A variation on the standard batch system is the Fed-Batch system.
Fed-Batch culture processes are also suitable in the present invention and comprise a typical batch system with the exception that the substrate is added in increments as the culture progresses. Fed-Batch systems are useful when catabolite repression is apt to inhibit the metabolism of the cells and where it is desirable to have limited amounts of substrate in the media. Measurement of the actual substrate concentration in Fed-Batch systems is difficult and is therefore estimated on the basis of the changes of measurable factors such as pH, dissolved oxygen and the partial pressure of waste gases such as C02. Batch and Fed-Batch culturing methods are common and well known in the art and examples may be found in Brock (supra) or (Deshpande, supra).
Commercial production of the instant proteins may also be accomplished with a continuous culture. Continuous cultures are an open system where a defined culture media is added continuously to a bioreactor and an equal amount of conditioned media is removed simultaneously for processing. Continuous cultures generally maintain the cells at a constant high liquid phase density where cells are primarily in log phase growth. Alternatively continuous culture may be practiced with immobilized cells where carbon and nutrients are continuously added, and valuable products, by-products or waste products are continuously removed from the cell mass. Cell immobilization may be performed using a wide range of solid supports composed of natural and/or synthetic materials.

continuous ~r scemi-continuous c~ahure all~ws f~r the rnodulati~n ~f ane factor or any number of factors that affect cell growth or en~9 product c~ncentration. For eazar~lple, one method e~ill maintain a limiting nutrient such as the carbon source or nitrogen level at a fined rate and allow all other parameters to moderate. In other systems a number of factors affecting growth can be altered continuously while the cell concentration, measured by media turbidity, is kept constant. Continuous systems strive to maintain steady state growth conditions and thus the cell loss due to media being drawn off must be balanced against the cell growth rate in the culture. Methods of modulating nutrients and growth factors for continuous culture processes, as well as techniques for maximizing the rate of product formation, are well known in the art of industrial microbiology and a variety of methods are detailed by Brock, supra.
Fermentation media in the present invention must contain suitable carbon substrates. Suitable substrates may include, but are not limited to:
monosaccharides (e.g., glucose and fructose), disaccharides (e.g., lactose or sucrose), polysaccharides (e.g., starch or cellulose or mixtures thereof ) and unpurified mixtures from renewable feedstocks (e.g., cheese whey permeate, cornsteep liquor, sugar beet molasses, and barley malt).
Additionally, the carbon substrate may also be one-carbon substrates such as carbon dioxide, methane or methanol for which metabolic conversion into key biochemical intermediates has been demonstrated. In addition to one and two carbon substrates, methylotrophic organisms are also known to utilize a number of other carbon-containing compounds such as methylamine, glucosamine and a variety of amino acids for metabolic activity. For example, methylotrophic yeast are known to utilize the carbon from methylamine to form trehalose or glycerol (Bellion et al., Microb. Growth C1 Compd., [Int. Symp.], 7th (1993), 415-32. Murrell, J.
Collin and Kelly, Don P, eds. Intercept: Andover, UK). Similarly, various species of Candida will metabolize alanine or oleic acid (Sulter et al., Arch. Microbiol. 153:485-489 (1990)). Hence it is contemplated that the source of carbon utilized in the present invention may encompass a wide variety of carbon-containing substrates and will only be limited by the choice of organism.
Recombinant Production in Plants Plants and algae are also known to produce carotenoid compounds. The crtE, idi, crtX, crtY, crtl, crt8 and crtZ nucleic acid fragments of the instant invention may be used to create transgenic plants homing the ability to ed~press the microbial protein(s). hreferrec~ plant host, will be any variety that will supp~rt a high pr~aduction level ~f the instant proteins. Suitable green plants will include, but are not limited to:
soybean, rapeseed (Srassica napes, S. campastris), sunflower (h'elianthus annul), cotton (Gossypium hirsufum), corn, tobacco (iVicofiana talaacum), alfalfa (Medicago sativa), wheat (Triticum sp.), barley (Hordeum ~ulgare), oats (Arena sati~a, L), sorghum (Sorghum bicolor), rice (~ryza sativa), Arabidopsis, cruciferous vegetables (broccoli, cauliflower, cabbage, parsnips, etc.), melons, carrots, celery, parsley, tomatoes, potatoes, strawberries, peanuts, grapes, grass seed crops, sugar beets, sugar cane, beans, peas, rye, flax, hardwood trees, softwood trees, and forage grasses. Algal species include, but are not limited to, commercially significant hosts such as Spirulina, Haemotacoccus, and Dunalliela.
Over-expression of preferred carotenoid compounds may be accomplished by first constructing chimeric genes of the present invention in which the coding regions) are operably linked to promoters capable of directing expression of the genes) in the desired tissues at the desired stage of development. For reasons of convenience, the chimeric genes may comprise promoter sequences and translation leader sequences derived from the same genes. 3' Non-coding sequences encoding transcription termination signals must also be provided. The instant chimeric genes may also comprise one or more introns in order to facilitate gene expression.
Any combination of any promoter and any terminator capable of inducing expression of a coding region may be used in the chimeric genetic sequence. Some suitable examples of promoters and terminators include those from nopaline synthase (nos), octopine synthase (ocs) and cauliflower mosaic virus (CaMI~ genes. One type of efficient plant promoter that may be used is a high-level plant promoter. Such promoters, in operable linkage with the genetic sequences of the present invention, should be capable of promoting expression of the present gene product. High-level plant promoters that may be used in this invention include, for example: 1.) the promoter of the small subunit (ss) of the ribulose-1,5-bisphosphate carboxylase from soybean (Berry-Lowe et al., J. Molecular and App. Gen., 1:483-498 (1982)); and 2.) the promoter of the chlorophyll a/b binding protein. These two promoters are known to be light-induced in plant cells (see, for example, Genetic Engineering of plants, an ~~ric~alfi~ar~l Perspective, ~. Cashmore, Ed. Plenum: i~J~~ (198;0, pp 29-38; Coruzzi, G. et al., J. Si~I. Chcrn., 258:1399 ('i 983); and Dunsmuir, P. efi al., J. I~~V~l. ~dppl. Genei., 2:285 (1983)).
Plasmid vectors comprising thc~ instant chimeric genes can then be constructed. The choice of plasmid vector depends upon the method thafi will be used to transform host plants. The skilled artisan is well aware of the genetic elemenfis that must be present on the plasmid vector in order to successfully transform, select and propagate host cells containing the chimeric gene(s). The skilled artisan will also recognize that different independent transformation events will result in different levels and patterns of expression (Jones et al., EM80 J. 4:2411-2418 (1985);
De Almeida et al., MoL Gen. Genetics 218:78-86 (1989)), and thus multiple events must be screened in order to obtain lines displaying the desired expression level and pattern. Such screening may be accomplished by Southern analysis of DNA blots (Southern, J, Mol. 8i~/.
98:503 (1975)), Northern analysis of mRNA expression (Kroczek, J.
Chromatogr. Biomed. Appl., 618 (1-2):133-145 (1993)), Western analysis of protein expression, or phenotypic analysis.
For some applications it will be useful to direct the instant proteins to different cellular compartments. It is thus envisioned that the chimeric genes described above may be further supplemented by altering the coding sequences to encode enzymes with appropriate intracellular targeting sequences added and/or with targeting sequences that are already present removed, such as: 1.) transit sequences (Keegstra, I<., Cell 56:247-253 (1989)); 2.) signal sequences; or,3.) sequences encoding endoplasmic reticulum localization (Chrispeels, J.J., Ann. Rev. Plant Phys.
Plant Mol. Biol. 42:21-53 (1991)) or nuclear localization signals (Raikhel, N., Plant Phys. 100:1627-1632 (1992)). While the references cited give examples of each of these, the list is not exhaustive and more targeting signals of utility may be discovered in the future that are useful in the invention.
Protein Engiineerina It is contemplated that the present crtE, idi, crt~C, crtY, crtl, crt8, and crtZ nucleotides may be used to produce gene products having enhanced or altered activity. Various methods are known for mutating a native gene sequence to produce a gene product with altered or enhanced activity including, but not limited to: 1.) error prone PCR (Melnikov et al., Nucleic Acids Research, 27(4):1056-1062 (February 15, 1999)); 2.) site-directed mutagenesis (Coombs et al., Pr~fei~7s (lgg8), pp 25g-X11, ~ngel~ati, F~~ath Rogue, Ed., f~cademic: San Diego, CA); and 8.) "gene shuffling"
(US 5,505,798; US 5,811,238; US 5,830,721; and US 5,837,~~58, hereby incorporated by reference).
The method ~f gene shuffling is particularly attractive dtae ~~ its facile implementation, and high rate of mutagenesis and ease of screening. The process of gene shuffling involves the restriction endonuclease cleavage of a gene of interest into fragments of specific size in the presence of additional populations of DNA fragments having regions of either similarity or difference to the gene of interest. This pool of fragments will then be denatured and reannealed to create a mutated gene. The mutated gene is then screened for altered activity.
The instant microbial sequences of the present invention may be mutated and screened for altered or enhanced activity by this method.
The sequences should be double-stranded and can be of various lengths ranging from 50 by to 10 kB. The sequences may be randomly digested into fragments ranging from about 10 by to 1000 bp, using restriction endonucleases well known in the art (Maniatis, supra). In addition to the instant microbial sequences, populations of fragments that are hybridizable to all or portions of the microbial sequence may be added.
Similarly, a population of fragments which are not hybridizable to the instant sequence may also be added. Typically these additional fragment populations are added in about a 10 to 20 fold excess by weight as compared to the total nucleic acid. Generally, if this process is followed, the number of different specific nucleic acid fragments in the mixture will be about 100 to about 1000. The mixed population of random nucleic acid fragments are denatured to form single-stranded nucleic acid fragments and then reannealed. Only those single-stranded nucleic acid fragments having regions of homology with other single-stranded nucleic acid fragments will reanneal. The random nucleic acid fragments may be denatured by heating. One skilled in the art could determine the conditions necessary to completely denature the double-stranded nucleic acid. Preferably the temperature is from about 80°C to 100°C.
The nucleic acid fragments may be reannealed by cooling. Preferably the temperature is from about 20°C to 75°C. Renaturation can be accelerated by the addition of polyethylene glycol ("PEG") or salt. A suitable salt concentration may range from 0 mM to 200 mM. The annealed nucleic acid fragments are then incubated in the presence of a nucleic acid polyn~erase and dI~~TPs (i.e., d~TP, dCTP, dGTP and dTTh). The nucleic acid polymerise may be the ~~len~w frac~mr~nt, the Tiq polymerise ~r any ~ther Di~A polymerise ~anown in the art. The polymerise may be idded to the rindom nucleic acid fragments prior to annealing, simultaneously with annealing or after innea~ling. The cycle of denaturation,renaturation and incubation in the presence of polymerise is repeated for a desired number of times. Preferably the cycle is repeated from about ~ to 50 times, more preferably the sequence is repeated from 10 to 40 times.
The resulting nucleic acid is a larger double-stranded polynucleotide ranging from about 50 by to about 100 kB and may be screened for expression and altered activity by standard cloning and expression protocols (Maniatis, supra).
Furthermore, a hybrid protein can be assembled by fusion of functional domains using the gene shuffling (exon shuffling) method (Nixon et al., Proc. Natl. Acid. Sci. USA, 94:1069-1073 (1997)). The functional domain of the instant gene can be combined with the functional domain of other genes to create novel enzymes with desired catalytic function. A hybrid enzyme may be constructed using PCR overlap extension methods and cloned into various expression vectors using the techniques well known to those skilled in art.
EXAMPLES
The present invention is further defined in the following Examples.
It should be understood that these Examples, while indicating preferred embodiments of the invention, are given by way of illustration only. From the above discussion and these Examples, one skilled in the art can ascertain the essential characteristics of this invention, and without departing from the spirit and scope thereof, can make various changes and modifications of the invention to adapt it to various usages and conditions.
GENERAL METHODS
Standard recombinant DNA and molecular cloning techniques used in the Examples are well known in the art and are described by: Maniatis (supra), Silhavy et al. (supra), and Ausubel et al. (supra).
Materials and methods suitable for the maintenance and growth of bacterial cultures are well known in the art. Techniques suitable for use in the following examples may be found as set out in: Manual of Methods for General Bacterioloay (Phillipp Gerhardt, R. G. E. Murray, Ralph N.
Costilow, Eugene W. Nester, Willis A. Wood, Noel R. Krieg and G. Briggs Phillips, Eds), American Society for ~,~licrobiology: ~ashingt~n, D.C.
('i gg4)); or, by Broch (scap~a). All reagents, restriction enzymes and materials used for the gr~wth and maintenance of bacterial cells were obtained from Aldrich Chemicals (i~lilwaul~ee, !~I), DIFC~ Lab~aratories (Detroit, IViI), GIBC~/BRL (Uaithersburg, fl~lD), or Sigma Chemical Company (St. Louis, M~) unless otherwise specified.
Sequence data was generated on an ABI Automatic sequencer using dye terminator technology (US 5,366,$60; EP 272,007) using a combination of vector and insert-specific primers. Sequence editing and assembly was performed in SequencherT"" version 4Ø5 (Gene Codes Corp., Ann Arbor, MI). All sequences represent coverage at least two times in both directions. Manipulations of genetic sequences were accomplished using Vector NTI programs version 7.0 (Informax, Inc., Bethesda, MD). Pairwise comparisons were performed using the default values in Vector NTI. BLAST analysis was performed using the default values set in the National Center for Biotechnology Information (NCBI).
The meaning of abbreviations is as follows: "sec" means second(s), "min" means minute(s), "h" means hour(s), "d" means day(s), "pL" means microliter(s), "mL" means milliliter(s), "L" means liter(s), "pM"
means micromolar, "mM" means millimolar, "M" means molar, "mmol"
means millimole(s), "pmol" mean micromole(s), "g" means gram(s), "pg"
means microgram(s), "ng" means nanogram(s), "U" means unit(s), "bp"
means base pair(s), and "kB" means kilobase(s).

Isolation of Carotenoid-Producing Strain Pantoea stevvartii DC413 The present Example describes the isolation and identification of a yellow-pigmented bacterium strain Pantoea stewartii DC413. Analysis of the native carotenoids produced in this organism confirmed production of zeaxanthin, in addition to various zeaxanthin precursors and zeaxanthin derivatives.
Strain isolation and 16S rRNA typing~ To isolate novel carotenoid-producing bacterial strains, pigmented microbes were isolated from a collection of environmental samples. A soil sample from Florida was collected and resuspended in Luria-Broth (LB). A 10 pL loopful of cell suspension was streaked onto LB plates and the plates were incubated at 30°C. Pigmented bacteria with diverse colony appearances were picked and streaked twice to homogeneity on LB plates and incubated at 30°C.

Fr~~ni these c~alonies, one G~hich ~~ornled shiny yellov~ colonies eras designated as "strain DC413".
1GS rRi~A gene s,cquencing was performed to type strain DC413.
Specifically, the 16S rRh~A gene of the strain was amplified by PCR using primers H1~12 (SECT ID ~l0:15) and JCR14~(SEC~ID NO:16). The amplified 16S rRNA genes were purified using a QIAquick PCR
Purification lit according to the manufacturer's instructions (Qiagen) and sequenced on an automated ABI sequences. The sequencing reactions were initiated with primers HK12, JCR14, and JCR15 (SEQ ID NO:17).
The assembled 1351 by 16S rRNA gene sequence (SEQ ID NO:18) was used as the query sequence for a BLASTN search (Altschul et al.~ Nucleic Acids Res. 25:3389-3402(1997)) against GenBank~.
BLAST analysis indicated that strain DC413 belonged to the Enterobacteriaceae family. Its 16S rDNA showed 98% sequence identity with the 16S rDNA sequences of strains typed as Pantoea stev~artii. This strain was thus designated as Pantoea stewartii DC413.
Carotenoid analysis of DC413: The yellow pigment in Pantoea stewartii DC413 was extracted and analyzed by HPLC. The strain was grown in 100 mL LB at 30°C for 2 days and then cells were harvested by centrifugation at 4000 g for 30 min. The cell pellet was extracted with 10 mL acetone. The solvent was dried under nitrogen and the carotenoids were resuspended in 0.5 mL acetone. The extraction was filtered with an Acrodisc~ CR25 mm syringe filter (Pall Corporation, Ann Arbor, MI) and then concentrated in 0.1 mL 10% acetone+90% acetonitrile for HPLC
analysis using an Agilent Series 1100 LC/MSD SI (Agilent, Foster City, CA) .
Sample (20 ~,L) was loaded onto a 150 mm X 4.6 mm ZORBAX
C18 (3.5 pm particles) column (Agilent Technologies, Inc.). The column temperature was kept at 40°C. The flow rate was 1 mL/min, while the solvent running program used was:
~ 0 - 2 min: 95% buffer A and 5% buffer B;
~ 2 - 10 min: linear gradient from 95% buffer A and 5% buffer B to 60% buffer A and 40% buffer B;
~ 10 - 12 min: linear gradient from 60% buffer A and 40% buffer B
to 50% buffer A and 50% buffer B;
~ 12 - 18 min: 50% buffer A and 50% buffer B; and, ~ 18 - 20 min: 95% buffer A and 5% buffer B.

Buffer l~ was 95b~~ acetonitrile and 5°~~ dH20; ~a~affer B eras 100~~~
tetrahyd rofuran.
HPLC analysis (Figure 3) indicated that strain DC413 produced ~ea~canthin (6.27 min peak) and ~i-carotene (13.01 min peale) by comparison with authentic standards of ~ea~zanthin (Carotei~ature, Lupsingen, Switzerland) and ~3-carotene (Sigma, St. Louis, MO). MS
analysis confirmed that the molecular weight of the zeaxanthin peak was 569, and that of the ~i-carotene peak was 537. The predominant peak that eluted at 3.24 min was most likely ~eaxanthin monoglucoside, as suggested by its molecular weight of 731.

Identification of Pigmented Cosmid Clones of DC413 Example 2 describes the construction of an E. coli cosmid clone capable of expressing an ~40 kB fragment of genomic DNA from Pantoea stevvartii DC413. This transformant produced zeaxanthin, in addition to zeaxanthin derivatives (predominantly zeaxanthin monoglucoside).
Chromosomal DNA preparation: Strain DC413 was grown in 25 mL
LB medium at 30°C overnight with aeration. Bacterial cells were centrifuged at 4,000 g for 10 min. The cell pellet was gently resuspended in 5 mL of 50 mM Tris-10 mM EDTA (pH 8) and lysozyme was added to a final concentration of 2 mg/mL. The suspension was incubated at 37°C
for 1 h. Sodium dodecyl sulfate was then added to a final concentration of 1 % and proteinase K was added at 100 pg/mL. The suspension was incubated at 55°C for 2 h. The suspension became clear and the clear lysate was extracted twice with an equal volume of phenol:chloroform:isoamyl alcohol (25:24:1) and once with chloroform:isoamyl alcohol (24:1). After centrifuging at 4,000 rpm for 20 min, the aqueous phase was carefully removed and transferred to a new tube. Two volumes of ethanol were added and the DNA was gently spooled with a sealed glass pasteur pipette. The DNA was dipped into a tube containing 70% ethanol. After air drying, the DNA was resuspended in 400 pL of TE (10 mM Tris-1 mM EDTA, pH 8) with RNaseA (100 pg/mL) and stored at 4°C. The concentration and purity of DNA was determined spectrophotometrically by OD26o/OD28o~
Cosmid library construction: A cosmid library of Pantoea stev~artii DC413 was constructed using the pWEB cosmid cloning kit from Epicentre Technologies (Madison, WI) following the manufacturer's instructions. Genomic DNA was sheared by passing it through a syringe needle. The sheared DhlA ~~as end-repaired and size-selected on lo~~-melting-point agarose by c~mparison with a ~~0 kB standard. ~i~~A
fragments appro~~imately q.0-hB in size were purified and ligated into the blunt-ended cloning-ready pWEB cosmid vector. The library was packaged using ultra-high efficiency iitiaxPlax Lambda Packaging E~~tr2~cts, and plated on EP1100 E.coli cells. Two yellow colonies were identified from the cosmid library clones. Since cosmid DNA from the two clones had similar restriction digestion patterns, further analysis was performed on a single clone (i.e., cosmid clone pWEB-413).
Carotenoid analysis of the yellow cosmid clone: The carotenoids in E. coli EP1100 containing cosmid pWEB-413 were analyzed by LC-MS, as described in EXAMPLE 1. The HPLC result is shown in Figure 4. The 6.25 min peak was identified as zeaxanthin, based on its UV spectrum, molecular weight and comparison with the authentic standard. Significant amounts of neither ~i-carotene nor ~i-cryptoxanthin intermediates accumulated. The predominant peak that eluted at 3.22 min was most likely zeaxanthin monoglucoside, as suggested by LC-MS analysis.

Identification of Carotenoid Biosynthesis Genes This Example describes the identification of Pantoea stewartii strain DC413 crtE, idi, crtX, crtY, crtl, crt8, and crtZ genes in cosmid pWEB-413, and provides a comparison of the relatedness of these genes with respect to other known Pantoea crt genes.
HPLC analysis suggested that cosmid pWEB-413 should contain genes for synthesis of zeaxanthin and its derivatives. To sequence the carotenoid synthesis genes, cosmid DNA pWEB-413 was subjected to in vitro transposition using the EZ::TN <TET-1 > kit from Epicentre (Madison, WI) following the manufacturer's instructions. Two hundred tetracycline resistant transposon insertions were sequenced from the end of the transposon using the TET-1~ FP-1 Forward primer (SEQ ID:19).
Sequence assembly was performed with the SequencherT"" program (Gene Codes Corp., Ann Arbor, MI). A 9127 by contig (SEQ ID:20) containing 7 genes of the carotenoid biosynthesis pathway from Pantoea stevvartii DC413 was assembled (Figure 5).
Genes encoding crtE, idi, crtX, crtY, crtl, crt8, and crtZ were identified by conducting BLAST (Basic Local Alignment Search Tool;
Altschul, S. F., et al., J. Mol. Biol. 215:403-410 (1993)) searches for similarity to sequences contained in the BLAST "nr" database (comprising all non-redundant ~enBanls~ ADS translati~ns,, sequences derivwd fr~m the 3-dimensional structure Br~olehaven ProtLin Data Banlc, the S~IISS-PI~~T protein sequence database, Ef~iBL, and DDBJ databases). Each sequence was analyzed for similarity to all publicly available D~I~!
sequences contained in the "nr" database using the BLASTi~ algorithm provided by the National Center for Biotechnology Information (NCBI).
The DNA sequence was translated in all reading frames and compared for similarity to all publicly available protein sequences contained in the "nr"
database using the BLASTX algorithm (Dish, W. and States, D. J., Nature Genetics 3:266-272 (1993)) provided by the NCBI.
All comparisons were done using either the BLASTNnr or BLASTXnr algorithm. The results of the BLAST comparisons are given in Table 2, which summarizes the sequences to which each gene has the most similarity. Table 2 displays data based on the BLASTXnr algorithm with values reported in Expect values. The Expect value estimates the statistical significance of the match, specifying the number of matches, with a given score, that are expected in a search of a database of this size absolutely by chance.
The nucleotide and amino acid sequences were also compared with those from other Pantoea strains, using the FASTA program incorporating the Smith-Waterman algorithm (W. R. Pearson, supra).
Table 3 summarizes the identity of the pairwise comparisons.

i ....~ .~ ~, ~ ~ .~ .~ o ~- ~ d.
~
~

s r ~a c ~ a~ ~ a~
cu _ ~ ~ ~ ~ y ~
~ ~ ~ _ ~
a ~a ~.
~

~ ~ ~ : _ . _ , c~ ' - ~~ N
~ 3~ ~-U ~ c~ .,~ ~~ - t~~c~~
~ ~~~ ~-~

~ ~~N~ ~~N~ c~
~ ~ _ c_~
~ _ N
~-~.Ns~ _ ~N ~ ~ ~ ~~ ~2 (~ (~~f~fl t8~~ t~

U

d2 I~ O r r O N ~' (g3 ~ r r r c-r ~ r ~ ~ ~D O

U

I~ C~ ~ ~ CO

-_ I~. 1' I' f~ 00 I' U) N

O

O ~ ~ M

4- 'L7 O

O .

O

N ~r (fl ao O N

W

J

O

(~ c- M ifs f~
N

V ~ ~ _ ~ Q Q Q
c~
m ,_ Q Q ~ Q

-O ~ ~ V p ~ ~ a a , ~
~

-f-~ p cB N r- .~ ~ ~ ~ ~ c? '~ r ~ N X
o ~ ,~ ~ ~ ~ 'o a ~

O O c C ~ ~ U N

-- U O O .. Mc ~ U_ U_ W U_ U
n ~c ~ ~
~

7, .d m O M Cfl I~ O LO LLI
O ' (L3 (17 M M
N

-.' . ~ Q ~ ~ ~ ~ M N
fn L Q ~ In C p C L ~ fn m O O O t~
Q) (~ t~

I- (0 QUO ~ O O ~ _ ~ _ ~ _ ~E N~ NN N

a a . a N

C O ~ ~ Q ~ p~ U ~ N ~ C
fn N v ~7 07 Q ~, Q. p Q
~

'd' > _ p ~ U fn -a tn ~ M ~ ~
~

d N ~ , M N c -N p (B N
~

N = T c' ~ .'~ ~ N
n ~ ~ O ~ ~ O
C ~ O
r O

I- ~ f N N (p Q M O M O
p N a0 I~ . - - I O
X
r' N -._. C Q, ~ ~ ~ O ~ .n-~ ~
~ c L ~ p ~ ~: C v r ~ cB
U ca m ~ ~
cB cB

U' ~n 'teaN ~~a N ~~~~ 'tea 0..~a d ~ LJJ cV

U'Z

U a U U U U

N

Q-' r N M 'd' In Cfl ~

~

Z

~ a c,.

~ ~
c~ ~
~a s ~ .~
r. c~
~ .~

~ _ ~ _ O
C
Y

_ ~
~ C
~ ~
O
v~

U

O

>

O

U

.Q

O ~ O

_ O
.~ O ~

fn O' y C

O L
t .

a .
-o GCVO C C
~

N ~ N
- ;~' ~ O ,~

~ O

Q
~

O
N .~

Q d. N W-U

L!! ~ tin ~
c a n o .~
'o ~
U

M U O
C

LlJ V c ~ ~
~

(n ~ ~
U
U

O -'-_ >, O ~
~

~ O
Z ( >, .N
(~
.<p a f3 z o c~a ~
:~
~

I- T' ~ ~
~
N

"-. ~ O ~ (UU
:" O O
O
O
~
C

tn ~ -M (B O .;
(a p O ~ '~
p ~ (a ~
~

~ ~ L ~ (6 ~ O
~
O

O

~ C

~
~

M ~ ~ U O
O ~ >
C' (6 L Q.

O U ~ O ~
~ U
O Q-Cp N

~ ~ ~
U

~ O ~ ~
a m C~ C ~
~t CLS

, (~S
-a (V
'v=
O
N

'a 'Z3 ~-tn (B

O 'a.
O o.c E

o Z

~, U .~' (B
>
U

N

LL 'a O fn O

.
N
o o x L
t1 N

fCf .s?
U
(0 Tf~BLE 3 Pair~,~ise e: ~m~aris~n ~f the Carotrnoid Biosynthesis Genes from .Pant~ea stcwav~'~ii DC413 with Those fr~m Other Pantoea Strains Pantoeaananafisa Pantoeaagglomeransb Pantoea stewarth~

Sourcelgene DNA Amino DNA Amino acid DNA Amino acid acid DC4.13, crtE 68% 69% 62% 51 /~ 68% 69/~

DC4~13, idi NA NA 67/~ 55% NA NA

DC413, crt)C 66% 62% 58% 48% 64% 61 /~

DC413, crtY 64% 65% 62% 56/~ 64% 63%

DC413, crtl 77% 87% 74% 75/~ 77% 88%

DC413, GrtB 70% 77% 69% 65/~ 67% 74%

DC413, crtZ 71 % 76% 70% 67% 72% 74%

a Pantoea ananatis, GenBank~ Accession Number D90087 b Pantoea agglomerans, GenBank~ Accession Number M87280 Pantoea stewartii, GenBank~
Accession Number AY166713 NA = Not applicable Expression of the crtEidiXYIB Gene Cluster of Pantoea stewartii DC413 in Methylomonas sp. 16a The following Example describes the introduction of the ert gene cluster comprising the crtEidiXYlB genes from Pantoea stewartii DC413 (Example 3) into Methylomonas 16a (ATCC PTA 2402) to enable the synthesis of desirable 40-carbon carotenoids, such as ~-carotene.
First, primers pWEB413F: 5'-GAATTCTGCAAGTAAGGACTGCCATTATG -3' (SEQ ID NO:21) and pWEB413R: 5'-GAATTCTAACGCGGACGCTGCCAGAGCT -3' (SEQ ID
N0:22) were used to amplify a fragment from DC413 containing the crtEidiXYIB genes by PCR. Cosmid DNA pWEB-413 was used as the template with Pfu Turbo polymerase (Stratagene, La Jolla, CA), and the following thermocycler conditions: 92°C (5 min); 94°C (1 min), 60°C
(1 min), 72°C (9 min) for 25 cycles; and 72°C (10 min). A single product of approximately 6.8 kB was observed following gel electrophoresis. Taq polymerase (Perkin Elmer) was used in a ten minute 72°C reaction to add additional 3' adenosine nucleotides to the fragment for TOPO cloning into pTrcHis2-TOPO (Invitrogen, Carlsbad, CA). Following transformation to E. coli TOP10 cells, several colonies appeared yellow in color, indicating that they were producing a carotenoid compound. The gene cluster was then subcloned into the broad host range vector pBHR1 (MoBiTec, LLC, Marco Island, FL), and electroporated into E. coli 10G cells (Lucigen, ~iddlet~wn, WI). The transf~rmants c~anfiaining the resulting plasmid pDC0332 v~ere selecficed on LB medium containing 50 p~glrnL I~anamycin.
Plasmid p~CQ332 was transferred into Mefhylomonas 16a lay tri-parental conjugal mating. The E. coli helper strain containing pf~6v2013 (ATCC No. 37159) and the E. coli 10G donor strain confiainingpDCQ332 were growing overnight in LB medium containing kanamycin (50 ~,g/mL), washed three times in LB, and resuspended in a volume of LB
representing approximately a 60-fold concentration of the original culture volume.
The Methylomonas 16a MWM1000 (~aldlcrtN1) strain contained a single crossover knockout of the aldlcrtN1 genes, which disrupted the synthesis of the native C3o carotenoids (US SN 60/527,083). This (~aldlertNl) strain was growing as the recipient using the general conditions described in WO 02/18617. Briefly, Methylomonas 16a MWM1000 strain was grown in serum stoppered Wheaton bottles (Wheaton Scientific, Wheaton IL) using a gas/liquid ratio of at least 8:1 (i.e., 20 mL of Nitrate liquid "BTZ-3" media in 160 mL total volume) at 30~C with constant shaking.
Nitrate liquid medium, also referred to herein as "defined medium"
or "BTZ-3" medium was comprised of various salts mixed with Solution 1 as indicated below (Tables 4 and 5) or where specified the nitrate was replaced with 15 mM ammonium chloride. Solution 1 provides the composition for 100-fold concentrated stock solution of trace minerals.

Solution 1'~
MVIl Conc. g per L

(mM) Nitriloacetic 191.1 66.9 12.8 acid CuCl2 x 2H20 170.48 0.15 0.0254 FeCl2 x 4H20 198.81 1.5 0.3 MnCl2 x 4H20 197.91 0.5 0.1 CoCl2 x 6H20 237.9 1.31 0.312 ZnCl2 136.29 0.73 0.1 H3B03 61.83 0.16 0.01 l~~l~~ C~rrc. c~ per L

(1~7~~) i~~a2i~loO4 241.95 ~.04 ~.01 at NiCl2 ~z 6H2~ 237.7 0.77 0.184 *Mix the gram amounts designated above in 900 mL of H2~, adjust to pH=7, and add H2~ to an end volume of 1 L. 4~eep refrigerated.

Nitrate liauid medium (BTZ-3)**
MllV Conc. g per L

(m M) NaN03 84.99 10 0.85 KH2P04 136.09 3.67 0.5 Na2S04 142.04 3.52 0.5 MgCl2 x 6H20 203.3 0.98 0.2 CaCl2 x 2H20 147.02 0.68 0.1 1 M HEPES (pH 238.3 50 mL
7) Solution 1 10 mL

**Dissolve in 900 mL H20. Adjust to pH=7, and add H20 to give 1 L. For agar plates:
Add 15 g of agarose in 1 L of medium, autoclave, let cool down to 50°C, mix, and pour plates.
The standard gas phase for cultivation contains 25% methane in air. The MWM1000 recipient was cultured under these conditions for 48 h in BTZ-3 medium, washed three times in BTZ-3, and resuspended in a volume of BTZ-3 representing a 150-fold concentration of the original culture volume.
The donor, helper, and recipient cell pastes were then combined in ratios of 1:1:2, respectively, on the surface of BTZ-3 agar plates containing 0.5% (w/v) yeast extract. Plates were maintained at 30°C in 25% methane for 16-72 h to allow conjugation to occur, after which the cell pastes were collected and resuspended in BTZ-3. Dilutions were plated on BTZ-3 agar containing kanamycin (50 p,g/mL) and incubated at 30°C in 25% methane for up to 1 week. Yellow transconjugants were streaked onto BTZ-3 agar with kanamycin (50 ~,g/mL).

For analysis of car~tenoid c~r~lp~sition, transcon~~agants were cultured in ~5 mL STS-3 containing laanamycin (50 p~g/mL) and incubated at 30°C in 25~~~ methane as the sole carbon source for 3-4~ days. The cells were harvested by centrifugation and frozen a~t -~0°C. coffer thawing, the pellets were extracted and carotenoid content was analysed by HPI_C, as described in Example 1.
HPLC analysis of extracts from il~ethylon~~nas 16a containing pDCC~332 showed almost exclusive production of ~i-carotene (Figure 6).
The retention time, UV spectrum and the molecular weight of the 14 min peak match those of the authentic ~i-carotene standard (Sigma, St. Louis, MO). This confirmed the synthesis of C4o carotenoids in this methanotrophic host using the crtEidi~CYIB gene cluster from Pantoea stewartii DC413.

S~~L~~'3~L &I~TIIZTf9 c110> P.I. ~.u P~nt ~~i~emoua~s anc~ Co., Inc.
<120> ~~1~3E~ EPt~~DITzl~ CAR~T~I~OI13 CDI~~P~U~1D~
<130> CL~2385 PCT' <150> US 60/488,183 <151> 2003-07-17 <150> US 60/527,083 <151> 2003-12-03 <160> 22 <170> PatentIn version 3.2 <210> 1 <211> 909 <212> DNA

<213> Pantoea ii DC413 stewart <400> 1 atgaccatttttgctgaaagagactctactctcatctacagcgatcctctgatgttactg 60 gcgattattgaacagcgtcttgaccgactgctgccggtagaaagcgaacgagactgcgtg 120 gggctcgccatgcgcgaaggcgcgctggcaccgggcaaacgcatccggccggtactgctg,180 atgctggccgctcacgaccttggctatcgcgacgaactcagcgggctgct'cgacttcgcc 240 tgcgccgtcgagatggtgcatgccgcctcgctgatactcgacgatattccctgcatggac 300 gatgccgaactgcggcgcggccggccgacaatccatcgccagttcggcgagccggtggcg 360 attctcgccgccgtcgccctgctgagccgcgccttcggcgtgattgcgctggcggacggc 420 atcagcagccaggcgaagacccaggccgtggcggagctttcccattcagtcggcattcag 480 gggctggtgcagggacagtttctcgatctgaccgaaggcggccagccgcgcagcgccgac 540 gccattcagctgaccaaccactttaaaaccagcgcgctgttcagcgcggcgatgcagatg 600 gccgccatcatcgccggcgcgccgctggcgtcgcgtgaaaagctgcaccgcttcgcgcgg 660 gatctcggccaggcctttcagctgctggacgacctgaccgacggccagagcgacacggga '720 aaagatgcccatcaggacgtggggaaatcgacgctggtgaacatgctgggcagcaaagcg 780 gtagaaaagcgcctgcgcgaccatctgcgacgcgccgatcgccacctcgcttcggcctgc 840 gacagcggctacgccacccggcacttcgtgcaggcctggttcgataaaaaactcgctatg 900 gtcggctga <210> 2 <211> 302 <212> PRT

SUBSTITUTE SHEET (RULE 26) e22.~> Pant~ea ste~~~artgi IJ~~'~.3 c~?~OOs 2 stet Thr Ile Phe Ala Glu Arg Asp Ser Thr Leu Ile Tyr Ser Asp Pro Leu Met Leu Leu Ala. Ile Ile Glu Gln Arg Leu Asp Arg Leu Leu Pro Val Glu Ser Glu Arg Asp Cys Val Gly Leu Ala Met Arg Glu Gly Ala Leu Ala Pro Gly Lys Arg Ile Arg Pro Val Leu Leu Met Leu Ala Ala His Asp Leu GIy Tyr Arg Asp Glu Leu Ser Gly Leu Leu Asp Phe Ala Cys Ala Val Glu Met Val His Ala Ala Ser Leu Ile Leu Asp Asp Ile Pro Cys Met Asp Asp Ala Glu Leu Arg Arg Gly Arg Pro Thr Ile His loo 105 llo Arg Gln Phe Gly Glu Pro Val Ala Ile Leu Ala Ala Val Ala Leu Leu Ser Arg Ala Phe Gly Val Tle Ala Leu Ala Asp Gly Ile Ser Ser Gln Ala Lys Thr Gln Ala Val Ala Glu Leu Ser His Ser Val Gly Ile Gln 145 150 ~~ 155 160 Gly Leu Val Gln Gly Gln Phe Leu Asp Leu Thr Glu Gly Gly Gln Pro Arg Sex Ala Asp Ala Ile Gln Leu Thr Asn His Phe Lys Thr Ser Ala Leu Phe Ser Ala Ala Met Gln Met Ala Ala Ile Ile Ala Gly Ala Pro Leu Ala Ser Arg Glu Lys Leu His Arg Phe Ala Arg Asp Leu Gly Gln SUBSTITUTE SHEET (RULE 26) Vila Ehe Gla~ Leu F.seu '~s~ asp Leu Thr ~sla G7L~ Gln Sea: Cusp Thr Gl~
225 230 235 24~
Las bAsp Ala His Gln Asp Val G1~ Las Ser Thr L~:u Va1 Asn Poet Leu G1~ Ser Lys Ala Val Glu Las Arg Leu Arg Asp His Leu Arg Arg Ala Asp Arg His Leu Ala Ser Ala Cys Asp Ser Gly Tyr Ala Thr Arg His Phe Val Gln Ala Trp Phe Asp Lys Lys Leu Ala Met Val Gly <210>

<211>

<212>
DNA

<213> oea stewartii Pant DC413 <400>

atgaaggacacggacctgacgaagcgcaaaaacgatcatctggacattgttctgcgtaat60 accgcgccggcgtcgggcagcttcgcccgctggcactttacccactgcgccctgccggag120 ctgcacctggatcagatcgatctgcgcacgcggctgttcgatcgccccatgcaggcgccc180 tttcttattagctcaatgaccggcggcgcggcgcgcgccctctcgattaatcatcatctt240 gccgaagcggcgcagacgctgggtctggcgctgggggtcggttcgcagcgcgtggcgctg300 gaaagcgacaacgattctggcctgacgcgcgatttacgccgtatcgccccggatattccg360 ctgctggcgaacctcggcgcggcgcagattctgggcgaacagggccgcaggctggcgcga420 aatgcggtaagcatgatcgaggcggatgcgctgatcgtccatettaatccgctgcaggaa480 gcgctgcagcgcggcggcgatcgcgactggcgcggcgtactgcaggcgattgcgcagctg540 gtgaagtcgctggaggtgccggtggtggtgaaagaggttggcgcgggcatctcggccgag600 gttgcgcagcggctcgccgaggcgggcgtcagcatgatcgatatcgcaggtgcgggcggc660 accagctgggcggcggtagagggcgaacgcgccagcaccccgcagcagcgcgcggtggcg720 atggcctttgccagctggggtattcccacagatgaagccttacgcgcggtgcgcgacagg780 ctgcctgccataccgcttatcgcctcaggcggcatccgcgacggcatcgacgcggcgaag840 gcgctgcggctcggcgcggatatcgttggccaggcggcggcggtgctcagcagcgccctg900 cactctacggatgcggtggtcgcgcactttaacacgctgattgaacagctgcgcgtcgcc960 tgtttctgcaccggcagcgctaatctgcgccagctgcgccttgcgccgctgcatcgcgcc1020 SUBSTITUTE SHEET (RULE 26) c~c~agaaacgc tatga 1035 <210a 4 <211a 34~
<212a - PR'.T
<213a Pantoea ste~a~c'tii I2C413 <400a 4 Met Lys Asp Thr Asp Leu Thr Lys Arg Lys Asn Asp Has Leu Asp Ile Val Leu Arg Asn Thr Ala Pro Ala Ser Gly Ser Phe,Ala Arg Trp His 20 , 25 30 Phe Thr His Cys Ala Leu Pro Glu Leu His Leu Asp Gln Ile Asp Leu Arg Thr Arg Leu Phe Asp Arg Pro Met Gln Ala Pro Phe Leu Ile Ser Ser Met Thr Gly Gly Ala Ala Arg Ala Leu Ser Ile Asn His His Leu Ala Glu Ala Ala Gln Thr Leu Gly Leu Ala Leu Gly Val Gly Ser Gln Arg Val Ala Leu Glu Ser Asp Asn Asp Ser Gly Leu Thr Arg Asp Leu Arg Arg Tle Ala Pro Asp Ile Pro Leu Leu Ala Asn Leu Gly Ala Ala Gln Ile Leu Gly Glu Gln Gly Arg Arg.Leu Ala Arg Asn Ala Val Ser Met Ile Glu Ala Asp Ala Leu'Ile Val His Leu Asn Pro Leu Gln Glu Ala Leu Gln Arg Gly Gly Asp Arg Asp Trp Arg Gly Val Leu Gln Ala Ile Ala Gln Leu Val Lys Ser Leu Glu Val Pro Val Val Val Lys Glu Val Gly Ala Gly Ile Ser Ala Glu Val Ala Gln Arg Leu Ala Glu Ala SUBSTITUTE SHEET (RULE 26) Gly '~'aI Ser ~Tet Ilc ~s~a Ile Vila Gly Z~Ia G1~ Gl~ Thr Ser bra Vila 210 215 22~
Ala Val Glu Gly Glu Arg Ala Ser Thr Pro Gln Gln t'~rg Ala Val Ala lvtet Ala Phe Ala Ser Trp Gly Ile Pro Thr Asp Glu Ala Leu Arg Ala Val Arg Asp Arg Leu Pro Ala Ile Pro Leu Ile Ala Ser Gly Gly Ile Arg Asp Gly Ile Asp Ala Ala Lys Ala Leu Arg Leu Gly Ala Asp Ile Val Gly Gln Ala Ala Ala Val Leu Ser Ser Ala Leu His Ser Thr Asp Ala Val Val Ala His Phe Asn Thr Leu Ile Glu Gln Leu Arg Val Ala Cys Phe Cys Thr Gly Ser Ala Asn Leu Arg Gln Leu Arg Leu Ala Pro Leu His Arg Ala Gly Glu Thr Leu <210>

<211>

<212>
DNA

<213> oea stewartii Pant DC413 <400>

atgagccatttcgccgcgatcgcccctcccttttacagccacgtgcgcgcgcttcaggcg60 ctggcgcagagcctgatagcgcgcggccatcgggtgacctttattcagcaggcggaggtt120 gccaccctgctcagcgacgccgctatcggctttcacgccatcggcctggaaacgcatcct180 gtcggcacgctcgaccgtacgctggcgctggcggcccatcccggcggcctgggcattctg240 cgcctgatccgcgatatggccagcagcaccgatatgctgtgccgcgagctgccggaggcg300 ctgcgggcgctggcggtagatggcgtgatcgtcgatcagatggcgccagcgggcgggctg360 gtggcggaggcgctgcggctgcccttcgtttcggtcgcctgcgccctgccggtcaatcgt420 gaagcccattttccattgccggtcatgccttttttgtggggtactagcagcgccgcgcgc480 SUBSTITUTE SHEET (RULE 26) gagcggttcg cctccagcga aaaaat tat gact~agctga tc~cc~cagcca cgatcgcgtg 5~!.0 ete~e~cgcc~ecatc~ccgacc~cctttgc~ccttgcegacec~ccgtcagccgcaccac~tgcctg600 tCgCCgCtggCgCc'1ae'ltCc'?6~CCc,"~~CtgCCgCe'~CgCCCtC~'e'tCtttCCgCgCC~'Ci~c."aG~' Ct~66~

CCggCCCatttCCaCgCCaCCggCCCgCtgC~CgaaCCgCCCgCC~Ct~CCgCaCr1(Ct~CC~'7~f~

ctgttcagtaaccgcggccagccgcgcattttcgcctcgctcggcacgctgcagggcggc 780 cgttacgggctgtttaaaacgctggcaaaagcctgccgcgaactggaggcggagctgctg 840 atcgcccaetgcggcggcctgagcgattttcaggcgcgtaaactgctgcgcgccggggcg 900 gcgcaggtagccgcctttgtcaatcagcgcgccgcgctggcgcaggcggacgtggccatt 960 acccacggcggcttaaatacggtgctcgacgccgtaacctatggcacgccgctgctggcg 1020 attccgctggcattcgatcagcccggcattgccgcgcggctggcgcaccatggcctgggg1080 atgcgcgcgtcgcgcttctccaccagccatcagattgcgcgtcgcctgcgtcgcctgctg1140 gacgatggtgcggttaagcagcgcatgacgcgcctgcagccgcagctggccgcctgcggc1200 ggcgtcgagcgcgcggctgagattaccgagcgcgcgctgctgacgcgccagccggtgcgc1260 ~gcggagaagtactatgacatcgcagtatga 1290 <210> 6 :<211> 429 <212> PRT
<213> Pantoea stewartii DC413 <400> 6 Met Ser His Phe A1a Ala Ile Ala Pro Pro Phe Tyr Ser His Val Arg Ala Leu Gln Ala Leu Ala Gln Ser Leu Ile Ala Arg Gly His Arg Val Thr Phe Tle Gln Gln'Ala Glu Val Ala Thr Leu Leu Ser Asp Ala Ala Ile Gly Phe His Ala Ile Gly Leu Glu Thr His Pro Val Gly Thr Leu .Asp Arg Thr Leu Ala Leu Ala Ala His Pro Gly Gly Leu G1y Ile Leu Arg Leu Ile Arg Asp Met Ala Ser Ser Thr Asp Met Leu Cys Arg Glu SUBSTITUTE SHEET (RULE 26) F~e~a Prc~ Glta r'~1~. Leta r'1rg r~la L~u Ala ~'al ~s~a Gly Val Ile dal Asp Gln t~et Ala Pro r~la Gly Gly Leu Val Ala Gha Ala Leu Arg Leu Pio Phe Val Ser Val Ala Gys Ala Leu Pro Val Asn Arg Glu Ala His Phe Pro Leu Pro Val Met Pro Phe Leu Trp Gly Thr Ser Ser Ala Ala Arg Glu Arg Phe Ala Ser Ser Glu Lys Ile Tyr Asp Trp Leu Met Arg Ser His Asp Arg Val Leu Ala Arg His Ala Asp Ala Phe Gly Leu Ala Asp Arg Arg Gln Pro His Gln Cys Leu Ser Pro Leu Ala Gln Ile Ser Gln 195 200 ~205 Leu Pro His Ala Leu Asp Phe Pro Arg Arg Glu Leu Pro Ala His Phe His Ala Thr Gly Pro Leu Arg Glu Pro Pro Ala Ala Ala Ala Ala Pro Leu Phe Ser Asn Arg Gly Gln Pro Arg Ile Phe Ala Ser Leu Gly Thr Leu Gln Gly Gly Arg Tyr Gly Leu Phe Lys Thr Leu Ala Lys Ala Cys Arg Glu Leu Glu Ala Glu Leu Leu Ile Ala His Cys Gly Gly Leu Ser Asp Phe Gln Ala Arg Lys Leu Leu Arg Ala Gly Ala Ala Gln Val Ala Ala Phe Val Asn Gln Arg Ala Ala Leu Ala,Gln Ala Asp Val Ala Ile Thr His Gly Gly Leu Asn Thr Val Leu Asp Ala Val Thr Tyr Gly Thr SUBSTITUTE SHEET (RULE 26) Prc~ l,e~a. F~eta ~~.a 1'1e ~r~ Lei. Vila Phe asp Gln Pr~ Gl~r Ile Vila ~1~
3~0 3~5 350 t~rg Vela X~la Flis ~iis Gly Leu Gly Ib'(et ~g P~la her erg Phe 8er Thr 355 ~ 360 365 Ser Has Gln Ile Ala Arg Arg Leu Arg Arg Leu Leu Asp Asp Gly Ala Val Lys Gln Arg Met Thr Arg Leu Gln Pro Gln Leu Ala Ala Cys Gly Gly Val Gln Arg Ala Ala Glu Ile Thr Glu Arg Ala Leu Leu Thr Arg Gln Pro Val Arg Ala Glu Lys Tyr Tyr Asp Ile Ala Val <210>

<211>

<212>
DNA

<213> oea stevoartii Pant DC413 <400>

atgacatcgcagtatgatctgctgttgctcggcgccggtctggcgaacgggctgctggcg 60 ctgcggctgaaagcgctccagccgcagctgcgcgtgctggtgcttgatgcccacgcccac 120 gccggtggcaatcacacctggtgctttcacgaagaggatctcagcgccgcgcagcatcag 180 tggattgcgccgctggtggcgcaccgctggccgcactacgaggtacgctttcccgcgctg 240 acgcgccagcttaacagcggctatttttgcgtcacttccgcgcgctttgacgaggtgctg 300 cgcgcgacgctcggcgacgcgctgcggcttaaccagaccgtcgccagcagcggccccgat 360 cacgtgcagctcgccagcggcgaagtgctgcgcgcgcgcgccgtcattgacggccgcggc 420 tatcagcccgacgccgccctgcagattggctttcagtcttttgtcggtcaggagtggcgc 480 ctgagccagccgcatcagctggaggggccgattctgatggacgcggccgtggatcagcag 540 gggggctatcgcttcgtctataccctgccgctctcgccgacgcgtctgctgattgaagat 600 acccactatattaacgacgcctcgctggcgacggcgcaggcgcggcagaatatctgcgac 660 tacgccacccgccagggctggcagctggagacgctgctgcgcgaagagcgcggcgcgctg 720 ccgattacgctggcgggcgatttcgaccgcttctggcatcatcgcgccccctgcgtcggc 780 ctgcgcgccgggctttttcaccccacgaccggctactccctgccgctggcggcgacgctg 840 gcggacgcgctcgccgcagaggcggacttctcccctgaggcgctcgcgccgcgtattcac 900 SUBSTITUTE SHEET (RULE 26) cc~ctttc~,cc~c aggcagcgtc~ gcS;taa~cag ggctttttcc c~.~c~tgctta~. ccgeatgctg ttcctggcgg ccgagggcga tcggcgctgg cs~cgtaatgc agcc~ctttta cggcctgccc 1020 gaggc~gctga tcgcccggtt tta.cgccgga cggctgacgc tggccgaccg cgegcc~catt 3080 cttagcggca agccgccggt cccggtgctg gcggcgctgc aggctattct cacccaccct 1140 tctggacga~ gagcatcacg atga 1164 <210> 8 <211> 387 <212> PRT
<213> Pantoea stewartii DC413 <400> 8 Met Thr Ser Gln Tyr Asp Leu Leu Leu Leu Gly Ala Gly Leu Ala Asn Gly Leu Leu Ala Leu Arg Leu Lys AIa Leu Gln Pro Gln Leu Arg Val Leu Val Leu Asg Ala His Ala His Ala Gly Gly Asn His Thr Trp Cys Phe His Glu Glu Asp Leu Ser Ala AIa Gln His Gln Trp Ile Ala Pro 50 ' S5 ~ 60 Leu Val Ala His Arg Trp Pro His Tyr Glu Val Arg Phe Pro Ala Leu Thr Arg Gln Leu Asn Ser Gly Tyr Phe Cys Val Thr Ser Ala Arg Phe Asp Glu Val Leu Arg Ala Thr Leu Gly Asp Ala Leu Arg Leu Asn Gln Thr Val Ala Ser Ser Gly Pro Asp His Val Gln Leu Ala Ser Gly Glu VaI Leu Arg AIa Arg Ala Val Ile Asp GIy Arg Gly Tyr Gln Pro Asp Ala Ala Leu Gln Ile Gly Phe Gln Ser Phe Val Gly Gln Glu Trp Arg Leu Ser Gln Pro His Gln Leu Glu Gly Pro Ile Leu Met Asp Ala Ala SUBSTITUTE SHEET (RULE 26) 'i~al Asia Gln Gln fly Gl~ ~'~nr t'~rc~ Phe '~'al 'fir Thr Leu L~r~ Leu Ser Pro Thr Arg Leu Leu Ile Glu Asp Thr His Tyr Ile Asn Asp Ala Ser Leu Ala Thr Ala Gln Ala Arg Gln Asn Ile Cys Asp Tyr Ala Thr Arg Gln Gly Trp Gln Leu Glu Thr Leu Leu Arg Glu Glu Arg Gly Ala Leu 225 230 ~ 235 240 Pro Ile Thr Leu Ala Gly Asp Phe Asp Arg Phe Trp His His Arg Ala Pro Cys Val Gly Leu Arg Ala Gly Leu Phe His Pro Thr Thr Gly Tyr Ser Leu Pro Leu~Ala Ala Thr Leu Ala Asp Ala Leu Ala Ala Glu Ala Asp Phe Ser Pro Glu Ala Leu Ala Pro Arg Ile His Arg Phe Ala Gln Ala Ala Trp Arg Lys Gln Gly Phe Phe Arg Met Leu Asn Arg Met Leu Phe Leu Ala Ala Glu Gly Asp Arg Arg Trp Arg Val Met Gln Arg Phe Tyr Gly Leu Pro Glu Gly Leu Ile Ala Arg Phe Tyr Ala Gly Arg Leu Thr Leu Ala Asp Arg Ala Arg Ile Leu Ser Gly Lys Pro Pro Val ProJ

Val Leu Ala Ala Leu Gln Ala Ile Leu Thr His Pro Ser Gly Arg Arg Ala Ser Arg <210a 9 <211> 2482 SUBSTITUTE SHEET (RULE 26) < 21. 2 :~ I~~' <213a Pant~ea ste~~altii DC~23 <400> 9 atc~aac~cacaccacggtaattggegcaggatttggeggs~ctggcactggcaattcgcctc60 caggcageaggcgttccaaegcggetgetggac~cagegcgacaagccggc~eggecc~cgec120 tatgtttatCaggatCagggCtttaCCtttgaCgCgggCCCgaCggtgatCaCCgatCCg180 tCCgCtattgaagagetgttCgCCCtggCgggaaaatCgatgCgCgaCtatgtCgagCtg240 ctgccggtgacccctttttaccggctctgctgggagacgggcgaggtgtttaactacgat300 . aacgatcaggcgcgactggaagcggagatccgcaaatttaatccagccgacgtggcgggc360 tatcagcgcttcctcgactattcgcgcgccgtgttcgccgaaggctacctgaagctcggc420 accgtgccctttttgtcgttccgcgatatgctgcgcgccgcaccgcagctggcgcgcctg480 caggcgtggcgcagcgtttacagcaaggtggcgagctttatcgaggatgataagctgcgg540 ' caggccttttcgtttcactcgctgctggtcggcggcaaccccttcgccacctcgtcgatc600 tatacgctgatccacgcgctggagcgcgaatggggcgtctggtttccgcgcggcggcacc660 ggcgcgctggtgcagggcatgctgaagctgttccaggatttaggcggcacgctggagctg720 aacgcgcgcgtcagccatatcgaggcgaaagaggccgcgatttccgccgtgcatctggag780 gatggtcgggtatttgaaacccgcgeggtcgcctctaacgccgatgtggtgcatacctat840 ggcgatctgctcggcaggcaccccgccgccgccgcgcaggccaaaaagctgaaaggcaag900 cgcatgagcaactcgctgtttgtgctctattttggcctgaaccatcatcacgatcagctg960 gcgcaccacaccgtctgcttcgggccgcgctaccgtgagctgattgacgagatctttaac'1020 cgcgacgggctggcggaagatttctcgctctatctccatgcgccctgcgtgaccgatccc1080 tcgctggcgccgccgggctgcggcagctactacgtgctggcaccggttccccatcttggc1140 accgccgatctcgactggaacgttgaggggccgcgcctgcgcgatcgcattttcgcctat1200 ctcgaagagcactatatgcccggcctgcgcagccagctggtcactcaccg,catcttcacg1260 ccgttcgatttccgcgaccagcttaatgcctatcagggctctgcgttttccgttgagccg.1320 attttgcgcc agagcgcctg gttccggccc cataaccgcg acagccatat ccgcaatctc 2380 tatctggtcg gcgcgggtac gcacccaggc gcgggcattc ccggcgtgat cggttccgcc 1440 aaagccaccg caagcctgat gctggaggat ctgcatgcat as 1482 <210> 10 <211> 493 <212> PRT
<213> Pantoea stewartii DC413 SUBSTITUTE SHEET (RULE 26) ~Q00~ 10 Met L~rs Flis Thr Thb '~3a1 Ile Gly Ala Gl~ Phe Gl~ Gl~,r Leu t'~la T~e~a Ala Ile Arg Leu Gln r~la Ala Gly Val Pro Thr Arg Leu Leu Glu Gln Arg Asp Lys Pro Gly Gly Arg Ala Tyr Val Tyr Gln Asp Gln Gly Phe Thr Phe Asp Ala Gly Pro Thr Val Ile.Thr Asp Pro Ser Ala Ile Glu Glu Leu Phe Ala Leu Ala Gly Lys Ser Met Arg Asp Tyr Val Glu Leu Leu Pro Val Thr Pro Phe Tyr Arg Leu Cys Trp Glu Thr Gly Glu Val Phe Asn Tyr Asg Asn Asp Gln Ala Arg Leu Glu Ala Glu Ile Arg Lys Phe Asn Pro Ala Asp Val Ala Gly Tyr Gln Arg Phe Leu Asp Tyr Ser 115 120 ,125 Arg Ala Val Phe Ala Glu Gly Tyr Leu Lys Leu Gly Thr Val Pro Phe 130 135 140 ' Leu Ser Phe Arg Asp Met Leu Arg Ala Ala Pro Gln Leu Ala Arg Leu Gln Ala Trp Arg Sex Val Tyr Ser Lys Val Ala Ser Phe Ile Glu Asp Asp Lys Leu Arg Gln Ala Phe Ser Phe His Ser Leu Leu Val Gly Gly Asn Pro Phe Ala Thr Ser Ser Ile Tyr Thr Leu Ile His Ala Leu Glu Arg Glu Trp Gly Val Trp Phe Pro Arg Gly Gly Thr Gly Ala Leu Val Gln Gly Met Leu Lys Leu Phe G1n Asp Leu Gly C9ly Thr Leu Glu Leu SUBSTITUTE SHEET (RULE 26) c'~sn 2~1a erg Val Ser His Ile Glu t'~la L~,rs Glu r'~l.a cola Il~ Ser Ala 2~!.5 250 255 Val His Leu Glu Asp Gly Arg Val Phe Glu Thr Arg Ala Val Ala Ser Asn Ala Asp Val Val His Thr Tyr Gly Asp Leu Leu Gly Arg His Pro Ala Ala Ala Ala Gln Ala Lys Lys Leu Ljrs Gly Lys Arg Met Sex Asn Ser Leu Phe Val Leu Tyr Phe Gly Leu Asn His His His Asp Gln Leu Ala His His Thr Val Cys Phe Gly Pro Arg Tyr,Arg Glu Leu Ile Asp Glu Ile Phe Asn Arg Asp Gly Leu Ala Glu Asp Phe Ser Leu Tyr Leu His Ala Pro Cys Val Thr Asp Pro Ser Leu Ala Pro Pro Gly Cys Gly Ser Tyr Tyr Val Leu Ala Pro Val Pro His Leu Gly Thr Ala Asp Leu Asp Trp Asn Val Glu Gly Pro Arg Leu Arg Asp Arg Ile Phe Ala Tyr Leu Glu Glu~'His Tyr Met Pro Gly Leu Arg Ser Gln Leu Val Thr His Arg Ile Phe Thr Pro Phe Asp Phe Arg Asp Gln Leu Asn Ala Tyr Gln Gly Sex Ala Phe Ser Val Glu Pro Ile Leu Arg Gln Ser Ala Trp Phe Arg Pro His Asn Arg Asp Ser His Ile Arg Asn Leu Tyr Leu Val Gly Ala Gly Thr His Pro Gly Ala GIy Ile Pro Gly Val Ile Gly Ser Ala SUBSTITUTE SHEET (RULE 26) L~,~~ ~'~1~ Thr 2~1a Ser Lea Ffet T~eu Glu ~'~sp Taeu His ~'11a <214a 11 <211a 930 < 212 a DIt~A
<213a Pantoea stewartii DC413 <400> 11 atgcataatc cgacgctgct gcaccatgcc gtagagacga tggaagtcgg ttcgaagagt 60 ttcgccaccgcctcaaagctgttcgacgcgaaaacgcgccgcagcgtgctgatgctctac 120 gcctggtgccgccactgtgatgatgtgatcgatgaccagcagcttggctttccaggcgag 180 gttccttcggcgcagaccccgcagcagcgtctggcaaatctggagcgcaaaacccgccag.240 gcctacgcgggcgcgcaaatgcatgaacccgccttcgccgcctttcaggaggtggcgatc 300 gcccacgatatctctcccgcttacgctttcgaccatctggaagggtttgcaatggacgtc 360 cgcggcgcgcgttatgaaacctttcaggatacgctgcgctactgctaccacgtggcgggc 420 gtggtgggattaatgatggcgcagattatgggggtgcgcgacgaggcggtgctggatcgc 480 gcctgcgatctcggcctcgcctttcagctgaccaatattgcacgcgatatcgttgaggat 540 -gcgcgagtcggccgctgctatttgccggaaagctggctggaggaggccgggctggatcgt 600 cttcactttgccgatcgcgctcatcgcccggcgctggcgaatctggcgcggcggctggtg 660 agcgaggcggagccctactacgcctctgcgtcggccgggctggccgggctgccgctgcgc 720 tctgcgtgggcgatcgccacggcgaaagaggtttatcgccgcattggggttaaggtctac 780 ggcgcgggggaaacggcctgggatcgccgccagtccaccagcaagcaggagaagcttctg 840 ctgctggcggcgggggcggcgcaggcgatcaggtctcgggcggctgcttctccgccgcgt 940 cctgccgagctctggcagcgtccgcgttag 930 <210> 12 <211> 309 <212> PRT
<213> Pantoea stewartii DC413 <400> 12 Met His Asn Pro Thr Leu Leu His His Ala Val Glu Thr Met Glu Val Gly Ser Lys Ser Phe Ala Thr Ala Ser Lys Leu Phe Asp A1a Lys Thr 1~4 SUBSTITUTE SHEET (RULE 26) ~.g ~g Sci ~~l Lea ~l~t Leu '~'~ri Vila a~ ~s cog His Gys c'3s~a ~~~p 3 5 ~. ~ ~. 5 Val Ile i~sp Asp Gln Gln Leu Gl~ Phe Px~ Gly Glu Val Pro Sir t-'~la Gln Thr Pro Gln Gln Arg Leu Ala Asn Leu Glu Arg Lys Thr Arg Glaa Ala Tyr Ala Gly Ala Gln Met His Glu Pro Ala Phe Ala Ala Phe Gln Glu Val Ala Ile Ala His Asp Ile Ser Pro Ala Tyr Ala Phe Asp His Leu Glu Gly Phe Ala Met Asp Val Arg Gly Ala Arg Tyr Glu Thr Phe Gln Asp Thr Leu Arg Tyr Cys Tyr His Val Ala Gly Val Val Gly Leu Met Met Ala Gln Ile Met Gly Val Arg Asp Glu Ala Val Leu Asp Arg Ala Cys Asp Leu Gly Leu Ala Phe Gln Leu Thr Asn~Ile Ala Arg Asp Ile Val Glu Asp Ala Arg Val Gly Arg Cys Tyr Leu Pro Glu Ser Trp Leu Glu Glu Ala Gly Leu Asp Arg Leu His Phe Ala Asp Arg Ala His Arg Pro Ala Leu Ala Asn Leu Ala Arg Arg Leu Va1 Ser Glu Ala Glu Pro Tyr Tyr Ala Ser Ala Ser Ala G1y Leu Ala Gly Leu Pro Leu Arg Ser Ala Trp Ala Ile Ala Thr Ala Lys Glu Val Tyr Arg Arg Ile Gly Val Lys Val Tyr Gly Ala Gly Glu Thr Ala Trp Asp Arg Arg Gln Ser SUBSTITUTE SHEET (RULE 26) a Far Seb F~~Fs Glx~ Glu Lys Lean T~e~. Leu Lcva l~la Ala Gl~ r~la Ala Glra x~la -Ile r~rg Ser Arg Ala Ala Ala Ser Pro Pr~ cog Pro Ala Glu Leu Trp Gln Arg Pr~ Arg <210> 13 <211> 534 <212> DNA
<213> Pantoea stewartii DC413 <400> 13 , atgctgtggttgtggaatgctgggatcgtattactgaccgtcgtagcgatggagattacc 60 gccgcgctgtcgcataaatatattatgcacggctggggatggggctggcaccggtcgcat 120' catgaaccgcacagcggctggtttgaagtgaacgatctctatgctgtggtgttcgccggg 180 ctggcgattctgttgatctacctgggcagccgcggcgtctggccgctacagtggataggc 240 gcaggcatgacgctttacggcctgctctattttattgtgcatgacgggctggtacaccag 300 cgctggccttttaagtacataccgcgtcgcggctactttaaacgactctacatggcgcac 360 cggctgcaccatgcggtgcgcggccgggaagactgcgtctccttcggcttcctctatgcg 420 ccgccgctggagaaattacaggcgacgctgcgtcagcgtcacggacgtcggcctaacgcg 480 gacgctgccagagctcggcaggacgcggcggagaagcagccgcccgagacctga 534 <210> 14 <211> 177 <212> PRT

<213> Pantoea stewartii <400> 14 Met Leu Trp Leu Trp Asn Ala Gly Ile Val Leu Leu Thr Val Val Ala Met Glu Ile ~Thr Ala Ala Leu Ser His Lys Tyr Ile Met His Gly Trp Gly Trp Gly Trp His Arg Ser His His Glu Pro His Ser Gly Trp Phe Glu Val Asn Asp Leu Tyr Ala Val Val Phe Ala Gly Leu Ala Ile Leu SUBSTITUTE SHEET (RULE 26) Lcza Ile ~r F~e~ GI~~ Ser Agg Gly ~3'al T~ Pro Leu Gln 'lrp Ile Gl~r Ala GI~C r~et Thi Leu Tyr Gl~ Leu Leu Tyr Phe Ile ~'al -His Asla Gly ~s 90 Leu Val His Gln Arg Trp Pro Phe Lys Tyr Ile Pro Arg Arg Gly Tyr Phe Lys Arg Leu Tyr Met Ala His Arg Leu His His Ala Val Arg Gly Arg Glu Asp Cys Val Ser Phe Gly Phe Leu Tyr Ala Pro Pro Leu Glu Lys.Leu Gln Ala Thr Leu Arg Gln Arg His Gly Arg Arg Pro Asn Ala Asp.Ala Ala Arg Ala Arg Gln Asp Ala Ala.Glu Lys Gln Pro Pro Glu Thr <210> 15 <211> 19 <212> DNA
<213> Artificial Sequence <220>
<223> Primer HK12 <400> 15 gagtttgatc ctggctcag 19 <210> 16 <211> 15 <212> DNA
<213> Artificial Sequence <220>
<223> Primer JCR14 <400> 16 acgggcggtg tgtac <210> 17 <211> 16 <212> DNA
<213> Artificial Sequence SUBSTITUTE SHEET (RULE 26) <220s.
<223> P~'igttat J'Cl'~15 <~~00> 1~ , gccagcagcc gcggta 16 <210>

<211>

<212>
DNA

<213> ~ea stewartii Pant DC413 <400>

atgacgctggcggcaggcctaacacatgcaagtcgaacggtagcacagaggagcttgctc6o-ctcgggtgacgagtggcggacgggtgagtaatgtctgggaaactgcccgatggaggggga120 taactactggaaacggtagctaataccgcataacgtcgcaagaccaaagtgggggacctt'180 cgggcctcacaccatcggatgtgcccagatgggattagctagtaggtggggtaacggctc240 acctaggcgacgatccctagctggtctgagaggatgaccagccacactggaactgagaca300 cggtccagactcctacgggaggcagcagtggggaatattgcacaatgggcgcaagcctga360 tgcagccatgccgcgtgtatgaagaaggccttcgggttgtaaagtactttcagcggggag420 gaaggcgacgcggttaataaccgcgtcgattgacgttacccgcagaagaagcaccggcta480 actccgtgccagcagccgcggtaatacggagggtgcaagcgttaatcggaattactgggc540 gtaaagcgcacgcaggcggt.ctgtcaagtcggatgtgaaatccccgggcttaacctggga600 actgcattcgaaactggcaggctagagtcttgtagaggggggtagaattccaggtgtagc660 ggtgaaatgcgtagagatctggaggaataccggtggcgaaggcggccccctggacaaaga720 ctgacgctcaggtgcgaaagcgtggggagcaaacaggattagataccctggtagtccacg780 ccgtaaacgatgtcgacttggaggctgttcccttgaggagtggcttccggagctaacgcg840 ttaagtcgaccgcctggggagtacggccgcaaggttaaaactcaaatgaattgacggggg900 cccgcacaagcggtggagcatgtggtttaattcgatgcaacgcgaagaaccttacctact960 cttgacatccagagaacttggcagagatgcattggtgccttcgggaactctgagacaggt1020 gctgcatggctgtcgtcagctcgtgttgtgaaatgttgggttaagtcccgcaacgagcgc1080 aacccttatcctttgttgccagcgattcggtcgggaactcaaaggagactgccggtgata1140 aaccggaggaaggtggggatgacgtcaagtcatcatggcccttacgagtagggctacaca1200 cgtgctacaa-tggcgcatacaaagagaagcgacctcgcgagagcaagcggacctcataaa7.260 gtgcgtcgtagtccggatcggagtctgcaactcgactccgtgaagtcggaatcgctagta1320 atcgtggatcagaatgccacggtgaatacgt 1352 SUBSTITUTE SHEET (RULE 26) <210s19 <211s2~

<212sD~T~

<213sAgtificial 5ec~uence <220>

<223sPrimes TET-1FP-1 <400a 19 gggtgcgcat gatcctctag agt 23 <210> 20 <211> 9127 <212> DNA
<213> Pantoea stewartii DC413 <400> , cgcggctcggtgcagcataccgacccgcagggcgagcgcacgccgctgtcggctaaacgg 60 gtgatggtggtaaatgcgggtgatggtctgcggcatgaagagtccgcgccgctgattgaa 120 gctgaactgttgcaggcgtttatccgcccggcgcgcgccggaggcgaagcgaaaaccctg 180 aagctgctgcgcgcttgcgtccgtcgcccataacgcctggacatggctggcggggccgga 240 aggcagtgccgcgccgctgacgctgcgtcaggcggtgcatatctacgacgtacggctgga 300 gcgtggcaaaacgctggcggtgcccgcgctgcccggcttcgcgccctggctggcggtgct 360 ggacggcgtggtgaatctggagggacagcgtctgcataaaggggaegtggccggtgacgc 420 cgacgcattgccagaggtaaccgccgagcgcgacgccacgctgatctgttttctggtgga..480 tgcggaagcgaccggcgtgatgagcggaaccgttagcggcagttgataataagggaaggt 540 aaataccttcccttttttatcaggcgtggtgacgggcgcgctgtagtcggtgccggaact 600 gacgtacctcttcgcgcgaccagcagcgttctctgaacgctacgttttgcagcggcgccg 660 gcagtggtaccccatccagcgatccgcgat~tgcagtaggcacgcaccagcctcagcggag 720 agacccccaggtagcgcgcaacatcagacagcgtcatcaattcacttttcatacccatct 780 ccttttgcgagcgattcactgcatcttcgattcctttcaccgcgcgttgtttgatctggc 840 gcattgaaccagtgattaaattttgagcaaggcgtaaaaattcatcatttcttaactggc 900 gaagcggcagattttcacccaggcttgaggctctttccttctgcgtgcgaggatgcgtga 960 aactctacccactgctgcttatcagcctgctgtttcctctccttgccgctgcacacagcg 1020 ttgaaacgggaaaacctcttccgccggtctttatcgccaacgagggcgagctggtgatgc 1080 agcatgagatgctgggctatcagcgctggagcagccagtcgctgaccggcaaagtccgca 1140 tggtgatccacgtcgccgggcggctgtcggccaaagagcaaaacgcgccgctgatcgccg 1200 ccgtccagcaggcaaactttccgcgcagccgctttcagaccaccactatcgtcaataccg 1260 SUBSTITUTE SHEET (RULE 26) acc~acgccattccasggcac~cgcc~.tttttgtcc~ac~cgcagcattaaatcgagcaagcgcg1320 acgcgccctgc~cagcatttcgtgatcgacagcagcggtgtggcgcgcaacagctggcagc1380 tgacgccgcacggctccgccgtgatgctgctggacgcacagggggtggtgcgcttce~caalQ.a.O

gagatggCgCgCtgaCgCCgCaggaggtCc'3ggCaggCtattgCtttgCttar,'~CCa~CtgC150~

tttCCgCtgCtgCggCgCCatCCgaCCCggCaCttagttCgtgaataaaattattatttt1560 attatCaCttatCtCCgttttgCCCgtCagCaggCgaattCtCgaCtaCgCtttatttCa1620 CCgtCtCgCCaaaaCCaaaCaaCaatgCtgatctgcgacgacgctaaaaataacaggttc1680 gacgttaattattagatggctctttctgcgccactttgttcatttgcaattacgacaggc1740 cgacgctcacctgcaagtaaggactgccattatgaccatttttgctgaaagagactctac1800 tctcatctacagcgatcctctgatgttactggcgattattgaacagcgtcttgaccgact1860 gctgccggtagaaagcgaacgagactgcgtggggctcgccatgcgcgaaggcgcgctggc1920 accgggcaaacgcatccggccggtactgctgatgctggccgctcacgaccttggctatcg1980 cgacgaactcagcgggctgctcgacttcgcctgcgccgtcgagatggtgc~atgccgcctc 2040 gctgatactcgacgatattccctgcatggacgatgccgaactgcggcgcggccggccgac 2100 aatccatcgccagttcggcgagccggtggcgattctcgccgccgtcgccctgctgagccg 2160 cgccttcggcgtgattgcgctggcggacggcatcagcagccaggcgaagacccaggccgt 2220 ggcggagctttcccattcagtcggcattcaggggctggtgcagggacagtttctcgatct 2280 gaccgaaggcggccagccgcgcagcgccgacgccattcagctgaccaaccactttaaaac 2340 cagcgcgctgttcagcgcggcgatgcagatggccgccatcatcgccggcgcgccgctggc 2400 gtcgcgtgaaaagctgcaccgcttcgcgcgggatctcggccaggcctttcagctgctgga 2460 cgacctgaccgacggccagagcgacacgggaaaagatgcccatcaggacgtggggaaatc 2520 gacgctggtgaacatgctgggcagcaaagcggtagaaaagcgcctgcgcgaccatctgcg 2580 acgcgccgatcgccacctcgcttcggcctgcgacagcggctacgccacccggcacttcgt 2640 gcaggcctggttcgataaaaaactcgctatggtcggctgaccgcgcgtttcctgtctgag 2700 tatatggagcagcaatgaaggacacggacctgacgaagcgcaaaaacgatcatctggaca.,2760 ttgttctgcgtaataccgcgccggcgtcgggcagcttcgcccgctggcactttacccact 2820 gcgccctgccggagctgcacctggatcagatcgatctgcgcacgcggctgttcgatcgcc 2880 ccatgcaggcgccctttcttattagctcaatgaccggcggcgcggcgcgcgccctctcga 2940 ttaatcatcatcttgccgaagcggcgcagacgctgggtctggcgctgggggtcggttcgc 3000 agcgcgtggcgctggaaagcgacaacgattctggcctgacgcgcgatttacgccgtatcg 3060 SUBSTITUTE SHEET (RULE 26) ecccggatattccgcctg~ctc~gcc~aacctcggcg~ggcgcagattctgggcgaacac~ggcc32.20 gcaggctggca~cgaaatgcggtaagcatgatcgaggcggatgcgctgatcgtccatctta 3180 atccgctgca-gc~aagcc~ctgcagcgcggcggcc~atcgcgactggcgcggcgtactc~cagg32Q0 cgattgcgcagctggtgaagtcgctggaggtc~ccggtggtggtgaaagaggttggcgcgg 3300 gcatctcggccgaggttgegcagcggctcgccgaggcgggcgtcagcatgatcgatatcg 3360 caggtgcgggcggcaccagctgggcggcggtagagggcgaacgcgccagcaccccgcagc 3420 agcgcgcggtggcgatggcctttgccagct-ggggtattcccacagatgaagccttacgcg 3480 cggtgcgcgacaggctgcctgccataccgcttatcgcctcaggcggcatccgcgacggca 3540 tcgacgcggcgaaggcgctgcggctcggcgcggatatcgttggccaggcggcggcggtgc 3600 tcagcagcgccctgcactctacggatgcggtggtcgcgca.ctttaacacgctgattgaac 3660 .

agctgcgcgtcgcctgtttctgcaccggcagcgctaatctgcgccagctgcgccttgcgc 3720 cgctgcatcgcgccggagaaacgctatgagccatttcgccgcgatcgcccctccctttta 3780 cagccacgtgcgcgcgcttcaggcgctggcgcagagcctg.atagcgcgcggccatcgggt 3840 gacctttattcagcaggcggaggttgccaccctgctcagcgacgccgctatcggctttca 3900 cgccatcggcctggaaacgcatcctgtcggcacgctcgaccgtacgctggcgctggcggc 3960 ccatcccggcggcctgggcattctgcgcctgatccgcgatatggccagcagcaccgatat 4020 ~gctgtgccgcgagctgccggaggcgctgcgggcgctggcggtagatggcgtgatcgtcga 4080 tcagatggcgccagcgggcgggctggtggcggaggcgctg.cggctgcccttcgtttcggt 43.40 cgcctgcgccctgccggtcaatcgtgaagcccattttccattgccggtcatgcctttttt4200 gtggggtactagcagcgccgcgcgcgagcggttcgcctccagcgaaaaaatttatgactg4260 gctgatgcgcagccacgatcgcgtgctggcgcgccatgccgacgcctttggccttgccga4320 ccgccgtcagccgcaccagtgcctgtcgccgctggegcaaatcagccagctgccgcacgc4380 cctcgactttccgcgccgcgagctgccggcccatttccacgccaccggcccgctgcgcga4440 accgcccgccgctgccgcagcgccgctgttcagtaaccgcggccagccgcgcattttcgc4500 ctcgctcggcacgctgcagggcggccgttacgggctgtttaaaacgctggcaaaagcctg4560 ccgcgaactggaggcggagctgctgatcgcccactgcggcggcctgagcgattttcaggc4620 gcgtaaactgctgcgcgccggggcggcgcaggtagccgcctttgtcaatcagcgcgccgc. 4680 gctggcgcaggcggacgtggccattacccacggcggcttaaatacggtgctcgacgccgt4740 aacctatggcacgccgctgctggcgattccgctggcattcgatcagcccggcattgccgc4800 gcggctggcgcaccatggcctggggatgcgcgcgtcgcgcttctccaccagccatcagat4860 SUBSTITUTE SHEET (RULE 26) tc~,rcgce~tcgc cte~cgtcgcc tgctggacga tc~gtc~cggtt aagcagcgca tgacgcgcct ~92~~
gcagecgcag ctggccgcct c~cggcggcgt cgagcgcgcg gctgagatta ccgagcgcgc X980 gctgctgacg cgccagccgg-tgcgcgcgga gaagtactat gaeatcgcag tatgatctgc --a0~!0 tgttgctcgg cgccggtctg gcgaacgggc tgctggcgct gcggctgaaa gcgctccagc 5100 cgcagctgcg cgtgctggtg cttgatgccc acgcccacgc cggtggcaat cacacctggt 5160 gctttcacga agaggatctc agcgccgcgc agcatcagtg gattgcgccg ctggtggcgc 5220 accgctggcc gcactacgag-gtacgctttc ccgcgctgac gcgccagctt aacagcggct 5280 atttttgcgt cacttccgcg cgctttgacg aggtgctgcg cgcgacgctc ggcgacgcgc 5340 tgcggcttaa ccagaccgtc gccagcagcg gccccgatca cgtgcagctc gccagcggcg 5400 aagtgctgcg cgcgcgcgcc gtcattgacg gccgcggcta tcagcccgac gccgccctgc 5460 agattggctt tcagtctttt gtcggtcagg agtggcgcct'gagccagccg catcagctgg 5520 aggggccgat tctgatggac gcggccgtgg atcagcaggg gggctatcgc ttcgtctata 5580 ccctgccgct ctcgccgacg cgtctgctga ttgaagatac ccactatatt aacgacgcct 5640 cgctggcgac ggcgcaggcg cggcagaata tctgcgacta cgccacccgc cagggctggc 5700 agctggagac gctgctgcgc gaagagcgcg gcgcgctgcc gattacgctg gcgggcgatt 5760 tcgaccgctt ctggcatcat cgcgccccct gcgtcggcct~gcgcgccggg ctttttcacc 5820 ccacgaccgg ctactccctg ccgctggcgg cgacgctggc,ggacgcgctc gccgcagagg 5880 cggacttctc ccctgaggcg ctcgcgccgc gtattcaccg ctttgcgcag gcagcgtggc 5940 gtaaacaggg'ctttttccgc atgcttaacc gcatgctgtt cctggcggcc gagggcgatc 6000 ggcgctggcg cgtaatgcag cgcttttacg gcctgcccga ggggctgatc gcccggtttt 6060 acgccggacggctgacgctggccgaccgcgcgcgcattcttagcggcaagccgccggtcc6120 cggtgctggcggcgctgcaggctattctcacccacccttc-tggacgaagagcatcacgat6180 gaagcacaccacggtaattggcgcaggatttggcgggctggcactggcaattcgcctcca6240 ggcagcaggcgttccaacgcggctgctggagcagcgcgacaagccgggcggccgcgccta6300 tgtttatcaggatcagggctttacctttgacgcgggcccgacggtgatcaccgatccgtc6360 cgctattgaagagctgttcgccctggcgggaaaatcgatgcgcgactatgtcgagctgct.

gccggtgacccctttttaccggctctgctgggagacgggcgaggtgtttaactacgataa6480 cgatcaggcgcgactggaagcggagatccgcaaatttaatccagccgacgtggcgggcta6540 tcagcgcttcctcgactattcgcgcgccgtgttcgccgaaggctacctgaagctcggcac6600 cgtgccctttttgtcgttccgcgatatgctgcgcgccgcaccgcagctggcgcgcctgca6660 SUBSTITUTE SHEET (RULE 26) ggcc~tgc~cgc~c~cr~tttacagcaaggtc~gcc~agctttatcgaggatgataas~ctgcc~gc-672~

~Q~'CCttttC~tttC3CtCgCt~jCtg~tC~'~CggCc.~lr'L.CCC~."ttCgCCc'LCCtCgtC~r~tCt~1 tc'~.C~Ctgr."itCCaC~C~Ct~g~~CgCgc~c~,gg~C~tCt~OjtttCCgC~CgQ~'Cg~C~sCC~t~
t~

CgCgCtggtgCagggCatgCtgaagCtgttCCaggatttaggCggC~CgCtggagCtg3~69~~

CgCgCgCgtCagCCatatCgaggCgaaagaggCCgCgatttCCgCCgtgCatCtggagga696~

tggtcgggtatttgaaacccgCgCggtCgCCtCtaaCgCCgatgtggtgCataCCtatgg702~

cgatctgctcggcaggcaccccgccgccgccgcgcaggccaaaaagctgaaaggcaagcg7080 ~catgagcaactcgctgtttgtgctctattttggcctgaaccatcatcacgatcagctggc7140 gcaccacaccgtctgcttcgggccgcgctaccgtgagctgattgacgagatctttaaccg7200 cgacgggctggcggaagatttctcgctctatctccatgcgccctgcgtgaccgatccctc7260 gctggcgccgccgggctgcggcagctactacgtgctggcaccggttccccatcttggcac7320 cgccgatctcgactggaacgttgaggggccgcgcctgcgcgatcgcattttcgcctatct7380 cgaagagcactatatgcccggcctgcgcagccagctggtcactcaccgcatcttcacgcc7440 ,gttcgatttccgcgaccagcttaatgcctatcagggctctgcgttttccgttgagccgat7500 ~

tttgcgccagagcgcctggttccggccccataaccgcgacagccatatccgcaatctcta7560 tctggtcggcgcgggtacgcacccaggcgcgggcattcccggcgtgatcggttccgccaa7620 agccaccgcaagcctgatgc.tggaggatctgcatgcataatccgacgctgctgcaccatg7680 ccgtagagacgatggaagtcggttcgaagagtttcgccaccgcctcaaagctgttcgacg 7740 cgaaaacgcgccgcagcgtgctgatgctctacgcctggtgccgccactgtgatgatgtga 7800 tcgatgaccagcagcttggctttccaggcgaggttccttcggcgcagaccccgcagcagc 7860 gtctggcaaatctggagcgcaaaacccgccaggcctacgcgggcgcgcaaatgcatgaac 7920 ccgccttcgccgcctttcaggaggtggcgatcgcccacgatatctctcccgcttacgctt 7980 tcgaccatctggaagggtttgcaatggacgtccgcggcgcgcgttatgaaacctttcagg 8040 atacgctgcgctactgctaccacgtggcgggcgtggtgggattaatgatggcgcagatta 8100 tgggggtgcgcgacgaggcggtgctggatcgcgcctgcgatctcggcctcgcctttcagc 8160 tgaccaatattgcacgcgatatcgttgaggatgcgcgagtcggccgctgctatttgccgg 8220 aaagctggct ggaggaggcc gggctggatc gtcttcactt tgccgatcgc gctcatcgcc 8280 cggcgctggc gaatctggcg cggcggctgg tgagcgaggc ggagccctac tacgcctctg 8340 cgtcggccgg gctggccggg ctgccgctgc gctctgcgtg ggcgatcgcc acggcgaaag 8400 aggtttatcg ccgcattggg gttaaggtct acggcgcggg ggaaacggcc tgggatcgcc 8460 SUBSTITUTE SHEET (RULE 26) c~ccagtccaccagcaa.gcaggagaagcttctgjctc~cte~gcg~fcgc~gggcc~c~cc~cac~g'eg~8~2~

tC~g'gtCtCg'g'gCg'~CtgCttCtCC~CCgCgtCCtQ~'CC~dgCtCtg~Cag'C~tCCgC~tt aggCCgaCgtCC~~gaC~CtgaCgCagCgtC~CCtg'~~c2.tttCtCCagCg~CggCs~Cc~t2l.~~~~

gaggaagCC~aaggagaCgCagtCttCCCggCCgCgCaCCgCatggtgCdgCCggtgCgC ~7~~

catgtagagtcgtttaaagtagccgcgacgcggtatgtacttaaaaggccagcgctggtg 8'160 taccagcccgtcatgcacaataaaatagagcaggccgtaaagcgtcatgcctgcgcctat 8820 ccactgtagcggccagacgccgcggctgcccaggtagatcaacagaatcgccagcccggc 8880 ~

gaacaccacagcatagagatcgttcacttcaaaccagccgctgtgcggttcatgatgcga 8940 ccggtgccagccccatccccagccgtgcataatatatttatgcgacagcgcggcggtaat 9000 ctccatcgctacgacggtcagtaatacgatcccagcattccacaaccacagcatatcttc 9060 tcccgtcagtgcatcctgccagccagcgcaggctggccatcatcagctgcggcacgccgc 9120 aggcgaa 9127 <210> 21 <211> 29 <212> DNA
<213> artificial sequence <220>
<223> Primer pWEB413F
<400> 21 .gaattctgca agtaaggact gccattatg 29 <210> 22~
<211> 28 <212> DNA
<213> artificial sequence <220>
<223> Primer pWEB413R
<400> 22 gaattctaac gcggacgctg ccagagct 28 SUBSTITUTE SHEET (RULE 26)

Claims (35)

1. An isolated nucleic acid molecule encoding a carotenoid biosynthetic pathway enzyme, selected from the group consisting of:
(a) an isolated nucleic acid molecule encoding the amino acid sequence selected from the group consisting of SEQ ID
NOs:2, 4, 6, 8, 10, 12 and 14;
(b) an isolated nucleic acid molecule that hybridizes with (a) under the following hybridization conditions: 0.1X SSC, 0.1 % SDS, 65°C and washed with 2X SSC, 0.1 % SDS
followed by 0.1X SSC, 0.1 % SDS; and (c) an isolated nucleic acid molecule that is complementary to (a) or (b).
2. The isolated nucleic acid molecule of Claim 1 selected from the group consisting of SEQ ID NOs:1, 3, 5, 7, 9, 11 and 13.
3. An isolated nucleic acid fragment of Claim 1 isolated from Pantoea stewartii strain DC413.
4. A polypeptide encoded by the isolated nucleic acid molecule of Claim 4.
5. The polypeptide of Claim 7 selected from the group consisting of SEQ ID NOs:2, 4, 6, 8, 10, 12 and 14.
6. An isolated nucleic acid molecule as set forth in SEQ ID
NO:20, comprising the crtE, idi, crtX, crtY, crtI, crtB, and crtZ, genes or an isolated nucleic acid molecule having at least 95% identity to SEQ ID
NO:20, wherein the isolated nucleic acid molecule encodes all of the polypeptides crtE, idi, crtX, crtY, crtI, crtB, and crtZ.
7. An isolated nucleic acid molecule comprising a first nucleotide sequence encoding a geranylgeranyl pyrophosphate synthetase enzyme of at least 302 amino acids that has at least 70% identity based on the Smith-Waterman method of alignment when compared to a polypeptide having the sequence as set forth in SEQ ID NO:2;
or a second nucleotide sequence comprising the complement of the first nucleotide sequence.
8. An isolated nucleic acid molecule comprising a first nucleotide sequence encoding an isopentenyl pyrophosphate isomerase enzyme of at least 344 amino acids that has at least 70% identity based on the Smith-Waterman method of alignment when compared to a polypeptide having the sequence as set forth in SEQ ID NO:4;

or a second nucleotide sequence comprising the complement of the first nucleotide sequence.
9. An isolated nucleic acid molecule comprising a first nucleotide sequence encoding an crtX enzyme of at least 429 amino acids that has at least 70% identity based on the Smith-Waterman method of alignment when compared to a polypeptide having the sequence as set forth in SEQ
ID NO:6;
or a second nucleotide sequence comprising the complement of the first nucleotide sequence.
10. An isolated nucleic acid molecule comprising a first nucleotide sequence encoding a lycopene cyclase enzyme of at least 387 amino acids that has at least 70% identity based on the Smith-Waterman method of alignment when compared to a polypeptide having the sequence as set forth in SEQ ID NO:8;
or a second nucleotide sequence comprising the complement of the first nucleotide sequence.
11. An isolated nucleic acid molecule comprising a first nucleotide sequence encoding a phytoene desaturase enzyme of at least 493 amino acids that has at least 81 % identity based on the Smith-Waterman method of alignment when compared to a polypeptide having the sequence as set forth in SEQ ID NO:10;
or a second nucleotide sequence comprising the complement of the first nucleotide sequence.
12. An isolated nucleic acid molecule comprising a first nucleotide sequence encoding a phytoene synthase enzyme of at least 309 amino acids that has at least 70% identity based on the Smith-Waterman method of alignment when compared to a polypeptide having the sequence as set forth in SEQ ID NO:12;
or a second nucleotide sequence comprising the complement of the first nucleotide sequence.
13. An isolated nucleic acid molecule comprising a first nucleotide sequence encoding a .beta.-carotene hydroxylase enzyme of at least 177 amino acids that has at least 82% identity based on the Smith-Waterman method of alignment when compared to a polypeptide having the sequence as set forth in SEQ ID NO:14;
or a second nucleotide sequence comprising the complement of the first nucleotide sequence.
14. A chimeric gene comprising the isolated nucleic acid molecule of any one of Claims 1 or 7-13 operably linked to suitable regulatory sequences.
15. A vector comprising the isolated nucleic acid molecule of Claim 6.
16. A transformed host cell comprising the chimeric gene of Claim 14.
17. A transformed host comprising the isolated nucleic acid molecule of claim 6.
18. The transformed host cell of Claim 16 or 17 wherein the host cell is selected from the group consisting of bacteria, yeast, filamentous fungi, algae, and green plants.
19. The transformed host cell of Claim 16 or 17 wherein the host cell is selected from the group consisting of Aspergillus, Trichoderma, Saccharomyces, Pichia, Candida, Hansenula, Yarrowia, Rhodosporidium, Lipomyces, Salmonella, Bacillus, Acinetobacter, Zymomonas, Agrobacterium, Flavobacterium, Rhodobacter, Rhodococcus, Streptomyces, Brevibacterium, Corynebacteria, Mycobacterium, Escherichia, Pantoea, Pseudomonas, Methylomonas, Methylobacter, Methylococcus, Methylosinus, Methylomicrobium, Methylocystis, Alcaligenes, Synechocystis, Synechococcus, Anabaena, Thiobacillus, Methanobacterium, Klebsiella, Methylophilus, Methylobacillus, Methylobacterium, Hyphomicrobium, Xanthobacter, Paracoccus, Nocardia, Arthrobacter, Rhodopseudomonas, Torulopsis, Phaffia, and Rhodotorula.
20. A method for the production of carotenoid compounds comprising:
(a) providing a transformed host cell comprising:
(i) suitable levels of farnesyl pyrophosphate; and (ii) a set of nucleic acid molecules encoding the enzymes selected from the group consisting of SEQ
ID NOs:2, 4, 6, 8, 10, 12 and 14 under the control of suitable regulatory sequences;
(b) contacting the host cell of step (a) under suitable growth conditions with an effective amount of a fermentable carbon substrate whereby a carotenoid compound is produced.
21. A method for the production of carotenoid compounds comprising:
(a) providing a transformed host cell comprising:
(i) suitable levels of farnesyl pyrophosphate; and (ii) a the isolated nucleic acid molecule of claim 6 under the control of suitable regulatory sequences;
(b) contacting the host cell of step (a) under suitable growth conditions with an effective amount of a fermentable carbon substrate whereby a carotenoid compound is produced.
22. A method according to Claim 20 or 21 wherein the transformed host cell is selected from the group consisting of C1 metabolizing hosts, bacteria, yeast, filamentous fungi, algae, and green plants.
23. A method according to Claim 22 wherein the C1 metabolizing host is a methanotroph and the fermentable carbon substrate is selected from the group consisting of methane, methanol, formaldehyde, formic acid, methylated amines, methylated thiols, and carbon dioxide.
24. A method according to Claim 23 wherein the C1 metabolizing host:
(a) grows on a C1 carbon substrate selected from the group consisting of methane and methanol; and (b) comprises a functional Embden-Meyerhof carbon pathway, said pathway comprising a gene encoding a pyrophosphate-dependent phosphofructokinase enzyme.
25. A method according to Claim 24 wherein the C1 metabolizing host cell is a high growth methanotrophic bacterial strain, known as Methylomonas 16a and having the ATCC designation PTA 2402.
26. A method according to Claim 20 or 21 wherein the transformed host cell is selected from the group consisting of Aspergillus, Trichoderma, Saccharomyces, Pichia, Candida, Hansenula, Yarrowia, Rhodosporidium, Lipomyces, Salmonella, Bacillus, Acinetobacter, Zymomonas, Agrobacterium, Flavobacterium, Rhodobacter, Rhodococcus, Streptamyces, Brevibacterium, Corynebacteria, Mycobacterium, Escherichia, Pantoea, Pseudomonas, Methylomonas, Methylobacter, Methylococcus, Methylosinus, Methylomicrobium, Methylocystis, Alcaligenes, Synechocystis, Synechococcus, Anabaena, Thiobacillus, Methanobacterium, Klebsiella, Methylophilus, Methylobacillus, Methylobacterium, Hyphomicrobium, Xanthobacter, Paracoccus, Norcardia, Arthrobacter, Rhodopseudomonas, Torulopsis, Phaffia, and Rhodotorula.
27. A method according to Claim 20 or 21, wherein the carotenoid compound produced is selected from the group consisting of antheraxanthin, adonirubin, adonixanthin, astaxanthin, canthaxanthin, capsorubrin, .beta.-cryptoxanthin, .alpha.-carotene, .beta.-carotene, epsilon-carotene, echinenone, .3-hydroxyechinenone, 3'-hydroxyechinenone, .gamma.-carotene, 4-keto-.gamma.-carotene, .zeta.-carotene, .alpha.-cryptoxanthin, deoxyflexixanthin, diatoxanthin, 7,8-didehydroastaxanthin, fucoxanthin, fucoxanthinol, isorenieratene, lactucaxanthin, lutein, lycopene, myxobactone, neoxanthin, neurosporene, hydroxyneurosporene, peridinin, phytoene, rhodopin, rhodopin glucoside, 4-keto-rubixanthin, siphonaxanthin, spheroidene, spheroidenone, spirilloxanthin, 4-keto-torulene, 3-hydroxy-4-keto-torulene, uriolide, uriolide acetate, violaxanthin, zeaxanthin-.beta.-diglucoside, and zeaxanthin.
28. A method of regulating carotenoid biosynthesis in an organism comprising over-expressing at least one carotenoid biosynthetic pathway gene selected from the group consisting of SEQ ID NOs: 1, 3, 5, 7, 9, 11, 13 and 20 in an organism such that the carotenoid biosysthesis is altered in the organism.
29. A method according to Claim 28 wherein said carotenoid gene is over-expressed on a multicopy plasmid.
30. A method according to Claim 28 wherein said carotenoid gene is operably linked to an inducible or regulated promoter.
31. A method according to Claim 28 wherein said carotenoid gene is expressed in antisense orientation.
32. A method according to Claim 28 wherein said carotenoid gene is disrupted by insertion of foreign DNA into the coding region.
33. A strain DC413 comprising the 16S rDNA sequence as set forth in SEQ ID NO:18.
34. An isolated nucleic acid molecule encoding all of the amino acid sequences as set forth in SEQ ID NO:2, 4, 6, 8, 10, 12, and 14.
35. The isolated nucleic acid molecule of claim 40 having the nucleic acid sequence as set forth in SEQ ID NO:20.
CA002527557A 2003-07-17 2004-07-14 Genes encoding carotenoid compounds Abandoned CA2527557A1 (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US48818303P 2003-07-17 2003-07-17
US60/488,183 2003-07-17
US52708303P 2003-12-03 2003-12-03
US60/527,083 2003-12-03
PCT/US2004/022747 WO2005010156A2 (en) 2003-07-17 2004-07-14 Genes encoding carotenoid compounds

Publications (1)

Publication Number Publication Date
CA2527557A1 true CA2527557A1 (en) 2005-02-03

Family

ID=34107764

Family Applications (1)

Application Number Title Priority Date Filing Date
CA002527557A Abandoned CA2527557A1 (en) 2003-07-17 2004-07-14 Genes encoding carotenoid compounds

Country Status (4)

Country Link
EP (1) EP1646713A2 (en)
AU (1) AU2004260061A1 (en)
CA (1) CA2527557A1 (en)
WO (1) WO2005010156A2 (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7851199B2 (en) 2005-03-18 2010-12-14 Microbia, Inc. Production of carotenoids in oleaginous yeast and fungi
US8691555B2 (en) 2006-09-28 2014-04-08 Dsm Ip Assests B.V. Production of carotenoids in oleaginous yeast and fungi
EP3630943A4 (en) * 2017-06-01 2021-06-09 Knipbio, Inc. Heterologous carotenoid production in microorganisms
CN115807026B (en) * 2022-08-01 2023-09-01 深圳大学 Construction method and application of astaxanthin synthesis path in Chlamydomonas reinhardtii
CN117264865B (en) * 2023-11-21 2024-03-12 清华大学 Recombinant rhodococcus erythropolis and application thereof in compound synthesis

Also Published As

Publication number Publication date
EP1646713A2 (en) 2006-04-19
AU2004260061A1 (en) 2005-02-03
WO2005010156A3 (en) 2007-05-03
WO2005010156A2 (en) 2005-02-03

Similar Documents

Publication Publication Date Title
EP1778855B1 (en) Carotenoid hydroxylase enzymes
US6929928B2 (en) Genes encoding carotenoid compounds
US7105634B2 (en) Genetic constructs encoding carotenoid biosynthetic enzymes
US7232666B2 (en) Optimized bacterial host strains of Methylomonas sp. 16a
JP5624974B2 (en) New carotenoid ketolase
US7064196B2 (en) Genes encoding carotenoid compounds
US7288387B2 (en) Genes of strain DC413 encoding enzymes involved in biosynthesis of carotenoid compounds
US20060234334A1 (en) Bacterial hemoglobin genes and their use to increase carotenoid production
CA2527557A1 (en) Genes encoding carotenoid compounds

Legal Events

Date Code Title Description
FZDE Dead