WO1998054331A2

WO1998054331A2 - High level expression of glycosyltransferases

Info

Publication number: WO1998054331A2
Application number: PCT/IB1998/000975
Authority: WO
Inventors: Warren W. Wakarchuk; N. Martin Young
Original assignee: National Research Council Of Canada
Priority date: 1997-05-27
Filing date: 1998-05-26
Publication date: 1998-12-03
Also published as: AU7671298A; WO1998054331A3

Abstract

This invention provides for methods for obtaining high level expression of glycosyltransferases. The methods involve expression of glycosyltransferases in host cells that are deficient in proteolytic enzymes that recognize sites having two or more adjacent basic amino acid residues. In another embodiment, the methods involve expression of glycosyltransferases using modified glycosyltransferase genes from which one or more such proteolytic recognition sites have been eliminated.

Description

HIGH LEVEL EXPRESSION OF GLYCOSYLTRANSFERASES

BACKGROUND OF THE INVENTION

Field of the Invention This invention pertains to the field of expression of glycosyltransferases in prokaryotic host cells.

Background

Carbohydrates are now recognized now as being of major importance in many cell-cell recognition events, notably the adhesion of bacteria and viruses to mammalian cells in pathogenesis and leukocyte-endothelial cell interaction through selectins in inflammation (Narki (1993) Glycobiology 3: 97-130). The oligosaccharide structures involved are potential therapeutic agents but they are time consuming and expensive to make by traditional chemical means.

A very promising route to production of specific oligosaccharide structures is through the use of the enzymes which make them in vivo, the glycosyltransferases. Such enzymes can be used as regio- and stereoselective catalysts for the in vitro synthesis of oligosaccharides (Ichikawa et al. (1992) Anal. Biochem. 202: 215-238). Large scale enzymatic synthesis of oligosaccharides depends on the availability of sufficient quantities of the required glycosyltransferases. However, production of glycosyltransferases in sufficient quantities for use in preparing oligosaccharide structures has been problematic. Expression of many mammalian glycosyltransferases has been achieved involving expression in eukaryotic hosts which can involve expensive tissue culture media and only moderate yields of protein (Kleene et al. (1994) Biochem. Biophys. Res. Commun. 201: 160- 167; Williams et al. (1995) Glycoconjugate J. 12: 755-761). Expression in E. coli has been achieved for mammalian glycosyltransferases, but these attempts have produced mainly insoluble forms of the enzyme from which it has been difficult to recover active enzyme in large amounts (Aoki et al. (1990) EMBO. J. 9:3171-3178; Nishiu et al. (1995) Biosci. Biotech. Biochem. 59 (9): 1750-1752).

Therefore, a need exists for efficient, relatively inexpensive methods for producing glycosyltransferases in large scale. The present invention fulfils this and other needs.

SUMMARY OF THE INVENTION

The claimed invention provides, in a first embodiment, a method of expressing a glycosyltransferase in a host cell by introducing into the host cell a nucleic acid encoding the glycosyltransferase and incubating the host cell under conditions appropriate for expression of the glycosyltransferase, wherein the host cell substantially lacks a protease that cleaves polypeptides between two consecutive positively charged amino acid residues. In another embodiment, the invention provides a composition comprising a glycosyltransferase polypeptide wherein the composition does not contain significant amounts of proteolytically cleaved fragments of the glycosyltransferase polypeptide. The glycosyltransferase polypeptide is produced by the expressing the glycosyltransferase in a host cell that substantially lacks protease activity that cleaves at sites having two or more adjacent basic amino acid residues.

Another embodiment of the invention provides methods of expressing a glycosyltransferase in a host cell by introducing into the host cell a nucleic acid that comprises a nucleotide sequence encoding the glycosyltransferase, wherein the nucleic acid has been modified so as to eliminate from the nucleotide sequence one or more occurrences of two or more adjacent codons that each specify a positively charged amino acid residue. In a typical embodiment, the two adjacent positively charged amino acid residues are a cleavage site for a protease that is present in a desired host cell.

The invention also provides recombinant nucleic acids that include a nucleotide sequence encoding a polypeptide having glycosyltransferase activity, wherein the nucleic acid has been modified so as to eliminate from the nucleotide sequence one or more occurrences of two adjacent codons that each specify a positively charged amino acid residue. Polypeptides encoded by the claimed recombinant nucleic acids are also provided. In another embodiment, the claimed invention provides a method for the transfer of a monosaccharide from a donor substrate to an acceptor substrate. The transfer is effected by: (a) providing a reaction medium comprising at least one glycosyl-transferase, a donor substrate, an acceptor substrate and a soluble divalent metal cation; and (b) incubating the reaction medium for a period of time sufficient to complete said transfer, wherein the glycosyltransferase is prepared by the expressing the glycosyltransferase in a host cell that is deficient in a protease that cleaves the glycosyltransferase polypeptide at a recognition site that includes two or more adjacent basic amino acid residues. Alternatively, the glycosyltransferase can be expressed from a nucleic acid that encodes a glycosyltransferase and has been modified to remove one or more occurrences of such recognition sites from the glycosyltransferase coding region.

BRIEF DESCRIPTION OF THE DRAWINGS

Figure 1 shows expression levels of βl,4-galactosyltransferase IgtB and deletion mutants -15, -25, and -30 examined by SDS-PAGE of whole cell extracts. Lane 1, IgtB; Lane 2, lgtB-15; Lane 3, lgtB-25; Lane 4, lgtB-30. The arrow denotes where full length protein migrates.

Figure 2 shows expression levels of αl,4-galactosyltransferase IgtC and deletion mutants -19, -22, -25, and -28 examined by SDS-PAGE of whole cell extracts. Lane 1, IgtC; Lane 2, lgtC-19, Lane 3, lgtC-22; Lane 4, lgtC-25; Lane 5, ZgtC-28. The arrow denotes where full length protein migrates.

Figure 3 shows the results of an electrospray mass-spectrometry analysis of IgtC and IgtB. The upper panel shows a deconvoluted spectrum obtained for IgtC, where the major mass corresponds to 19 C-terminal residues missing. The lower panel shows the deconvoluted spectrum for the IgtB proteolysis fragments, corresponding to 42 and 28 C- terminal residues missing.

Figure 4 shows the degradation of IgtB from MC58 and 406 Y cells by the ompT protease as a function of time. Lane 1 is a sample of an extract from the ompT strain WA834 expressing IgtB from MC58, taken after 6 h; lane 2 is a sample taken 6 h after the extract from IgtB (MC58) in LG90 was prepared; lane 3 is a sample taken after 24 h from the IgtB (MC58) LG90 extract; lane 4 is a sample taken after 48 h. Lane 5 is a sample taken 6 h after the crude extract of 406 Y derived IgtB from LG90 was prepared; lane 6 was a csample taken 24 h after the IgtB (406Y) LG90 extract was prepared; lane 7 is a sample taken 48 h after the IgtB (406 Y) LG90 extract was prepared.

Figure 5 shows expression of IgtB as demonstrated by SDS-PAGE of IgtB extracts. Lane 1, CHAPS extract; Lane 2, CHAPS pellet; Lane 3, 100,000 x g supernatent ; Lane 4, 100,000 x g pellet.

Figure 6 shows the synthetic glycosyltransferase acceptors used for IgtB and IgtC assays. The upper structure is FCHASE-a inophenyl-GlcNAc, the acceptor used for IgtB assays; the lower structure is FCHASE-aminophenyl-lactose, the acceptor used for IgtC assays. Figure 7 shows the effect of DTT preincubation on activity of IgtC. The solid line is the reaction progress after pre-treatment of purified IgtC with DTT, the dashed line is the reaction progress without pre-treatment of purified IgtC with DTT. Pre-treatment was performed on enzyme at 200 μg/ml for 30 minutes at room temperature. Both reactions were performed in the presence of 5 mM DTT, with FCHASE-AP-Lac as the acceptor. Figure 8 presents a basic residue map for the C-terminal 50 residues of: IgtA,

IgtB, IgtC, and gtE (from Neisseria meningitidis GenBank U25839 and N. gonorrhoeae GenBank U 14554 ) and IgtD (from N. gonorrhoeae), IgtC, IgtD, IpsA, and Uc2a (from Haemophilus influenza Rd strain database from http://www.tigr.org/tdb/mdb/mdb.htmlj, IpsA (from Pasturella haemolytica GenBank PHU15958), and rfal/J (rrom Salmonella typhimurium EMBL X53847; G47884), FucT (from Helicobacter pylori GenBank

AF008596), cpsl4I and J (from Streptococcus pneumoniae CPS type 14, GenBank X85787). The arrows indicate ompT cleavage points observed in lgtC_Nm and lgtB_Nm, basic amino acids are in italics, and potential ompT cleavage sites are underlined.

DETAILED DESCRIPTION

Definitions

Glycosyltransferases produced using the claimed methods are useful for transferring a monosaccharide from a donor substrate to an acceptor sugar. The addition generally takes place at the non-reducing end of an oligosaccharide or carbohydrate moiety on a biomolecule. Biomolecules as defined here include but are not limited to biologically significant molecules such as proteins (e.g., glycoproteins), and lipids (e.g., glycolipids, phospholipids, sphingolipids and gangliosides).

The following abbreviations are used herein: Ara = arabinosyl; Fru = fructosyl;

Fuc = fucosyl;

Gal = galactosyl;

GalNAc = N-acetylgalacto;

Glc = glucosyl; GlcNAc = N-acetylgluco;

Man = mannosyl; and

NeuAc = sialyl (N-acetylneuraminyl). Oligosaccharides are considered to have a reducing end and a non-reducing end, whether or not the saccharide at the reducing end is in fact a reducing sugar. In accordance with accepted nomenclature, oligosaccharides are depicted herein with the non- reducing end on the left and the reducing end on the right.

All oligosaccharides described herein are described with the name or abbreviation for the non-reducing saccharide (e.g., Gal), followed by the configuration of the glycosidic bond (α or β), the ring bond, the ring position of the reducing saccharide involved in the bond, and then the name or abbreviation of the reducing saccharide (e.g., GlcNAc). The linkage between two sugars may be expressed, for example, as 2,3, 2->3, or (2,3). Each saccharide is a pyranose.

Much of the nomenclature and general laboratory procedures required in this application Can be found in Sambrook, et al. , Molecular Cloning: A Laboratory Manual (2nd Ed.), Vol. 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, 1989. The manual is hereinafter referred to as "Sambrook et al. "

The term "nucleic acid" refers to a deoxyribonucleotide or ribonucleotide polymer in either single- or double-stranded form, and unless otherwise limited, encompasses known analogues of natural nucleotides that hybridize to nucleic acids in manner similar to naturally occurring nucleotides. Unless otherwise indicated, a particular nucleic acid sequence includes the complementary sequence thereof. The term "operably linked" refers to functional linkage between a nucleic acid expression control sequence (such as a promoter, signal sequence, or array of transcription factor binding sites) and a second nucleic acid sequence, wherein the expression control sequence affects transcription and/or translation of the nucleic acid corresponding to the second sequence.

The term "recombinant" when used with reference to a cell indicates that the cell replicates a heterologous nucleic acid, or expresses a peptide or protein encoded by a heterologous nucleic acid. Recombinant cells can contain genes that are not found within the native (non-recombinant) form of the cell. Recombinant cells can also contain genes found in the native form of the cell wherein the genes are modified and re-introduced into the cell by artificial means. The term also encompasses cells that contain a nucleic acid endogenous to the cell that has been modified without removing the nucleic acid from the cell; such modifications include those obtained by gene replacement, site-specific mutation, and related techniques. A "heterologous sequence" or a "heterologous nucleic acid", as used herein, is one that originates from a source foreign to the particular host cell, or, if from the same source, is modified from its original form. Thus, a heterologous glycosyltransferase gene in a prokaryotic host cell includes a glycosyltransferase gene that is endogenous to the particular host cell that has been modified. Modification of the heterologous sequence may occur, e.g. , by treating the DNA with a restriction enzyme to generate a DNA fragment that is capable of being operably linked to the promoter. Techniques such as site-directed mutagenesis are also useful for modifying a heterologous sequence.

A "subsequence" refers to a sequence of nucleic acids or amino acids that comprise a part of a longer sequence of nucleic acids or amino acids (e.g., polypeptide) respectively.

A "recombinant expression cassette" or simply an "expression cassette" is a nucleic acid construct, generated recombinantly or synthetically, with nucleic acid elements that are capable of affecting expression of a structural gene in hosts compatible with such sequences. Expression cassettes include at least promoters and optionally, transcription termination signals. Typically, the recombinant expression cassette includes a nucleic acid to be transcribed (e.g., a nucleic acid encoding a desired polypeptide), and a promoter. Additional factors necessary or helpful in effecting expression may also be used as described herein. For example, an expression cassette can also include nucleotide sequences that encode a signal sequence that directs secretion of an expressed protein from the host cell. Transcription termination signals, enhancers, and other nucleic acid sequences that influence gene expression, can also be included in an expression cassette.

The term "isolated" is meant to refer to material which is substantially or essentially free from components which normally accompany the enzyme as found in its native state. Thus, when purified, the enzymes of the invention do not include materials normally associated with their in situ environment. Typically, isolated proteins of the invention are at least about 40% pure, usually at least about 60%, and preferably at least about 80% pure as measured by band intensity on a silver stained gel or other method for determining purity. Protein purity or homogeneity can be indicated by a number of means well known in the art, such as polyacrylamide gel electrophoresis of a protein sample, followed by visualization upon staining. For certain purposes high resolution will be needed and HPLC or a similar means for purification utilized.

The term "identical" in the context of two nucleic acids or polypeptide sequences refers to the residues in the two sequences which are the same when aligned for maximum correspondence. Optimal alignment of sequences for comparison can be conducted, e.g., by the local homology algorithm of Smith and Waterman (1981) Adv. Appl. Math. 2: 482, by the homology alignment algorithm of Needleman and Wunsch (1970) J. Mol. Biol. 48:443, by the search for similarity method of Pearson and Lipman (1988) Proc. Natl. Acad. Sci. USA 85: 2444, by computerized implementations of these algorithms (GAP, BESTFIT, FASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics Computer Group, 575 Science Dr., Madison, WI), or by inspection. An additional algorithm that is suitable for determining sequence similarity is the BLAST algorithm, which is described in Altschul et al. (1990) J. Mol. Biol. 215: 403- 410. Software for performing BLAST analyses is publicly available through the National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov/). This algorithm involves first identifying high scoring sequence pairs (HSPs) by identifying short words of length W in the query sequence that either match or satisfy some positive-valued threshold score T when aligned with a word of the same length in a database sequence. T is referred to as the neighborhood word score threshold (Altschul et al, supra.). These initial neighborhood word hits act as seeds for initiating searches to find longer HSPs containing them. The word hits are extended in both directions along each sequence for as far as the cumulative alignment score can be increased. Extension of the word hits in each direction are halted when: the cumulative alignment score falls off by the quantity X from its maximum achieved value; the cumulative score goes to zero or below, due to the accumulation of one or more negative-scoring residue alignments; or the end of either sequence is reached. The BLAST algorithm parameters W, T and X determine the sensitivity and speed of the alignment. The BLAST program uses as defaults a word length (W) of 11 , the BLOSUM62 scoring matrix (see Henikoff and Henikoff (1992) Proc. Natl. Acad. Sci. USA 89: 10915-10919) alignments (B) of 50, expectation (E) of 10, M=5, N=-4, and a comparison of both strands.

The BLAST algorithm performs a statistical analysis of the similarity between two sequences; see, e.g., Karlin and Altschul (1993) Proc. Nat 'I. Acad. Sci. USA 90: 5873-5787. One measure of similarity provided by the BLAST algorithm is the smallest sum probability (P(N)), which provides an indication of the probability by which a match between two nucleotide or amino acid sequences would occur by chance. For example, a nucleic acid is considered similar to a glycosyltransferase gene or cDNA if the smallest sum probability in a comparison of the test nucleic acid to a glycosyltransferase nucleic acid is less than about 1 , preferably less than about 0.1 , more preferably less than about 0.01 , and most preferably less than about 0.001.

The term "substantial identity" or "substantial similarity" in the context of a polypeptide indicates that a polypeptides comprises a sequence with at least 70% sequence identity to a^'reference sequence, or preferably 80%, or more preferably 85% sequence identity to the reference sequence, or most preferably 90% identity over a comparison window of about 10-20 amino acid residues. An indication that two polypeptide sequences are substantially identical is that one peptide is immunologically reactive with antibodies raised against the second peptide. Thus, a polypeptide is substantially identical to a second polypeptide, for example, where the two peptides differ only by a conservative substitution. An indication that two nucleic acid sequences are substantially identical is that the polypeptide which the first nucleic acid encodes is immunologically cross reactive with the polypeptide encoded by the second nucleic acid.

Another indication that two nucleic acid sequences are substantially identical is that the two molecules hybridize to each other under stringent conditions.

"Bind(s) substantially" refers to complementary hybridization between a probe nucleic acid and a target nucleic acid and embraces minor mismatches that can be accommodated by reducing the stringency of the hybridization media to achieve the desired detection of the target polynucleotide sequence. The phrase "hybridizing specifically to", refers to the binding, duplexing, or hybridizing of a molecule only to a particular nucleotide sequence under stringent conditions when that sequence is present in a complex mixture (e.g., total cellular) DNA or RNA. The term "stringent conditions" refers to conditions under which a probe will hybridize to its target subsequence, but to no other sequences. Stringent conditions are sequence-dependent and will be different in different circumstances. Longer sequences hybridize specifically at higher temperatures. Generally, stringent conditions are selected to be about 5°C lower than the thermal melting point (Tm) for the specific sequence at a defined ionic strength and pH. The Tm is the temperature (under defined ionic strength, pH, and nucleic acid concentration) at which 50% of the probes complementary to the target sequence hybridize to the target sequence at equilibrium. (As the target sequences are generally present in excess, at Tm,

50%) of the probes are occupied at equilibrium). Typically, stringent conditions will be those in which the salt concentration is less than about 1.0 M Na ion, typically about 0.01 to 1.0 M Na ion concentration (or other salts) at pH 7.0 to 8.3 and the temperature is at least about 30°C for short probes (e.g., 10 to 50 nucleotides) and at least about 60°C for long probes (e.g., greater than 50 nucleotides). Stringent conditions may also be achieved with the addition of destabilizing agents such as formamide.

The phrases "specifically binds to a protein" or "specifically immunoreactive with", when referring to an antibody refers to a binding reaction which is determinative of the presence of the protein in the presence of a heterogeneous population of proteins and other biologies. Thus, under designated immunoassay conditions, the specified antibodies bind preferentially to a particular protein and do not bind in a significant amount to other proteins present in the sample. Specific binding to a protein under such conditions requires an antibody that is selected for its specificity for a particular protein. A variety of immunoassay formats may be used to select antibodies specifically immunoreactive with a particular protein. For example, solid-phase ELISA immunoassays are routinely used to select monoclonal antibodies specifically immunoreactive with a protein. See Harlow and Lane (1988) Antibodies, A Laboratory Manual, Cold Spring Harbor Publications, New York, for a description of immunoassay formats and conditions that can be used to determine specific immunoreactivity.

A "conservative substitution", when describing a protein refers to a change in the amino acid composition of the protein that does not substantially alter the protein's activity. Thus, "conservatively modified variations" of a particular amino acid sequence refers to amino acid substitutions of those amino acids that are not critical for protein activity or substitution of amino acids with other amino acids having similar properties (e.g., acidic, basic, positively or negatively charged, polar or non-polar, etc.) such that the substitutions of even critical amino acids do not substantially alter activity. Conservative substitution tables providing functionally similar amino acids are well known in the art. The following six groups each contain amino acids that are conservative substitutions for one another:

1) Alanine (A), Serine (S), Threonine (T);

2) Aspartic acid (D), Glutamic acid (E); 3) Asparagine (N), Glutamine (Q);

4) Arginine (R), Lysine (K);

5) Isoleucine (I), Leucine (L), Methionine (M), Valine (V); and

6) Phenylalanine (F), Tyrosine (Y), Tryptophan (W).

See also, Creighton (1984) Proteins, W.H. Freeman and Company. In addition, individual substitutions, deletions or additions which alter, add or delete a single amino acid or a small percentage of amino acids in an encoded sequence are also "conservatively modified variations".

The practice of this invention involves the construction of recombinant nucleic acids and the expression of genes in transfected host cells. Molecular cloning techniques to achieve these ends are known in the art. A wide variety of cloning and in vitro amplification methods suitable for the construction of recombinant nucleic acids such as expression vectors are well-known to persons of skill. Examples of these techniques and instructions sufficient to direct persons of skill through many cloning exercises are found in Berger and Kimmel, Guide to Molecular Cloning Techniques, Methods in Enzymology volume 152 Academic Press, Inc., San Diego, CA (Berger); and Current Protocols in Molecular Biology, F.M. Ausubel et al., eds., Current Protocols, a joint venture between Greene Publishing Associates, Inc. and John Wiley & Sons, Inc., (1994 Supplement) (Ausubel).

Examples of protocols sufficient to direct persons of skill through in vitro amplification methods, including the polymerase chain reaction (PCR) the ligase chain reaction (LCR), Qβ-replicase amplification and other RNA polymerase mediated techniques are found in Berger, Sambrook, and Ausubel, as well as Mullis et al. (1987) U.S. Patent No. 4,683,202; PCR Protocols A Guide to Methods and Applications (Innis et al. eds) Academic Press Inc. San Diego, CA (1990) (Innis); Arnheim & Levinson (October 1, 1990) C&EN 36-47; The Journal Of NIH Research (1991) 3: 81-94; (Kwoh et al. (1989) Proc. Natl. Acad. Sci. USA 86: 1173; Guatelli et al. (1990) Proc. Natl. Acad. Sci. USA 87: 1874; Lomell et al.

(1989) J Clin. Chem. 35: 1826; Landegren et α/. (1988) Science 241 : 1077-1080; Van Brunt

(1990) Biotechnology 8: 291-294; Wu and Wallace (1989) Gene 4: 560; and Barringer et al. (1990) Gene 89: 117. Improved methods of cloning in vitro amplified nucleic acids are described in Wallace et al, U.S. Pat. No. 5,426,039. Abbreviations used herein include: APTS, 8-aminopyrene-l,3,6-trisulfonic acid; CE, capillary electrophoresis; CNBr, cyanogen bromide; LIF, laser induced fluorescence; FCHASE, 6-(5-fluorescein-carboxamido)-hexanoic acid succimidyl ester; IMAC, immobilized metal affinity chromatography; IPTG, isopropyl-1-thio-β-D- galactopyranoside; LOS, lipo-oligosaccharide; ESI-MS, electrospray ionization mass spectrometry; Neu5Ac, N-acetyl-neuraminic acid; Νeu5Gc, N-glycolyl-neuraminic acid; Νeu5Pr, N-propionyl-neuraminic acid; PCR, polymerase chain reaction; PNDF, polyvinyldiene difluoride; TMR, tetramethylrhodamine.

Description of the Preferred Embodiments The present invention provides methods for producing a glycosyltransferase at higher yields than were previously obtainable. Glycosyltransferases produced using the claimed methods are also provided, as are nucleic acids, recombinant host cells, and expression vectors that are useful for producing glycosyltransferases in high yields. The invention is based in part on the discovery that glycosyltransferase yields are often adversely affected by proteolysis that occurs at specific sites within glycosyltransferase polypeptides, in particular, in between two adjacent basic amino acid residues. Because such proteolysis sites often occur relatively near the carboxyl terminus of glycosyltransferase polypeptides, proteolysis at these sites was not recognized as an explanation for the failure of previous attempts to produce glycosyltransferases by recombinant expression. Thus, in one embodiment, the invention provides methods for producing a glycosyltransferase by expression of a nucleic acid encoding the glycosyltransferase in a host cell that lacks a protease that cleaves between two adjacent basic amino acid residues. In another embodiment, glycosyltransferases are produced by expressing a glycosyltransferase gene that has been modified to eliminate one or more occurrences of two adjacent codons that each specify a positively charged amino acid residue.

A. Glycosyltransferases

The claimed methods are useful for producing any glycosyltransferase that includes as part of its amino acid sequence two or more adjacent basic amino acid residues.

Such amino acid residues, which are positively charged at physiological pH, include lysine

(K), arginine (R) and histidine (H), most preferably K or R. Glycosyltransferases having two or more adjacent basic amino acid residues can be identified either experimentally or by inspection of the amino acid sequence of the particular glycosyltransferase. Many glycosyltransferases are known, as are their amino acid sequences. See, e.g., "The WWW

Guide To Cloned Glycosyltransferases,"

(nttp://bellatrix.pcl.ox.ac.uk people/iain glvcosyltransferase.html . Glycosyltransferase amino acid sequences and nucleotide sequences encoding glycosyltransferases from which the amino acid sequences can be deduced are also found in various publicly available databases, including GenBank, Swiss-Prot, EMBL, and others.

Glycosyltransferases for which the claimed methods are useful include, but are not limited to, galactosyltransferases, fucosyltransferases, glucosyltransferase, N- acetylgalactosaminyltransferases, N-acetylglucosaminyltransferases, glucuronyltransferases, sialyltransferases, mannosyltransferases, and oligosaccharyltransferases. These glycosyltransferases include those from both eukaryotes and prokaryotes. Many mammalian glycosyltransferases have been cloned and expressed and the recombinant proteins have been characterized in terms of donor and acceptor specificity and they have also been investigated through site directed mutagenesis in attempts to define residues involved in either donor or acceptor specificity (Aoki et al. (1990) EMBO. J. 9: 3171-3178; Harduin- Lepers et al. (1995) Glycobiology 5(8): 741-758; Natsuka and Lowe (1994) Current Opinion in Structural Biology 4: 683-691 ; Zu et al. (1995) Biochem. Biophys. Res. Comm. 206(1): 362-369; Seto et al. (1995) Eur. J. Biochem. 234: 323-328; Seto et al. (1997) J Biol. Chem. 272: 14133-141388). Prokaryotic glycosyltransferases are also amenable to production using the claimed methods. Such glycosyltransferases include enzymes involved in synthesis of lipooligosaccharides (LOS), which are produced by many gram negative bacteria. The LOS typically have terminal glycan sequences that mimic glycoconjugates found on the surface of human epithelial cells or in host secretions (Preston et al. (1996) Critical Reviews in Microbiology 23(3): 139-180). Such enzymes include, but are not limited to, the proteins of the rfa operons of species such as E. coli and Salmonella typhimurium, which include an l,6 galactosyltransferase and an α 1,6 galactosyltransferase (see, e.g., ΕMBL Accession Nos. M80599 and M86935 (E. coli); ΕMBL Accession No. S56361 (S. typhimurium)), a glucosyltransferase (Swiss-Prot Accession No. P25740 (E. coli), an l,2-glucosyltransferase (r/αJ)(Swiss-Prot Accession No. P27129 (E. coli) and Swiss-Prot Accession No. P 19817 (S. typhimurium)), and an αl,2-N-acetylglucosaminyltransferase (r/άK)(ΕMBL Accession No. U00039 (E. coli). Other glycosyltransferases for which amino acid sequences are known include those that are encoded by operons such as rfάB, which have been characterized in organisms such as Klebsiella pneumoniae, E. coli, Salmonella typhimurium, Salmonella enterica, Yersinia enterocolitica, Mycobacterium leprosum, and the rhl operon of Pseudomonas aeruginosa.

The claimed methods are also useful for producing glycosyltransferases that are involved in producing structures containing lacto-N-neotetraose, D-galactosyl-β-l,4-N- acetyl-D-glucosaminyl-β-l,3-D-galactosyl-β-l,4-D-glucose, and the P^k blood group trisaccharide sequence, D-galactosyl-α-l,4-D-galactosyl-β-l,4-D-glucose, which have been identified in the LOS of the mucosal pathogens Neisseria gonnorhoeae and N. meningitidis (Scholten et al. (1994) J Med. Microbiol. 41 : 236-243). The genes from N. meningitidis and N. gonorrhoeae that encode the glycosyltransferases involved in the biosynthesis of these structures have been identified from N. meningitidis immunotypes L3 and LI (Jennings et al. (1995) Mol. Microbiol. 18: 729-740) and the N. gonorrhoeae mutant F62 (Gotshlich (1994) J. Exp. Med. 180: 2181-2190). In N. meningitidis, a locus consisting of 3 genes, IgtA, IgtB and Igt E, encodes the glycosyltransferase enzymes required for addition of the last three of the sugars in the lacto-N-neotetraose chain (Jennings et al., supra.; Wakarchuk et al. (1996) J. Biol. Chem. 271: 19166-73). Recently the enzymatic activity of the IgtB and IgtA gene product was demonstrated, providing the first direct evidence for their proposed glycosyltransferase function (Wakarchuk et al. (1996) J. Biol. Chem. 271 (45): 28271-276). In N. gonorrhoeae, there are two additional genes, IgtD which adds β-D-GalΝAc to the 3 position of the terminal galactose of the lacto-N-neotetraose structure and IgtC which adds a terminal α-D-Gal to the lactose element of a truncated LOS, thus creating the P^k blood group antigen structure (Gotshlich (1994), supra.). In N. meningitidis, a separate immunotype LI also expresses the P^k blood group antigen and has been shown to carry an IgtC gene

(Jennings et al. (1995), supra.). Neisseria glycosyltransferases and associated genes are also described in USPΝ 5,545,553 (Gotschlich).

B. Expression of Glycosyltransferases in Protease-Defϊcient Host Cells

In one embodiment the invention provides methods of expressing a glycosyltransferase in a host cell that lacks a protease that is capable of cleaving polypeptides between two adjacent basic amino acid residues. This method involves the use of a protease-deficient host cell to express the glycosyltransferase. For example, a nucleic acid encoding the glycosyltransferase can be introduced into the host cell, which is then incubated under conditions appropriate for expression of the glycosyltransferase. Alternatively, a cell that naturally produces a desired glycosyltransferase can be made protease deficient and used for production and purification of greater amounts of the glycosyltransferase than were otherwise achievable. By expressing the glycosyltransferase in a protease deficient host cell, proteolysis of the glycosyltransferase polypeptide at adjacent basic amino acid residues is eliminated, thus permitting recovery of active glycosyltransferase polypeptide. The host cells used in this embodiment of the invention substantially lack proteolytic enzymes for which the proteolysis recognition site is two adjacent basic amino acid residues. Typically, these proteolytic enzymes cleave the peptide bond between two basic amino acid residues. One example of a protease that cleaves between two adjacent basic amino acid residues, and thus is not present in significant amounts in host cells used in the claimed methods, is omptin, which is the product of ompT in E. coli. The nucleotide sequence of ompT is described in Grodberg et al. (1988) Nucleic Acids Res. 16: 1209. See also, Sugimura and Nishihara (1988) J. Bacteriol. 170: 5625-5632. The Kex2 proteinase of Saccharomyces cerevisiae is another example of a proteolytic enzyme that cleaves at two adjacent basic amino acid residues (Fuller et al. (1991) In Advances in Protein Sequence

Analysis, Jornvall et al., ed.). Human proteases that cleave at two adjacent basic amino acid residues include furin and PC2.

Protease deficient host cells that are suitable for use in the claimed methods are known to those of skill in the art for several species. For example, ompT-deficient E. coli strains are commercially available (Novagen, Inc., Madison WI). Suitable E. coli hosts include, for example, BL21, BL21(DΕ3), and other ompT-deficient E. coli strains. If an otherwise desirable host strain contains a protease that cleaves at two adjacent basic amino acid residues, one can eliminate the protease from the host strain by mutagenesis techniques, or can employ an inhibitor of the particular protease. To identify host cells that do not contain a protease which cleaves at a site that comprises two adjacent basic amino acid residues, one can test a cell extract for proteolytic activity using assays that are known to those of skill in the art. The extract can be from the extracellular medium, the periplasmic space, or can be an intracellular extract or a membrane preparation, or a combination thereof. Suitable assays for proteolytic enzymes are described in, for example, Barrett, Εd., Methods in Enzymology, Vol. 248, Academic

Press, 1995 and Sarath et al. (1989) In: Proteolytic Enzymes, A Practical Approach, Beynon, R.J. and Bond, J.S., eds., IRL Press, Oxford. One example of a sensitive assay for proteolytic enzymes is the PepTag Protease Assay™ (Promega, Madison WI), which uses charged, dye-linked peptides as proteolytic substrates. Proteolysis of the peptides alters their size and charge in a manner that is amenable to monitoring with agarose gel electrophoresis. For detection of a protease activity that cleaves between two positively charged amino acid residues, this assay uses the "Al Peptide," which has the sequence (Dye)-Leu-Arg-Arg-Ala- Ser-Leu-Gly. The intact peptide has a net charge of +1. Cleavage between the second arginine and the C-terminus glycine yields a FT peptide that still has a net charge of +1, but will differ in electrophoretic mobility from the intact peptide due to a decrease in molecular weight. Cleavage between the arginine residues, which is indicative of the presence of a protease that is not desirable in host cells used in this embodiment of the invention, yields a neutral FI species, while further proteolysis results in a negatively charged F2 species. The PepTag Protease Assay™ can be carried out as follows to determine whether a host cell contains a protease that cleaves between two adjacent basic amino acid residues. The Al peptide substrate is added to a sample containing an extract from the prospective host cell. The assay mixture is initially incubated at room temperature for 30 minutes. Incubations can be extended for 18 hours or longer in order to detect very low levels of protease. The sample is then loaded onto a 0.8% agarose gel with the wells located in the center of the gel. The different peptide species are rapidly separated by electrophoresis in 15-30 minutes. The dye- linked peptides appear pink under normal lighting conditions and results of the assay can be observed directly. For increased sensitivity, a UN transilluminator can be used. Detection using Peptide Al depends more on a change in size rather than charge. Further details regarding this assay are available from the manufacturer of the PepTag Protease Assay™ kit. Suitable host cells for this embodiment of the invention substantially lack proteolytic enzymes having the described activity. A cell is said to "substantially lack" a protease if a polypeptide produced by the cell, or placed in a cellular extract (cell supernatant fluid, periplasmic extract, membrane fraction, intracellular extract, or a combination thereof), is not appreciably cleaved at occurrences of two adjacent basic amino acid residues. Generally, less than about 10% of these sites will be cleaved by the cell extract under suitable assay conditions; more preferably, less than about 5% of recognition sites that include two adjacent basic amino acid residues are cleaved. Most preferably, less than about 1%) of the potential cleavage sites for the particular protease are cleaved; in preferred embodiments none of these potential protease cleavage sites are cleaved.

C. Glycosyltransferases Modified to Remove Protease Cleavage Sites In another embodiment, the invention provides glycosyltransferases, and genes encoding glycosyltransferases, that do not contain one or more occurrences of two adjacent positively charged amino acid residues that are present in the corresponding naturally occurring glycosyltransferase polypeptide. Typically, the two adjacent positively charged amino acid residues that are eliminated are a protease cleavage site. Also provided are methods of expressing a glycosyltransferase in a host cell by introducing into the host cell a nucleic acid that encodes a glycosyltransferase polypeptide that lacks proteolysis sites consisting of two adjacent basic amino acid residues. The glycosyltransferase-encoding nucleic acids used in this embodiment are modified so as to eliminate from the nucleotide sequence one or more occurrences of two adjacent codons for positively charged amino acid residues. Glycosyltransferase nucleic acids that can be modified according to the claimed methods, and methods of obtaining such nucleic acids, are known to those of skill in the art. Glycosyltransferase nucleic acids (e.g., cDNA, genomic, or subsequences (probes)) can be cloned, or amplified by in vitro methods such as the polymerase chain reaction (PCR), the ligase chain reaction (LCR), the transcription-based amplification system (TAS), the self-sustained sequence replication system (SSR). A wide variety of cloning and in vitro amplification methodologies are well-known to persons of skill. Examples of these techniques and instructions sufficient to direct persons of skill through many cloning exercises are found in Berger and Kimmel, Guide to Molecular Cloning Techniques, Methods in Enzymology 152 Academic Press, Inc., San Diego, CA (Berger); Sambrook et al. (1989) Molecular Cloning - A Laboratory Manual (2nd ed.) Vol. 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor Press, NY, (Sambrook et al.); Current Protocols in Molecular Biology, F.M. Ausubel et al., eds., Current Protocols, a joint venture between Greene Publishing Associates, Inc. and John Wiley & Sons, Inc., (1994 Supplement) (Ausubel); Cashion et al., U.S. patent number 5,017,478; and Carr, European Patent No. 0,246,864. Examples of techniques sufficient to direct persons of skill through in vitro amplification methods are found in Berger, Sambrook, and Ausubel, as well as Mullis et al., (1987) U.S. Patent No. 4,683,202; PCR Protocols A Guide to Methods and Applications (Innis et al. eds) Academic Press Inc. San Diego, CA (1990) (Innis); Arnheim & Levinson (October 1, 1990) 0&EN36-47; The Journal Of NIH Research (1991) 3: 81-94; (Kwoh et al. (1989) Proc. Natl. Acad. Sci. USA 86: 1173; Guatelli et al. (1990) Proc. Natl. Acad. Sci. USA 87, 1874; Lomell et al. (1989) J. Clin. Chem., 35: 1826; Landegren et al. , (1988) Science, 241: 1077-1080; Van Brunt (1990) Biotechnology, 8: 291-294; Wu and Wallace, (1989) Gene, 4: 560; and Barringer et al. (1990) Gene, 89: 117.

DNA encoding the glycosyltransferase proteins or subsequences of this invention can be prepared by any suitable method as described above, including, for example, cloning and restriction of appropriate sequences or direct chemical synthesis by methods such as the phosphotriester method of Narang et al. (1979) Meth. Enzymol. 68: 90-99; the phosphodiester method of Brown et al. (1979) Meth. Enzymol. 68: 109-151; the diethylphosphoramidite method of Beaucage et al. (1981) Tetra. Lett, 22: 1859-1862; and the solid support method of U.S. Patent No. 4,458,066. In one preferred embodiment, a nucleic acid encoding a glycosyltransferase can be isolated by routine cloning methods. A nucleotide sequence of a glycosyltransferase as provided in, for example, GenBank or other sequence database can be used to provide probes that specifically hybridize to a glycosyltransferase gene in a genomic DNA sample, or to a glycosyltransferase mRNA in a total RNA sample (e.g., in a Southern or Northern blot). Once the target glycosyltransferase nucleic acid is identified, it can be isolated according to standard methods known to those of skill in the art (see, e.g., Sambrook et al. (1989) Molecular Cloning: A Laboratory Manual, 2nd Ed, Vols. 1-3, Cold Spring Harbor Laboratory; Berger and Kimmel (1987) Methods in Enzymology, Vol. 152: Guide to Molecular Cloning Techniques, San Diego: Academic Press, Inc.; or Ausubel et al. (1987) Current Protocols in Molecular Biology, Greene Publishing and Wiley-Interscience, New York).

A glycosyltransferase nucleic acid can also be cloned by detecting its expressed product by means of assays based on the physical, chemical, or immunological properties. For example, one can identify a cloned glycosyltransferase nucleic acid by the ability of a polypeptide encoded by the nucleic acid to catalyze the transfer of a monosaccharide from a donor to an acceptor moiety. In a preferred method, capillary electrophoresis is employed to detect the reaction products. This highly sensitive assay involves using either monosaccharide or disaccharide aminophenyl derivatives which are labeled with fluorescein as described in Fig. 6 and Wakarchuk et al. (1996) J Biol. Chem. 271 (45): 28271-276. For example, to assay for a Neisseria IgtC enzyme, either FCHASE- AP-Lac or FCHASE-AP-Gal can be used, whereas for the Neisseria IgtB enzyme an appropriate reagent is FCHASE-AP-GlcNAc (Fig. 6). Chemical synthesis produces a single stranded oligonucleotide. This can be converted into double stranded DNA by hybridization with a complementary sequence, or by polymerization with a DNA polymerase using the single strand as a template. One of skill would recognize that while chemical synthesis of DNA is limited to sequences of about 100 bases, longer sequences can be obtained by the ligation of shorter sequences.

Alternatively, subsequences can be cloned and the appropriate subsequences cleaved using appropriate restriction enzymes. The fragments can then be ligated to produce the desired DNA sequence.

In one embodiment, glycosyltransferase nucleic acids can be cloned using DNA amplification methods such as polymerase chain reaction (PCR). Thus, for example, the nucleic acid sequence or subsequence is PCR amplified, using a sense primer containing one restriction site (e.g., Ndeϊ) and an antisense primer containing another restriction site (e.g., H dIII). This will produce a nucleic acid encoding the desired glycosyltransferase sequence or subsequence and having terminal restriction sites. This nucleic acid can then be easily ligated into a vector containing a nucleic acid encoding the second molecule and having the appropriate corresponding restriction sites. Suitable PCR primers can be determined by one of skill in the art using the sequence information provided in GenBank or other sources. Appropriate restriction sites can also be added to the nucleic acid encoding the glycosyltransferase protein or protein subsequence by site-directed mutagenesis. The plasmid containing the glycosyltransferase sequence or subsequence is cleaved with the appropriate restriction endonuclease and then ligated into an appropriate vector for amplification and/or expression according to standard methods.

Other physical properties of a polypeptide expressed from a particular nucleic acid can be compared to properties of known glycosyltransferases to provide another method of identifying glycosyltransferase-encoding nucleic acids. Alternatively, the putative Igt gene can be mutated, and its role as a glycosyltransferase established by detecting a variation in the structure of the oligosaccharide of LOS. As an alternative to cloning a glycosyltransferase gene, a glycosyltransferase nucleic acid can be chemically synthesized from a known sequence that encodes a glycosyltransferase. The glycosyltransferase nucleic acids used in this embodiment of the claimed invention are altered to remove from the nucleotide sequence encoding the glycosyltransferase one or more occurrences of two or more adjacent codons that each specify a positively charged amino acid residue. For example, one can change a nucleotide in one or more of the codons so that the modified codon or codons specify an amino acid residue that is not basic. Alternatively, or in addition, one or more of the codons can be deleted from the nucleotide sequence. Preferably, the substitution, deletion, or other modification will not substantially disrupt the activity of the glycosyltransferase expressed from the modified nucleotide sequence.

One of skill will recognize many ways of generating alterations in a given nucleic acid construct. Such well-known methods include site-directed mutagenesis, PCR amplification using degenerate oligonucleotides, exposure of cells containing the nucleic acid to mutagenic agents or radiation, chemical synthesis of a desired oligonucleotide (e.g., in conjunction with ligation and/or cloning to generate large nucleic acids) and other well- known techniques. See, e.g., Giliman and Smith (1979) Gene 8:81-97, Roberts et al. (1987) Nature 328: 731-734 and Sambrook, Innis, Ausubel, Berger, and Mullis (all supra). Modified glycosyltransferase nucleic acids can be tested to determine whether the encoded glycosyltransferase lacks a particular protease site by expressing the glycosyltransferase and assaying for susceptibility to proteolysis by a protease using methods described herein or otherwise known to those of skill in the art.

Glycosyltransferase nucleic acids that have been modified so as to eliminate from the glycosyltransferase-encoding nucleotide sequence one or more occurrences of two adjacent codons that each specify a positively charged amino acid residue are also provided by the invention. Glycosyltransferases expressed using these modified nucleic acids, by virtue of the modification or modifications, lack at least one protease recognition site that is present in the polypeptide encoded by the unmodified native glycosyltransferase gene. In another embodiment, the nucleic acids are modified to delete from the glycosyltransferase- encoding nucleotide sequence the codon for the second of two adjacent positively charged amino acid residues, along with some or all of the codons that are to the carboxy terminus from the second basic amino acid residue. For example, the invention provides a modified nucleic acid encoding a Neisseria meningitidis IgtC, wherein the nucleic acid does not include codons for the carboxyl terminal 19 amino acids of a polypeptide encoded by an unmodified, native IgtC gene. Also provided by the invention are expression cassettes that include the claimed glycosyltransferase nucleic acids as well as a promoter that is functional in the desired host cell. The expression cassettes can also include other sequences involved in transcription, translation, and posttranslational modification of the glycosyltransferase. Such sequences are described in more detail below.

Expression vectors, and host cells that comprise the claimed recombinant nucleic acids, are also provided by the invention.

The invention also provides glycosyltransferase polypeptides encoded by the claimed recombinant nucleic acids. Such polypeptides, and those produced using protease- deficient host cells as described above, typically are devoid of significant amounts of proteolytically cleaved fragments of the glycosyltransferase polypeptide.

D. General Methods for Expression of Glycosyltransferases

In a preferred embodiment, the glycosyltransferase proteins or subsequences thereof, are synthesized using recombinant DNA methodology. Generally this involves creating a DNA sequence that encodes the glycosyltransferase, modified as desired, placing the DNA in an expression cassette under the control of a particular promoter, expressing the protein in a host, isolating the expressed protein and, if required, renaturing the protein.

Glycosyltransferases can be expressed in a variety of host cells, including E. coli, other bacterial hosts, yeasts, filamentous fungi, and various higher eukaryotic cells such as the COS, CHO and HeLa cells lines and myeloma cell lines. Techniques for gene expression in microorganisms are described in, for example, Smith, Gene Expression in Recombinant Microorganisms (Bioprocess Technology, Vol. 22), Marcel Dekker, 1994.

Examples of useful bacteria include, but are not limited to, Escherichia, Enterobacter, Azotobacter, Erwinia, Bacillus, Pseudomonas, Klebsielia, Proteus, Salmonella, Serratia, Shigella, Rhizobia, Vitreoscilla, and Paracoccus. Filamentous fungi that are useful as expression hosts include, for example, the following genera: Aspergillus, Trichoderma, Neurospora, Penicillium, Cephalosporium, Achlya, Podospora, Mucor, Cochliobolus, and Pyricularia. See, e.g., US Patent No. 5,679,543 and Stahl and Tudzynski, Eds., Molecular Biology in Filamentous Fungi, John Wiley & Sons, 1992. Synthesis of heterologous proteins in yeast is well known and described in the literature. Methods in Yeast Genetics, Sherman, F., et al. , Cold Spring Harbor Laboratory, (1982) is a well recognized work describing the various methods available to produce the enzymes in yeast.

The recombinant protein gene will be operably linked to appropriate expression control sequences for each host. For E. coli this includes a promoter such as the T7, trp, or lambda promoters, a ribosome binding site and preferably a transcription termination signal. For eukaryotic cells, the control sequences will include a promoter and preferably an enhancer, a polyadenylation sequence, and may include splice donor and acceptor sequences. Suitable expression control sequences for use in mammalian, fungal, and yeast expression are known to those of skill in the art. Examples of promoters for use in yeast include GAL1,10 (Johnson, M., and Davies, R.W., 1984, Mol. and Cell. Biol. , 4:1440-1448) ADH2 (Russell, D., et al. 1983, J. Biol. Chem. , 258:2674-2682), PH05 (EMBO J. 6:675-680, 1982), and MFαl (Herskowitz, I. and Oshima, Y., 1982, in THE MOLECULAR BIOLOGY OF THE YEAST SACCHA OMYCES, (eds. Strathern, J.N. Jones, E.W., and Broach, J.R., Cold Spring Harbor Lab., Cold Spring Harbor, N.Y., pp. 181-209. A multicopy plasmid with a selective marker such as Leu-2, URA-3, Trp-1, and His-3 is also desirable. For filamentous fungi such as, for example, strains of the fungi Aspergillus (McKnight et al , U.S. Patent No. 4,935,349), examples of useful promoters include those derived from Aspergillus nidulans glycolytic genes, such as the ADH3 promoter (McKnight et al. , EMBO J. 4: 2093-2099 (1985)) and the tpiA promoter. An example of a suitable terminator is the ADH3 terminator (McKnight et al.).

The plasmids of the invention can be transferred into the chosen host cell by well-known methods such as calcium chloride transformation for E. coli and calcium phosphate treatment or electroporation for mammalian cells. Cells transformed by the plasmids can be selected by resistance to antibiotics conferred by genes contained on the plasmids, such as the amp, gpt, neo and hyg genes. Techniques for transforming fungi are well known in the literature, and have been described, for instance, by Beggs, Hinnen et al. (Proc. Natl. Acad. Sci. USA 75: 1929-1933 (1978)), Yelton et al. (Proc. Natl. Acad. Sci. USA 81: 1740-1747 (1984)), and Russell (Nature 301 : 167-169 (1983)). At least two procedures are used in transforming yeast cells. In one case, yeast cells are first converted into protoplasts using zymolyase, lyticase or glusulase, followed by addition of DNA and polyethylene glycol (PEG). The PEG-treated protoplasts are then regenerated in a 3% agar medium under selective conditions. Details of this procedure are given in the papers by J.D. Beggs, 1978, Nature (London), 275:104-109; and Hinnen, A., et al , 1978, Proc. Natl Acad. Sci. USA, 75:1929-1933. The second procedure does not involve removal of the cell wall. Instead the cells are treated with lithium chloride or acetate and PEG and put on selective plates (Ito, H., et al , 1983, J. Bad. , 153:163-168).

Once expressed, the recombinant glycosyltransferase proteins can be used for synthesis of carbohydrates, in either purified or unpurified states. For example, the recombinantly modified organisms can used in fermentation, or cell lysates that contain the glycosyltransferase can be used for carbohydrate synthesis. In some embodiments, the recombinant glycosyltransferases are purified, either partially or substantially to homogeneity, according to standard procedures of the art, such as, for example, ammonium sulfate precipitation, affinity columns, column chromatography, gel electrophoresis and the like (see, generally, R. Scopes, Protein Purification, Springer- Verlag, N.Y. (1982), Deutscher, Methods in Enzymology Vol. 182: Guide to Protein Purification., Academic Press, Inc. N.Y. (1990)). Once purified, partially or to homogeneity as desired, the polypeptides may then be used (e.g., for carbohydrate synthesis or as immunogens for antibody production).

In some embodiments, host cells that lack a protease that cleaves a glycosyltransferase polypeptide, or that contain a glycosyltransferase gene that lacks codons for a protease cleavage site that is found in the native glycosyltransferase, are used to produce a desired oligosaccharide by fermentation. The host cells are typically grown in the presence of suitable media which provides for growth and proliferation of the host cells (e.g., water, oxygen (for aerobic fermentations), carbon source, nitrogen, minerals, necessary vitamins, and the like), and which also contains necessary substrates, or precursors of substrates, for biosynthesis of the desired oligosaccharide.

One of skill in the art would recognize that after chemical synthesis, biological expression, or purification, the glycosyltransferase protein(s) may possess a conformation substantially different than the native conformations of the constituent polypeptides. In this case, it may be necessary to denature and reduce the polypeptide and then to cause the polypeptide to re-fold into the preferred conformation. Methods of reducing and denaturing proteins and inducing re-folding are well known to those of skill in the art (See, Debinski et al. (1993) J. Biol. Chem., 268: 14065-14070; Kreitman and Pastan (1993) Bioconjug. Chem., 4: 581-585; and Buchner, et al., (1992) Anal. Biochem., 205: 263- 270). Debinski et al., for example, describe the denaturation and reduction of inclusion body proteins in guanidine-DTE. The protein is then refolded in a redox buffer containing oxidized glutathione and L-arginine.

One of skill would recognize that modifications can be made to the glycosyltransferase proteins without diminishing their biological activity. Some modifications may be made to facilitate the cloning, expression, or incorporation of the targeting molecule into a fusion protein. Such modifications are well known to those of skill in the art and include, for example, a methionine added at the amino terminus to provide an initiation site, or additional amino acids (e.g., poly His) placed on either terminus to create conveniently located restriction sites or termination codons or purification sequences.

The vectors used for expression of glycosyltransferases according to the claimed methods can also contain a nucleic acid sequence that enables the vector to replicate independently in one or more selected host cells. Generally, this sequence is one that enables the vector to replicate independently of the host chromosomal DNA, and includes origins of replication or autonomously replicating sequences. Such sequences are well known for a variety of bacteria. For instance, the origin of replication from the plasmid pBR322 is suitable for most Gram-negative bacteria. Alternatively, the vector can replicate by becoming integrated into the host cell genomic complement and being replicated as the cell undergoes DNA replication.

The vectors can also comprise selectable marker genes to allow selection of host cells bearing the desired construct. These genes can encode a gene product, such as a protein, necessary for the survival or growth of transformed host cells grown in a selective culture medium. Host cells not transformed with the vector containing the selection gene will not survive in the culture medium. Typical selection genes encode proteins that confer resistance to antibiotics or other toxins, such as ampicillin, neomycin, kanamycin, chloramphenicol, or tetracycline. Alternatively, selectable markers may encode proteins that complement auxotrophic deficiencies or supply critical nutrients not available from complex media, e.g., the gene encoding D-alanine racemase for Bacilli. A number of selectable markers are known to those of skill in the art and are described for instance in Sambrook et al, supra. A preferred selectable marker for use in bacterial cells is a kanamycin resistance marker (Vieira and Messing, Gene 19: 259 (1982)). Use of kanamycin selection is advantageous over, for example, ampicillin selection because ampicillin is quickly degraded by β-lactamase in culture medium, thus removing selective pressure and allowing the culture to become overgrown with cells that do not contain the vector.

Construction of suitable vectors containing one or more of the above listed components employs standard ligation techniques as described in the reference cited above. Isolated plasmids or DNA fragments are cleaved, tailored, and re-ligated in the form desired to generate the plasmids required. To confirm correct sequences in plasmids constructed, the plasmids can be analyzed by standard techniques such as by restriction endonuclease digestion, and/or sequencing according to known methods.

E. Uses of Glycosyltransferases

The invention also provides methods of using glycosyltransferases produced using the methods described herein to prepare oligosaccharides (which are composed of two or more saccharides. The glycosyltransferase reactions of the invention take place in a reaction medium comprising at least one glycosyltransferase, a donor substrate, an acceptor sugar and typically a soluble divalent metal cation. The methods rely on the use of a glycosyl transferase to catalyze the addition of a saccharide to a substrate saccharide. For example, the invention provides methods for adding GalNAc or GlcNAc to Gal, in a β 1,3 linkage, by contacting a reaction mixture comprising an activated GalNAc or GlcNAc with an acceptor moiety that includes a Gal residue in the presence of a GalNAc transferase or GlcNAc transferase that has been prepared according to the methods described herein. . Also provided are methods of using galactosyltransferases prepared as described herein to add Gal in a β 1,4 linkage, an α 1,4 linkage, or a β 1,3 linkage to a saccharide that includes a GlcNAc or Glc residue. The methods involve contacting a reaction mixture that contains a galactosyltransferase and an activated Gal with an acceptor moiety that has a GlcNAc or Glc residue. One example of such an oligosaccharide for which the invention provides a method of synthesis is lacto-N-neotetraose, Galβ(l-4)- GlcNAcβ(l-3)-Galβ(l-4)-Glc (formula I). See, e.g., Min- Yuan Chou et al. (1996) J. Biol. Chem. 271 (32): 19166-19173.

Formula I

In some embodiments, the glycosyltransferase is a fucosyltransferase. A number of fucosyltransferases are known to those of skill in the art. Briefly, fucosyltransferases include any of those enzymes which transfer L-fucose from GDP-fucose to a hydroxy position of an acceptor sugar. In some embodiments, for example, the acceptor sugar is a GlcNAc in a βGal(l-»4)βGlcNAc group in an oligosaccharide glycoside. Suitable fucosyltransferases for this reaction include the known βGal(l- 3,4)βGlcNAc α(l-»3,4)fucosyltransferase (FTIII E.C. No. 2.4.1.65) which is obtained from human milk (see, Palcic, et al, Carbohydrate Res. 190:1-11 (1989); Prieels, et al, J. Biol. Chem. 256:10456-10463 (1981); and Nunez, et al, Can. J. Chem. 59:2086-2095 (1981)) and the βGal(l→4)βGlcNAc α(l→3)fucosyltransferases (FTIV, FTV, FTVI, and FTNII, E.C. No. 2.4.1.65) which are found in human serum. A recombinant form of βGal(l— >3,4)βGlcNAc α(l-»3,4)fucosyltransferase is also available (see, Dumas, et al, Bioorg. Med. Letters 1 :425-428 (1991) and Kukowska-Latallo, et al, Genes and Development 4:1288-1303 (1990)). Other exemplary fucosyltransferases include αl,2 fucosyltransferase (E.C. No. 2.4.1.69). Enzymatic fucosylation may be carried out by the methods described in Mollicone, et al, Eur. J. Biochem. 191 :169-176 (1990) or U.S. Patent No. 5,374,655. In another group of embodiments, the glycosyltransferase is a galactosyltransferase. When a galactosyltransferase is used, the reaction medium will preferably contain, in addition to the galactosyltransferase, donor substrate, acceptor sugar and divalent metal cation, a donor substrate recycling system comprising at least one mole of glucose- 1 -phosphate per each mole of acceptor sugar, a phosphate donor, a kinase capable of transferring phosphate from the phosphate donor to nucleoside diphosphates, and a pyrophosphorylase capable of forming UDP-glucose from UTP and glucose- 1 -phosphate and catalytic amounts of UDP and a UDP-galactose-4-epimerase. Exemplary galactosyltransferases include α(l,3) galactosyltransferase (E.C. No. 2.4.1.151, see, e.g., Dabkowski et al, Transplant Proc. 25:2921 (1993) and Yamamoto et al. Nature 345:229- 233 (1990)) and α(l,4) galactosyltransferase (E.C. No. 2.4.1.38).

In another group of embodiments, the glycosyltransferase prepared as described herein can be used in combination with an additional glycosyltransferase. For example, one can use a combination of sialyltransferase and galactosyltransferases. In this group of embodiments, the enzymes and substrates can be combined in an initial reaction mixture, or preferably the enzymes and reagents for a second glycosyltransferase reaction can be added to the reaction medium once the first glycosyltransferase reaction has neared completion. By conducting two glycosyltransferase reactions in sequence in a single vessel, overall yields are improved over procedures in which an intermediate species is isolated. Moreover, cleanup and disposal of extra solvents and by-products is reduced.

Glycosyltransferases produced using the claimed methods are also useful in glycosyltransferase cycles. A number of glycosyltransferase cycles (for example, sialyltransferase cycles, galactosyltransferase cycles, and fucosyltransferase cycles) are described in U.S. Patent No. 5,374,541 and WO 9425615 A. Other glycosyltransferase cycles for which the glycosyltransferases of the invention are useful are described in Ichikawa et al. J. Am. Chem. Soc. 114:9283 (1992), Wong et al. J. Org. Chem. 57: 4343 (1992), DeLuca, et al, J. Am. Chem. Soc. 117:5869-5870 (1995), and Ichikawa et al. In Carbohydrates and Carbohydrate Polymers. Yaltami, ed. (ATL Press, 1993). One of skill in the art will understand that other glycosyltransferases can be substituted into similar transferase cycles as have been described in detail for the sialyltransferases, galactosyltransferases, and fucosyltransferases. In particular, the glycosyltransferase can also be, for instance, glucosyltransferases, e.g., Alg8 (Stagljov et al, Proc. Natl. Acad. Sci. USA 91:5977 (1994)) or Alg5 (Heesen et al. Eur. J. Biochem. 224:71 (1994)), N-acetylgalactosaminyltransferases such as, for example, α(l,3) N- acetylgalactosaminyltransferase, β(l,4) N-acetylgalactosaminyltransferases (Nagata et al J. Biol. Chem. 267:12082-12089 (1992) and Smith et al. J. Biol Chem. 269:15162 (1994)) and polypeptide N-acetylgalactosaminyltransferase (Homa et al. J. Biol Chem. 268:12609 (1993)). Suitable N-acetylglucosaminyltransferases include GnTI (2.4.1.101, Hull et al, BBRC 176:608 (1991)), GnTII, and GnTIII (Ihara et al. J. Biochem. 113:692 (1993)), GnTV (Shoreiban et /. J. Biol. Chem. 268: 15381 (1993)), O-linked N- acetylglucosaminyltransferase (Bierhuizen et al. Proc. Natl. Acad. Sci. USA 89:9326 (1992)), N-acetylglucosamine-1 -phosphate transferase (Rajput et al. Biochem J285:985 (1992), and hyaluronan synthase. Suitable mannosyltransferases include α(l,2) mannosyltransferase, α(l,3) mannosyltransferase, β(l,4) mannosyltransferase, Dol-P-Man synthase, OChl , and Pmtl .

For the above glycosyltransferase cycles, the concentrations or amounts of the various reactants used in the processes depend upon numerous factors including reaction conditions such as temperature and pH value, and the choice and amount of acceptor saccharides to be glycosylated. Because the glycosylation process permits regeneration of activating nucleotides, activated donor sugars and scavenging of produced PPi in the presence of catalytic amounts of the enzymes, the process is limited by the concentrations or amounts of the stoichiometric substrates discussed before. The upper limit for the concentrations of reactants that can be used in accordance with the method of the present invention is determined by the solubility of such reactants. Preferably, the concentrations of activating nucleotides, phosphate donor, the donor sugar and enzymes are selected such that glycosylation proceeds until the acceptor is consumed. The considerations discussed below, while in the context of a sialyltransferase, are generally applicable to other glycosyltransferase cycles.

Each of the enzymes is present in a catalytic amount. The catalytic amount of a particular enzyme varies according to the concentration of that enzyme's substrate as well as to reaction conditions such as temperature, time and pH value. Means for determining the catalytic amount for a given enzyme under preselected substrate concentrations and reaction conditions are well known to those of skill in the art.

Enzyme amounts or concentrations are expressed in activity Units. One activity Unit catalyzes the formation of one μmol of product at a given temperature

(typically 37°C) and pH value (typically 7.5) per minute. Thus, 10 Units of an enzyme is a catalytic amount of that enzyme where 10 μmols of substrate are converted to 10 μmol of product in one minute at a temperature of 37°C and a pH value of 7.5.

The above ingredients are combined by admixture in an aqueous reaction medium (solution). That medium has a pH value of about 6 to about 8.5. The medium is devoid of chelators that bind enzyme cofactors such as Mg⁺² or Mn⁺². The selection of a medium is based on the ability of the medium to maintain pH value at the desired level. Thus, in some embodiments, the medium is buffered to a pH value of about 7.5, preferably with HEPES. If a buffer is not used, the pH of the medium should be maintained at about 6 to 8.5, preferably about 7.2 to 7.8, by the addition of base. A suitable base is NaOH, preferably 6 M NaOH.

The reaction medium can also contain solubilizing detergents (e.g., Triton or SDS) and organic solvents such as methanol or ethanol, if necessary. In addition, the enzymes are preferably utilized free in solution but can be bound to a support such as a polymer. The reaction mixture is thus substantially homogeneous at the beginning, although some precipitate can form during the reaction.

The temperature at which an above process is carried out can range from just above freezing to the temperature at which the most sensitive enzyme denatures. That temperature range is preferably about zero degrees C to about 45°C, and more preferably at about 20°C to about 30°C. The reaction mixture so formed is maintained for a period of time sufficient for the donor saccharide to be added to the acceptor. Some of the product can often be detected after a few hours, with recoverable amounts usually being obtained within 24 hours. It is preferred to optimize the yield of the process, and the maintenance time is usually about 36 to about 240 hours. The products produced by the above processes can be used without purification. However, it is usually preferred to recover the product. Standard, well known techniques for recovery of glycosylated saccharides such as thin or thick layer chromatography, ion exchange chromatography, or membrane filtration can be used. It is preferred to use membrane filtration, more preferably utilizing a reverse osmotic membrane, or one or more column chromatographic techniques for the recovery as is discussed hereinafter and in the literature cited herein. For instance, membrane filtration wherein the membranes have molecular weight cutoff of about 3000 to about 10,000 can be used to remove proteins. Nanofiltration or reverse osmosis can then be used to remove salts. Nanofilter membranes are a class of reverse osmosis membranes which pass monovalent salts but retain polyvalent salts and uncharged solutes larger than about 100 to about 700 Daltons, depending upon the membrane used. Thus, in a typical application, saccharides prepared by the methods of the present invention will be retained in the membrane and contaminating salts will pass through. Using such techniques, the saccharides (e.g., sialyl lactose) can be produced at essentially 100% purity, as determined by proton NMR and TLC. In another aspect, the present invention provides methods for the preparation of compounds having the formula:

NeuAcα(2→3)Galβ(l→4)(Fucαl→3)GlcN(R*)β(l→3)Galβ-OR In this formula, R is a hydrogen, a saccharide, an oligosaccharide or an aglycon group having at least one carbon atom. R can be either acetyl or allyloxycarbonyl (Alloc).

The term "aglycon group having at least one carbon atom" refers to a group — A— Z, in which A represents an alkylene group of from 1 to 18 carbon atoms optionally substituted with halogen, thiol, hydroxy, oxygen, sulfur, amino, imino, or alkoxy; and Z is hydrogen, -OH, -SH, -NH₂, -NHR¹, -M(R^l)₂, -CO₂H, -CCfeR¹, -CONH₂, -CONHR¹, -CON(R')₂, -CONHNH₂, or -OR¹ wherein each R¹ is independently alkyl of from 1 to 5 carbon atoms. In addition, R can be (CH₂)_nCH(CH₂)_mCH₃

(CH₂)₀CH₃,

where n,m,o =1-18; (CH₂)_n-R² (in which n = 0- 18), wherein R² is a variously substituted aromatic ring, preferably, a phenyl group, being substituted with one or more alkoxy groups, preferably methoxy or O(CH₂)_mCH₃, (in which m = 0-18), or a combination thereof. The steps for these methods include:

(a) galactosylating a compound of the formula GlcNR'β(l— »3)Galβ-OR with a galactosyltransferase in the presence of a UDP-galactose under conditions sufficient to form the compound: Galβ(l->4)GlcNR'β(l->3)Galβ-OR;

(b) sialylating the compound formed in (a) with a sialyltransferase in the presence of a CMP derivative of a sialic acid using a α(2,3)sialyltransferase under conditions in which sialic acid is transferred to the non-reducing sugar to form the compound: NeuAcα(2→3)Galβ(l→4)GlcNR'β(l→3)Galβ-OR; and

(c) fucosylating the compound formed in (b) to provide the NeuAcα(2→3)Galβ(l→4χFucαl→3)GlcNR'β(l->3)Galβ-OR. Additionally, for the present method, at least one of the galactosylating and sialylating steps are conducted in a reaction medium containing a divalent metal cation and the medium is periodically or continually supplemented with the divalent metal cation to maintain the metal ion concentration between about 2 mM and about 75 mM. The galactosylating and sialylating steps are carried out enzymatically, preferably under the general conditions described above for the methods of forming glycosidic linkages. In a preferred embodiment, the galactosylating and sialylating steps are carried out in a single vessel. The galactosylating step is preferably carried out as part of a galactosyltransferase cycle and the sialylating step is preferably carried out as part of a sialyltransferase cycle. Preferred conditions and descriptions of other species and enzymes in each of these cycles has been described.

The fucosylating step can be carried out either chemically or enzymatically. Enzymatic fucosylation can be carried out by contacting the appropriate oligosaccharide with an α(l— »3)fucosyltransferase and a compatible GDP-derivative of L-fucose under conditions wherein the fucose is transferred onto the oligosaccharide. The term

"α(l-»3)fucosyltransferase" refers to any fucosyltransferase which transfers L-fucose from GDP-fucose to a hydroxy position of a GlcNAc in a βGal(l→4)βGlcNAc group in an oligosaccharide glycoside. Suitable fucosyltransferases have been described above and include the known βGal(l→3,4)βGlcNAc α(l→3,4)fucosyltransferase and the βGal(l→4)βGlcNAc α(l→3)fucosyltransferase.

Suitable conditions, known to those of skill in the art, include the addition of the α(l-»3)fucosyltransferase to an appropriate mixture of the oligosaccharide and GDP-fucose in an appropriate buffer such as 0.1 M sodium cacodylate in appropriate conditions of pH and temperature such as at a pH of 6.5 to 7.5 and a temperature of from 0°C to 50°C, preferably between 25°C and 45°C, more preferably between 35°C and 40°C, for 12 hours to 4 days. The resulting fucosylated product can be isolated and purified using conventional techniques including membrane filtration, HPLC and gel-, reverse phase-, ion exchange-, or adsorption chromatography. Alternatively, fucosylation of the oligosaccharide produced in step (b) is carried out chemically according to methods described in USSN 08/063,181, the disclosure of which is incorporated herein by reference. In a particularly preferred embodiment, R is ethyl, the fucosylation step is carried out chemically, and the galactosylation and sialylation steps are carried out in a single vessel.

In yet another aspect, the present invention provides methods for the preparation of compounds as described in WO 94/26760. Generally these compounds have the formula:

NeuAcα(2→3)Galβ( 1 →4)(Fucα 1 →3)GlcN(R")β-OR² In this formula, R" is alkyl or acyl from 1-18 carbons, 5,6,7,8 - tetrahydro-2-naphthamido; benzamido; 2-naphthamido; 4-amino benzamido; or 4-nitrobenzamido. R² may be the same as R as described above or may be Galβ-OR (R is as described above).

In the above descriptions, the terms are generally used according to their standard meanings. The term "alkyl" as used herein means a branched or unbranched, saturated or unsaturated, monovalent or divalent, hydrocarbon radical having from 1 to 20 carbons, including lower alkyls of 1-8 carbons such as methyl, ethyl, n-propyl, butyl, n- hexyl, and the like, cycloalkyls (3-7 carbons), cycloalkylmethyls (4-8 carbons), and arylalkyls.

The term "aryl" refers to a radical derived from an aromatic hydrocarbon by the removal of one atom, e.g., phenyl from benzene. The aromatic hydrocarbon may have more than one unsaturated carbon ring, e.g., naphthyl. The term "alkoxy" refers to alkyl radicals attached to the remainder of the molecule by an oxygen, e.g., ethoxy, methoxy, or n- propoxy. The term "alkylthio" refers to alkyl radicals attached to the remainder of the molecule by a sulfur.

The term of "acyl" refers to a radical derived from an organic acid by the removal of the hydroxyl group. Examples include acetyl, propionyl, oleoyl, myristoyl. The compounds described above can then be used in a variety of applications, e.g., as antigens, diagnostic reagents, or as therapeutics. Thus, the present invention also provides pharmaceutical compositions which can be used in treating a variety of conditions. The pharmaceutical compositions are comprised of oligosaccharides made according to the methods described above. Pharmaceutical compositions of the invention are suitable for use in a variety of drug delivery systems. Suitable formulations for use in the present invention are found in Remington's Pharmaceutical Sciences, Mace Publishing Company, Philadelphia, PA, 17th ed. (1985). For a brief review of methods for drug delivery, see, Langer, Science 249:1527- 1533 (1990).

The pharmaceutical compositions are intended for parenteral, intranasal, topical, oral or local administration, such as by aerosol or transdermally, for prophylactic and/or therapeutic treatment. Commonly, the pharmaceutical compositions are administered parenterally, e.g., intravenously. Thus, the invention provides compositions for parenteral administration which comprise the compound dissolved or suspended in an acceptable carrier, preferably an aqueous carrier, e.g., water, buffered water, saline, PBS and the like. The compositions may contain pharmaceutically acceptable auxiliary substances as required to approximate physiological conditions, such as pH adjusting and buffering agents, tonicity adjusting agents, wetting agents, detergents and the like.

These compositions may be sterilized by conventional sterilization techniques, or may be sterile filtered. The resulting aqueous solutions may be packaged for use as is, or lyophilized, the lyophilized preparation being combined with a sterile aqueous carrier prior to administration. The pH of the preparations typically will be between 3 and 11, more preferably from 5 to 9 and most preferably from 7 and 8.

In some embodiments the oligosaccharides of the invention can be incorporated into liposomes formed from standard vesicle-forming lipids. A variety of methods are available for preparing liposomes, as described in, e.g., Szoka et al, Ann. Rev. Biophys. Bioeng. 9:467 (1980), U.S. Pat. Nos. 4,235,871, 4,501,728 and 4,837,028. The targeting of liposomes using a variety of targeting agents (e.g., the sialyl galactosides of the invention) is well known in the art (see, e.g., U.S. Patent Nos. 4,957,773 and 4,603,044). ' Standard methods for coupling targeting agents to liposomes can be used. These methods generally involve incorporation into liposomes of lipid components, such as phosphatidylethanolamine, which can be activated for attachment of targeting agents, or derivatized lipophilic compounds, such as lipid derivatized oligosaccharides of the invention.

Targeting mechanisms generally require that the targeting agents be positioned on the surface of the liposome in such a manner that the target moieties are available for interaction with the target, for example, a cell surface receptor. The carbohydrates of the invention may be attached to a lipid molecule before the liposome is formed using methods known to those of skill in the art (e.g., alkylation or acylation of a hydroxyl group present on the carbohydrate with a long chain alkyl halide or with a fatty acid, respectively). Alternatively, the liposome may be fashioned in such a way that a connector portion is first incorporated into the membrane at the time of forming the membrane. The connector portion must have a lipophilic portion which is firmly embedded and anchored in the membrane. It must also have a reactive portion which is chemically available on the aqueous surface of the liposome. The reactive portion is selected so that it will be chemically suitable to form a stable chemical bond with the targeting agent or carbohydrate which is added later. In some cases it is possible to attach the target agent to the connector molecule directly, but in most instances it is more suitable to use a third molecule to act as a chemical bridge, thus linking the connector molecule which is in the membrane with the target agent or carbohydrate which is extended, three dimensionally, off of the vesicle surface. The compositions containing the oligosaccharides can be administered for prophylactic and/or therapeutic treatments. In therapeutic applications, compositions are administered to a patient already suffering from a disease, as described above, in an amount sufficient to cure or at least partially arrest the symptoms of the disease and its complications. An amount adequate to accomplish this is defined as a "therapeutically effective dose." Amounts effective for this use will depend on the severity of the disease and the weight and general state of the patient, but generally range from about 0.5 mg to about 2,000 mg of oligosaccharide per day for a 70 kg patient, with dosages of from about 5 mg to about 200 mg of the compounds per day being more commonly used.

' In prophylactic applications, compositions containing the oligosaccharides of the invention are administered to a patient susceptible to or otherwise at risk of a particular disease. Such an amount is defined to be a "prophylactically effective dose." In this use, the precise amounts again depend on the patient's state of health and weight, but generally range from about 0.5 mg to about 1,000 mg per 70 kilogram patient, more commonly from about 5 mg to about 200 mg per 70 kg of body weight. Single or multiple administrations of the compositions can be carried out with dose levels and pattern being selected by the treating physician. In any event, the pharmaceutical formulations should provide a quantity of the oligosaccharides of this invention sufficient to effectively treat the patient.

The oligosaccharides may also find use as diagnostic reagents. For example, labeled compounds can be used to locate areas of inflammation or tumor metastasis in a patient suspected of having an inflammation. For this use, the compounds can be labeled with appropriate radioisotopes, for example, ¹²⁵1, ¹⁴C, or tritium.

The oligosaccharide of the invention can be used as an immunogen for the production of monoclonal or polyclonal antibodies specifically reactive with the compounds of the invention. The multitude of techniques available to those skilled in the art for production and manipulation of various immunoglobulin molecules can be used in the present invention. Antibodies may be produced by a variety of means well known to those of skill in the art.

The production of non-human monoclonal antibodies, e.g., murine, lagomorpha, equine, etc., is well known and may be accomplished by, for example, immunizing the animal with a preparation containing the oligosaccharide of the invention. Antibody-producing cells obtained from the immunized animals are immortalized and screened, or screened first for the production of the desired antibody and then immortalized. For a discussion of general procedures of monoclonal antibody production see Harlow and Lane, Antibodies, A Laboratory Manual Cold Spring Harbor Publications, N.Y. (1988).

EXAMPLES

The following examples are offered to illustrate, but not to limit the present invention.

Example 1

Improved Recovery of Neisseria meningitidis Galactosyltransferases by Recombinant Expression in Protease Deficient E. coli

SUMMARY

The IgtB gene encoding a β-l,4-galactosyltransferase gene and the IgtC gene encoding an α-l,4-galactosyltransferase from the bacterial pathogen Neisseria meningitidis were cloned into an expression vector and overexpressed in E. coli. Both genes expressed very well, but problems with C-terminal proteolysis were encountered with both proteins. The IgtC protein was initially isolated from extracts of recombinant E. coli as a truncated species that retained enzymatic activity, and was subsequently shown by mass spectrometry to be 19 residues shorter than the expected protein. A specific set of engineered C-terminal deletions was constructed to investigate their effect on the expression of IgtC. As many as 28 residues could be deleted with no effect on activity, and with the concomintant improvement of the overall expression up to five fold over the full length protein. The overall expression level of the /gtC-19 mutant was 14.5% of the total cellular protein, and isolated yields were appoximately 90 mg/L.

The IgtB protein was also proteolysed in extracts of normal E. coli strains into enzymatically inactive fragments lacking 28 or 41 C-terminal residues. This degradation could be prevented by expression in an ompT protease deficient strain of E. coli. The full length IgtB protein was not stable in soluble protein extracts from all recombinant strains, however a stable enzyme preparation could be achieved by detergent extraction of the membrane fraction from cells of the ompT strain expressing IgtB. The detergent extract of the membrane fraction contained IgtB as 14.3%) of the total protein. The membrane fraction contained 66 mg of IgtB from one L of culture and the combined amount of IgtB expressed was around 200 mg/L of culture. Specific deletions of IgtB were also constructed, and 15 residues could be removed without loss of enzyme activity and also with the concomitant improvement of the overall expression up to twofold over the full length protein. Longer deletions produced protein but activity could not be detected in these recombinant strains. Examination of the glycosyltransferase sequences from a wide range of bacteria showed their C-terminal segments of ~50 amino acids frequently contained paired basic residues. Engineering of these segments may therefore be required as a general practice to produce these enzymes for use in the large-scale chemi-enzymatic synthesis of carbohydrates, including carbohydrate-based therapeutics.

Characterization of the two enzymes showed both could be used for synthesis of oligosaccharides using synthetic glycosyltransferase acceptors. Preparative synthesis was performed to demonstrate the utility of these enzymes. Kinetic characterization of these enzymes also shows they have properties desirable for the enzymatic synthesis of oligosaccharides. The expression levels of these enzymes is high enough that they could be used for larger scale synthesis which would be of value in producing carbohydrate based therapeutics and other products.

MATERIALS AND METHODS

Recombinant DNA and growth of recombinant bacterial cultures. Basic recombinant DNA methods like plasmid DNA isolation, restriction enzyme digestions, the purification of DNA fragments for cloning, ligations, transformations and DNA sequencing were performed as recommended by the enzyme supplier, or the manufacturer of the kit used for the particular procedure. PCR was performed with Pwo polymerase as described by the manufacturer (Boehringer Mannheim, Laval, P.Q.). Restriction and DNA modification enzymes were purchased from New England Biolabs LTD. (Mississauga Ontario). Qiaprep columns were from Qiagen Inc. (Chatsworth CA, USA). DNA sequencing was performed with an Applied Biosystems (ABI) model 370A automated DNA sequencer using the cycle sequencing kit from ABI, (Montreal PQ). Polyacrylamide gel electrophoresis of proteins was performed as recommended in the technical literature supplied by Bio-Rad Laboratories (Mississauga Ontario).

The following standard laboratory strains of Escherichia coli were used for the propagation of recombinant plasmids, and the production of recombinant gene products. CS1883: thr-1, leuBβ, A(gpt-proA)66, hisG , argKb, thi-1, rfbD\, lacYl, ara-14, galK2, xyl- 5, myl-1, mgl-51, rpsLbλ, kdgK5l, supE44, galE [Δ(galOPE:Cm]; LG90: Δ(/αc- proAB); WA834 (an E. coli B strain): ompT504, malB, hsdlO, DE46, gal-\S 1 , met-100. The vector used in this work, has been described previously (Wakarchuk et al. (1994) Protein Science 3: 467-475. All cultures were grown in 2YT as previously described (Wakarchuk et al. (1996), supra.) The antibiotic ampicillin was added at 150 μg/ mL to all cultures of plasmid containing strains. The cultures were grown with shaking at 37°C for both protein and plasmid production.

Isolation of the IgtC and IgtB genes.

Neisseria meningitidis 129E LI, 406Y L3, and MC58 L3 genomic DNA was prepared from cells grown 18 h on Columbia blood agar plates at 37°C in 5% CO₂. The cells were washed off one plate with six ml of 100 mM Tris-HCl, 10 mM EDTA, pH 8.0. To this suspension, 0.6 ml of 10%) SDS was added and the suspension was inverted several times until lysis had been achieved. This mixture was then extracted three times with 1 :1 phenol chloroform. The DNA was precipitated by adding 0.4 ml 3M sodium acetate pH 5.2, and 12 ml ice cold ethanol. After inverting the mixtures several times the precipitated DNA was removed with a sterile pipette tip to a clean tube and washed three time with 3 ml of ice cold 70% ethanol. This precipitate was dried with a stream of argon gas, and then redissolved in 2 ml 10 mM Tris-HCl, ImM EDTA pH 8.0. In preparation for use, 0.5 ml of the DNA was treated with RNAse A, 20 μg/ml at 37°C for 30 min.

The genes for the IgtC enzyme and the IgtB enzyme were isolated by PCR amplification from the 129E LI and 406Y L3 or MC58 L genomic DNA, respectively, using the following primers (based on DNA sequences in GenBank U25839, and from M. Jennings personal communication): /gtC-5P 5' - CTA GAA GGA TCC ATC GAT GCT TAG GAG GTC ATA TGG ACA TCG TAT TTG CGG CAG ACG AC - 3'; /gtC-3PN 5' - TAT CAT CGA TAA GCT TAG TCA TCA ATA AAT CTT GCG TAA GAA TCT GGC - 3 ' . Deletions from the 3' end were also constructed by PCR using the primers: lgtC-\9, 5'- TATCATCG AAA GCT TAGTCATCA CTTTGT CGAAAA CATACG GTG; lgtC-21, 5'-TATCA TCG AAA GCT TAGTCATCAAAA CATACG GTG CGG GAC GGC; IgtC- 24, 5'- TATCATCG AAA GCT TAGTCA TCA GTG CGG GAC GGC AAG TTT GCC; /gtC-27, 5'-TATCATCG AAA GCT TAG TCA TCA GGC AAG TTT GCC GCG CCA TTC; IgtB-Sp, 5* CTA GAA GGA TCCATC GAT GCT TAG GAG GTC ATA TGC AAA ACC ACGTTATCA GCT TAG CTT CC; /gt£-3pN, 5' TAT CAT CGA TAA GCT TAG TCATTATTG GAA AGG CAC AAT GAA CTG TTC GCG; lgtB-15, TCATCG AAA GCT TAGTCA TCA CCTTCC CTG CTGATT TTG GTC; lgtB-25, 5' TCA TCG AAA GCT TAG TCATCA GAT CAG GCG GCG TTT GAATGT G; lgtB-30, 5'- TCA TCG AAA GCT TAG TCA TCA GAATGT GTT GGC GGG GGA ATC.

Enzyme assays.

Glycosyltransferase reactions with IgtC were performed at 37°C in 20 μl and contained: HEPES-NaOH buffer, 50 mM, pH 7.5, 10 mM MnCl₂, 5 mM dithiothreitol, 0.05- 5.0 mM acceptor, 0.001-1.0 mM UDP-Gal donor, and various amounts of enzyme, either from crude bacterial extracts, extracts of recombinant E. coli with the cloned gene or purified enzyme. Reactions for the IgtB enzyme were performed at 37°C in 20 μl and contained: HEPES-NaOH buffer 50 mM, 100 mM NaCl, pH 7.5, 10 mM MnCl₂, 0.1- 5.0 mM labeled acceptor, 0.01-1.0 mM UDP-Gal donor and various amounts of enzyme, either from crude bacterial extracts, or extracts of recombinant E. coli with the cloned gene. The recombinant enzymes were assayed after dilution in buffer containing 1 mg/ml acetylated bovine serum albumin. The dilutions were chosen such that for assay times of 5-30 minutes approximately 10%) conversion of the acceptor to product would be achieved. Enzyme assays from extracts from N. meningitidis were incubated 1-15 h. In order to examine if the UDP generated during the reactions was an inhibitor of the transferases, 1 U of calf intestinal alkaline phosphatase (New England Biolabs) was added to some reactions. The reactions were terminated either by the addition of an equal volume of 2% SDS and heated to 75°C for 3 minutes, or by diluting the reaction with 10 mM NaOH. These samples were then diluted appropriately in water prior to analysis by capillary electrophoresis.

After the reaction the FCHASE-aminophenylglycosides could be isolated for further analysis. They were bound to a Sep-Pak Cl 8 reverse phase cartridge (Waters), desalted by washing with water and then eluted in 50%) acetonitrile. After drying under vacuum, the samples were dissolved in water and glycosidase assays were performed as described by the enzyme manufacturer (Oxford Glycosystems). These samples were then diluted with water and again analyzed by capillary electrophoresis.

Capillary Electrophoresis.

Capillary electrophoresis (CE) was performed with a Beckman P/ACE 5510 equipped with an Argon-ion laser induced fluorescence detector, excitation λ = 488 nm, with the emission λ=520 nm or a UN detector set at 214 nm. The capillary was 50 μ x 57 cm bare silica, with the detector window at 47 cm. The capillary was conditioned before each run by washing with 0.2 M ΝaOH for 2 min., water for 2 min., then 20 mM sodium dodecyl sulphate, 10 mM sodium tetraborate, 60 mM sodium phosphate pH 9.4, for 2 min. For some runs the buffer was simply 25 mM sodium tetraborate pH 9.4 for 2 min. Samples were introduced by pressure injection for 2-5 seconds, and the separation was performed at 18 kV, 75 μA. Peak integration was performed with the Beckman System Gold (version 8) software. Samples were diluted so that both the acceptor peak and the product peak were less than 1000 relative fluorescence units. Purification of the recombinant enzymes IgtC and IgtB.

A culture of /gtC-19 or IgtB was grown at 37°C to an A₆₀₀ (1 cm) of 0.3, then enzyme expression induced by the addition of IPTG to 1 mM and the culture grown for an additional 15 h. The cells were collected by centrifugation at 5000 x g for 20 min. The cells were resuspended in 50 mM MOPS buffer pH 7.0 to a concentration of 10%) w/v. The cells were then broken by two passages through an Avestin C-5 cell disruptor (Ottawa, Ont.) at 20,000 psi. A protease inhibitor cocktail tablet (Complete™, Boehringer Mannheim) was added for each 50 ml of extract. Unbroken cells were removed by centrifugation 5000 x g for 20 min at 10°C. The supernatant from this was then centrifuged at 20,000 x g for 30 min. to pellet additional cell debris, and the supernatant was than centrifuged at 100,000 x g for 60 min. to pellet a membrane fraction. The membrane fraction was saved in the case of IgtB, and discarded for IgtC preparations.

Initial purification of IgtC was performed by immobilized metal affinity chromatography (IMAC) on a Pharmacia Hi-Trap chelating Sepharose column charged with nickel chloride. The supernatant from the 100,000 x g centrifugation was loaded onto the IMAC column which had been equilibrated in 20 mM MOPS pH 7.0, 500 mM NaCl. The column was eluted with a gradient of 1M imidazole in the starting buffer. The lgtC-\9 protein was isolated from the supernatant of the 100,000 x g centrifugation, by cation exchange chromatography on SOURCE-15S media (Pharmacia), or POROS HS II (Perseptive Bio-Systems). The buffer was 10 mM MES pH 5.7, and a gradient of 0-300mM NaCl over 10 column volumes was used. The active peak from this column was concentrated by ultrafiltration in a stirred cell using a 10,000 MWCO filter (Millipore). The concentrated peak material was then chromatographed on a 2.5 cm X 90 cm column of Sephacryl SI 00-HR in 50 mM ammonium acetate pH 7. The IgtB protein was isolated by detergent extraction of the pellet from the

100,000 xg centrifugation. The protein was solubilized by the addition of CHAPS to a final concentration of 0.75%. Glycerol and NaCl were added to 20% and 100 mM respectively to help stabilize the enzyme.

For analysis of the proteolysis products, the supernatant fraction from IgtB produced in CS 1883 was first chromatographed on EconoPakQ (BioRad) with 20 mM Tris- HC1 pH 8.0 as buffer, and a gradient of 0-700 mM NaCl over 10 column volumes. A fraction with residual IgtB activity was then further purified on Superose 12 (Pharmacia) with 50 mM ammonium acetate pH 7 as eluant. Fractions were examined on SDS-PAGE, and then analyzed by elctrospray mass spectrometry as described below.

N-terminal amino acid sequencing and molecular mass determination.

Automated gas-phase amino acid sequencing was performed on an Applied Bio-systems (Foster City, CA, USA) 475 A protein sequencing system incorporating a model 470 A gas-phase sequencer equipped with an on-line model 120A PTH analyzer under the control of a model 900A control and data analysis module. Mass analysis was performed using a Fisons Instruments (Manchester, U.K.) VG electrospray Quattro triple quadrupole mass spectrometer with a mass range of 3500 amu/e. Solutions to be analyzed were prepared in 5% acetic acid at a concentration of 0.1 -0.2 mg/mL and infused directly into the mass spectrometer.

Preparative Synthesis using IgtC.

To investigate the utility of the Igt-C enzyme, we performed reactions on 10 mg of p-nitrophenol-β-D-lactoside (p-NP-Lac). These reactions were performed in a 10 ml volume, which gave a concentration of 2 mM for p-NP-Lac. The buffer conditions were identical to those given above for IgtC. The reaction initially contained 200 mU of enzyme based on Units calculated with FCFLASE-AP-Lac as substrate. After two hours, a second 200 mU of enzyme was added. The progress of the reaction was followed by CE using a UV detector at A₂₁₄ for detection of the reaction product.

RESULTS

Cloning and expression o/IgtB and IgtC in E. coIL

■ Primers were made to amplify both of these genes from N. meningitidis genomic DNA such that they could be directly inserted in an expression vector. The constructs were made both with and without poly-histidine tags for purification by immobilized metal affinity chromatography. Transformants were screened directly for enzyme activity using fluorescein labeled aminophenylglycosides either by TLC of the reaction mixture or by CE-LIF. We observed that both genes expressed well in E. coli. After induction with IPTG, a major new protein could be seen from SDS-PAGE analysis of whole cell lysates (Fig. 1, Fig. 2). Initial attempts to purify the His₅ tagged IgtC by IMAC showed the enzyme activity was not all retained on an IMAC column. The protein was also purified by cation exchange chromatography and examination of the proteins eluted from both columns by SDS-PAGE showed they were isolated in two forms, with the enzyme from the cation exchange column being shorter than expected. This shorter protein could be isolated by IMAC, although not as pure as the material from the cation exchange chromatography. The majority of the enzyme present in the starting material was the shorter form, and only small amounts of the full length material could be isolated from the IMAC column. Analysis of the protein by electrospray mass spectral analysis showed an observed mass of 33176.78 ± 1.31 Da (Figure 3, top panel), while the expected mass of the full length protein was 35778 Da. This truncated product comprised residues 1-291, i.e., 19 residues were missing from the C- terminal end, as evidenced by amino acid sequencing. The point of cleavage mapped to a double basic residue pair. Subsequent engineering of deletions of the gene showed that the last 27 residues of IgtC are not essential for expression of the protein (Figure 2), as all of these truncated forms were still enzymatically active. When we attempted to express IgtC with a 32 residue C-terminal deletion, no protein was visible on SDS-PAGE after induction and no enzymatic activity could be detected. The lgtC-19 product was not susceptible to further proteolysis in E. coli extracts and was used for all further experiments.

Two forms of the IgtB protein were cloned and expressed, both from N. meningitidis, but from different L3 immunotype strains (MC58 and 406 Y). These proteins have nearly identical sequences, but we wanted to investigate if one or the other gene expressed better in E. coli. The expression level of either version of the IgtB protein appears to be as high as that of IgtC (Figure 2) but, as soon as cell free extracts were made the 406 Y derived protein was rapidly proteolysed into a major product, while the protein from MC58 was cleaved into two major cleavage products (Figure 4). This proteolysis resulted in an almost complete loss of enzyme activity in the extract containing either protein. After partial purification of two of these inactive proteolysed products from IgtB (MC58), Ν-terminal sequence analysis and mass spectrometry showed truncated products of 26284.49 ± 3.45 Da (residues 1-239) and 27980 ± 3.36 (residues 1-252) (Figure 3, bottom panel), while the expected mass of the full length protein was 31578 Da. These C-terminal deletions correspond to truncations of 28 and 42 residues, with both cleavage sites being at double basic residues. The IgtB gene product from N. meningitidis 406Y was also degraded, but only one major proteolysis product could be seen from SDS-PAGE analysis (Figure 4), and Ν-terminal sequencing showed that this was also a C-terminal truncation. The size of this truncated fragment was estimated from SDS-PAGE gels to be again shortened by about 28 residues, and inspection of the C-terminal sequence reveals a double basic residue site at this position in the 406 Y sequence at a site similar to that seen in the protein from the MC58 strain. This proteolysis product was also not enzymatically active. When engineered deletions were also constructed with IgtB (MC58), enzymatic activity was observed after a 15 residue deletion, but deletion of 25 residues caused loss of activity. In contrast to what we observed with IgtC we were unable to demonstrate enzyme activity from a construct of IgtB with a poly-histidine tail.

The IgtB protein either in the full length form or with the engineered deletion was not stable and was susceptible to proteolysis in E. coli extracts. An examination of the C-terminal sequence of both IgtC and IgtB revealed a clustering of basic amino acid residues, pairs of which give rise to ompT protease cleavage sites (Murby et al. (1996) Protein Exp. Purific. 7: 129-136). Identification of the sites of proteolytic cleavage in both enzymes suggested that ompT was responsible for the observed degradation. We then demonstrated that proteolysis could be prevented by preparing protein from an E. coli ompT protease deficient strain, WA834. It is not clear why the MC58 derived protein is cleaved into two major products while the protein from 406Y is only cleaved into one major product. The cleavage site at residue 28 in the MC58 protein has the sequence KHR, which contains a histidine instead of lysine or arginine and thus may be a suboptimal ompT cleavage site, and therefore cleaved more slowly than the KRR sequence found in 406 Y. If this is a suboptimal cleavage site then the alternative cleavage site is residue 42. In the 406 protein it appears that once the cleavage occurs at residue 28, cleavage at residue 42 does not occur.

The full length protein could be obtained in high yield from CHAPS detergent extracts from membrane preparations of /gt5/WA834 (Figure 5). This material remained active when glycerol and ΝaCl were added to 20%> and 100 mM respectively. There were a total of 29.4 Units of activity in the membrane preparation obtained by 100,000 x g centrifugation of the lysate from 20 grams of cells. The amount of IgtB present was estimated from densitometry of SDS-PAGE gels of the membrane preparation and the CHAPS detergent extract (Tables 1, 2). From this data is can be estimated that 66 mg of IgtB protein are present in the membranes from 1 L of culture. It should be mentioned there is still a significant amount of soluble IgtB protein in the supernatant after the 100,000 x g centrifugation (Tables 1, 2).

Table 1. Expression levels of IgtC and IgtB as measured by densitometry of SDS-PAGE of various preparations.

Glycosyltransferase % of stainable protein Fold Increase in Construct (densitometer scan) Expression na lgtC-19 18 3 lgtC-22 28 4.6

lgtC-25 35 5.8

lgtC-28 17 2.8 lgtB-total 7 na lgtB-soluble 6 na lgtB-membrane 9.6 na

lgtB-CHAPS(Sol) 14.3 na

lgtB-CHAPS(Pel) 8 na lgtB-15-total 13 1.8

The amount of enzyme isolated from /gtC-19 was 46.5 mg (106 Units) from 10 gm of cell paste. Therefore, the isolated yield from 1 L of culture (fermentor grown cells) would be approximately 90 mg. The overall yield of IgtB would be approximately >200 mg/L when both the soluble protein and the membrane material are considered.

Table 2. Distribution of Galactosyltransferase Activity in N. meningitidis and recombinant Escherichia coli. Enzyme Source % Activity Soluble % Activity in Pellet

(100,000 x g) (100,000 x g)

Neisseria meningitidis LI, 70 30

IgtC

Escherichia coli, IgtC ND ND

Escherichia coli, lgtC-19 88 12

Neisseria meningitidis L3, 79 21

LgtB

Escherichia coli, IgtB 81 ^a 19 ^a

Escherichia coli, lgtB-15 83.5 ^a 16.5 ^a

ND. Not determined a. Determined by densitometry of SDS-PAGE gels

Measurement of enzyme activity. This assay involved using either monosaccharide or disaccharide aminophenyl derivatives which were labeled with fluorescein as described previously (Fig. 6, Wakarchuk et al. (1996) supra.). We were able to assay activity from N. meningitidis extracts (Table 1), so we reasoned the recombinant enzymes would work on the same acceptor molecules. The acceptor for the IgtC enzyme was either FCHASE- AP-Lac, or FCHASE-

AP-Gal, whereas for the IgtB enzyme we used FCHASE-AP-GlcΝAc (Fig. 6). Even though the substrate for IgtB is normally a trisaccharide, we found it could use simple monosaccharide acceptors. With the synthetic acceptors in our CE based assay we were able to detect activity in permeabilized cells, from as little as a single colony of either the native N. meningitidis or recombinant E. coli. A minor problem in detecting activity of the IgtB or IgtC enzymes was that strains containing E. coli β-galactosidase degraded the acceptors or the product so the activity sometimes appeared very low.

With the purified IgtC protein we could also use p-nitrophenyl-lactoside as the acceptor for 10 mg scale reactions. We achieved 70% conversion without optimizing the reaction conditions. The enzyme shows a-5 fold higher reaction rate with the FCHASE-AP- Lac versus FCHASE-AP-Gal, but no significant difference in K_m(app) (Table 3). The terminal galactose after an IgtC reaction was sensitive to treatment with α-galactosidase, and was shown to be linked at the 4 position of the preceding galactose by methylation analysis (data not shown). Preparative reactions with IgtB were also performed on approximately 0.5 mg scale with the FCHASE-AP-GlcNAc, however we did not have di- or trisaccharide acceptors for this enzyme with which to compare reaction rates. The reaction product was sensitive to β-galactosidase, and had identical migration on TLC and CE analysis to FCHASE-AP- LacNac prepared with commercial bovine β-galactosyltransferase. We saw no difference in the level of activity upon the addition of alkaline phosphatase to the reactions, which suggested UDP generated from UDP-Gal was not a potent inhibitor of either gtB or IgtC

Preliminary kinetic analysis of IgtC and IgtB.

Using the CE based enzyme assay we were able to measure K_m(app) and kc_at(app) for IgtC and a data for IgtB. We used a stopped assay for these reactions and were careful to keep conversion of acceptor or donor to product to 10% or less. The data were analyzed using the computer program Grafit. The K_m(app) values for the FCHASE acceptors were in the mM range (Table 3), with the errors shown as calculated by Grafit. The K_m(apP) for UDP-Gal were in the μM range for both enzymes.

Table 3. Kinetic Data for the IgtC and IgtB Galactosyltransferases

Enzyme K_m(app) K_m(app) k a_t (min^" ) k_cat/ K_m(app)

UDP-Gal Acceptor lgtC-19 ND Gal 1.46 ± 0.23 mM L5~6 9~3

4.4 ± 0.3 μM Lac 1.39 ± 0.14 mM 150 108

IgtB 25 ± 6 μM GlcNAc 0.6 ± 0.1 mM NA NA

Requirement for free Thiols for Activity of IgtC.

We found that with extracts of N meningitidis 129E LI and with purified IgtC enzyme, a reducing agent was required for enzymatic activity. We also observed that enzyme reaction rates were not linear, and had a definite lag period before a reasonably linear rate was achieved (Figure 7). This lag period could be eliminated by pre-treatment of the IgtC protein with 5 mM DTT. In addition the overall amount of activity was significantly higher during the early part of the reaction, 2.38 fold at 3 minutes, which decreases to 1.3 fold at the 15 minute time point. This pre-treatment could be replaced by storing the enzyme in the presence of 5 mM DTT. No thiol requirement was observed for the IgtB protein.

DISCUSSION

Large scale enzymatic synthesis of oligosaccharides depends on the availability of sufficient quantities of the required glycosyltransferases. Expression of many mammalian glycosyltransferases has been achieved involving expression in eukaryotic hosts which can involve expensive tissue culture media and only moderate yields of protein (Kleene et al. (1994) Glycoconjugate J. 12: 755-761; Williams et al. (1995) Glycoconjugate J. 12: 755-761). Expression in E. coli has been achieved for mammalian glycosyltransferases but these attempts have produced mainly insoluble forms of the enzyme from which it has been difficult to recover active enzyme in large amounts (Aoki et al, supra.; Nishiu et al. (1995) Biosci. Biotech. Biochem. 59 (9) : 1750-1752). Since bacteria have been shown to synthesize oligosaccharide structures identical to those in mammals, we reasoned that bacterial glycosyltransferases would be better prospects for recombinant protein expression in E. coli . We have now demonstrated that the expression level of two of these genes is very good in E. coli relative to the levels reported for some mammalian genes (Aoki et al, supra.; Kleene et al, supra.; Nishiu et al, 1995; Seto et al. (1995), supra.; Williams et al, supra.).

The natural acceptor for some of these enzymes is a short chain LOS. However, whole LOS is problematic to use as an acceptor in a quantitative assay as it is heterogeneous and requires detergent for solubility, so synthetic substrates were used in a capillary electrophoresis assay. From the Igt locus of N. meningitidis, we have previously described the measurement of enzyme activity from a β-l,4-galactosyltransferase, and a β- N-acetylglucosaminyltransferase activity from the IgtB and IgtA gene products respectively (Wakarchuk et al. ( 1996), supra . We have now demonstrated α- 1 ,4-galactosyltransferase activity from the IgtC gene product which is used in the biosynthesis of the LI immunotype LOS. This IgtC protein is not involved in lacto-N-neotetraose biosynthesis, but is involved in the elaboration of the P^k blood group antigen like trisaccharide. This trisaccharide has been shown to have potential for use in therapeutic intervention for the treatment of enterohemoπhagic E. coli infections (Armstrong et al. (1995) J. Infect. Dis .171 : 1042- 1045), and the high yield of the IgtC enzyme would facilitate large scale enzymatic synthesis of the P blood group antigen trisaccharide.

The reaction conditions for these two enzymes are similar in that they are Mn⁺² requiring enzymes with similar pH optima, but they have a major difference in that IgtC requires free thiol groups for activity whereas the addition of reducing agents had no effect on the activity of IgtB. These enzymes also show a marked difference in stability in extracts, which may indicate that they occupy different environments in the cell, IgtB being more stable in the membrane, and IgtC being quite stable in the cytoplasm as well as the membrane fraction. The yield of enzyme activity is apparently higher with IgtC compared to IgtB because its activity was measured with a lactoside, which is a better acceptor than the simple galactoside, whereas the activity of IgtB was measured with a monosaccharide acceptor when the natural acceptor is a trisaccharide.

The kinetic data for these enzymes showed there was no evidence of substrate inhibition with either high concentrations of acceptor or donor. We also observed that, unlike many mammalian enzymes, UDP is not a potent inhibitor (Palcic (1994) Methods in Enzymology 230: 300-316) of either enzyme as we were able to perform preparative reactions without the addition of alkaline phosphatase, and in fact we saw no difference with the addition of alkaline phosphatase. These properties suggest these enzymes should be suitable for large scale synthesis. Our early attempts to purify the IgtB and IgtC proteins revealed that they were susceptible to proteolysis in extracts E. coli. An examination of the purified cleaved proteins by electrospray mass spectrometry showed the cleavage sites were within double or triple basic amino acid sequences which are ompT protease cleavage sites. In the case of IgtC this cleavage had virtually no effect on the enzymatic activity. With IgtB expressing strains active enzyme was rapidly degraded in extracts and active enzyme could only be recovered from extracts of ompT mutant strains. The ompT cleavage sites suggested modifications we could make to the genes to eliminate production of heterogeneous enzyme species. The expression of these modified proteins was then better than that of the full length protein for both IgtB and IgtC by a factor of 2 to 5. An examination of the Igt locus from both N. meningitidis and N. gonorrhoeae shows that all of the genes IgtABCDE have clusters of these ompT cleavage sites at their C-termini (Figure 8) In the examination of these sequences it can be seen that ompT cleavage sites are clustered in the last 50 amino acids, but do not occur in homologous positions. Furthermore, examination of the sequences often bacterial glycosyltransferases from five other species—Haemophilus influenza, Pasturella haemolytica, Salmonella typhimurium, Helicobacter pylori, and Streptococcus pneumoniae-ieveiύs the sequence feature we observed in IgtB and IgtC, i.e., basic residues clustered in the C-terminal 50 amino acids, and hence the presence of ompT cleavage sites. This clustering of basic residues may be a structural characteristic of glycosyltransferases, but may also be seen in many other proteins. The enzymes with this sequence feature are from a broad spectrum of activities: galactosyltransferases (IgtC and IgtB, rfal, lgtC_Hi, lpsA_Hi, cpsl4J_Sp, and IpsAJPh); N-acetylglucosaminyl- and N-acetylgalactosaminyltransferases (IgtA, IgtD, cpsl41_Sp), a glucosyltransferase (rfaJ) and a fucosyltransferase (FucT_Hp). This type of C-terminal region appears to be a general feature of the bacterial glycosyltransferases involved in LOS or LPS outer core biosynthesis, and can also be found in gram positive capsular glycosyltransferases. It would therefore be expected that the protein expressed from these other glycosyltransferase genes would be susceptible to the ompT cleavage that we observed in IgtB and IgtC.

While there are internal ompT sites in IgtB and IgtC we did not observe cleavage of these, presumably because the protein is folded such that they are not exposed to the protease. This result suggests then that the C-terminal region may be a less structured tail segment which makes it susceptible to proteolysis by ompT. The known cleavage specificity of the ompT protease indicates a preference for denatured or unstructured regions of the protein substrate (White et al. (1995) J. Biol. Chem. 270: 12990-12994). The role of this tail segment may be to anchor the protein to the membrane in N. meningitidis through a weak ionic interaction with negatively charged phospholipids. We do not know at this time if the proteins are proteolysed in N. meningitidis, but both IgtB and IgtC are found as soluble and membrane associated enzymes in the organism.

This work is the first detailed examination of bacterial glycosyltransferases involved in LOS outer core biosynthesis. We have shown that both IgtB and IgtC can be expressed in an active form at very high levels in ompT strains of E. coli. The common sequence feature of clustered basic amino acid residues and the presence of ompT cleavage sites in five different types of glycosyltransferases (12 in total) from three species of bacteria, suggests that recovery of active glycosyltransferases from a variety of bacteria may require expression in ompT strains of E. coli. These recombinant enzymes are thus a good source of glycosyltransferases which can be used for the in vitro synthesis of oligosaccharides.

It is understood that the examples and embodiments described herein are for illustrative purposes only and that various modifications or changes in light thereof will be suggested to persons skilled in the art and are to be included within the spirit and purview of this application and scope of the appended claims. All publications, patents, and patent applications cited herein are hereby incorporated by reference for all purposes.

Claims

WHAT IS CLAIMED IS:

1. A method of expressing a glycosyltransferase in a host cell, the method comprising introducing into the host cell a nucleic acid encoding the glycosyltransferase and incubating the host cell under conditions appropriate for expression of the glycosyltransferase, wherein the host cell substantially lacks a protease that cleaves polypeptides between two consecutive positively charged amino acid residues.

2. The method according to claim 1 , wherein the host cell is a prokaryotic cell.

3. The method according to claim 2, wherein the host cell is an E. coli cell.

4. The method according to claim 2, wherein the protease is an ompT protease.

5. The method according to claim 1 , wherein the host cell is a yeast cell.

6. The method according to claim 5, wherein the host cell is a Saccharomyces cell.

7. The method according to claim 6, wherein the protease is a Kex2 protease.

.

8. The method according to claim 1 , wherein the host cell is a filamentous fungal cell.

9. The method according to claim 1 , wherein the glycosyltransferase is a prokaryotic glycosyltransferase.

10. The method according to claim 9, wherein the prokaryotic glycosyltransferase is a Neisseria glycosyltransferase.

11. The method according to claim 10, wherein the Neisseria is selected from the group consisting of Neisseria meningitidis and Neisseria gonorrhoeae.

12. The method according to claim 11, wherein the glycosyltransferase is a galactosyltransferase.

13. The method according to claim 12, wherein the galactosyltransferase is encoded by a gene selected from the group consisting of IgtB and IgtC.

14. The method according to claim 1 , wherein the method further comprises the step of recovering the expressed glycosyltransferase.

15. A composition comprising a glycosyltransferase polypeptide produced by the method of claim 1.

16. The glycosyltransferase according to claim 15, wherein the glycosyltransferase polypeptide includes amino acid residues that are not present in a native glycosyltransferase polypeptide obtained from a host cell that includes the protease.

17. The glycosyltransferase according to claim 15, wherein the glycosyltransferase is selected from the group consisting of a galactosyltransferase, a fucosyltransferase, a glucosyltransferase, an N-acetylgalactosaminyltransferase, an N- acetylglucosaminyltransferase, glucuronyltransferase, a sialyltransferase, a mannosyltransferase, and an oligosaccharyltransferase.

18. The glycosyltransferase according to claim 15, wherein the glycosyltransferase is encoded by a gene selected from the group consisting of IgtA, IgtB, IgtC, IgtD, and IgtE.

19. A method of expressing a glycosyltransferase in a host cell, the method comprising introducing into the host cell a nucleic acid comprising a nucleotide sequence encoding a glycosyltransferase polypeptide, wherein the nucleotide sequence lacks one or more occurrences of two adjacent codons for positively charged amino acid residues that are present in a naturally occurring glycosyltransferase polypeptide.

20. The method according to claim 19, wherein at least one nucleotide that comprises one of two adjacent codons for positively charged amino acid residues is replaced with a different nucleotide such that the codon no longer specifies a positively charged amino acid residue.

21. A recombinant nucleic acid comprising a nucleotide sequence encoding a polypeptide having glycosyltransferase activity, wherein the nucleotide sequence lacks one or more occurrences of two adjacent codons for positively charged amino acid residues that are present in a naturally occurring glycosyltransferase polypeptide.

22. The recombinant nucleic acid according to claim 21 , wherein at least one nucleotide that comprises one of the two adjacent codons for positively charged amino acid residues is replaced with a different nucleotide such that the codon no longer specifies a positively charged amino acid residue.

23. The recombinant nucleic acid according to claim 21 , wherein the two adjacent codons encode positively charged amino acid residues that comprise a cleavage site for a protease that is present in a desired host cell.

24. The recombinant nucleic acid according to claim 21 , wherein the nucleotide sequence lacks the second of two adjacent codons for positively charged amino acid residues and codons downstream of the second codon.

25. The recombinant nucleic acid according to claim 24, wherein the glycosyltransferase is Neisseria meningitidis IgtC.

26. The recombinant nucleic acid according to claim 25, wherein the nucleotide sequence lacks codons for the carboxyl terminal 19 amino acids of a polypeptide encoded by a naturally occurring IgtC gene.

21. A polypeptide encoded by the recombinant nucleic acid of claim 24.

28. An expression cassette comprising a promoter that is functional in the host cell operably linked to the recombinant nucleic acid of claim 21.

29. The expression cassette according to claim 28, wherein the expression cassette comprises an expression vector.

30. A host cell that comprises the recombinant nucleic acid of claim 21.

31. The host cell according to claim 30, wherein the host cell is a prokaryotic cell.

32. A method for the transfer of a monosaccharide from a donor substrate to an acceptor substrate, comprising: (a) providing a reaction medium comprising at least one glycosyl- transferase, a donor substrate, an acceptor substrate and a soluble divalent metal cation; and (b) incubating the reaction medium for a period of time sufficient to complete said transfer; wherein the glycosyltransferase is prepared by the method according to claim 1.

33. The method according to claim 32, wherein the glycosyltransferase is a Neisseria galactosyltransferase encoded by a gene selected from the group consisting of IgtB and IgtC.

34. The method according to claim 32, wherein the acceptor substrate is GlcNAcβ(l -3)-Galβ(l -4)-Glc and the monosaccharide is galactose.