CA2496603A1

CA2496603A1 - Glycosyl hydrolase genes and their use for producing enzymes for the biodegradation of carrageenans

Info

Publication number: CA2496603A1
Application number: CA002496603A
Authority: CA
Inventors: Tristan Barbeyron; Philippe Potin; Christophe Richard; Bernard Henrissat; Jean-Claude Yvin; Bernard Kloareg
Original assignee: Individual
Current assignee: Laboratoires Goemar SA
Priority date: 1996-10-07
Filing date: 1997-10-06
Publication date: 1998-04-16

Abstract

The invention concerns genes coding for glycosyl hydrolases having an HCA score with the iota-carrageenase of Alteromonas fortis not less than 65%, on the domain extending between the amino acids 164 and 311 of the proteinic sequence SEQ ID NO. 2 of said iota-carrageenase, as well as genes coding for glycosyl hydrolases having an HCA score with the kappa-carrageenase of Alteromonas carrageenovora not less than 75%, on the domain extending between the amino acids 117 and 262 of the proteinic sequence SEQ ID NO. 6 of said kappa-carrageenase.

Description

Glycosyl hydrolase genes and their use for producine enzymes for the biode~radation of carrageenans The present invention relates to glycosyl hydrolase genes for the biotechnological production of oligosaccharides, especially sulfated oligo ca~xageenans and more: particularly align-iota-carrageenans and oligo-kappa carrageenans, by the biodegradation of carrageenans.
The sulfated galactans of Rhodophyceae, such as agars and carrageenans, represent the major polysaccharides of Rhodophyceae and are very widely used as gelling agents or thickeners in various branches of activity, especially agri-foodstuffs. About 6000 tonnes of agars and 22,000 tonnes of carrageenans are extracted annually from red seaweeds for this purpose. Agars are commercially produced by red seaweeds of the genera Gelidium and Gracilaria. Carrageenans, on the other hand, are widely extracted from the genera Chondrus, Gigartina and Eucheuma.
Carrageenans consist of repeat D-galactose units alternately bonded by ~i 1-~4 and a 1-~3 linkages. Depending on the number and position of sulfate ester groups on the repeat disaccharide of the molecule, carrageenans are thus divided into several different types, namely: kappa-carrageenans, which possess one sulfate ester group, iota-carrageenans, which possess two sulfate ester groups, and lambda-carrageenans, which possess three sulfate ester groups.
The physicochemical properties and the uses of these polysaccharides as gelling agents are based on their capacity to undergo ball-helix conformational transitions as a function of the thermal and ionic environment [Kloareg et al., Oceanography and Marine Biology - An annual review 26 : 259-315 (1988)].
Furthermore, carrageenans are structural analogs of the sulfated polysaccharides of the animal extracellular matrix (heparin, chondroitin, keratan, dermatan) and they exhibit biological activities which are related to certain functions of these glycosaminoglycans.
In particular, carrageenans are known:
(i) - for their action on the immune system, causing the secretion of interleukin or prostaglandins, (ii) - for their antiviral action on the AIDS virus HN1, the herpes virus HSV 1 and the hepatitis A virus, (iii) - as antagonists of the fixation of the growth factors of human cells, (iv) - and also for their action on the proliferation of keratinocytes and their action on the contractility of fibroblasts.
Furthermore, oligocarrageenans act on the adherence, the division and the S protein synthesis of human cell cultures, doubtless as structural analogs of the glycosylated part of the proteins of the extracellular matrix. In plants, oligocarrageenans very significantly elicit enzymatic activities which are markers of growth (amylase) or of the phenolic defense metabolism (laminarinase, phenyl alanineammonium lyase).
Carrageenans are extracted from red seaweeds by conventional processes such as hot aqueous extraction, and oligocarrageenans are obtained from carrageenans by chemical hydrolysis or, preferably, by enzymatic hydrolysis.
The production of oligocarrageenans by enzymatic hydrolysis generally comprises the following steps:
1) production of a glycosyl hydrolase by the culture of a marine bacterium;

2) enzymatic hydrolysis of the carrageenan with the glycosyl hydrolase thus obtained; and 3) fractionation and purification of the oligocarrageenans obtained.
Microorganisms which produce enzymes capable of hydrolyzing iota- and kappa-carrageenans were isolated by Bellion et al. in 1982 [Can. J. Microbiol.

874-80 (1982)]. Some are specific for x- or t-carrageenan and others are capable of hydrolyzing both substrates. Another group of bacteria capable of degrading camageenans was characterized by Sarwar et al. in 1983 [J. Gen. Appl.
Microbiol.
29 : 145-55 (1983)]. These yellow-orange bacteria are assigned to the Cytophaga group of bacteria and some of these bacteria have the property of hydrolyzing both agar and carrageenans.
Purification and characterisation of several t-carrageenases and x-carrageenases, such as the t-carrageenase and K-carrageenase of Cytophaga ~lrobachiensis, the t-carrageenase of Alteromonas fortis and the tc-carrageenase of Alterornonas carrageeraovora, were described in the thesis of P. Potin ["Recherche, production, purification et caracterisation de galactane-hydrolases pour la preparation des parois d'algues rouges", (February 1992)]. A detailed study of the x-carrageenase of .4lterornouas carrageenovora was described by Potin et al.
[Eur.
J. Biochem. 228, 971-975 (1995)].
Amended sheet The availability of specific enzymes and tools for obtaining oligocarra-geenans by genetic engineering could markedly improve their production.
The Applicant has now found novel glycosyl hydrolase genes which make it possible specifically to obtain either oligo-iota-can:ageenans or oligo-kappa-carrageenans.
Thus the present invention relates to novel genes which code for glycosyl hydrolases having an HCA score with the iota-carrageenase of Alteromonas fortis which is greater than or equal to 65%, preferably greater than or equal to 70%
and advantageously greater than or equal to 75% over the domain extending between amino acids 164 and 311 of the sequence [SEQ m No. 2] of the iota-carrageenase of Alteronionas fords.
The present invention relates more particularly to the nucleic acid sequence [SEQ m No. 1] which codes for an iota-carrageenase as defined above, the amino acid sequence of which is the sequence [SEQ ID No. 2].
The present invention further relates to the genes which code for glycosyl hydrolases having an HCA score with the kappa-carrageenase of Alteromonas carrageenovora which is greater than or equal to 7~%, preferably greater than 8~%
and advantageously greater than 85% over the domain extending between amino acids 117 and 262 of the sequence [SEQ ID No. 6] of the kappa-carrageenase of Alteromonas carrageenovora.
More specifically, the present invention relates to an isolated nucleic acid molecule comprising a nucleic acid sequence encoding a protein having glycosyl-hydrolase activity having a hydrophobic cluster analysis score with the kappa-carrageenase of Alteromonas carrageenovora which is greater than or equal to 75% over the domain extending between amino acids 117 and 262 of the protein sequence represented by SEQ ID No. 6 of said kappa-carrageenase.
In particular, the invention relates to the nucleic acid sequence [SEQ m No.
7] which codes for a kappa-carrageenase having a score as defined above, the amino acid sequence of which is the sequence [SEQ ID No. 8].
The glycosyl hydrolase genes of the invention are obtained by a process which consists in selecting proteins having an HCA score with the iota-carrageenase of Alteromonas fortis which is greater than or equal to 65%, preferably greater than or equal to 70% and advantageously a eater than or equal to 75% over the domain extending between amino acids 16~ and 311 of the sequence [SEQ ll7 No. 2] of the iota-canageenase of Alterontonas fords, and in sequencing the resulting genes by the conventional techniques well known to those skilled in the art.

3a The glycosyl hydrolase genes of the invention can also be obtained by a process which consists in selecting proteins having an HCA score with the kappa-carrageenase of Alteronaortas carrageenovora which is greater than or equal to 75 io, preferably greater than 80°Io and advantageously greater than 85010 over the domain extending between amino acids 117 and 262 of the sequence [SEQ ID
No. 6] of the kappa-carrageenase of Alteromonas car-rageenovora, and in sequencing the resulting genes by the conventional techniques well known to those skilled in the art.
Finally, the present invention relates to the use of the above glycosyl hydrolase genes for obtaining, by genetic engineering, glycosyl hydrolases which are useful for the biotechnological production of oligocarrageenans.
The glycosyl hydrolases according to the invention are therefore characterized by the HCA score which they possess with a particular domain of the amino acid sequence of the iota-carrageenase of Alteromonas fortis or the kappa-carrageenase of Alteromonas carra~eenovora.
The HCA or "Hydrophobic Cluster Analysis" method is a method of analyzing the sequences of proteins represented as a two-dimensional structure, which has been described by Gaboriaud et al. [FEBS Letters 224, 149-155 (1987)].
It is known that the three-dimensional structure of a protein governs its biological properties, the production of an active protein demanding correct folding.
It is also known that the primary structure of proteins varies much more substantially than the higher-order structures and that proteins can be grouped into families which show similar secondary and tertiary structures but sometimes have such divergent primary sequences that the mutual relationship between such proteins is not obvious. The code which relates primary structure and secondary stmeture therefore appears to be highly degenerate since very different primary structures can ultimately lead to similar secondary and tertiary structures [Structure 3, 853-859 (1995) and Proc. Natl. Acad. Sci. USA 92 (1995)].
The use of the HCA method has shown that the distribution, size and shape of these hydrophobic clusters along the amino acid sequences are representative of the 3D folding of the proteins studied.
Also, Woodcock et al. [Protein Eng. 5, 629-635 ( 1992)] have shown that the hydrophobic clusters defined by the a-helical 2D diagram are statistically centered on the regular secondary structures (a-helices, (3-strands j, that the 2D
diagram based on the a-helix carries the greatest amount of structural information and that the correspondence between hydrophobic clusters and elements of secondary structure is of the same quality for any type of folding (all a, all (3> al(3 and a + (3), thus demonstrating that the HCA method can be used irrespective of the type of protein.
Amended sheet L. Lemesle-Varloot et al. [Biochimie 72, 555-574 (1990)] have shown that when two proteins have a similar distribution of hydrophobic clusters over a domain of at Ieast 50 residues, their three-dimensional strictures in this domain are considered to be superimposable and their functions to be analogous.

5 Thus, for example, Barbeyron et al. [Gene 139, 105-I09 (1994)] used this HCA method for the comparison of the similarities in the shape, distribution and size of several hydrophobic clusters of the tc-carrageenase of Alterornonas carrageenovora with respect to enzymes from family 16 of glycosyl hydrolases.
The two-dimensional representation used in the HCA method is an a-helix in which the amino acids are arranged by computer processing to give 3.6 residues per turn. To obtain an easily readable plane image, the helix is cut in the longitudinal direction. Finally, to obtain the whole of the hydrophobic clusters situated at the edges of the image, the diagram is duplicated. The method uses a code which recognizes only two states: the hydrophobic state and the hydrophilic state.
The amino acids recognized as being hydrophobic are identified and grouped into characteristic; geometric figures. Using these two states makes it possible to become independent of the tolerance shown by the two- and three-dirnensional structures towards the variability of the primary sequences.
Furthermore, this representation affords rapid observation of interactions over a short or medium distance since the first amino acid and the second, adjacent amino acid of a given residue are located on a segment of 17 amino acids. Finally, in contrast to the analytical methods based on the primary or.secondary structures of proteins, no "window" of predefined length is used.
The fundamental characteristic of the oc-helix representation is that, for a given globular protein or only a domain of this protein, the distribution of the hydrophobic residues on the diagram is not random. The hydrophobic residues (VILFWMY) form clusters of varying geometry and size. On the diagram, the hydrophilic and hydrophobic faces of the amphiphilic helices are very recognizable. Thus a horizontal diamond cluster corresponds to the hydrophobic face of an oc-helix, the internal helices appear as large horizontal hydrophobic clusters and the (3-strands appear as rather short, vertical hydrophobic clusters. The method makes it possible to identify the hydrophobic residues forming the core of the globular proteins and to locate the elements of secondary structure, namely the a-helices and the ~i-strands, independently of any knowledge of the secondary structure of the protein studied.
Amended sheet The HCA score between two proteins is calculated as follows:
For each cluster:
HCA score = 2CR/(RC~ + RC,) x 100%
where - RC i and RCS are the number of hydrophobic residues in the cluster of protein 1 (cluster 1) and the cluster of protein 2 (cluster 2), respectively.
- CR is the number of hydrophobic residues in the cluster 1 which correspond to the hydrophobic residues in the cluster 2.
The mean value obtained for all the clusters along the protein sequences compared gives the final HCA score.
On the HCA profiles, the amino acids are represented by their standard code of a single letter, with the exception of proline (P), glycine (G), serine (S) and threonine (T).
In fact, because of their particular properties, these residues are represented by the special symbols indicated below so as to facilitate their visual identification on the HCA diagrams (cf. list of abbreviations).
Proline introduces high constraints into the polypeptide chain and is considered systematically as an interruption in the clusters. In fact, proline residues stop or deform the helices and the lamellae. Glycine possesses a very substantial conformational flexibility because of the absence of a side chain in this anuno acid.
Serine and threonine are normally hydrophilic, but they can also be found in hydrophobic environments, such as a-helices, in which their hydroxyl group loses their hydrophilic character because of the hydrogen bond formed with the carbonyl group of the main chain. Within the hydrophobic (3-lamellae, threonine is sometimes capable of replacing hydrophobic residues by virtue of the methyl group on its side chain.
Amino acids can be divided into four groups according to their hydrophobicity:
(i) - strongly hydrophobic residues: V, I, L and F;
(ii) - moderate.ly hydrophobic residues: W, M and Y
-~ W appears at surface sites more frequently than F, -~ M is encountered at various sites, internal or otherwise, -~ Y can adapt to internal hydrophobic environments and is frequently found in loops;
(iii) - weakly hydrophobic residues: A and C are virtually insensitive to the hydrophobic character of their environment; and Amended sheet (iv) - hydrophilic residues: D, E, N, Q, H, K and R.
Using this HCA method, the Applicant has found that proteins having an HCA score with the iota-carrageenase of Alteromonas fortis which is greater than or equal to 65% over the domain extending between anuno acids 164 and 311 of said iota-carrageenase are enzymes of the glycosyl hydrolase type and more particularly iota-carrageenases appropriate for the production of oligo-iota-carrageenans from carrageenans.
The proteins having an HCA score which is greater than or equal to 70%, preferably greater than or equal to 75°!0, with the above domain 164-311 are particularly preferred for the purposes of the invention.
One particular example of glycosyl hydrolase obtained with a gene according to the invention is the protein having the amino acid sequence [SEQ
TD
No. 2], extracted from Alteromorcas fortis.
Another particular example of glycosyl hydrolase obtained with a gene according to the invention is the protein having the amino acid sequence [SEQ
ID
No. 4], extracted from Cytophaga drobachiertsis.
Likewise, the Applicant has found that proteins having an HCA score with the kappa-carrageenase of Alteromonas carrageenovora which is greater than or equal to 75% over the domain extending between amino acids 117 and 262 of said kappa-carrageenase are enzymes of the glycosyl hydrolase type and more particularly kappa-carrageenases appropriate for the production of oligo-kappa-carrageenans from carrageenans.
The proteins having an HCA score which is greater than or equal to 80%, preferably greater than or equal to 85%, with the above domain 117-262 are particularly preferred for the purposes of the invention.
The above proteins are advantageously extracted from marine bacteria.
One particular example of glycosyl hydrolase obtained with a gene according to the invention is the protein having the amino acid sequence [SEQ
m No. 6], extracted from Alteromo~aas carrageenovora.
Another particular example of glycosyl hydrolase obtained with a gene according to the invention is the protein having the amino acid sequence [SEQ
m No. 8], extracted from Cytophaga drobachie~xsis.
Amended sheet As indicated previously, the genes according to the invention, coding for giycosyl hydrolases, can be obtained by sequencing the genome of bacteria which product glycosyl hydrolases, as defined above, by the conventional methods well known to those skilled in the art.
The invention further relates to the expression vectors which carry the nucleic acid sequences according to the invention, with the means for their expression.
These expression vectors can be used to transform prokaryotic microorganisms, particularly Eschericlzia coli, or eukaryotic cells such as yeasts or fungi.
The invention will now be described in greater detail by means of the illustrative and non-limiting Examples below.
The methods used in these Examples are methods well known to those skilled in the art, which are described in detail in the work by Sambrook, Fristsch and Maniatis entitled "Molecular cloning: a laboratory manual", published in by Cold Spring Harbor Press, New York (2nd edition).
The following description will be understood more clearly with the aid of Figures 1 to 4, which respectively show the following:
Fig. 1: The maximum similarity alignment, according to the method of Needleman and Wunsch [J. Mol. Biol. 48, 443-453 ( 1970)], of the amino acid sequence of the iota-carrageenase of Alterontortas fords (top part) and the iota-carrageenase of C.
drobachiensis (bottom part).
Fig. 2: The HCA profiles of the amino acid sequences of the iota-carrageenases of Cytophaga drobacltiensis and Alteronzonas fords.
Fig. 3: The maximum similarity alignment, according to the method of Needleman and Wunsch, 1970, J. Mol. Biol. 48, 443-453, of the amino acid sequence of the kappa-carrageenase of Alter-onzonas carrageenovora (top part) and Cytoplzaga drobachiensis (bottom part).
Fig. 4: The HCA profiles of the amino acid sequences of the kappa-carrageenases of Cytophaga dr-obaclziensis and Alterontonas fortis.

The abbreviations or special symbols used for the amino acids in the Examples below are as follows:
Glycine: 0 S Proline:
Threonine :0 Serine: O
Alanine: A
Valine: V
Leucine: L
Isoleucine: I
Methionine: M
Phenylalanine: F
Tryptophan: W
Cysteine: C
Asparagine: N
Glutamine: Q
Tyrosine: Y
Aspartate: D
Glutamate: E
Lysine: K
Arginine: R
Histidine: H

The iota-carrageenases of Cytoplzaga drobachiensis and Alteromonas fords SECTION 1: Cloning of the genes of the iota~carrageenases of Cytophaga drobaclaierais and Alteromonas fords 5 Cytophaga drobachiensis was isolated by the Applicant from the red seaweed Delesseria sanguinea [Eur. J. Biochem. 201 . 241-247 (1991)].
Alterornonas fortis (ATCC 43554) was obtained from the American Type Cuiture Collection. The strains were cultivated on a Zobell medium at 25°C.
Genome libraries of the DNAs of C. drobachiensis and A. fortis were 10 constructed.
The strain used to construct these libraries, namely Escherichia coli DHSa (Rec A, endAl, gyA96, tlail, hsdRl7 [rk- mk+], supE44, relAl, IacZA1VI15), was cultivated on Luria-Bertani medium (LB medium) at 37°C or on a so-called Zd medium (bactotryptone 5 g/l, yeast extract 1 g/1, NaCI 10 g/1; pH = 7.2) at 22°C, to which 2 % of K-carrageenan were added.
Ampicillin (50 pg/ml) or tetracycline ( 15 ~.g/ml) was added to the agar or non-agar culture media from stock solutions prepared in 50% ethanol (to avoid solidification at the storage temperature, -20°C), except in the case of the non-recombinant strain DHSa.
The expression vector used is plasmid pAT153 described in Nature 283 216 (1980). This plasmid contains two antibiotic resistance genes: a tetracycline resistance gene and a gene which codes for a ~i-lactamase, an enzyme of the cytoplasmic membrane which degrades ampicillin.
The total DNA of C. drobachiensis and the total DNA of A. fortis were prepared by the method described by Barbeyron et al. [J. Bacteriol. 160, 586-( 1984)].
The genomic DNAs of C. drobachiensis and A. fortis were cleaved with the restriction endonucleases NdeII and Sau3AI respectively. In fact, in the case of C.
drobachiensis, the restriction endonuclease NdeII was used preferentially because the DNA of this bacterium is methylated on the C residue of the GATC sequence.
The purified DNA fragments of 5000 to 10,000 by were cloned at the BamHI site of plasmid pAT153, which cleaves the tetracycline resistance gene.
6000 clones were obtained in each of the genome libraries.

The five positive C. drobachiertsis clones and the two positive A. fords clones, which hollowed out a hole in the t-carrageenan after one week of culture at 22"C, are referred to respectively as pICI to pICS and pIPl to pIP2.
1. Cloning from C. drobachiertsis The cloning of this gene is described in detail by T. Barbeyron in the doctoral thesis examined on 28 October 1993 at the Universite Pierre et Marie Curie, Roscoff.
The plasmid DNA was isolated from the above five clones by the alkaline lysis method [Nucleic Acid Res. 7 : 1513 (1979)].
The sizes and mapping of the inserts showing an t-carrageenase activity were determined by agarose gel electrophoresis after single and double digestion of their plasmids with various restriction enzymes.
The DNA fragments were extracted from the agarose by the glass wool method.
All the plasmids obtained contain an identical PvuII fragment of 3.3 kb.
This fragment was subcloned in phagemid pbluescript KSII (Stratagene) (pICP07 and pICPl6).
Likewise, the internal NdeI fragment and a HindIll fragment partially comprising the PvuII fragment were subcloned to give the pICN22 and pICH42 subclones, respectively.
To locate the t-carrageenase gene, libraries were constructed from the pICP07 and pICP I6 subclones in phagemid pbluescript with the aid of the exonuclease III of E. coli, using the "ExoIII" kit from Pharmacia.
The subclones and the ExollI clones obtained were plated onto Zd medium solidified with t-carrageenan.
Only the pICPl6 and pICP07 clones and the ExoIll pICP074 and pICP0712 clones (obtained by degradation with ExoIll for 4 minutes and 12 minutes, respectively, from the pICP07 clone) are t-carraaeenase-positive.
2. Cloning from Alteromonas fords The DNA of the pIP 1 and pIP2 clones showed inserts of 10.45 kb and 4.125 kb respectively, having a. common fragment of 3 kb. These clones showed a positive t-carrageenase activity. Different fragments were subcloned and plated as described above. However, none of the subclones obtained proved to be t-carrageenase-positive.

SECTION 2: Determination of the nucleotide sequences of the genes coding for the t-carrageenases of Cytophaga drobaclaiensis and Alteromo~tas fortis 1. Sequence of the Cytoplaaga drobachierasis gene Plasmid pICP0712 was used to determine the nucleotide sequence of the gene responsible for the t-carrageenase activity of C. drobachierzsis [SEQ )D
No.
3].
This nucleotide sequence is composed of 1837 bp. Translation of the six reading frames revealed only one open frame, called cgiA. The potential initiation codon is situated 333 by beyond the 5'P end of the sequence.
The protein sequence [SEQ ID No. 4] deduced from the sequence of cgiA is composed of 391 amino acids, corresponding to a theoretical molecular weight of 53.4 kDa. The hydropathic profile of this protein shows a hydrophobic region covering the first 24 amino acids. The presence of a positively charged amino acid (Lys) followed by a hydrophobic block and then by a polar segment of six amino acids suggests that this domain could be a signal peptide. According to the analyses performed by the method of Von Heijne [J. Mol. Biol. 184 : 99-105 (1985)], the signal peptidase would cleave between valine (Val'4) and threonine (Thr''5). The mature protein devoid of its signal peptide would have a theoretical molecular weight of 50.7 kDa. The identity of the cgiA gene was confirmed by determination of the amino acids at the NH, end of the partially purified protein.
The sequence obtained matches the one deduced from the nucleotide sequence.
The first amino acid is situated 14 residues from the NHS end generated by the signal peptidase. As the presence of the two prolines following the amino acids determined by microsequencing had slightly disturbed the order of appearance of the N-terminal residues, the sequence of an internal oligopeptide, purified by HPLC after cleavage with trypsin, was established. The sequence NH~ATYKCOOH obtained is situated near the C-terminal end of the iotase (residues 396 to 399).
2. Sequence of the Alteromonas fortis gene Plasmids pIHP 15 and pIHPX 17, subcloned from pIP 1 and plP2, were used to determine the nucleotide sequence of the gene responsible for the t-carrageenase activity of Alter-omorzas fortis, SEQ 1D No. 1. The 2085 by fragment contains a single open reading frame of 1473 bp, called cgiA. The sequence situated upstream of the initiation codon (AZ'G~~ ~) is not a coding sequence.

The protein sequence deduced from the sequence of the A. fortis t-carrageenase gene [SEQ ID No. 2] consists of 491 amino acids, corresponding to a theoretical molecular weight of 54.802 kDa. In the present case, again, the N-terminal part of the protein exhibits a high hydrophobicity, suggesting that this domain could be a signal peptide; the hypothetical cleavage site would be situated between glycine (Gly''6) and alanine (Ala'7). The mature protein devoid of its signal peptide would have a theoretical molecular weight of 51.95 kDa, corresponding to a value similar to the molecular weight obtained with the protein purified by SDS-PAGE, namely 57 kDa.
SECTION 3: Comparison of the protein sequences of the t-carrageenases of C'ytophaga drobac)tiensis and Alteromonas fords After removal of the signal peptide from each sequence, it could be seen that the sequence of the t-carrageenase of C. drobachiensis has similarities to that of the t-carrageenase of A. fortis.
In fact, the two sequences of iota-carrageenase have a similarity of 43.2%
over the whole of the linear sequence alignment. This similarity is particularly high (57.8%) between amino acids 164 and 311 (numbering of the iota-carrageenase of Alteromanas fords (Fig. 1)).
At the same time, an HCA analysis showed that the HCA score between the two proteins is 82% over a domain of 293 amino acids and reaches 90.5% in the case of said domain 164-31.1 (Fig. 2).
No significant similarity to other polysaccharidases known hitherto could be demonstrated.
These two enzymes therefore constitute a novel family of alvcosvl hydrolases.
EXAMPLE II:
The kappa-carrageenases of Alteromonas carrageenovora and Cytophaga drobacl:iensis SECTION 1: Cloning of the kappa-carrageenase genes Alterornonas carrageenovora ATCC 43555 was obtained from the American Type Culture Collection. The strains A. carrageeraovora and C.
drobachierasis were cultivated under conditions identical to those mentioned in section 1 of Example I.
Likewise, genome libraries were constructed using the strain Eschericlzia coli DHSoc and plasmid vector pAT153.

1. Cloning from Alteromonas carrageenovora The preparation of this gene is described in detail by T. Barbeyron in the thesis cited above (cf. Example 1) and in Gene 139, 105-109 (1994).
From the genome library of Alteromortas carrageenovora,: 4 E. coli clones, called K1 to K4, were capable of hydrolyzing kappa-carrageenan.
Plasmids pKAl to pKA4 were purified from the four independent clones and mapped with the aid of the restriction endonucleases BamHI, DraI, EcoRI, HindIII, MIuI, PstI, PvuII, SaII, SspI, XbaI and XhoI.
The presence of a 2.2 kb DraI-HirtdIII fragment was noted in each plasmid.
This common fragment, which is the whole insert of plasmid pKA3, was sequenced in its entirety from plasmid pKA3.
2. Cloning from Cytophaga drobaclaiertsis From the genome library of C. drobachiensis, five E. coli clones, called pKCI to pKCS, were capable of hollowing out a hole in the substrate.
The plasmids isolated and purified from said clones were mapped with restriction endonucleases.
Internal fragments of 1100 by and 600 by respectively were subcloned from pKC 1 in phagemid pbluescript and were called pKCE 11 and pKCN6.
Plasmids pKCI, pKCEIl and pKCN6 were used to determine the nucleotide sequence of the kappa-carrageenase gene.
SECTION 2: Determination of the sequences of the genes coding for the kappa-carrageenases of Alteromonas carrageenovora and Cytophaga drobaclaiensis 1. Sequence of the Alteromonas carrageenovora gene The number of nucleotides in the pKA3 insert is 2180 bp. Translation in the six reading frames reveals the presence of three open frames, only one of which is complete; this one separates the other two, which are only partial.
All three of them are located on the same DNA strand. The second open frame, called cgkA, read in the third reading frame, contains 1191 by [SEQ
ID No. 5].
The translation product of the cgkA gene corresponds to a protein of 397 amino acids with a theoretical molecular weight of 44,212 Da (SEQ ID No. 6).
The hydropathic profile of this protein shows a highly hydrophobic domain, extending over 25 amino acids, at the N-terminal end. This domain comprises a positively charged amino acid {Lys) followed by a segment rich in hydrophobic anuno acids and then by three polar amino acids. These results suggest that a signal peptide is involved. The N-terminal sequence of the protein purified from 5 the culture supernatant was determined, thereby confirming the identity of the gene. These results indicate that the signal peptidase cleaves the protein between residues 25 and 26, which is consistent with Von Heijne's rule (-3, -1). The mature protein therefore has a theoretical molecular weight of 4I.6 kDa.
2. Sequence of the Cytophaga drobachiensis gene 10 The pKC 1 insert of 4425 by contains a single open reading frame of 1635 bp, called cgkA (SEQ ID No. 7).
The protein translated from the kappa-carrageenase gene is a protein comprising 545 amino acids with a molecular weight of 61.466 kDa [SEQ ID No.
8].

15 The hydropathic profile of this protein shows a highly hydrophobic domain at the N-terminal end, suggesting that a signal peptide is involved.
According to Von Heijne's rule (-3, -1), the cleavage site of the signal peptidase should be situated between threonine and serine in positions 35 and respectively, with the codon ATG875 as the initiation codon.
The molecular weight of the protein, calculated after removal of the signal peptide, is 57.4 kDa, which is greater than the molecular weight determined for the purified extracellular x-carrageenase, namely 40.0 kDa.
SECTION 3: Comparison of the protein sequences of the x carrageenases of Alteromonas carrageenovora and Cytophaga drobachiensis The K-carrageenase of C. drobachiensis has a similarity of 36. I % with the K-carrageenase of Alteramonas carrageenovora over the whole of the linear sequence alignment.
This similarity is particularly high between amino acids 117 and 262 (51.8%) (numbering of the x-carrageenase of Alteromonas carrageereovora) (Fig.
3).
As previously, this similarity is substantiated by HCA analysis, which shows an HCA score between the two proteins of 75.4% over said domain of 145 amino acids (Fig. 4).

HCA analysis also shows that these two proteins belong to family 16 of glycosyl hydrolases, which includes endoxyglucan transferases (XET), laminarinases, lichenases and agarases. In fact, the HCA score of the two kappa carrageenases is 67.5% with XET, 67.6% with laminarinases, 73.7% with S lichenases and 71.5% with agarases.

SEQUENCE LISTING
(1) GENERAL INFORMATION:
(i) APPLICANT:
(A) NAME: LABORATOIRES GOEMAR S.A.
(B) STREET: La ~Macieleine B.P. 55 (C) CITY: Saint-Malo (E) COUNTRY: France (F) POSTAL CODE (lIP): 35413 Cedex (G) TELEPHONE: 99 21 53 70 (H) TELEFAX: 99 82 56 17 (ii) TITLE OF INVENTIOrd: Glycolyse hydrolase genes and their use for producing enzymes for the biodegradtion of carrageenans (iii) NUMBER OF SEQUENCES: 8 (iv) COMPUTER READABLE FORM:
(A) MEDIUM TYPE: Floppy disk (B) COMPUTER: IBM PC compatible (C) OPERATING SYSTEM: PC-DOS/MS-DOS
(D) SOFTWARE: Pat<sntIn Release #1.0, Version #1.30 (EPO) (2) INFORMATION FOR SEQ ID IJO: 1:
(i) SEQUENCE CHARACTERISTICS.
(A) LENGTH: 2085 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (iii) HYPOTHETICAL: NO
(ix) FEATURE:
(A) NAME/KEY: CDS
(B) LOCATION:join(211..1683, 1880..2083) (xi) SEQUENCE DESCRIPTION: SEQ ID N0: 1:

Met Arg Leu Tyr Phe Arg Lys Leu Trp Leu Thr Asn Leu Phe Leu Gly Gly Ala Leu Ala Ser Ser Ala Ala Ile Gly Ala Val Ser Pro Lys Thr Tyr Lys Asp Ala Asp Phe Tyr Val Ala Pro 'Phr Gln Gln Asp Val Asn Tyr Asp Leu Val Asp Asp Phe Gly Ala Asn Gly Asn Asp Thr Ser Asp Asp Ser Asn Ala Leu Gln Arg Ala Ile Asn Ala Ile Ser Arg Lys Pro Asn Gly Gly Thr Leu Leu Ile Pro Asn Gly Thr Tyr His Phe Leu Gly Ile Gln Met Lys Ser Asn Val His Ile Arg Val Glu Ser Asp Val Ile Ile Lys Pro Thr Trp Asn Gly Asp Gly Lys Asn His Arg Leu Phe Glu Val Gly Val Asn Asn Ile Val Arg Asn Phe Ser Phe Gln Gly Leu Gly Asn Gly Phe Leu Val Asp Phe Lys Asp Ser Arg Asp Lys Asn Leu Ala Val Phe Lys Leu Gly Asp Val Arg Asn Tyr Lys Ile Ser Asn Phe Thr Ile Asp Asp Asn Lys Thr Ile Phe Ala Ser Ile Leu Val Asp Val Thr Glu Arg Asn Gly Arg Leu His Trp TCG CGT A_~T GGA ATT ATC GAA AGA ATA AAA CAA AAT AAC GCT TTG TTC 858 Ser Arg Asn Gly Ile Ile Glu Arg Ile Lys Gln Asn Asn Ala Leu Phe Gly Tyr Gly Leu Ile Gln Thr Tyr Gly Ala Asp Asn Ile Leu Phe Arg Asn Leu His Ser Glu Gly Gly Ile Ala Leu Arg Met Glu Thr Asp Asn Leu Leu Met Lys Asn Tyr Lys Gln Gly Gly Ile Arg Asn Ile Phe Ala Asp Asn Ile Arg Cys Ser Lys Gly Leu Ala A1a Val Met Phe Gly Pro His Phe Met Lys Asn Gly Asp Val Gln Val Thr Asn Val Ser Ser Val Ser Cys Gly Ser Ala Val Arg Ser Asp Ser Gly Phe Val Glu Leu Phe Ser Pro Thr Asp Glu Val His Thr Arg Gln Ser Trp Lys Gln Ala Val Glu Ser Lys Leu Gly Arg Gly Cys Ala Gln Thr Pro Tyr Ala Arg Gly Asn Gly Gly Thr Arg Trp Ala Ala Arg Val Thr Gln Lys Asp Ala Cys Leu Asp Lys Ala Lys Leu Glu Tyr Gly Ile Glu Pro Gly Ser Phe Gly Thr Val Lys Val Phe Asp Val Thr Ala Arg Phe Gly Tyr Asn Ala Asp CTT AAA CAG GAC CAG CTA GAC 7.'AC TTT TCT ACA TCC AAC CCT ATG TGC 1434 Leu Lys Gln Asp Gln Leu Asp Tyr Phe Ser Thr Ser Asn Pro Met Cys Lys Arg Val Cys Leu Pro Thr Lys Glu Gln Trp Ser Lys Gln Gly Gln Ile Tyr Ile Gly Pro Ser Leu Ala Ala Val Ile Asp Thr Thr Pro Glu Thr Ser Lys Tyr Asp Tyr Asp Val Lys Thr Phe Asn Val Lys Arg Ile Asn Phe Pro Val Asn Ser His Lys Thr Ile Asp Thr Asn Thr Glu Ser Ser Arg Val Cys Asn Tyr Tyr Gly Met Ser Glu Cys Ser Ser Ser Arg Txp Glu Arg Met Lys Gly Val Ser Thr Lys Asn A:la Leu Leu Phe Ala Gly Phe Ser Leu Ser Leu Val Ala Gln Ser Val Ser Ala Gln Glu Ala Lys Gln Pro Glu Lys Glu Glu Lys Asp Val Glu Val Ile Leu Val Ser ,Ala Gln Lys Arg Glu Gln Ala Leu Lys Glu Val Pro Val Ser Ile Glu Val Ile Gln Gly Asp Leu Leu (2) INFORMATION FOR SEQ ID N0: 2:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 559 amino acids (B) TYPE: amino acid (D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID N0: 2:
Met Arg Leu Tyr Phe Arg Lys Leu Trp Leu Thr Asn Leu Phe Leu Gly Gly Ala Leu Ala Ser Ser Ala Ala Ile Gly Ala Val Ser Pro Lys Thr Tyr Lys Asp Ala Asp Phe Tyr Val Ala Pro Thr Gln Gln Asp Val Asn Tyr Asp Leu Val Asp Asp Phe Gly Ala Asn Gly Asn Asp Thr Ser Asp Asp Ser Asn Ala Leu Gln Arg Ala Ile Asn Ala Ile Ser Arg Lys Pro Asn Gly Gly Thr Leu Leu Ile Pro Asn Gly Thr Tyr His Phe Leu Gly Ile Gln Met Lys Ser Asn Val His Ile Arg Val Glu Ser Asp Val Ile Ile Lys Pro Thr Trp Asn Gly Asp Gly Lys Asn His Arg Leu Phe Glu Val Gly Val Asn Asn Ile Val. Arg Asn Phe Ser Phe Gln Gly Leu Gly 130 13'_i 140 Asn Gly Phe Leu Val Asp Phe Lys Asp Ser Arg Asp Lys Asn Leu Ala Val Phe Lys Leu Gly Asp Val Arg Asn Tyr Lys Ile Ser Asn Phe Thr Ile Asp Asp Asn Lys Thr Ile Phe Ala Ser Ile Leu Val Asp Val Thr lao 185 190 Glu Arg Asn Gly Arg Leu His Trp Ser Arg Asn Gly Ile Ile Glu Arg Ile Lys Gln Asn Asn Ala Leu Phe Gly Tyr Gly Leu Ile Gln Thr Tyr Gly Ala Asp Asn Ile Leu Phe Arg Asn Leu His Ser Glu Gly Gly Ile Ala Leu Arg Met Glu Thr Asp Asn Leu Leu Met Lys Asn Tyr Lys Gln Gly Gly Ile Arg Asn Ile Phe Ala Asp Asn Ile Arg Cys Ser Lys Gly Leu Ala Ala Val Met Phe Gly Pro His Phe Met Lys Asn Gly Asp Val Gln Val 7.'hr Asn Val Ser Ser Val Ser Cys Gly Ser Ala Val Arg Ser Asp Ser Gly Phe Val Glu Leu Phe Ser Pro Thr Asp Glu Val His Thr Arg Gln Ser Trp Lys Gln Ala Val Glu Ser Lys Leu Gly Arg Gly Cys Ala Gln Thr Pro Tyr Ala Arg Gly Asn Gly Gly Thr Arg Trp Ala Ala Arg Val Thr Gln Lys Asp Ala Cys Leu Asp Lys Ala Lys Leu Glu Tyr Gly Ile Glu Pro Gly Ser Phe Gly Thr Val Lys Val Phe Asp Val Thr Ala Arg Phe Gly Tyr Asn Ala Asp Leu Lys Gln Asp Gln Leu Asp Tyr Phe Ser Thr Ser Asn Pro Met Cys Lys Arg Val Cys Leu Pro Thr Lys Glu Gln Trp Ser Lys Gln Gly Gln Ile Tyr Ile Gly Pro Ser Leu Ala Ala Val Ile Asp Thr Thr Pro Glu Thr Ser Lys Tyr Asp Tyr Asp Val Lys Thr Phe Asn Val Lys Arg Ile Asn Phe Pro Val Asn Ser His Lys Thr Ile Asp Thr Asn Thr Glu Ser Ser Arg Val Cys Asn Tyr Tyr Gly Met Ser Glu Cys Ser Ser Ser Arg Trp Glu Arg Met Lys Gly Val Ser Thr Lys Asn Ala Leu Leu Phe Ala Gly Phe Ser Leu Ser Leu Val Ala Gln Ser Val Ser Ala Gln Glu Ala Lys Gln Pro Glu Lys Glu Glu Lys Asp Val Glu Val Ile Leu Val Ser Ala Gln Lys Arg Glu Gln Ala Leu Lys Glu Val Pro Val Ser Ile Glu Val Ile Gln Gly Asp Leu Leu (2) INFORMATION FOR SEQ ID N0: 3:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 1997 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (iii) HYPOTHETICAL: NO
(ix) FEATURE:
(A) NAME/KEY: CDS
(B) LOCATION:join(333..1805, 1866..1997)~
(xi) SEQUENCE DESCRIPTION: SEQ ID N0: 3:

CCT

Met Lys Leu Gln Phe Lys Pro Val Tyr Leu Ala Ser Ile Ala Ile Met Ala Ile Gly Cys Thr Lys Glu Val Thr Glu Asn Asp Thr Ser Glu Ile Ser Glu Val Pro Thr Glu Leu 2°i 30 35 Arg Ala Ala Ala Ser Ser Ph.e Tyr Thr Pro Pro Gly Gln Asn Val Arg AIa Asn Lys Lys Asn Leu Val Thr Asp Tyr Gly Val Asn His Asn Asp Gln Asn Asp Asp Ser Ser Lys Leu Asn Leu Ala Ile Lys Asp Leu Ser Asp Thr G1y Gly Ile Leu Thr Leu Pro Lys Gly Lys Tyr Tyr Leu Thr Lys Ile Arg Met Arg Ser Asn Val His Leu Glu Ile Glu Lys Gly Thr GTA ATC TAT CCG ACC AAG GGG 'rTG ACT CCT GCG AAG AAT CAC AGA ATT 737 Val Ile Tyr Pro Thr Lys Gly Leu Thr Pro Ala Lys Asn His Arg Ile Phe Asp Phe Ala Ser Lys Thr Glu Glu Lys Ile Glu Asn Ala Ser Ile Val Gly Lys Gly Gly Lys Phe Ile Val Asp Leu Arg Gly Asn Ser Ser Lys Asn Gln Ile Val Ala Asp Val Gly Asn VaI Thr Asn Phe Lys Ile Ser Asn Phe Thr Ile Lys Asp Glu Lys Thr Ile Phe Ala Ser Ile Leu Val Ser Phe Thr Asp Lys Ala Gly Asn Ala Trp Pro His Lys Gly Ile Ile Glu Asn Ile Asp Gln Ala Asn Ala His Thr Gly Tyr Gly Leu Ile Gln Ala Tyr Ala Ala Asp Asn Ile Leu Phe Asn Asn Leu Ser Cys Thr GGC GGG G'rA ACC TTG CGT TTA GAA ACC GAC AAC CTC GCT ATG AAA ACC 1121 Gly Gly Val Thr Leu Arg Leu Glu Thr Asp Asn Leu Ala Met Lys Thr Ala Lys Lys Gly Gly Val Arg Asp Ile Phe Ala Thr Lys Ile Lys Asn Thr Asn Gly Leu Thr Pro Val :Met Phe Ser Pro His Phe Met Glu Asn Gly Lys Val Thr Ile Asp Asp Val Thr Ala Ile Gly Cys Ala Tyr Ala Val Arg Val Glu His Gly Phe Ile Glu Ile Phe Asp Lys Gly Asn Arg Ala Ser Ala Asp Ala Phe Lys Asn Tyr Ile Glu Gly Ile Leu Gly Ala Gly Ser Val Glu Val Val Tyr Lys Arg Asn Asn Gly Arg Thr Trp Ala Ala Arg Ile Ala Asn Asp Phe Asn Glu Ala Ala Tyr Asn His Ser Asn Pro Ala Val Ser Gly I12 Lys Pro Gly Lys Phe Ala Thr Ser Lys Val Thr Asn Val Lys Ala Thr Tyr Lys Gly Thr Gly Ala Lys Leu Lys Gln Ala Phe Leu Ser Tyr Leu Pro Cys Ser Glu Arg Ser Lys Val Cys Arg Pro Gly Pro Asp Gly Phe Glu Tyr Asn Gly Pro Ser Leu Gly Val Thr Ile Asp A~~n Thr Lys Arg Asp Asn Ser Leu Gly Asn Tyr Asn Val Asn Val Ser Thr Ser Ser Val Gln Gly Phe Pro Asn Asn Tyr Val Leu Asn Val Lys Tyr Asn Thr Pro Lys Val Cys Asn Gln Asn Leu Gly Ser Ile 475 480 ~ 485 Thr Ser Cys Asn Met Ser Leu Ser His Val Val Ile Tyr Trp Arg Leu Leu Ile Lys Ala Trp Ile Ser Ser Gly Val Asn Ile Gly Leu Ala Pro Ser Leu Pro Ala Thr Ile Ala Leu Cys Ser Tyr Ala Gln Ala Lys Ser (2) INFORMATION FOR SEQ ID N0: 4:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 535 amino acids (B) TYPE: amino acid (D) TOPOLOGY: linear (ii) 'MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4:
Met Lys Leu Gln Phe Lys Pro VaI Tyr Leu Ala Ser Ile Ala Ile Met Ala Ile Gly Cys Thr Lys Glu Val Thr Glu Asn Asp Thr Ser Glu Ile Ser Glu Val Pro Thr Glu Leu Arg Ala Ala Ala Ser Ser Phe Tyr Thr Pro Pro Gl.y Gln Asn Val Arg Ala Asn Lys Lys Asn Leu Val Thr Asp Tyr Gly Val Asn His Asn Asp Gln Asn Asp Asp Ser Ser Lys Leu Asn Leu Ala Ile Lys Asp Leu Ser Asp Thr Gly Gly Ile Leu Thr Leu Pro Lys Gly Lys Tyr Tyr Leu Thr Lys Ile Arg Met Arg 5er Asn Val His Leu Glu Ile Glu Lys Gly Thr Val Ile Tyr Pro Thr Lys Gly Leu Thr Pro Ala Lys Asn His Arg Ile Phe Asp Phe Ala Ser Lys Thr Glu Glu Lys Ile Glu Asn Ala Ser Ile Va1 Gly Lys Gly Gly Lys Phe Ile Val Asp Leu Arg Gly Asn Ser Ser Lys Asn Gln Ile Val Ala Asp Val Gly Asn Va1 Thi Asn Phe Lys Ile Ser Asn Phe Thr Ile Lys Asp Glu Lys Thr Ile Phe Ala Ser Ile Leu Val Ser Phe Thr Asp Lys Ala Gly Asn Ala Trp Fro His Lys Gly Ile Ile Glu Asn Ile Asp Gln Ala Asn Ala His Thr Gly Tyr Gly Leu Ile Gln Ala Tyr Ala Ala Asp Asn Ile Leu .225 230 235 240 Fhe Asn Asn Leu Ser Cys Thr Gly Gly Val Thr Leu Arg Leu Glu Thr Asp Asn Leu Ala Met Lys Thr Ala Lys Lys Gly Gly Val Arg Asp Ile Phe Ala Thr Lys Ile Lys Asn Thr Asn Gly Leu Thr Pro Val Met Phe 'w75 280 285 Ser Pro His Phe Met Glu Asn Gly Lys Val Thr Ile Asp Asp Val Thr Ala Ile Gly Cys Ala Tyr Ala Val Arg Val Glu His Gly Phe Ile Glu Ile Phe Asp Lys Gly Asn Arg Ala Ser Ala Asp Ala Phe Lys Asn Tyr Ile Glu Gly Ile Leu Gly Ala Gly Ser Val Glu Val Val Tyr Lys Arg Asn Asn Gly Arg Thr Trp Ala Ala Arg Ile Ala Asn Asp Phe Asn Glu Ala Ala Tyr Asn His Ser Asn Pro Ala Val Ser Gly Ile Lys Pro Gly Lys Phe Ala Thr Ser Lys Val Thr Asn Val Lys Ala Thr Tyr Lys Gly Thr Gly A1a Lys Leu Lys Gln Ala Phe Leu Ser Tyr Leu Pro Cys Ser Glu Arg Ser Lys Val Cys Arg Pro Gly Pro Asp Gly Phe Glu Tyr Asn Gly Pro Ser Leu Gly Val Thr Ile Asp Asn Thr Lys Arg Asp Asn Ser Leu Gly Asn Tyr Asn Val Asn Val Ser Thr Ser Ser Val Gln Gly Phe Pro Asn Asn Tyr Val Leu Asn Val Lys Tyr Asn Thr Pro Lys Val Cys Asn Gln Asn Leu Gly Ser Ile Thr Ser Cys Asn Met Ser Leu Ser His Val Val Ile Tyr Trp Arg Leu Leu Ile Lys Ala Trp Ile Ser Ser Gly Val Asn Ile Gly Leu Ala Pro Ser Leu Pro Ala Thr Ile Ala Leu Cys Ser Tyr Ala Gln Ala Lys Ser (2) INFORMATION FOR SEQ ID NO: 5:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2180 base pairs (B) TYPE: nucleic acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (iii) HYPOTHETICAL: NO
(ix) FEATURE:
(A) NAME/KEY: CDS
(B) LOCATION:join(1..498, 741..1931, 2009..2179) (xi) SEQUENCE DESCRIPT7:ON: SEQ ID N0: 5:

Asp His Ile Ile Pro Leu Gln :Lle Lys Asn Ser Gln Asp Ser Gln Ile Ile Ser Fhe Phe Lys Ala Asp Lys Gly Ser Val Ser Arg Gln Val His Pro Pro Trp Pro Val Pro Cys Lys Ser Lys Leu Gln Glu Gln Asp Ser Ser Glu Ser Lys Glu Ser Lys Ala Glu Gln Val Lys Ile Asn Asn Cys Val Val Gln Asn Ala Met Leu Tyr Ile Glu Asn Asn Tyr Phe Asn Asp Ile Asn Ile Asp Thr Val Ala Phe Ser Val Gly Val Ser Arg Ser Tyr Leu Val Lys Gln Phe Lys Leu Ala Thr Asn Lys Thr Ile Asn Asn Arg Ile Ile Glu Val Arg Ile Glu Gln Ala Lys Lys Val Leu Leu Lys Lys Ser Val Thr Glu Thr Ala Tyr Glu Val Gly Phe Asn Asn Ser Asn Tyr 130 13°_. 140 Phe Ala Thr Val Phe Lys Lys Arg Thr Asn Tyr Thr Pro Lys Gln Phe Lys Arg Thr Phe Ser Ser Met Lys Pro Ile Ser Ile Val Ala Phe F'ro Ile Pro Ala Ile Sex Met Leu Leu Leu Ser Ala Val Ser Gln Ala Ala Ser Met Gln Pro Pro Ile Ala Lys Pro Gly Glu Thr Trp Ile Leu Gln Ala Lys Arg Ser Asp Glu Phe Asn Val Lys Asp Ala Thr Lys Trp Asn Phe Gln Thr Glu Asn Tyr Gly Val Trp Ser Trp Lys Asn Glu Asn Ala Thr Val Ser Asn Gly Lys Leu Lys Leu Thr Thr Lys Arg Glu Ser His Gln Arg Thr Phe Trp Asp Gly Cys Asn Gln Gln Gln Val Ala Asn Tyr Pro Leu Tyr Tyr Thr Ser Gly Val Ala Lys Ser Arg Ala Thr Gly Asn Tyr Gly Tyr Tyr Glu Ala Arg Ile Lys Gly Ala Ser Thr Phe Pro Gly Val Ser Pro Ala Phe Trp Met Tyr Ser Thr Ile Asp Arg Ser Leu Thr Lys Glu Gly Asp Val Gln Tyr Ser Glu Ile Asp Val Val Glu Leu Thr Gln Lys Ser Ala V'al Arg Glu Ser Asp His Asp Leu His Asn Ile Val Val Lys Asn Gly Lys Pro Thr Trp Met Arg Pro Gly Ser Phe Pro Gln Thr Asn His Asn Gly Tyr His Leu Pro Phe Asp Pro Arg Asn Asp Phe His Thr Tyr Gly Val Asn Val Thr Lys Asp Lys Ile Thr Trp Tyr Val Asp Gly Glu Ile Val Gly Glu Lys Asp Asn Leu Tyr Trp His Arg Gln Met Asn Leu Thr Leu Ser Gln Gly Leu Arg Ala Pro His Thr Gln Trp Lys Cys Asn Gln Phe Tyr Pro Ser Ala Asn Lys Ser Ala Glu Gly Phe Pro Thr 5er Met Glu Val Asp Tyr Val Arg Thr Trp Val Lys Val Gly Asn Asn Asn Ser Ala Pro Gly Glu Gly Gln Ser Cys Pro Asn Thr Phe Val Ala Val Asn Ser Val Gln Leu Ser Ala Ala Lys Gln Thr Leu Arg Lys Gly Gln Ser Thr Thr Leu Glu Ser Thr Val Leu Pro Asn Cys Ala Thr Asn Lys Lys Val Ile Tyr Ser Ser Ser Asn Lys Asn Val Ala Thr Val Asn Ser Ala Gly Val Val Lys Ala Lys Asn Lys Gly Thr Ala Thr Ile Thr Val Lys 'rhr Lys Asn Lys Gly Lys Ile Asp Lys Leu Thr Ile Ala Val Asn Met Lys Lys Val Asn Leu Ser Ser Lys Trp Ile Ile Ser Ile Ser Leu Leu Ile Ile Cys Asp Tyr Val Tyr Leu Ile Arg Thr Asn Val Asn Glu Gln Ala Asn Ala Glu Ala Thr Ala His Met His Tyr Lys Ile Asn Asn Thr Lys His Ser Lys Gly Lys Leu Asp (2) INFORMATION FOR SEQ ID N0: 6:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 620 amino acids (B) TYPE: amino acid (D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID N0: 6:
Asp His Ile Ile Pro Leu Gln I1e Lys Asn Ser Gln Asp Ser Gln Ile Ile Ser Phe Phe Lys Ala Asp Llls Gly Ser Val Ser Arg Gln Val His Pro Pro Trp Pro Val Pro Cys Lys Sex Lys Leu Gln Glu Gln Asp Ser Ser Glu Ser Lys Glu Ser Lys Ala Glu Gln Val Lys Ile Asn Asn Cys Val Val Gln Asn Ala Met Leu Tyr Ile Glu Asn Asn Tyr Phe Asn Asp Ile Asn Ile Asp Thr Val Ala Phe Ser Val Gly Val Ser Arg Ser Tyr Leu Val Lys Gln Phe Lys Leu Ala Thr Asn Lys Thr Ile Asn Asn Arg Ile Ile G1u Val Arg Ile Glu Gln Ala Lys Lys Val Leu Leu Lys Lys Ser Val Thr Glu Thr Ala Tyr Glu Val Gly Phe Asn Asn Ser Asn Tyr Phe Ala Thr Val Phe Lys Lys Arg Thr Asn Tyr Thr Pro Lys Gln Phe Lys Arg Thr Phe Ser Ser Met Lys Pro Ile Ser.Ile Val Ala Phe Pro Ile Pro Ala Ile Ser Met Leu Leu Leu Ser Ala Val Ser Gln Ala Ala leo 1as 190 Ser Met Gln Pro Pro Ile Ala Lys Pro Gly Glu Thr Trp Ile Leu Gln Ala Lys Arg Ser Asp Glu Phe Asn Val Lys Asp Ala Thr Lys Trp Asn Phe Gln Thr Glu Asn Tyr Gly Val Trp Ser Trp Lys Asn Glu Asn Ala Thr Val Ser Asn Gly Lys Leu Lys Leu Thr Thr Lys Arg Glu Ser His Gln Arg Thr Phe Trp Asp Gly Cys Asn Gln Gln Gln Val Ala Asn Tyr Pro Leu Tyr Tyr Thr Ser Gly Val Ala Lys Ser Arg Ala Thr Gly Asn Tyr Gly Tyr Tyr Glu Ala Arg Ile Lys Gly Ala Ser Thr Phe Pro Gly Val Ser Pro Ala Phe Trp Met Tyr Ser Thr Ile Asp Arg Ser Leu Thr Lys Glu Gly Asp Val Gln Tyr Ser Glu Ile Asp Val Val Glu Leu Thr Gln Lys Ser Ala Val Arg Glu Ser Asp His j~sp Leu His Asn Ile Val Val Lys Asn Gly Lys Pro Thr Trp Met Arg Pro Gly Ser Phe Pro Gln 355 ?.60 365 Thr Asn His Asn Gly Tyr His Leu Pro Phe Asp Pro Arg Asn Asp Phe His Thr T'yr Gly Val Asn Val Thr Lys Asp Lys Ile Thr Trp Tyr Val Asp Gly Glu Ile Val Gly Glu Lys Asp Asn Leu Tyr Trp His Arg Gln Met Asn Leu Thr Leu Ser Gln Gly Leu Arg Ala Pro His Thr Gln Trp Lys Cys Asn Gln Phe Tyr Pro Ser Ala Asn Lys Ser Ala Glu Gly Phe Pro Thr Ser Met Glu Val Asp Tyr Val Arg Thr Trp Val Lys Val Gly Asn Asn Asn Ser Ala Pro Gly Glu Gly Gln Ser Cys Pro Asn Thr Phe Val Ala Val Asn Ser Val Gln Leu Ser Ala Ala Lys Gln Thr Leu Arg Lys Gly Gln Ser Thr Thr Leu Glu Ser Thr Val Leu Pro Asn Cys Ala Thr Asn Lys Lys Val Ile Tyr Ser Ser Ser Asn Lys Asn Val Ala Thr Val Asn Ser Ala Gly Val Val Lys Ala Lys Asn Lys Gly Thr Ala Thr Ile Thr Val Lys Thr Lys Asn Lys Gly Lys Ile Asp Lys Leu Thr Ile Ala Val Asn Met Lys Lys Val Asn Leu Ser Ser Lys Trp Ile Ile Ser Ile Ser Leu Leu Ile Ile Cys Asp Tyr Val Tyr Leu Ile Arg Thr Asn Val Asn Glu Gln Ala Asn Ala Glu Ala Thr Ala His Met His Tyr Lys Ile Asn Asn Thr Lys His Ser Lys Gly Lys Leu Asp (2) INFnRMATION FOR SEQ ID N0: 7:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 2600 base pairs (B) TYPE: nucleic: acid (C) STRANDEDNESS: single (D) TOPOLOGY: linear (ii) MOLECULE TYPE: DNA (genomic) (iii) HYPOTHETICAL: NO

(ix) FEATURE.

(A) NAME/KEY: CDS

(B) LOCATION:875..2509 (xi) SEQUENCE DESCRIPTION: :
SEQ ID N0: 7 AAGATCATGG CTATAATTAG TTGAAAAA.ACAGGGCTTACCATGACATGGA GCTTTATTGA660 AAA AAA
CCA AAT
TTT

Met Lys Lys Pro Asn Phe ACT GCA AGT TTT

Tyr Gly Lys Met Gly Arg Leu Ser Leu Phe Tyr Leu Thr Ala Ser Phe GGG CAA ACG CCG

Phe Leu Gly Leu Val Tyr Gln Pro Lys Thr Ser Asn Gly Gln Thr Pro AAA TGG TCG AAA

Asn Asp Gln Trp Thr Ile Ser Ala Asp Glu Phe Asn Lys Trp Ser Lys AAA TGG ACA AAT

Asn Asp Pro Asp Trp Ala Ile Lys Gly Asn Leu Pro Lys Trp fihr Asn AAC AAT AAC AAC

Thr Ser Ala Trp Lys Trp Gln Lys Val Lys Ile Ser Asn Asn Asn Asn Gly Ile Ala Glu Leu Thr Met Arg His Asn Ala Asn Asn Thr Pro Pro Asp Gly Gly Thr Tyr Phe Thr Ser Gly Ile Phe Lys Ser Tyr Gln Lys Phe Thr Tyr Gly Tyr Phe Glu Ala Lys Ile Gln Gly Ala Asp Ile Gly Glu Gly Val Cys Pro Ser Phe Trp Leu Tyr Ser Asp Phe Asp Tyr Ser Val Ala Asn Gly Glu Thr Val Tyr Ser Glu Ile Asp Val Val Glu Leu CAA CAA 'PTC GAT TGG TAT GAA GGC CAT CAG GAC GAC ATT TAC GAC ATG 1420 Gln Gln Phe Asp Trp Tyr Glu Gly His Gln Asp Asp Ile Tyr Asp Met Asp Leu Asn Leu His Ala Val Val Lys Glu Asn Gly Gln Gly Val Trp Lys Arg Pro Lys Met Tyr Pro Gln Glu Gln Leu Asn Lys Trp Arg Ala ATG GAC CCG AGT AAA GAC TTT' CAT ATC TAT GGT TGT GAA GTG AAC CAG 1564 Met Asp Pro Ser Lys Asp Phe: His Ile Tyr G1y Cys Glu Val Asn Gln Asn Glu Ile Ile Trp Tyr Va7. Asp Gly Val Glu Val Ala Arg Lys Pro Asn Lys Tyr Trp His Arg Pro Met Asn Val Thr Leu Ser Leu Gly Leu Arg Lys Pro Phe Val Lys Phe Phe Asp Asn Lys Asn Asn Ala Ile Asn Pro Glu Thr Asp Ala Lys Ala Arg Glu Lys Leu Ser Asp Ile Pro Thr Ser Met Tyr Val Asp Tyr Val Arg Val Trp Glu Lys Ser Ala Gly Asn Thr Thr Asn Pro Pro Thr Ser Glu Val Gly Thr Leu Lys Thr Lys Gly Ser Lys Leu Val Ile Asp His Trp Asp Ala Ser Thr Gly Thr Ile Ser Ala Val Ser Asn Asn Thr Lys Thr Gly Gln Tyr Ala Gly Ser Val Asn Asn A1a Ser Ile Ala Gln Ile Val Thr Leu Lys Ala Asn Thr Ser Tyr Lys Val Ser Ala Phe Gly Lys Ala Ser Ser Pro Gly Thr Ser Ala Tyr Leu Gly Ile Ser Lys Ala 5er Asn Asn Glu Leu Ile Ser Asn Phe Glu TTC AAA ACA ACC TCA TAC TCC' AAA GGC GAG ATT GAG ATA AGA ACT GGA 2140 Phe Lys Thr Thr Ser Tyr Ser Lys Gly Glu Ile Glu Ile Arg Thr Gly 410 ' 415 420 Asn Val Gln Glu Ser Tyr Arg Ile Trp Tyr Trp Ser Ser Gly Gln Ala Tyr Cys Asp Asp Phe Asn Leu Val Glu Ile Asn Ser Gly Ala Ser Gln Leu Asn Glu Asn Glu Thr Glu Thr Ala Leu Glu Lys Gly Ile His Ile Tyr Pro Asn Pro Tyr Lys Asn Gly Pro Leu Thr Ile Asp Phe Gly Lys Pro Phe Ser Gly Glu Val Gln Ile Thr Gly Leu Asn Gly Arg Thr Phe TTA AGA A.GA AAT GTT GTC GAT CAA ACT TCG GTT CAG CTC CTA GAA TCC 2428 Leu Arg Arg Asn Val Val Asp Gln Thr Ser Val Gln Leu Leu Glu Ser Lys Ser Lys Phe Lys Ser Gly Leu Tyr Ile Val Lys Ile Ser Gly Pro Asp GIy Glu Val Ser Lys Lys Ile Leu Val Glu (2) INFORMATION FOR SEQ ID NO: 8:
(i) SEQUENCE CHARACTERISTICS:
(A) LENGTH: 545 amino acids (B) TYPE: amino acid (D) TOPOLOGY: linear (ii) MOLECULE TYPE: protein (xi) SEQUENCE DESCRIPTION: SEQ ID N0: 8:
Met Lys Lys Pro Asn Phe Tyr Gly Lys Met Gly Arg Thr Ala Leu Ser Ser Leu Phe Tyr Leu Phe Phe Leu Gly Leu Val Tyr Gly Gln Gln Pro Thr Lys Thr Ser Asn Pro Asn Asp Gln Trp Thr Ile Lys Trp Ser Ala Ser Asp Glu Phe Asn Lys Asn Asp Pro Asp Trp Ala Lys Trp Ile Lys Thr Gly Asn Leu Pro Asn Thr Ser Ala Trp Lys Trp Asn Asn Gln Lys Asn Val Lys Ile Ser Asn Gly Ile Ala Glu Leu Thr Met Arg His Asn Ala Asn Asn Thr Pro Pro Asp Gly Gly Thr Tyr Phe Thr Ser Gly Ile Phe Lys Ser Tyr Gln Lys Phe Thr Tyr Gly Tyr Phe Glu Ala Lys Ile Gln Gly Ala Asp Ile Gly Glu Gly Val Cys Pro Ser Phe Trp Leu Tyr Ser Asp Phe Asp Tyr Ser Val Ala Asn Gly Glu Thr Val Tyr Ser Glu Ile Asp Val Val Glu Leu Gln Gln Phe Asp Trp Tyr Glu Gly His Gln Asp Asp Ile Tyr Asp Met Asp Leu Asn Leu His Ala Val Val Lys Glu Asn Gly Gln Gly Val Trp Lys Arg Pro Lys Met Tyr Pro Gln Glu Gln 195 2.00 205 Leu Asn Lys Trp Arg Ala Met Asp Pro Ser Lys Asp Phe His Ile Tyr Gly Cys Glu Val Asn Gln Asn Glu Ile Ile Trp Tyr Val Asp Gly Val Glu Val Ala Arg Lys Pro Asn Lys Tyr Trp His Arg Pro Met Asn Val Thr Leu Ser Leu Gly Leu Arg Lys Pro Phe Val Lys Phe Phe Asp Asn Lys Asn Asn Ala Ile Asn Pro Glu Thr Asp Ala Lys Ala Arg Glu Lys Leu Ser Asp Ile Pro Thr Ser Met Tyr Val Asp Tyr Val Arg Val Trp Glu Lys Ser Ala Gly Asn Thr Thr Asn Pro Pro Thr Ser Glu Val Gly Thr Leu Lys Thr Lys Gly Ser Lys Leu Val Ile Asp His Trp Asp Ala Ser Thr Gly Thr Ile Ser Ala Val Ser Asn Asn Thr Lys Thr Gly Gln Tyr Ala Gly Ser Val Asn Asn Ala Ser Ile Ala Gln Ile Val Thr Leu Lys Ala Asn Thr Ser Tyr Lys Val Ser Ala Phe Gly Lys Ala Ser Ser Pro Gly Thr Ser Ala Tyr Leu Gly Ile Ser Lys Ala Ser Asn Asn Glu Leu Ile Ser Asn Phe Glu Phe Lys Thr Thr Ser Tyr Ser Lys Gly Glu Ile Glu Ile Arg Thr Gly Asn Val Gln Glu Ser Tyr Arg Ile Trp Tyr Trp Ser Ser Gly Gln Ala Tyr Cys Asp Asp Phe Asn Leu Val Glu Ile Asn Ser G7.y Ala Ser Gln Leu Asn Glu Asn Glu Thr Glu Thr Ala Leu Glu Lys Gly Ile His Ile Tyr Faro Asn Pro Tyr Lys Asn Gly Pro Leu Thr Ile Asp Phe Gly Lys Pro Phe Ser Gly Glu Val Gln Ile Thr Gly Leu Asn Gly Arg Thr Phe Leu Arg Arg Asn Val Val Asp Gln Thr Ser Val Gln Leu Leu Glu Ser Lys Ser Lys Phe Lys Ser Gly Leu Tyr Ile Val Lys Ile Ser Gly Pro Asp Gly Glu Val Ser Lys Lys Ile Leu Val Glu

Claims

1. An isolated nucleic acid molecule comprising a nucleic acid sequence encoding a protein having glycosyl-hydrolase activity having a hydrophobic cluster analysis score with the kappa-carrageenase of Alteromonas carrageenovora which is greater than or equal to 75% over the domain extending between amino acids 117 and 262 of the protein sequence represented by SEQ ID No. 6 of said kappa-carrageenase.

2. The nucleic acid molecule according to claim 1 wherein the hydrophobic cluster analysis score is greater than or equal to 80%

3. The nucleic acid molecule according to claim 1 wherein the hydrophobic cluster analysis score is greater than or equal to 85%.

4. The nucleic acid molecule according to claim 1 which codes for the K-carrageenase of Cytophaga drobachiensis and comprises the nucleic acid sequence represented by SEQ ID No. 7.

5. Use of the nucleic acid molecule according to any one of claims 1 to 4 for obtaining glycosyl hydrolases by genetic engineering.

6. Use of the nucleic acid molecule according to claim 4 for obtaining the kappa-carrageenase of Cytophaga drobachiensis by genetic engineering.

7. A vector comprising a nucleic acid molecule according to claim 1.

8. A host cell genetically modified with a nucleic acid molecule according to claim 1 or with a vector comprising said nucleic acid molecule.

9. A method of producing a protein having glycosyl hydrolase activity, the method comprising:
(a) obtaining the host cell of claim 8; and (b) growing the host cell under conditions and for a time sufficient to produce the protein.