FR2748479A1

FR2748479A1 - Cellulase proteins with cohesin or dockerin type II domains

Info

Publication number: FR2748479A1
Application number: FR9605854A
Authority: FR
Inventors: Emmanuelle Leibovitz; Pierre Beguin
Original assignee: Institut Pasteur de Lille
Current assignee: Institut Pasteur de Lille
Priority date: 1996-05-10
Filing date: 1996-05-10
Publication date: 1997-11-14

Abstract

The following are new: (1) a compound comprising a type II dockerin domain; and (2) a compound to which at least one cohesin type II domain is capable of binding in a (non-)covalent manner.

Description

POLYPEPTIDE COMPORTANT UN NOUVEAU DOMAINE COHESINE DE
TYPE II, COMPOSITION ENZYMATIQUE EN COMPORTANT ET
FRAGMENTS D'ADN CODANT POUR CES POLYPEP?'IDES
La présente invention concerne des domaines protéiques susceptibles d'interagir de façon non covalente et permettant d'agencer en complexes multiprotéiques définis des polypeptides d'intérêt biochimique ou biologique pour les faire agir ensemble simultanément ou de manière séquentielle afin de potentialiser leur synergie. Elle concerne également les fragments d'ADN codant pour lesdits fragments protéiques. La présente invention concerne enfin des compositions enzymatiques permettant d'associer plusieurs enzymes pour les faire agir ensemble simultanément ou de manière séquentielle afin de potentialiser leur synergie. Par exemple, dans le cas d'une action séquentielle, plusieurs types d'enzymes à activité différente peuvent agir successivement sur un me me mélange de substrats.POLYPEPTIDE COMPRISING A NEW COHESIN DOMAIN OF
TYPE II, ENZYMATIC COMPOSITION COMPRISING AND
DNA FRAGMENTS ENCODING THESE POLYPEP? 'IDES
The present invention relates to protein domains capable of interacting non-covalently and making it possible to organize, in defined multiprotein complexes, polypeptides of biochemical or biological interest in order to make them act together simultaneously or sequentially in order to potentiate their synergy. It also relates to the DNA fragments coding for said protein fragments. The present invention finally relates to enzymatic compositions making it possible to associate several enzymes to make them act together simultaneously or sequentially in order to potentiate their synergy. For example, in the case of a sequential action, several types of enzymes with different activity can act successively on a same mixture of substrates.

Les cellulases de plusieurs bactéries cellulolytiques sont organisées en complexe enzymatique comportant des sous-unités à activité catalytique interagissant avec un polypeptide sans activité catalytique appelé "protéine de charpente". Cette interaction se réalise via des domaines des sous-unités à activité catalytique appelés "domaines dockerine" et des domaines répétés de la protéine de charpente appelés "domaines cohésine" de taille plus importante que les domaines dockerine des unités catalytiques. The cellulases of several cellulolytic bacteria are organized in an enzymatic complex comprising subunits with catalytic activity interacting with a polypeptide without catalytic activity called "framework protein". This interaction takes place via domains of catalytic activity subunits called "dockerine domains" and repeated domains of the framework protein called "cohesin domains" of larger size than the dockerine domains of catalytic units.

A ce jour, seuls les domaines cohésine des protéines de charpente ont été identifiés, ces domaines sont appelés dans la présente description, domaines cohésine de type I. To date, only the cohesin domains of the framework proteins have been identified, these domains are called in the present description, type I cohesin domains.

En particulier, Clostridium thermocellum, une bactérie Gram positive, thermophile et anaérobie, produit un complexe cellulolytique à masse moléculaire élevée dénommé cellulosome (15, 16, 21). Ce complexe est initialement fixé à la surface cellulaire et est ensuite libéré dans le milieu. In particular, Clostridium thermocellum, a Gram-positive, thermophilic and anaerobic bacterium, produces a high molecular weight cellulolytic complex called the cellulosome (15, 16, 21). This complex is initially attached to the cell surface and is then released into the medium.

Le cellulosome est composé d'au moins 15 polypeptides différents, comprenant de nombreuses a-1 ,4-endoglucanases, au moins une cellobiohydrolase (23) et plusieurs hémicellulases (R-1,4-xylanases, lichénases) (22). Les composants catalytiques sont liés de manière non covalente à une sous-unité de charpente non catalytique, dénommée CipA (pour Cellulosome Integrating Protein) (37). The cellulosome is composed of at least 15 different polypeptides, comprising numerous a-1, 4-endoglucanases, at least one cellobiohydrolase (23) and several hemicellulases (R-1,4-xylanases, lichenases) (22). The catalytic components are linked non-covalently to a non-catalytic framework subunit, called CipA (for Cellulosome Integrating Protein) (37).

La protéine CipA et des composants similaires identifiés dans les complexes cellulolytiques d'autres Clostridium cellulolytiques sont des protéines de charpente ou "scaffoldines" (2). The protein CipA and similar components identified in the cellulolytic complexes of other Clostridium cellulolytics are framework proteins or "scaffoldins" (2).

Le mode de fixation des sous-unités catalytiques à la protéine
Cip A a été élucidé (références n" 8, 33). Chaque sous-unité catalytique contient un segment dupliqué et conservé de 23 résidus, constituant un domaine dockerine (2). Les domaines dockerine entrent en interaction avec un ensemble de domaines de liaison complémentaires, ou domaines cohésine (2).The mode of attachment of the catalytic subunits to the protein
Cip A has been elucidated (references No. 8, 33). Each catalytic subunit contains a duplicated and conserved segment of 23 residues, constituting a docker domain (2). Docker domains interact with a set of binding domains complementary, or cohesin domains (2).

Ces domaines, dont neuf copies sont présentes dans la séquence de CipA, sont très semblables entre eux, particulièrement les domaines 4 à 8, qui possèdent plus de 95 % de résidus identiques ( 1 1). These domains, nine copies of which are present in the CipA sequence, are very similar to each other, particularly domains 4 to 8, which have more than 95% of identical residues (1 1).

Il a été montré que l'on peut greffer un domaine dockerine sur une protéine ne faisant pas partie du cellulosome, par exemple l'endoglucanase CelC de C. thermocellum, et que celle-ci acquiert de ce fait la capacité de se fixer sur CipA (32). It has been shown that a dockerine domain can be grafted onto a protein which is not part of the cellulosome, for example the endoglucanase CelC from C. thermocellum, and that the latter therefore acquires the capacity to bind to CipA. (32).

Cette observation a suggéré la possibilité d'utiliser l'affinité entre domaines cohésine et domaines dockerine afin de créer des complexes artificiels incorporant diverses protéines fusionnées à des domaines dockerine adéquats, interagissant avec les domaines cohésine de la protéine de charpente (2, 32). De tels complexes pourraient trouver diverses applications biotechnologiques. En modifiant de manière contrôlée, la composition de cellulosomes naturels, il pourrait être possible d'optimiser leur activité vis-à-vis de substrats cellulosiques définis. On peut également envisager d'améliorer le processus de dégradation d'autres substrats complexes et insolubles, faisant appel à des enzymes de spécificité complémentaire, et dont l'action synergique serait potentialisée par une association en complexes multienzymatiques. De même, I'association physique d'enzymes effectuant des réactions séquentielles permet d'accélérer celles-ci lorsque la vitesse de diffusion du produit de la première réaction vers le deuxième site réactionnel est limitante (L. Bülow et
K. Mosbach, Multienzyme systems obtained by gene fusion, Trends in
Biotechnol. 9, 226-23 i). Par ailleurs, I'utilité de complexes multiprotéiques n'est pas limitée à l'association d'enzymes. La construction de complexes protéiques multifonctionnels est en effet susceptible de donner lieu à une grande variété d'applications, discutées dans la référence (2). This observation suggested the possibility of using the affinity between cohesin domains and docker domains to create artificial complexes incorporating various proteins fused to suitable docker domains, interacting with the cohesin domains of the framework protein (2, 32). Such complexes could find various biotechnological applications. By modifying the composition of natural cellulosomes in a controlled manner, it might be possible to optimize their activity with respect to defined cellulosic substrates. One can also consider improving the degradation process of other complex and insoluble substrates, using enzymes of complementary specificity, and whose synergistic action would be potentiated by an association in multienzymatic complexes. Similarly, the physical association of enzymes carrying out sequential reactions makes it possible to accelerate these when the rate of diffusion of the product of the first reaction towards the second reaction site is limiting (L. Bülow and
K. Mosbach, Multienzyme systems obtained by gene fusion, Trends in
Biotechnol. 9, 226-23 i). Furthermore, the usefulness of multiprotein complexes is not limited to the combination of enzymes. The construction of multifunctional protein complexes is indeed likely to give rise to a wide variety of applications, discussed in reference (2).

Cependant, la construction de complexes de stoechiométrie et de topologie définies se heurte à une difficulté importante. Tous les domaines cohésine connus jusqu'à présent sont très semlables quant à leur séquence et à leur spécificité de liaison. Par exemple, il a été montré que
CelS, une des sous-unités catalytiques du cellulosome, peut se lier de façon équivalente aux domaines cohésine (18b) 1, 2, et 9 de CipA, et vraisemblablement à tous les autres domaines cohésine de celle-ci. En conséquence, il n'est pas possible de programmer la liaison d'une protéine de fusion, porteuse d'un domaine dockerine, à un domaine cohésine défini de la protéine de charpente.However, the construction of complexes of defined stoichiometry and topology encounters an important difficulty. All the cohesin domains known up to now are very similar in terms of their sequence and their binding specificity. For example, it has been shown that
CelS, one of the catalytic subunits of the cellulosome, can bind equivalently to the cohesin domains (18b) 1, 2, and 9 of CipA, and presumably to all of the other cohesin domains thereof. Consequently, it is not possible to program the binding of a fusion protein, carrying a dockerine domain, to a defined cohesin domain of the framework protein.

Les domaines cohésine connus jusqu'à ce jour, possédant une forte similitude de séquence et de spécificité de liaison, ont été groupés sous le nom de domaines cohésine de type 1. De même, les domaines dockerine portés par les sous-unités catalytiques du cellulosome, et capables de se lier aux domaines cohésine de type I, sont appelés domaines dockerine de type 1. The cohesin domains known to date, having a strong similarity of sequence and specificity of binding, have been grouped under the name of cohesin domains of type 1. Likewise, the dockerine domains carried by the catalytic subunits of the cellulosome , and capable of binding to type I cohesin domains, are called type 1 dockerine domains.

Il existe cependant à l'extrémité COOH-terminale de CipA un domaine présentant une similitude de séquence éloignée avec les domaines dockerine de type I, mais incapable de se lier aux domaines cohésine de type
I. Il permet aux protéines qui le portent de se fixer à trois polypeptides exocellulaires de C. thermocellum. La structure et la fonction de ces polypeptides sont inconnues (29).There is, however, at the COOH-terminus of CipA a domain with distant sequence similarity with dockerine type I domains, but unable to bind to cohesin type domains
I. It allows the proteins which carry it to bind to three exocellular polypeptides of C. thermocellum. The structure and function of these polypeptides are unknown (29).

L'invention repose sur la caractérisation d'un gène, sdbA ("scaffolding dockerin binding protein"), qui a été cloné et séquencé, dont le produit SdbA est capable de se fixer spécifiquement au domaine COOHterminal de CipA, à l'exclusion des domaines dockerine de type I portés par les sous-unités catalytiques du cellulosome. La caractérisation du polypeptide SdbA montre qu'il comporte une région spécifique responsable de la liaison avec le domaine COOH-terminal de CipA, et dont la séquence est très différente de celle des domaines cohésine de type I. Cette région, ainsi que les segments polypeptidiques de séquence et de spécificité d'interaction similaire, sont nouveaux et appelés domaines cohésine de type II. De même, la région COOH-terminale de CipA est appelée domaine dockerine de type II. The invention is based on the characterization of a gene, sdbA ("scaffolding dockerin binding protein"), which has been cloned and sequenced, the product SdbA of which is capable of binding specifically to the COOHterminal domain of CipA, excluding dockerine type I domains carried by the catalytic subunits of the cellulosome. The characterization of the SdbA polypeptide shows that it contains a specific region responsible for binding to the COOH-terminal domain of CipA, and whose sequence is very different from that of the type I cohesin domains. This region, as well as the polypeptide segments of similar sequence and specificity of interaction, are new and called type II cohesin domains. Likewise, the COOH-terminal region of CipA is called type II docker domain.

L'utilisation de domaines cohésine et dockerine de type II, éventuellement en conjonction avec des domaines cohésine et dockerine de type différent (par exemple de type I) permet de construire des complexes protéiques mieux définis. The use of type II cohesin and docker domains, possibly in conjunction with different type (eg type I) cohesin and docker domains, allows better defined protein complexes to be constructed.

L'intérêt des domaines cohésine de type Il selon la présente invention est de présenter une spécificité de reconnaissance différente de celle des domaines cohésine de protéine de charpente connus précédemment, notamment ceux de la protéine CipA. The advantage of type II cohesin domains according to the present invention is that they have a specificity of recognition which is different from that of the cohesin domains of framework protein known previously, in particular those of the CipA protein.

La présente invention concerne plus particulièrement des domaines cohésine de type II ainsi que des domaines dockerine de type II. The present invention relates more particularly to type II cohesin domains as well as to type II docker domains.

La présente invention concerne notamment des composés sur lesquels sont capables de se fixer de façon covalente ou non au moins un domaine cohésine de type II ou un domaine dockerine de type II. The present invention relates in particular to compounds to which are capable of binding, covalently or not, at least one type II cohesin domain or one type II dockerine domain.

Plus particulièrement ces composés sont des peptides, polypeptides ou protéines, mais il peut s'agir de lipides ou de glycosides ou bien de molécules de type mixte telles que protéoglycane, lipopolysaccharide par exemple. I1 est possible de prévoir d'autres types de molécules notamment des marqueurs ou par exemple des molécules chimiques thérapeutiques ou non. More particularly, these compounds are peptides, polypeptides or proteins, but they can be lipids or glycosides or else molecules of mixed type such as proteoglycan, lipopolysaccharide for example. It is possible to provide other types of molecules, in particular markers or for example therapeutic or non-therapeutic chemical molecules.

Un domaine cohésine de type II est un domaine protéique qui se lie de façon spécifique avec le domaine dockenne de CipA correspondant au domaine dockerine de type Il tel qu'il sera défini ci-après. De préférence l'affinité du complexe ainsi formé sera au moins de 105 M/L tel que mesuré par la méthode décrite dans SALAMITOU et al (réf. 28). A type II cohesin domain is a protein domain which specifically binds with the CipA dockan domain corresponding to the type II dockerin domain as will be defined below. Preferably the affinity of the complex thus formed will be at least 105 M / L as measured by the method described in SALAMITOU et al (ref. 28).

La séquence du domaine dockerine de CipA est celle correspondant à l'IDS n" 4. The CipA docker domain sequence is that corresponding to IDS No. 4.

Un domaine cohésine de type II peut correspondre à des séquences naturelles, il peut notamment s'agir de domaine provenant de bactéries cellulolytiques notamment des Clostridium comme cela sera décrit ci-après pour SdbA. A type II cohesin domain may correspond to natural sequences, it may in particular be a domain originating from cellulolytic bacteria, in particular Clostridium bacteria, as will be described below for SdbA.

Mais de tels domaines sont également présents sur les protéines OlpB et ORF2p. But such domains are also present on the proteins OlpB and ORF2p.

La notion de domaine cohésine de type II incorpore également des séquences protéiques non naturelles pour autant qu'elles puissent se lier avec le domaine dockerine de type Il de CipA. The notion of type II cohesin domain also incorporates unnatural protein sequences as long as they can bind with the type II dockerine domain of CipA.

Il peut alors s'agir notamment de domaines homologues aux domaines naturels ou de fragments de ces domaines mais il est possible de prévoir également des domaines entièrement synthétiques obtenus par exemple en utilisant certains acides aminés non naturels, ou bien en utilisant des éléments améliorant l'affinité. It can then be in particular domains homologous to natural domains or fragments of these domains but it is possible to also provide entirely synthetic domains obtained for example by using certain non-natural amino acids, or else by using elements improving the affinity.

Par "protéine homologue" ou "séquence homologue" on entend selon la présente invention toute protéine, polypeptide ou peptide présentant une homologie de séquence d'au moins 25 % par rapport au domaine cohésine de type Il notamment celui correspondant à SdbA, ladite séquence conservant les propriétés de liaison spécifique au domaine dockerine concerné, notamment au domaine dockerine de CipA. By "homologous protein" or "homologous sequence" is meant according to the present invention any protein, polypeptide or peptide having a sequence homology of at least 25% with respect to the type II cohesin domain, in particular that corresponding to SdbA, said sequence retaining the binding properties specific to the docker area concerned, in particular to the docker area of CipA.

Par "fragment de protéines" ou "fragment de séquences" on entend un fragment d'au moins 50 acides aminés conservant les propriétés de liaison spécifique au domaine dockerine concerné, notamment au domaine dockerine de CipA. By “protein fragment” or “sequence fragment” is meant a fragment of at least 50 amino acids retaining the binding properties specific to the docker domain concerned, in particular to the docker domain of CipA.

Il faut rappeler qu'un domaine cohésine de type II doit présenter une bonne affinité pour le domaine dockérine correspondant mais ne doit présenter que pas ou peu d'affinité pour le domaine dockérine de type différent notamment de type I. It should be remembered that a type II cohesin domain must have good affinity for the corresponding dockerine domain but must have little or no affinity for the different type of dockerine domain, in particular type I.

La présente invention concerne également les composés comportant un domaine dockérine de type II, c'est à dire un domaine protéique qui se lie de façon spécifique avec un domaine cohésine de type II et ce avec une affinité d'au moins 105 M/L mesuré comme précédemment. The present invention also relates to the compounds comprising a type II dockerine domain, that is to say a protein domain which specifically binds with a type II cohesin domain and this with an affinity of at least 105 M / L measured. like before.

Cette définition n'est pas redondante, en effet il faut bien comprendre qu'à partir du domaine dockérine de type II de CipA il est possible de définir un certain nombre de domaines cohésine de type II, lesquels peuvent permettre de définir de nouveaux domaines dockerine de type II lesquels comme précédemment peuvent être d'origine naturelle mais peuvent être constitués de fragments de domaines de séquences homologues ou bien éventuellement comme cela a été indiqué précédemment, comporter des séquences entièrement synthétiques avec éventuellement des acides aminés non naturels. This definition is not redundant, indeed it must be understood that from the dockerine type II domain of CipA it is possible to define a certain number of type II cohesin domains, which can make it possible to define new dockerine domains. of type II which, as above, may be of natural origin but may consist of fragments of domains of homologous sequences or alternatively, as has been indicated above, include entirely synthetic sequences with optionally non-natural amino acids.

La liaison entre un domaine cohésine et un domaine dockerine de type II sera dénommée ci-après par simplification interaction C/D de type II, le complexe ainsi formé étant dénommé soit complexe C/D de type II lorsqu'il ne comporte qu'une seule interaction C/D de type II soit complexe multimérique lorsqu'il comporte au moins une interaction C/D autre que de type II, interaction C/D de type I par exemple et/ou d'autres formes d'interactions : avidine/biotine, antigène/anticorps par exemple. De préférence, les complexes multimérique selon l'invention comportent essentiellement des interactions de type C/D. The bond between a cohesin domain and a dockerine domain of type II will be called hereinafter by simplification C / D interaction of type II, the complex thus formed being called either C / D complex of type II when it comprises only one only C / D type II interaction is multimeric complex when it includes at least one C / D interaction other than type II, C / D type I interaction for example and / or other forms of interactions: avidin / biotin, antigen / antibody for example. Preferably, the multimeric complexes according to the invention essentially comprise C / D type interactions.

Ainsi la présente invention, grâce à ces différentes interactions, permet de cibler l'intégration dans un complexe d'enzymes différentes et de fournir un complexe artificiel utilisant notamment une protéine de charpente comportant des domaines dockerine de spécificité de liaisons différentes afin d'agencer de manière spécifique diverses protéines porteuses de domaines cohésine correspondants. Thus the present invention, thanks to these different interactions, makes it possible to target integration into a complex of different enzymes and to provide an artificial complex using in particular a framework protein comprising docker domains with specificity of different bonds in order to organize specifically, various proteins carrying corresponding cohesin domains.

Plus particulièrement, la présente invention fournit un polypeptide ayant un domaine cohésine de type II, selon l'invention, caractérisé en ce qu'il est capable de se fixer au domaine dockerine COOHterminal de la protéine de charpente CipA du complexe cellulolytique de
Clostridium thermocellum. Toute protéine ou peptide présentant ou comportant une séquence ayant plus de 25 % de résidus d'acides aminés identiques avec un domaine cohésine de type II de SdbA entre dans la définition de l'invention. En particulier, il s'agit d'une protéine de
Clostridium thermocellum ou d'un fragment de celle-ci.More particularly, the present invention provides a polypeptide having a type II cohesin domain, according to the invention, characterized in that it is capable of binding to the dockerine COOHterminal domain of the framework protein CipA of the cellulolytic complex of
Clostridium thermocellum. Any protein or peptide having or comprising a sequence having more than 25% of amino acid residues identical with a cohesin type II domain of SdbA falls within the definition of the invention. In particular, it is a protein of
Clostridium thermocellum or a fragment thereof.

Dans un mode plus particulier de réalisation, la présente invention a pour objet une protéine SdbA ("scaffolding dockerin binding protein") de Clostridium thermocellum de poids moléculaire apparent de 68kDa (+ 10 %) comportant un domaine cohésine qui est capable de se fixer avec un domaine dockerine de type Il notamment de la protéine CipA de
Clostridium thermocellum.In a more particular embodiment, the present invention relates to a protein SdbA ("scaffolding dockerin binding protein") of Clostridium thermocellum with apparent molecular weight of 68kDa (+ 10%) comprising a cohesin domain which is capable of binding with a type II docker domain including the CipA protein of
Clostridium thermocellum.

Le polypeptide SdbA du complexe cellulolytique de Clostridium thermocellum, a une séquence de 631 acides aminés substantiellement telle que représentée sur l'IDS n"l. The SdbA polypeptide of the cellulolytic complex of Clostridium thermocellum, has a sequence of 631 amino acids substantially as shown in IDS No. 1.

La présente invention a permis d'identifier le domaine de la protéine SdbA capables de se fixer au domaine dockerine de CipA. En particulier, le domaine cohésine comprend une séquence de la région Nterminale de la protéine de 184 acides aminés substantiellement telle que représentée dans l'IDS n 1 de l'acide aminé n 27 à l'acide aminé n 210 de la séquence de la protéine ou une séquence homologue ou un fragment de cette séquence ou d'une séquence homologue capable de se fixer à un domaine dockerine de la protéine CipA, par exemple, un fragment de ces séquences d'au moins 50 acides aminés capable de se fixer à un domaine dockerine de la protéine CipA. The present invention has made it possible to identify the domain of the protein SdbA capable of binding to the dockerine domain of CipA. In particular, the cohesin domain comprises a sequence of the Nterminal region of the protein of 184 amino acids substantially as represented in IDS n 1 of amino acid n 27 to amino acid n 210 of the protein sequence or a homologous sequence or a fragment of this sequence or of a homologous sequence capable of binding to a dockerin domain of the CipA protein, for example, a fragment of these sequences of at least 50 amino acids capable of binding to a dockerine domain of the CipA protein.

La présente invention a permis d'identifier des domaines cohésine de type II d'autres protéines de Clostridium thermocellum, en particulier des protéines OlpB et ORF2p (9, 17). SdbA présente une homologie de séquence avec les séquences répétées N-terminale de OlpB et ORF2p. The present invention has made it possible to identify type II cohesin domains of other Clostridium thermocellum proteins, in particular proteins OlpB and ORF2p (9, 17). SdbA has sequence homology with the N-terminal repeats of OlpB and ORF2p.

Le segment polypeptidique comprenant les résidus 26-199 de la protéine OlpB, qui présente une forte similitude de séquence avec les résidus 27-191 de SdbA, peut également fixer le domaine C-terminal de CipA
Ainsi outre des fragments de la protéine SdbA ou d'une protéine homologue interagissant avec un domaine dockerine d'une protéine de charpente selon l'invention, la présente invention a donc également pour objet des fragments de OlpB et ORF2p, de séquences similaires au domaine cohésine de type II de SdbA.The polypeptide segment comprising residues 26-199 of the protein OlpB, which has a strong sequence similarity with residues 27-191 of SdbA, can also fix the C-terminal domain of CipA
Thus, in addition to fragments of the SdbA protein or of a homologous protein interacting with a docker domain of a framework protein according to the invention, the present invention therefore also relates to fragments of OlpB and ORF2p, of sequences similar to the domain cohesin type II from SdbA.

La présente invention a donc pour objet tout polypeptide comprenant comme domaine cohésine la séquence correspondant substantiellement à l'une des séquences de la protéine OlpB choisies parmi la séquence des acides aminés n 28 au n 192, la séquence des acides aminés n" 207 au n" 363, la séquence des acides aminés n 409 au n" 565 et la séquence des acides aminés n" 607 au n 763 de L'IDES n" 2 ou une séquence homologue à l'une de ces séquences ou un fragment de ces séquences d'au moins 50 acides aminés, capable de se fixer à un domaine dockerine de la protéine CipA. The subject of the present invention is therefore any polypeptide comprising as cohesin domain the sequence substantially corresponding to one of the sequences of the OlpB protein chosen from the sequence of amino acids n 28 to n 192, the sequence of amino acids n "207 to n "363, the sequence of amino acids n 409 to n" 565 and the sequence of amino acids n "607 to n 763 of IDES n" 2 or a sequence homologous to one of these sequences or a fragment of these sequences at least 50 amino acids, capable of binding to a docker domain of the CipA protein.

La présente invention a également pour objet tout polypeptide comprenant un domaine cohésine qui a substantiellement pour séquence en acides aminés, une séquence de la protéine ORF2p choisie parmi la séquence des acides aminés n" 38 à 195 et la séquence des acides aminés n" 209 à 365 de L'IDES n 3, ou une séquence homologue à ces séquences ou un fragment de ces séquences d'au moins 50 acides aminés capable de se fixer à un domaine dockerine de la protéine CipA. The subject of the present invention is also any polypeptide comprising a cohesin domain which has substantially as amino acid sequence, a sequence of the ORF2p protein chosen from the sequence of amino acids n "38 to 195 and the sequence of amino acids n" 209 to 365 of IDES n 3, or a sequence homologous to these sequences or a fragment of these sequences of at least 50 amino acids capable of binding to a dockerine domain of the protein CipA.

La présente invention a également pour objet tout polypeptide qui comporte un segment de séquence de plus de 50 acides aminés, présentant plus de 25 % de résidus identiques avec l'un des segments de l'IDS n 1, de 1'IDES n 2, ou de L'IDES n 3 décrits ci-dessus, et capable de fixer le domaine dockerine de CipA. The present invention also relates to any polypeptide which comprises a sequence segment of more than 50 amino acids, having more than 25% of identical residues with one of the segments of IDS n 1, of IDES n 2, or IDES n 3 described above, and capable of fixing the dockerine domain of CipA.

Parmi les composés selon la présente invention comportant un domaine cohésine de type II ou dockenne de type II il faut citer par exemple les enzymes, les récepteurs, les antigènes, les anticorps ou un de leurs fragments comportant entre 20 et 100 acides aminés. Among the compounds according to the present invention comprising a type II or dockenne cohesin domain of type II, it is necessary to cite, for example, enzymes, receptors, antigens, antibodies or one of their fragments comprising between 20 and 100 amino acids.

Dans le cas particulier où la protéine est une enzyme, il s'agira par exemple d'une cellulase permettant une meilleure hydrolyse de substrat cellulosique ou tout autre type d'enzyme hydrolytique. In the particular case where the protein is an enzyme, it will for example be a cellulase allowing better hydrolysis of the cellulosic substrate or any other type of hydrolytic enzyme.

Dans le cas où le composé selon l'invention est essentiellement une protéine, les parties de la protéine peuvent être fusionnées audit domaine cohésine ou dockerine par l'intermédiaire d'un fragment polypeptidique. La liaison peut être également une liaison non covalente, par exemple une liaison conformationnelle. In the case where the compound according to the invention is essentially a protein, the parts of the protein can be fused to said cohesin or dockerine domain via a polypeptide fragment. The bond can also be a non-covalent bond, for example a conformational bond.

La présente invention a également pour objet un fragment d'ADN codant pour un composé selon l'invention lorsque celui-ci est un polypeptide ou codant pour la protéine SdbA ou un fragment de celle-ci, lorsque le composé selon l'invention comporte d'autres éléments que le polypeptide ou la protéine l'invention concerne également le fragment d'ADN codant pour le polypeptide ou la protéine. The present invention also relates to a DNA fragment coding for a compound according to the invention when the latter is a polypeptide or coding for the protein SdbA or a fragment thereof, when the compound according to the invention comprises d 'Elements other than the polypeptide or protein the invention also relates to the DNA fragment coding for the polypeptide or protein.

La présente invention repose en partie sur le clonage moléculaire et le séquençage du gène dénommé sdbA, dont le produit se fixe spécifiquement au domaine dockerine porté par CipA. Des segments du gène ont été sous-clonés et exprimés séparément, pour permettre d'identifier la région du polypeptide responsable de la fixation du domaine dockerine de
CipA. Il s'agit du fragment d'ADN comprenant un cadre de lecture ouvert de 1 893 nucléotides, et codant pour le polypeptide de 63 1 aminoacides dénommé SdbA, ayant une masse moléculaire calculée de 68 577 Da.The present invention is based in part on molecular cloning and sequencing of the gene called sdbA, the product of which binds specifically to the dockerine domain carried by CipA. Segments of the gene have been subcloned and expressed separately, to identify the region of the polypeptide responsible for binding the docker domain of
CipA. It is the DNA fragment comprising an open reading frame of 1,893 nucleotides, and coding for the polypeptide of 63 1 amino acids called SdbA, having a calculated molecular mass of 68,577 Da.

La présente invention a donc également pour objet un fragment d'ADN représenté substantiellement par la séquence 1 à 1893 de L'IDES n" 1 codant pour la protéine SdbA ainsi qu'une souche de E. coli déposée à la CNCM de l'institut Pasteur sous le n I-1684 transformée par le plasmide PCT1830 comportant un fragment d'ADN correspondant à cette séquence codant pour la protéine SdbA. The present invention therefore also relates to a DNA fragment represented substantially by the sequence 1 to 1893 of IDES n "1 coding for the protein SdbA as well as a strain of E. coli deposited at the CNCM of the institute. Pasteur under No. I-1684 transformed by the plasmid PCT1830 comprising a DNA fragment corresponding to this sequence coding for the protein SdbA.

La présente invention a en outre pour objet un fragment d'ADN qui a pour séquence essentiellement les nucléotides 82 à 573 dans l'IDS n" 1 codant pour le domaine cohésine de la protéine SdbA ainsi qu'une souche de E. coli déposée à la CNCM de l'Institut Pasteur sous le n 1-1683 transformée par le plasmide pCT1801 comportant un fragment d'ADN correspondant à cette séquence de 1893 pb. The present invention further relates to a DNA fragment which has the sequence essentially nucleotides 82 to 573 in IDS n "1 coding for the cohesin domain of the protein SdbA as well as an E. coli strain deposited at the CNCM of the Institut Pasteur under the number 1-1683 transformed by the plasmid pCT1801 comprising a DNA fragment corresponding to this sequence of 1893 bp.

De même la présente invention a également pour objet un fragment d'ADN caractérisé en ce qu'il a substantiellement pour séquence l'une des séquences codant pour un domaine cohésine de la protéine OlpB choisies parmi la séquence des nucléotides 85 à 570, la séquence des nucléotides 619 à 1095, la séquence des nucléotides 1225 à 1689 et la séquence des nucléotides 1819 à 2189 dans L'IDES n 2 ainsi qu'un fragment d'ADN caractérisé en ce qu'il a substantielle ment pour séquence l'une des séquences codant pour un domaine cohésine de ORF2 choisies parmi la séquence des nucléotides 109 à 582 et la séquence des nucléotides n" 625 à 1092 dans l'IDS n" 3. Similarly, the present invention also relates to a DNA fragment characterized in that it has substantially as sequence one of the sequences coding for a cohesin domain of the OlpB protein chosen from the sequence of nucleotides 85 to 570, the sequence of nucleotides 619 to 1095, the sequence of nucleotides 1225 to 1689 and the sequence of nucleotides 1819 to 2189 in IDES No. 2 as well as a DNA fragment characterized in that it has substantially one of the sequences sequences coding for a cohesin domain of ORF2 chosen from the sequence of nucleotides 109 to 582 and the sequence of nucleotides n "625 to 1092 in IDS n" 3.

La présente invention a également pour objet des fragments d'ADN qui ont pour une séquence une séquence complémentaire ou homologue ou complémentaire de l'homologue d'un des fragments d'ADN tels que définis ci-dessus. The present invention also relates to DNA fragments which have for a sequence a sequence complementary or homologous or complementary to the homolog of one of the DNA fragments as defined above.

Par "fragment d'ADN homologue" on entend des fragments qui codent pour des polypeptides homologues comme cela a été décrit précédemment. By "homologous DNA fragment" is meant fragments which code for homologous polypeptides as described above.

La présente invention a également pour objet des fragments d'ADN capables de s'hybrider dans des conditions faiblement stringentes (19) avec un fragment d'ADN selon l'invention tel que défini précédemment. The present invention also relates to DNA fragments capable of hybridizing under weakly stringent conditions (19) with a DNA fragment according to the invention as defined above.

La présente invention concerne également des complexes comportant au moins un composé tel que décrit précédemment lié par une interaction C/D de type II avec un composé comportant au moins un domaine dockerine de type II, chaque composé constituant un élément du complexe. The present invention also relates to complexes comprising at least one compound as described above linked by a C / D type II interaction with a compound comprising at least one dockerine type II domain, each compound constituting an element of the complex.

Il s'agit notamment d'un complexe multimérique caractérisé en ce que au moins deux des "éléments" du complexe sont liés par une interaction C/D de type II. De préférence le complexe comportera au moins trois "éléments" dont deux des "éléments" sont liés par une interaction autre que C/D de type II par exemple par une interaction C/D de type I. It is in particular a multimeric complex characterized in that at least two of the "elements" of the complex are linked by a type II C / D interaction. Preferably the complex will comprise at least three "elements" of which two of the "elements" are linked by an interaction other than C / D of type II for example by a C / D interaction of type I.

Par "élément" on désignera un composé selon l'invention qui pourra éventuellement comporter un autre domaine de liaison : interaction C/D de type I par exemple, ou bien un composé comportant un seul domaine de liaison différent de l'interaction C/D de type II mais capable de se fixer sur un composé selon l'invention. The term “element” denotes a compound according to the invention which may possibly comprise another binding domain: C / D interaction of type I for example, or else a compound comprising a single binding domain different from the C / D interaction type II but capable of binding to a compound according to the invention.

En utilisant judicieusement les divers types d'interactions, il est possible d'obtenir des complexes ayant des structures variées. La structure du complexe multimérique ou la structure du complexe de type II selon l'invention peuvent être ainsi de type linéaire ou greffée ou bien de type mixte. By judiciously using the various types of interactions, it is possible to obtain complexes having various structures. The structure of the multimeric complex or the structure of the type II complex according to the invention can thus be of the linear or grafted type or else of the mixed type.

Un complexe multimérique de type linéaire comprend un enchaînement de composés selon l'invention, ne comportant que deux domaines de liaison chacun. Un tel complexe est représenté à la figure I B. A linear type multimeric complex comprises a chain of compounds according to the invention, comprising only two binding domains each. Such a complex is shown in Figure I B.

Au contraire, une structure greffée comporte en général une molécule de structure avec un certain nombre de domaines de liaison et des greffons protéines par exemple, ne comportant qu'un seul domaine, ce type de structure est schématisé à la figure 1A et 1C. On the contrary, a grafted structure generally comprises a structural molecule with a certain number of binding domains and protein grafts for example, comprising only one domain, this type of structure is shown diagrammatically in FIGS. 1A and 1C.

Bien entendu, il est possible de prévoir des structures qui combinent ces deux structures de base, on peut même prévoir des structures cycliques. Of course, it is possible to provide structures which combine these two basic structures, it is even possible to provide cyclic structures.

En effectuant une fixation de façon séquentielle on peut ainsi obtenir un complexe de structure bien défini, ce qui est particulièrement intéressant pour obtenir des complexes enzymatiques. By sequentially fixing it is thus possible to obtain a complex with a well-defined structure, which is particularly advantageous for obtaining enzymatic complexes.

Les composés selon la présente invention peuvent être obtenus par génie génétique lorsqu'il s'agit de protéines. Lorsque les composés comportent des éléments non protéiques, ceux-ci peuvent être greffés par des moyens connus notamment par réactions chimiques pour les liaisons covalentes ou par des liaisons non covalentes. The compounds according to the present invention can be obtained by genetic engineering when it is a question of proteins. When the compounds contain non-protein elements, these can be grafted by known means in particular by chemical reactions for the covalent bonds or by non-covalent bonds.

Une première façon de mettre en oeuvre l'invention consiste à fusionner au moyen du génie génétique des domaines cohésine respectivement dockerines de type différent, de façon à construire des protéines de charpente comportant ces domaines en nombre et en ordre définis. Parallèlement, des domaines dockerines respectivement cohésines adéquats sont greffés sur des protéines étrangères, par exemple des enzymes, que l'on désire associer dans un ordre choisi le long de la protéine de charpente, on obtient ainsi une structure greffée correspondant à la figure 1A ou C. Ce type de mise en oeuvre conduit à des complexes se rapprochant du cellulosome naturel. A first way of implementing the invention consists in fusing, by genetic engineering, cohesin domains respectively dockerins of different type, so as to construct framework proteins comprising these domains in number and in defined order. At the same time, suitable dockerine or cohesin domains respectively are grafted onto foreign proteins, for example enzymes, which it is desired to associate in a chosen order along the framework protein, a grafted structure corresponding to FIG. 1A is thus obtained or C. This type of implementation leads to complexes approaching the natural cellulosome.

Ces complexes selon l'invention pourront comprendre de préférence pour chaque composé des segments peptidiques de jonction, de longueur et de séquence appropriées. Par exemple, les constructions reprendront les segments de jonction naturels riches en proline et/ou hydroxy amino acides présents dans les polypeptides naturels. These complexes according to the invention may preferably comprise, for each compound, peptide segments of junction, of appropriate length and sequence. For example, the constructions will take up the natural junction segments rich in proline and / or hydroxy amino acids present in the natural polypeptides.

L'incorporation des protéines que l'on désire associer s'effectue par l'intermédiaire d'un domaine cohésine ou dockerine greffée par exemple au moyen du génie génétique.The incorporation of the proteins which it is desired to associate takes place via a cohesin or dockerine domain grafted for example by means of genetic engineering.

Dans le complexe selon la présente invention, le nombre d'éléments du multimère est compris entre 1 et 50 éléments associés entre eux et de préférence 1 et 20. In the complex according to the present invention, the number of elements of the multimer is between 1 and 50 elements associated with each other and preferably 1 and 20.

Dans un mode de réalisation, chaque élément du complexe comprend des domaines cohésines ou des domaines dockerine. In one embodiment, each element of the complex comprises cohesin domains or dockerine domains.

Mais il est possible de prévoir des éléments comportant des domaines cohésines et dockerines. However, it is possible to provide elements comprising cohesin and docker domains.

La présente invention a également pour objet un fragment d'ADN codant pour un élément du complexe selon l'invention. The present invention also relates to a DNA fragment coding for an element of the complex according to the invention.

D'une manière générale, la présente invention a également pour objet les vecteurs d'expression comprenant un fragment d'ADN selon l'invention placé sous le contrôle d'éléments assurant son expression dans une cellule hôte de type eucaryote ou dans un hôte bactérien tel qu'une souche de E.coli transformée par un vecteur d'expression selon l'invention, et un procédé de préparation d'un polypeptide selon l'invention ou d'une protéine selon l'invention caractérisé en ce qu'on réalise la culture de cellules hôtes transformées à l'aide d'un vecteur d'expression selon l'invention ou par culture d'une souche de E.coli selon l'invention. In general, the present invention also relates to the expression vectors comprising a DNA fragment according to the invention placed under the control of elements ensuring its expression in a host cell of eukaryotic type or in a bacterial host such as a strain of E. coli transformed with an expression vector according to the invention, and a process for the preparation of a polypeptide according to the invention or of a protein according to the invention characterized in that one carries out the culture of host cells transformed using an expression vector according to the invention or by culture of an E. coli strain according to the invention.

Enfin la présente invention fournit une composition enzymatique comprenant plusieurs enzymes réunis afins de les faire agir ensemble et le cas échéant potentialiser leur synergie, par l'intermédiaire d'un complexe multimérique sur chacun desquels est couplé une enzyme différente. Finally, the present invention provides an enzymatic composition comprising several enzymes united in order to make them act together and if necessary potentiate their synergy, by means of a multimeric complex on each of which is coupled a different enzyme.

La présente invention concerne des compositions comportant au moins un complexe multimérique présentant au moins un domaine d'interaction C/D de type II. The present invention relates to compositions comprising at least one multimeric complex having at least one type II C / D interaction domain.

En particulier, une composition enzymatique selon l'invention peut comprendre deux enzymes réunies afin de les faire agir ensemble et le cas échéant potentialiser leur synergie, par l'intermédiaire d'un complexe selon l'invention comportant une première enzyme, et une seconde enzyme liée par interaction C/D de type II. In particular, an enzymatic composition according to the invention can comprise two enzymes joined together in order to make them act together and if necessary potentiate their synergy, via a complex according to the invention comprising a first enzyme, and a second enzyme linked by type II C / D interaction.

Dans une variante avantageuse de réalisation, ledit complexe comporte un polypeptide comprenant un domaine cohésine selon l'invention, couplé à un domaine dockerine de la protéine Cip A couplé à un premier enzyme, et un deuxième élément comprend un domaine dockerine d'une sous unité catalytique du complexe cellulolytique de
Clostridium thermocellum couplé à une seconde enzyme, qui se liera au domaine cohésine.In an advantageous variant embodiment, said complex comprises a polypeptide comprising a cohesin domain according to the invention, coupled to a docker domain of the Cip A protein coupled to a first enzyme, and a second element comprises a docker domain of one subunit catalytic of the cellulolytic complex of
Clostridium thermocellum coupled to a second enzyme, which will bind to the cohesin domain.

Les complexes multimériques selon l'invention sont plus particulièrement utilisables lorsque lesdits complexes multimériques potentialisent la synergie des éléments des complexes, notamment lorsqu'il s'agit d'une composition enzymatique. The multimeric complexes according to the invention are more particularly usable when said multimeric complexes potentiate the synergy of the elements of the complexes, in particular when it is an enzymatic composition.

La présente invention concerne également un procédé de détection d'un antigène ou d'un anticorps par la mise en contact d'un complexe multimérique selon l'invention avec une solution contenant un anticorps ou un antigène d'intérêt et la révélation de la réaction entre le complexe multimérique et l'antigène ou l'anticorps. The present invention also relates to a method for detecting an antigen or an antibody by bringing a multimeric complex according to the invention into contact with a solution containing an antibody or an antigen of interest and revealing the reaction between the multimeric complex and the antigen or antibody.

La révélation peut se faire par marquage radioactif du complexe anticorps ou antigène ou par visualisation en utilisant des marquages non isotopiques, par exemple de type avidine - biotine ou tout autre marquage équivalent. The revelation can be done by radioactive labeling of the antibody or antigen complex or by visualization using non-isotopic labels, for example of the avidin-biotin type or any other equivalent labeling.

D'autres caractéristiques et avantages de la présente invention apparaîtront à la lumière de la description détaillée qui va suivre. Cette description fait référence aux figures 1 à 6. Other characteristics and advantages of the present invention will appear in the light of the detailed description which follows. This description refers to Figures 1 to 6.

La figure 1 schématise la structure de complexe multimérique selon l'invention. Figure 1 shows schematically the structure of multimeric complex according to the invention.

La figure 2 représente une carte de restriction de la région comprenant le gène sdbA, et construction de pCT1830, pCT1831 et pCT1832, codant pour SdbA-N, SdbA-C et SdbA, respectivement. E: EcoRI; K: Kpnl; P:
PstI; Sa: Sali; Sc: Sac I; Sp: SphI; SCM: site de clonage multiple. Les positions des segments codant pour les diverses régions identifiées dans SdbA sont indiquées par des cadres de dessins différents. Les nombres se réfèrent à la séquence nucléotidique (figure 3). Les nucléotides qui ont été changés dans la séquence amplifiée par PCR sont indiqués en gras. L'ADN du vecteur POUR30 est indiqué par un trait mince. La FIG. 2 represents a restriction map of the region comprising the sdbA gene, and construction of pCT1830, pCT1831 and pCT1832, coding for SdbA-N, SdbA-C and SdbA, respectively. E: EcoRI; K: Kpnl; P:
PstI; Sa: Dirty; Sc: Bag I; Sp: SphI; SCM: multiple cloning site. The positions of the segments coding for the various regions identified in SdbA are indicated by different drawing frames. The numbers refer to the nucleotide sequence (Figure 3). Nucleotides that have been changed in the PCR amplified sequence are shown in bold. The DNA of the vector POUR30 is indicated by a thin line. The

La figure 3 représente une séquence nucléotidique de la région codant pour le gène sdbA. Le site de liaison ribosomique supposé est souligné. Les diverses régions identifiées dans SdbA sont indiquées par des cadres de même dessin que sur la figure 2. SLR: site de liaison ribosomique. FIG. 3 represents a nucleotide sequence of the region coding for the sdbA gene. The assumed ribosomal binding site is underlined. The various regions identified in SdbA are indicated by boxes of the same design as in FIG. 2. SLR: ribosomal binding site.

La figure 4 représente l'alignement du domaine cohésine de
SdbA et des domaines cohésine de OlpB et ORF2p (9). Les résidus qui sont identiques ou similaires à la majorité des séquences représentées sont indiqués sur un fond ombré. La numérotation des résidus commence avec des codons d'initiation supposés. Les aminoacides similaires sont: F, I, V, L et M; R et K; S et T; D et E; N et Q; et F, Y et W. FIG. 4 represents the alignment of the cohesin domain of
SdbA and cohesin domains of OlpB and ORF2p (9). Residues which are identical or similar to the majority of the sequences represented are indicated on a shaded background. The numbering of the residuals begins with assumed initiation codons. Similar amino acids are: F, I, V, L and M; R and K; S and T; D and E; N and Q; and F, Y and W.

La figure 5 représente la similarité entre les résidus 264 à 275 de SdbA et d'un motif présent dans les protéines M de Streptococcus ovoBenes. M1: (numéro de dépôt GenBank x72752), M9 (24), PAM (3),
M12 (26). Pour chaque protéine, la numérotation commence avec le codon d'initation supposé. Les résidus qui sont identiques ou similaires dans la majorité des séquences représentées sont indiqués sur un fond ombré. Les critères de similarité sont les mêmes que pour la figure 4.FIG. 5 represents the similarity between residues 264 to 275 of SdbA and of a motif present in the M proteins of Streptococcus ovoBenes. M1: (GenBank filing number x72752), M9 (24), PAM (3),
M12 (26). For each protein, the numbering begins with the supposed initiation codon. Residues which are identical or similar in the majority of the sequences represented are indicated on a shaded background. The similarity criteria are the same as for Figure 4.

La figure 6 représente l'alignement des segments répétés COOHterminaux de SdbA avec les séquences similaires d'autres protéines de suface cellulaire. OlpA: protéine A de couche externe de C. thermocellum (9);
OlpB: protéine B de couche externe de C. thermocellum (9); Pul: pullulanase de T. thermosulfurinenes EM1 (20); Bsph: protéine de couche S de 13. Figure 6 shows the alignment of the COOHterminal repeats of SdbA with similar sequences of other cell surface proteins. OlpA: C. thermocellum outer layer protein A (9);
OlpB: C. thermocellum outer layer B protein (9); Pul: pullulanase from T. thermosulfurinenes EM1 (20); Bsph: layer S protein 13.

sohaericus (4). Pour chaque protéine, la numérotation commence au niveau du codon d'initiation supposé. Les résidus qui sont semblables ou identiques dans au moins huit segments sont indiqués sur fond ombré. Les critères de similarité sont les mêmes que pour la figure 4. sohaericus (4). For each protein, the numbering begins at the level of the supposed initiation codon. Residues that are similar or identical in at least eight segments are indicated on a shaded background. The similarity criteria are the same as for Figure 4.

1. MATERIEL ET MéTHODES 1. Souches bactériennes. plasmides et conditions de culture
Les souches bactériennes et les plasmides utilisés dans cette étude sont récapitulés dans le tableau 1. La souche TG1 d'Escherichia coli a été utilisée pour le clonage et le séquencage. Les protéines ont été produites dans E. coli M15 (pREP4).1. MATERIALS AND METHODS 1. Bacterial strains. plasmids and culture conditions
The bacterial strains and the plasmids used in this study are summarized in Table 1. The TG1 strain of Escherichia coli was used for cloning and sequencing. The proteins were produced in E. coli M15 (pREP4).

C. thermocellum a été cultivé dans des conditions anaérobies, à 60"C dans du milieu CM3-3 complété avec 5 g de cellobiose par litre (31). C. thermocellum was cultivated under anaerobic conditions, at 60 ° C in CM3-3 medium supplemented with 5 g of cellobiose per liter (31).

On a cultivé E. coli à 37"C, dans du milieu de Luria Bertani (19). E. coli was grown at 37 ° C in Luria Bertani's medium (19).

On a ajouté des antibiotiques en fonction des plasmides présents dans l'hôte: 100 Fg/ml de ticarcilline, 30 crg/ml de chloramphénicol, 25 yg/ml de kanamycine.Antibiotics were added depending on the plasmids present in the host: 100 Fg / ml ticarcillin, 30 crg / ml chloramphenicol, 25 yg / ml kanamycin.

2. Manipulations d'ADN
L'ADN génomique de C. thermocellum a été purifié par la méthode de Marmur modifiée par Quiviger et coll. (25). D'autres manipulations d'ADN ont été effectuées selon Ausubel et coll. (1). On a utilisé les enzymes de restriction en suivant les recommandations des fournisseurs.2. DNA manipulation
The genomic DNA of C. thermocellum was purified by the method of Marmur modified by Quiviger et al. (25). Other DNA manipulations were carried out according to Ausubel et al. (1). Restriction enzymes were used following supplier recommendations.

Les amorces oligonucléotidiques ont été synthétisées par
Eurogentec SA (Sering, Belgique) ou Genset SA (Paris, France). On a effectué l'amplification par PCR selon Saiki et coll. (27), en utilisant 100 pmoles de chaque amorce oligonucléotidique dans un mélange réactionnel de 100 Crl. The oligonucleotide primers were synthesized by
Eurogentec SA (Sering, Belgium) or Genset SA (Paris, France). PCR amplification was performed according to Saiki et al. (27), using 100 pmol of each oligonucleotide primer in a reaction mixture of 100 Crl.

MgC12 a été ajouté jusqu'à une concentration finale de 2 mM. On a effectué 35 cycles d'amplification. Les paramètres étaient les suivants: hybridation: 1 minute à 65"C; extension: 1 minute à 72"C; et dénaturation: 1 minute à 94"C. MgC12 was added to a final concentration of 2 mM. 35 amplification cycles were performed. The parameters were as follows: hybridization: 1 minute at 65 "C; extension: 1 minute at 72" C; and denaturation: 1 minute at 94 "C.

On a toujours vérifié la séquence des fragments clonés obtenus par PCR.The sequence of the cloned fragments obtained by PCR has always been verified.

3. Construction de la banaue rrénomiaue de C. thermocellum
L'ADN de C. thermocellum a été partiellement digéré par
Sau3AI, et les fragments ont été séparés sur un gradient de saccharose. Des fragments de plus de 12 kb ont été insérés dans le plasmide pUC18 coupé par
BamHI, et traités par de la phosphatase alcaline bactérienne (Ready-to-go,
Pharmacia). Des cellules de E. coli TG1 ont été transformées par électroporation et étalées en présence de 0,8 mg de S-bromo-4-chloro-3-indolyl-ss-D-galactoside par plaque et 0,2 mg d'isopropyl-ss-D-thiogalactoside (IPTG) par plaque.3. Construction of the C. thermocellum rhenoid bana
The DNA of C. thermocellum was partially digested by
Sau3AI, and the fragments were separated on a sucrose gradient. Fragments larger than 12 kb were inserted into the plasmid pUC18 cut by
BamHI, and treated with bacterial alkaline phosphatase (Ready-to-go,
Pharmacia). E. coli TG1 cells were transformed by electroporation and plated in the presence of 0.8 mg of S-bromo-4-chloro-3-indolyl-ss-D-galactoside per plate and 0.2 mg of isopropyl- ss-D-thiogalactoside (IPTG) per plate.

4. Criblage de colonie et repérage de protéines transférées sur membrane
On a criblé comme décrit (8) les clones recombinants, en recherchant la fixation de CelC-DsCipA marquée au 125il
Pour identifier les polypeptides porteurs de domaines cohésine de type II, on a analysé les protéines par SDS-PAGE (14) et on les a tranférées sur une membrane en Nylon (Hybond-N+, Amersham) (1). La membrane a été mise à incuber avec CelC-DsCelD et CelC-DsCipA marquées au 125I, lavée et autoradiographiée comme décrit précédemment (29, 32).4. Colony screening and detection of proteins transferred to the membrane
The recombinant clones were screened as described (8), looking for the binding of CelC-DsCipA labeled with 125il.
To identify the polypeptides carrying type II cohesin domains, the proteins were analyzed by SDS-PAGE (14) and they were transferred to a nylon membrane (Hybond-N +, Amersham) (1). The membrane was incubated with 125C labeled CelC-DsCelD and CelC-DsCipA, washed and autoradiographed as previously described (29, 32).

5. Séquences d'ADN et analvse des séquences
Les fragments de restriction appropriés de pCT1801 ont été sous-clonés dans le plasmide pBCSK, et on a engendré des délétions emboîtées en utilisant de lsexonucléase III et de la nucléase S1 (nécessaire
Erase-a-base, Promega), comme indiqué par le fournisseur. On a séquencé les matrices monocaténaires conformément à la méthode de terminaison de chaîne didésoxy de Sanger et coll. (30), en utilisant les nécessaires
Sequenase et Taquence (USB-Amersham). La séquence a été déterminée au moins une fois sur chaque brin. L'analyse par ordinateur des données des séquences a été effectuée au moyen du logiciel Sequence Analysis Software
Package de Genetic Computer Group, version 7 (University of Wisconsin) (6).5. DNA sequences and sequence analysis
The appropriate restriction fragments of pCT1801 were subcloned into the plasmid pBCSK, and nested deletions were generated using lononuclease III and nuclease S1 (required
Erase-a-base, Promega), as indicated by the supplier. The single-stranded matrices were sequenced according to the dideoxy chain termination method of Sanger et al. (30), using the necessary
Sequenase and Taquence (USB-Amersham). The sequence has been determined at least once on each strand. Computer analysis of sequence data was performed using Sequence Analysis Software
Genetic Computer Group package, version 7 (University of Wisconsin) (6).

6. Construction de clones d'expression et purifications de nrotéine
En utilisant le vecteur pQE-30, on a construit des clones produisant en excès des formes de SdbA intactes ou comprenant des délétions. Ainsi, on a fait fusionner la séquence codant pour le polypeptide recherché avec un segment codant pour 6 résidus His, pour faciliter la purification (13). Afin de cloner le fragment codant pour le domaine
NH2-terminal de SdbA, on a synthétisé par PCR un fragment de 670 pb encadré par BamHI et PstI (figure 1). L'amorce directe était
5'-CTG CCG GCG GGA TCC GCA AGG GCA GAT-3' et l'amorce inverse était
5'-ACT TIT GCA GAA TIT TCT GCA GGC G3'. 6. Construction of expression clones and purification of nrotéine
Using the vector pQE-30, clones were constructed which either produced excess forms of SdbA intact or contained deletions. Thus, the sequence encoding the desired polypeptide was fused with a segment encoding 6 His residues, to facilitate purification (13). In order to clone the fragment coding for the domain
NH2-terminal of SdbA, a 670 bp fragment framed by BamHI and PstI was synthesized by PCR (FIG. 1). The direct primer was
5'-CTG CCG GCG GGA TCC GCA AGG GCA GAT-3 'and the reverse primer was
5'-ACT TIT GCA GAA TIT TCT GCA GGC G3 '.

Le fragment a été inséré entre les sites BamHI et Pstl de pQE30, pour donner pCT1830. Le polypeptide codé par pCT1830 a été dénommé SdbA-N.The fragment was inserted between the BamHI and PstI sites of pQE30, to give pCT1830. The polypeptide encoded by pCT1830 was designated SdbA-N.

Pour cloner la région codant pour les domaines COOI-lterminaux de SdbA, on a fait digérer par BamHI le plasmide pCT1801. Les extrémités ont été complétées et converties en extrémités franches à l'aide du fragment de
Klenow de l'ADN polymérase. Après nouvelle coupure par PstI, le fragment de 1,4 kb, codant pour les domaines COOH-terminaux, a été purifié et inséré dans le vecteur pQE-30 qui avait été digéré par Hindlll, traité par le fragment de Klenow de l'ADN polymérase et digéré à nouveau par Pst I. Le plasmide résultant a été dénommé pCT1831, et le polypeptide codé a été dénommé SdbA-C.To clone the region coding for the COOI-lterminal domains of SdbA, the plasmid pCT1801 was digested with BamHI. The ends were completed and converted to blunt ends using the fragment of
Klenow of DNA polymerase. After further cleavage by PstI, the 1.4 kb fragment, coding for the COOH-terminal domains, was purified and inserted into the vector pQE-30 which had been digested with Hindlll, treated with the Klenow fragment of DNA. polymerase and digested again with Pst I. The resulting plasmid was named pCT1831, and the encoded polypeptide was named SdbA-C.

Le plasmide pCT1832, exprimant la séquence complète de SdbA, a été construit par insertion du fragment Bamll-Pstl de 670 pb (voir plus haut) dans le plasmide pCT1831 digéré par BamHI et Pstl. The plasmid pCT1832, expressing the complete sequence of SdbA, was constructed by insertion of the Bam11-Pstl fragment of 670 bp (see above) in the plasmid pCT1831 digested with BamHI and Pstl.

La production et la purification des protéines ont été effectuées au moyen du système Qjaexpress (OlAGEN Inc.). Des cultures de 1 litre ont été mises à incuber à 37"C jusqu'à une DO600 de 0,7. On y a ensuite ajouté de l'IPTG jusqu'à une concentration finale de 0,3 mM, et les cultures ont été mises à nouveau à incuber pendant 5 heures à 37 C. On a remis les cellules en suspension dans 80 ml de Tris.HC1 50 mM, pH 7,5 (tampon A) et on les a lysées au moyen d'une presse de French Aminco, sous une pression de 100 MPa. On a centrifugé l'extrait à 9 000 g pendant 20 minutes afin d'éliminer les débris cellulaires. On a injecté le surnageant dans une colonne de 8 ml de résine Ni-NTA équilibrée avec du tampon A, on a lavé la colonne avec du tampon A et on l'a éluée avec le même tampon contenant 250 mM d'imidazole. Les fractions éluées ont été dialysées pendant une nuit à 4"C, contre 1 litre de tampon A. Les protéines purifiées ont été conservées à -80 C. Protein production and purification was performed using the Qjaexpress system (OlAGEN Inc.). 1 liter cultures were incubated at 37 "C to an OD600 of 0.7. IPTG was then added to a final concentration of 0.3 mM, and the cultures were Incubated again for 5 hours at 37 C. The cells were resuspended in 80 ml of 50 mM Tris.HC1, pH 7.5 (buffer A) and they were lysed using a French press Aminco, at a pressure of 100 MPa. The extract was centrifuged at 9000 g for 20 minutes in order to remove the cellular debris. The supernatant was injected into a column of 8 ml of Ni-NTA resin balanced with buffer A, the column was washed with buffer A and eluted with the same buffer containing 250 mM imidazole. The eluted fractions were dialyzed overnight at 4 "C, against 1 liter of buffer A. purified proteins were stored at -80 C.

7. Détermination de séquence amino-terminale d'aminoacides
50 pmoles de chaque polypeptide à séquencer ont été séparées par SDS-PAGE et transférées pendant une nuit, à la température ambiante, à 850 mA sur une membrane en PVDF [poly(chlorure de vinylidène)] hydrophobe (Problott, Applied Biosystem) traitée par du méthanol à 100 %, au moyen d'un système Trans-Blot Cell (BioRad) contenant 50 mM de Tris (base), 50 mM de tampon acide borique. On a coloré les bandes au noir amide à 0,003 96, on les a excisées, et on a déterminé la séquence amino-terminale des polypeptides par la méthode d'Edman, en utilisant un appareil de séquencage 473A ou Procise HT (Applied Biosystem).7. Determination of amino-terminal sequence of amino acids
50 pmol of each polypeptide to be sequenced were separated by SDS-PAGE and transferred overnight, at ambient temperature, at 850 mA on a hydrophobic PVDF [poly (vinylidene chloride)] membrane (Problott, Applied Biosystem) treated with 100% methanol, using a Trans-Blot Cell system (BioRad) containing 50 mM Tris (base), 50 mM boric acid buffer. The bands were amid black 0.003 96, excised, and the amino terminal sequence of the polypeptides was determined by the Edman method, using a 473A or Procise HT sequencer (Applied Biosystem).

Il. RESULTATS 1. Clonage d'un gène codant pour un polvpeptide se fixant spécifiquement au domaine dockerine de CiDA
On a criblé 1 600 clones recombinants en recherchant la fixation de CelC-DsCipA marquée au 1251. Huit clones indépendants ont été marqués spécifiquement. Les contrôles effectués avec de la CelC-DsCelD marquée au 125I ont indiqué que la fixation était spécifique pour le domaine dockerine de CipA (figure 2).He. RESULTS 1. Cloning of a gene coding for a polvpeptide binding specifically to the dockerine domain of CiDA
1,600 recombinant clones were screened for the binding of 1251-labeled CelC-DsCipA. Eight independent clones were specifically labeled. The controls carried out with CelC-DsCelD labeled with 125I indicated that the binding was specific for the dockerine domain of CipA (FIG. 2).

Tous les segments clonés s'hybrident avec la même région du génome de C. thermocellum (données non représentées), dont la carte est représentée sur la figure 1. Ces cartes de restriction sont en accord avec les fragments de restriction révélés par analyse Southern blot dans l'ADN de C thermocellum (données non représentées). Les segments ne s'hybridaient pas et nlont pas de fragments de restriction en commun avec la région comprenant cipA et olpA (9). Dans la région couverte par les fragments clonés, un segment de 1,6 kb, compris entre le site Pstl et la limite gauche de l'insert porté par pCT1801 (figure 1), est nécessaire et suffisant pour coder pour un polypeptide capable de fixer le domaine dockerine de CipA. Le gène correspondant a été dénommé sdbA. All the cloned segments hybridize with the same region of the genome of C. thermocellum (data not shown), the map of which is represented in FIG. 1. These restriction maps are in agreement with the restriction fragments revealed by Southern blot analysis in the DNA of C thermocellum (data not shown). The segments did not hybridize and do not have restriction fragments in common with the region comprising cipA and olpA (9). In the region covered by the cloned fragments, a 1.6 kb segment, between the Pst1 site and the left limit of the insert carried by pCT1801 (FIG. 1), is necessary and sufficient to code for a polypeptide capable of fixing CipA dockyard area. The corresponding gene was named sdbA.

2. Analvse de la séquence
La séquence du gène de la SdbA est représentée sur la figure 3.2. Analysis of the sequence
The sequence of the SdbA gene is shown in Figure 3.

La séquence codante comprend 1 893 nucléotides. Le codon d'initiation ATG est précédé d'un site de liaison ribosomique supposé. Le polypeptide codé, composé de 631 aminoacides, a une masse moléculaire calculée de 68 577 Da.The coding sequence includes 1,893 nucleotides. The ATG initiation codon is preceded by a supposed ribosomal binding site. The encoded polypeptide, composed of 631 amino acids, has a calculated molecular mass of 68,577 Da.

La structure de domaines de la protéine est représentée sur les figures 1 et 3.The domain structure of the protein is shown in Figures 1 and 3.

Un peptide signal supposé de 26 résidus aminoacide est localisé à l'extrémité
NH2-terminale du polypeptide (36). Des alignements avec d'autres protéines indiquent la présence de trois régions distinctes dans SdbA. La région
N-terminale, composée de 156 résidus aminoacide, est semblable aux segments N-terminaux répétés de OlpB (dénommé précédemment ORFlp) et
ORF2p de C. thermocellum, deux polypeptides dont les gènes sont localisés immédiatement en aval de cipA (9) ( figure 4). Un espaceur de 56 résidus, riche en Pro/Thr/Ser, sépare cette région du reste de la protéine. La région centrale est composée de 215 aminoacides, avec de nombreux résidus Lys.A supposed signal peptide with 26 amino acid residues is located at the end
NH2-terminal of the polypeptide (36). Alignments with other proteins indicate the presence of three distinct regions in SdbA. The region
N-terminal, composed of 156 amino acid residues, is similar to the repeating N-terminal segments of OlpB (previously called ORFlp) and
ORF2p from C. thermocellum, two polypeptides whose genes are located immediately downstream of cipA (9) (Figure 4). A 56-residue spacer, rich in Pro / Thr / Ser, separates this region from the rest of the protein. The central region is made up of 215 amino acids, with many Lys residues.

Cette région comprend une courte séquence d'aminoacides semblable à un segment présent dans les protéines M de Streptococcus pyogenes (figure 5).This region includes a short sequence of amino acids similar to a segment present in the M proteins of Streptococcus pyogenes (Figure 5).

La région COOH-terminale est composée de ces segments répétés qui sont très semblables aux segments dénommés SLH (S-layer homologous = homologues à la couche S), présents dans plusieurs protéines localisées sur la surface cellulaire de diverses bactéries (9, 18) (figure 6).The COOH-terminal region is composed of these repeated segments which are very similar to the segments known as SLH (S-layer homologous), present in several proteins located on the cell surface of various bacteria (9, 18) ( figure 6).

3. Identification du domaine responsable de la fixation du domaine dockerine de CipA
Afin d'identifier le domaine responsable de la fixation du domaine dockerine de CipA, on a comparé les propriétés de liaison de polypeptides dérivés de SdbA. Le gène sdbA et des sous-fragments appropriés ont été fusionnés avec le vecteur d'expression pC9-30 codant pour His6, et les polypeptides correspondants ont été purifiés par chromatographie d'affinité à Ni (24). Les masses moléculaires apparentes de la protéine SdbA intacte et du fragment contenant les régions centrale et
C-terminale sont de 60 kDa et 36 kDa, respectivement, en accord avec les masses prédites à partir de la séquence (figure 7A). La masse moléculaire apparente du domaine NH2-terminal était égale à 35 kDa, et était supérieure à la masse moléculaire calculée à partir de la séquence (22 715 Da). Toutefois, le fragment comprend le segment de jonction riche en résidus Pro, ce qui peut expliquer une lente migration dans la SDS-PAGE (10). Les préparations de SdbA intacte et du polypeptide COOH-terminal contenaient l'une et l'autre un second polypeptide de 24 kDa. Dans les deux cas, la séquence
NH2-terminale de ce polypeptide est SKYAVSY, ce qui indique qu'elle est dérivée de la région COOH-terminale contenant les segments SLH répétés de
SdbA. Etant donné que les segments SLH répétés ne contiennent pas de groupement de résidus histidine, le fragment COOH-terminal est probablement lié aux polypeptides intacts. En effet, il a été rapporté que des polypeptides contenant des segments SLH répétés s'auto-associent (17).3. Identification of the domain responsible for fixing the dockerine domain of CipA
To identify the domain responsible for binding the docker domain of CipA, the binding properties of polypeptides derived from SdbA were compared. The sdbA gene and appropriate subfragments were fused with the expression vector pC9-30 encoding His6, and the corresponding polypeptides were purified by Ni affinity chromatography (24). The apparent molecular masses of the intact SdbA protein and of the fragment containing the central and
C-terminal are 60 kDa and 36 kDa, respectively, in agreement with the masses predicted from the sequence (Figure 7A). The apparent molecular weight of the NH2-terminal domain was 35 kDa, and was greater than the molecular weight calculated from the sequence (22,715 Da). However, the fragment includes the junction segment rich in Pro residues, which may explain a slow migration in the SDS-PAGE (10). Both the intact SdbA and the COOH-terminal polypeptide preparations contained a second 24 kDa polypeptide. In both cases, the sequence
NH2-terminal of this polypeptide is SKYAVSY, which indicates that it is derived from the COOH-terminal region containing the repeated SLH segments of
SdbA. Since the repeated SLH segments do not contain a group of histidine residues, the COOH-terminal fragment is probably linked to the intact polypeptides. Indeed, it has been reported that polypeptides containing repeated SLH segments self-associate (17).

L'analyse du criblage de colonies, en utilisant comme sonde
CelC-DsCipA marquée au 125I, a confirmé que le produit du gène sdbA se fixait au domaine dockerine de CipA (figure 7B). La fixation au fragment
NH2-terminal est moins intense, mais décelable. On n'a pas pu déceler de fixation au fragment C-terminal. Etant donné que la région NH2-terminale de SdbA est semblable aux segments NH2-terminaux répétés d'OlpB, on a contrôlé si CelC-DsCipA se fixait à MalE-ORFlp-N, une protéine chimère comprenant le premier segment NH2-terminal répété d'OlpB fusionné à la protéine de fixation du maltose, tvlalE (17). La colonne 5 de la figure 7B indique que MalE-ORFlp-N a été marquée. Aucune fixation n'a été observée avec MalEORFlp-C, qui consiste en les segments SLH C-terminaux d'OlpB fusionnés à MalE Ni SdbA, ni ORFlp-N, ni ORFlp-C n'ont été marquées après incubation avec CelC-DsCelD marquée au 1251 (données non représentées).Colony screening analysis, using as probe
125C-labeled CelC-DsCipA confirmed that the sdbA gene product binds to the dockerine domain of CipA (Figure 7B). Attachment to the fragment
NH2-terminal is less intense, but detectable. No attachment to the C-terminal fragment could be detected. Since the NH2-terminal region of SdbA is similar to the repeated NH2-terminal segments of OlpB, it was checked whether CelC-DsCipA binds to MalE-ORFlp-N, a chimeric protein comprising the first repeated NH2-terminal segment d 'OlpB fused to the maltose binding protein, tvlalE (17). Column 5 of Figure 7B indicates that MalE-ORFlp-N has been labeled. No fixation was observed with MalEORFlp-C, which consists of the SLH C-terminal segments of OlpB fused to MalE Neither SdbA, ORFlp-N, nor ORFlp-C were marked after incubation with CelC-DsCelD marked at 1251 (data not shown).

Des protéines portant des domaines dockerine peuvent être marquées au 1251 et utilisées comme sondes pour la détection de protéines contenant des domaines cohésine complémentaires (29, 32). Ainsi, on peut isoler des clones exprimant des polypeptides contenant des domaines cohésine, et on peut identifier les domaines cohésine (8). Dans la présente invention, on a appliqué la même stratégie pour cloner le gène sdbA et pour identifier le domaine cohésine responsable de la fixation du domaine dockerine de CipA. On a obtenu un seul gène. Il se peut que d'autres gènes codant pour des protéines ayant des propriétés similaires aient échappé à la détection, en raison d'une absence d'expression spontanée. Proteins carrying docker domains can be labeled with 1251 and used as probes for the detection of proteins containing complementary cohesin domains (29, 32). Thus, clones expressing polypeptides containing cohesin domains can be isolated, and cohesin domains can be identified (8). In the present invention, the same strategy was applied to clone the sdbA gene and to identify the cohesin domain responsible for binding the dockerine domain of CipA. We got a single gene. Other genes encoding proteins with similar properties may have gone unnoticed due to a lack of spontaneous expression.

Sur les trois polypeptides, p170, plI6 et p60, qui se sont précédemment révélés fixer le domaine dockerine de CipA (29), pl70 et pll6 sont trop longs pour être codés par sdbA, même en tenant compte de modifications post-traductionnelles, telles qu'une glycosylation. Le polypeptide p60 se révèle le seul candidat possible. Of the three polypeptides, p170, plI6 and p60, which have previously been shown to bind the docker domain of CipA (29), pl70 and pll6 are too long to be encoded by sdbA, even allowing for post-translational modifications, such as 'glycosylation. The p60 polypeptide appears to be the only possible candidate.

La figure 7 indique que le domaine cohésine se trouve dans la région NH2-proximale de SdbA. Le signal détecté avec le fragment
NH2-terminal est plus faible qu'avec la protéine entière; toutefois, on n'a pu détecter aucun signal en utilisant une quantité semblable du fragment
COOH-terminal. Le fait de tronquer SdbA peut avoir affecté l'affinité ou la stabilité du polypeptide NH2-terminal résiduel. Ou encore, la fixation à la nitrocellulose peut altérer la conformation du domaine cohésine, tandis que la fixation de la protéine intacte à la membrane peut être médiée par des régions du polypeptide non requises pour la fixation de la sonde marquée.Figure 7 indicates that the cohesin domain is in the NH2-proximal region of SdbA. The signal detected with the fragment
NH2-terminal is lower than with the whole protein; however, no signal could be detected using a similar amount of the fragment
COOH-terminal. Truncating SdbA may have affected the affinity or stability of the residual NH2-terminal polypeptide. Alternatively, attachment to nitrocellulose may alter the conhesin domain conformation, while attachment of the intact protein to the membrane may be mediated by regions of the polypeptide not required for attachment of the labeled probe.

Contrairement au domaine dockerine de CipA, qui est clairement apparenté aux domaines dockerine présents dans les sous-unités catalytiques, le domaine cohésine de SdbA ne présente pas de similarité évidente avec les domaines cohésine de CipA et OlpA. Toutefois, il est semblable aux segments répétés localisés à l'extrémité NH2-terminale de
OlpB et ORF2p (9). En effet, CelC-DsCipA marquée au 125I se fixe spécifiquement au premier segment répété NH2-terminal d'OlpB. Ainsi, les domaines NH2-terminaux de SdbA, OlpB et très probablement ORF2p représentent un nouveau type de domaine cohésine. C'est pourquoi, selon la présente invention on les dénomme "domaines cohésine de type II", et "domaines cohésine de type I" les domaines cohésine rencontrés dans CipA et à l'extrémité NH2-terminale d'OlpA.Unlike the docker domain of CipA, which is clearly related to the docker domains present in the catalytic subunits, the cohesin domain of SdbA does not show any obvious similarity with the cohesin domains of CipA and OlpA. However, it is similar to the repeated segments located at the NH2-terminus of
OlpB and ORF2p (9). Indeed, 125I-labeled CelC-DsCipA specifically binds to the first NH2-terminal repeat segment of OlpB. Thus, the NH2-terminal domains of SdbA, OlpB and most probably ORF2p represent a new type of cohesin domain. This is why, according to the present invention they are called "cohesin domains of type II", and "cohesin domains of type I" the cohesin domains encountered in CipA and at the NH2-terminal end of OlpA.

Les trois protéines Olp B, ORF2p et SdbA, qui sont connues comme contenant les domaines cohésine de type II, portent également des segments répétés SLH. Dans tous les cas étudiés jusqu'à présent, les segments répétés SLH se rencontrent dans des protéines qui sont associées à la surface cellulaire de bactéries, et des preuves biochimiques indiquent qu'ils se fixent à des composants de l'enveloppe cellulaire (17). Ainsi, SdbA peut être localisée sur la surface cellulaire, au même titre qu'OlpA (28) et OlpB (17). La similarité entre la région centrale de SdbA et une région présente dans les protéines M de Streptococcus vient à l'appui de cette hypothèse. Il a été supposé que dans les protéines M, cette région peut entrer en interaction avec des glucides de la paroi cellulaire (34). Prises dans leur ensemble, ces considérations suggèrent que SdbA, OlpB et éventuellement ORF2p sont des composants de l'enveloppe cellulaire qui sont impliqués dans la fixation de cellulosomes à la surface cellulaire. The three Olp B proteins, ORF2p and SdbA, which are known to contain type II cohesin domains, also carry SLH repeat segments. In all the cases studied so far, the repeated segments of SLH are found in proteins which are associated with the cell surface of bacteria, and biochemical evidence indicates that they bind to components of the cell envelope (17) . Thus, SdbA can be localized on the cell surface, in the same way as OlpA (28) and OlpB (17). The similarity between the central region of SdbA and a region present in the M proteins of Streptococcus supports this hypothesis. It has been hypothesized that in M proteins, this region can interact with carbohydrates in the cell wall (34). Taken together, these considerations suggest that SdbA, OlpB and possibly ORF2p are components of the cell envelope that are involved in the attachment of cellulosomes to the cell surface.

Alors que SdbA ne porte qu'un seul domaine cohésine, ces domaines sont répétés deux fois dans ORF2p et quatre fois dans OlpB. Ainsi, jusqu'à quatre molécules de CipA portant des sous-unités catalytiques fixées pourraient être groupées autour d'une molécule d'OlpB. Toutefois, ce fait seul ne suffit pas pour rendre compte de la formation d'agrégats très volumineux (polycellulosomes) allant jusqu'à 80 MDa, comme rapporté dans la référence (5). De tels agrégats doivent impliquer d'autres interactions, éventuellement au niveau des segments répétés SLH, qui sont reconnus se lier entre eux (17). While SdbA carries only one cohesin domain, these domains are repeated twice in ORF2p and four times in OlpB. Thus, up to four molecules of CipA carrying fixed catalytic subunits could be grouped around one molecule of OlpB. However, this fact alone is not sufficient to account for the formation of very bulky aggregates (polycellulosomes) of up to 80 MDa, as reported in reference (5). Such aggregates must imply other interactions, possibly at the level of the repeating SLH segments, which are known to bind together (17).

TABLEAU 1
Souches bactériennes et plasmides
Souches et plasmides Caractères significatifs Source de
Référence
Souches Escherichia coli TG1 [A(lac-pro) thi supE hsdD5/ (12)
F tra-36proA+B+lacIqlacZ#M15]
M15 (pREP4) (7,35),
nécessaire QIAexpress#
QIAGEN Inc.TABLE 1
Bacterial and plasmid strains
Strains and plasmids Significant characters Source of
Reference
Escherichia coli TG1 strains [A (lac-pro) thi supE hsdD5 / (12)
F tra-36proA + B + lacIqlacZ # M15]
M15 (pREP4) (7.35),
required QIAexpress #
QIAGEN Inc.

Clostridium thermocellum
NCIB 10682
Plasmides
pUC18 (38)
pBCSK- StratageneS
pQE-30 nécessaire QIAexpress#
QIAGEN Inc.Clostridium thermocellum
NCIB 10682
Plasmids
pUC18 (38)
pBCSK- StratageneS
pQE-30 required QIAexpress #
QIAGEN Inc.

pCT1801 dérivé de pUC18 contenant n CNCM
un fragment Sau3A codant 1-1684
pour SdbA
pCT1830 dérivé de pQE-30 codant pour n CNCM
le domaine cohésine de SdbA I-1684
soudé à 6 résidus His
pCT1831 dérivé de pQi-30 codant pour
les régions centrale et COOH- la présente
terminale de SdbA soudées étude
à 6 résidus ISis
pCT1832 dérivé de pQE-30 codant pour
SdbA soudé à 6 résidus His
BIBLIOGRAPHIE 1. F. M. AUSUBEL, R. BRENT, R. E. KINGSTON, D. D. MOORE, J. G. SEIDMAN, J. A.pCT1801 derived from pUC18 containing n CNCM
a Sau3A fragment encoding 1-1684
for SdbA
pCT1830 derived from pQE-30 encoding n CNCM
the cohesin domain of SdbA I-1684
welded to 6 His residues
pCT1831 derivative of pQi-30 encoding
the central and COOH regions - this
welded SdbA terminal study
with 6 ISis residues
pCT1832 derived from pQE-30 encoding
SdbA welded to 6 His residues
BIBLIOGRAPHY 1. FM AUSUBEL, R. BRENT, RE KINGSTON, DD MOORE, JG SEIDMAN, JA

SMITH, and K. STRUHL, 1990, Current Protocols in Molecular Biology, Greene
Publishing and Wiley Interscience, New York.SMITH, and K. STRUHL, 1990, Current Protocols in Molecular Biology, Greene
Publishing and Wiley Interscience, New York.

2. E.A. BAYER, E. MORAG, and R. LAMED, 1994, The cellulosome - a treasuretrove for biotechnology. Trends in Biotechnol. 12:379-386.2. E.A. BAYER, E. MORAG, and R. LAMED, 1994, The cellulosome - a treasuretrove for biotechnology. Trends in Biotechnol. 12: 379-386.

3. A. BERGE and U. SJOBRING, 1993, PAM, A novel plasminogen-binding protein from Streptococcus pyogenes. J. Biol. Chem. 268:25417-24.3. A. BERGE and U. SJOBRING, 1993, PAM, A novel plasminogen-binding protein from Streptococcus pyogenes. J. Biol. Chem. 268: 25417-24.

4. R.D. BOWDITCH, P. BAUMANN, and A.A. YOUSTEN, 1989, Cloning and sequencing of the gene encoding a 125-kilodalton surface-layer protein from Bacillus sphaericus 2362 and a related cryptic gene. J. Bacteriol.4. R.D. BOWDITCH, P. BAUMANN, and A.A. YOUSTEN, 1989, Cloning and sequencing of the gene encoding a 125-kilodalton surface-layer protein from Bacillus sphaericus 2362 and a related cryptic gene. J. Bacteriol.

171:41784188.171: 41784188.

5. COUGHLAN, M.P., K. HON-NAMI, H. HON-NAMI, L.G. LJUNGDAHL, J. J.5. COUGHLAN, M.P., K. HON-NAMI, H. HON-NAMI, L.G. LJUNGDAHL, J. J.

PAULIN, and W.E. RIGSBY, 1985, The cellulolytic enzyme complex of
Clostridium thermocellum is very large, Biochem. Biophys. Res. Commun.PAULIN, and WE RIGSBY, 1985, The cellulolytic enzyme complex of
Clostridium thermocellum is very large, Biochem. Biophys. Res. Common.

130:904909. 130: 904909.

6. DEVEREUX, J., P. HAEBERLI, and O. SMITHIES, 1984, A comprehensive set of sequence analysis programs for the VAS. Nucleic Acids Res. 12:387-395.6. DEVEREUX, J., P. HAEBERLI, and O. SMITHIES, 1984, A comprehensive set of sequence analysis programs for the VAS. Nucleic Acids Res. 12: 387-395.

7. FARABAUGH, P.J., 1978, Sequence of the laci gene, Nature 274:765-769.7. FARABAUGH, P.J., 1978, Sequence of the laci gene, Nature 274: 765-769.

8. FUJINO, T., P. BEGUIN, and J.P. AUBERT, 1992, Cloning of a Clostridium thermocellum DNA fragment encoding polypeptides that bind the catalytic components of the cellulosome, FEMS Microbiol. Lett. 94:165-170.8. FUJINO, T., P. BEGUIN, and J.P. AUBERT, 1992, Cloning of a Clostridium thermocellum DNA fragment encoding polypeptides that bind the catalytic components of the cellulosome, FEMS Microbiol. Lett. 94: 165-170.

9. FUJINO, T., P. BEGUIN and J.P. AUBERT, 1993, Organization of a Clostridium thermocellum gene cluster encoding the cellulosomal scaffolding protein
CipA and a protein possibly involved in the attachment of the cellulosome to the cell surface. J. Bacteriol. 175:1891-1899. 9. FUJINO, T., P. BEGUIN and JP AUBERT, 1993, Organization of a Clostridium thermocellum gene cluster encoding the cellulosomal scaffolding protein
CipA and a protein possibly involved in the attachment of the cellulosome to the cell surface. J. Bacteriol. 175: 1891-1899.

10. FURTHMAYR, H., and R. TltvIPL, 1971, Characterization of collagen peptides by sodium dodecylsulfate-polyac rylamide electrophoresis. Anal.10. FURTHMAYR, H., and R. TltvIPL, 1971, Characterization of collagen peptides by sodium dodecylsulfate-polyac rylamide electrophoresis. Anal.

Biochem. 41:51S516. Biochem. 41: 51S516.

11. GERNGROSS, U.T., M.P.M. ROMANIEC, N.S. HUSKISSON, and A.L DEMAIN, 1993, Sequencing of a Clostridium thermocellum gene (CipA) encoding the cellulosomal SL-protein reveals an unusual degree of internal homology.11. GERNGROSS, U.T., M.P.M. ROMANIEC, N.S. HUSKISSON, and A.L DEMAIN, 1993, Sequencing of a Clostridium thermocellum gene (CipA) encoding the cellulosomal SL-protein reveals an unusual degree of internal homology.

Mol. Microbiol. 8:325-334.Mol. Microbiol. 8: 325-334.

12. GIBSON, T.J. 1984, Studies on the Epsteîn-Barr virus genome, 1984,
University of Cambridge, Cambridge, UK.12. GIBSON, TJ 1984, Studies on the Epsteîn-Barr virus genome, 1984,
University of Cambridge, Cambridge, UK.

13. JANKNECHT, R., G. DE MARTYNOFF, J. LOU, R. A. HIPSKIND, A. NORDHEIM, and G.G. STUNNENBERG, 1991, Rapid and efficient purification of native histidine-tagged protein expressed by recombinant vaccinia virus. Proc.13. JANKNECHT, R., G. DE MARTYNOFF, J. LOU, R. A. HIPSKIND, A. NORDHEIM, and G.G. STUNNENBERG, 1991, Rapid and efficient purification of native histidine-tagged protein expressed by recombinant vaccinia virus. Proc.

Natl. Acad. Sci. USA 88:8972-8976.Natl. Acad. Sci. USA 88: 8972-8976.

14. LAEMMLI, U.K., 1970, Cleavage of structural proteins during the assembly of the head of bacteriophage T4, Nature 227:680-685.14. LAEMMLI, U.K., 1970, Cleavage of structural proteins during the assembly of the head of bacteriophage T4, Nature 227: 680-685.

15. LAMED R., R. KENIG, E. SETTER, and E.A. BAYER, 1985, Major characteristics of the cellulolytic system of Clostridium thermocellum coincide with those of the purified cellulosome, Enzyme Microb. Technol.15. LAMED R., R. KENIG, E. SETTER, and E.A. BAYER, 1985, Major characteristics of the cellulolytic system of Clostridium thermocellum coincide with those of the purified cellulosome, Enzyme Microb. Technol.

7:3741.7: 3741.

16. LAMED R., E. SETTER, R. KENIG, and E.A. BAYER, 1983, The cellulosome : a discrete cell surface organelle of Clostridium thermocellum with eshibits separate antigenic, cellulose-binding and various cellulolytic activities.16. LAMED R., E. SETTER, R. KENIG, and E.A. BAYER, 1983, The cellulosome: a discrete cell surface organelle of Clostridium thermocellum with eshibits separate antigenic, cellulose-binding and various cellulolytic activities.

Biotechnol. Bioeng. Symp. 13:163 - 181. Biotechnol. Bioeng. Nice. 13: 163-181.

17. LEMAIRE M., H. OHAYON, P. GOUNON, T. FUJINO and P. BEGUIN, 1995, OlpB, a new outer layer protein of Clostridium thermocellum, and binding of its Slayer-like domain to components of the cell envelope, J. Bioteriol. 77:24512459. 17. LEMAIRE M., H. OHAYON, P. GOUNON, T. FUJINO and P. BEGUIN, 1995, OlpB, a new outer layer protein of Clostridium thermocellum, and binding of its Slayer-like domain to components of the cell envelope, J. Bioteriol. 77: 24512459.

18. LUPAS A., H. ENGELHARDT, J. PETERS, U. SANTARIUS, S. VOLKER, and W.18. LUPAS A., H. ENGELHARDT, J. PETERS, U. SANTARIUS, S. VOLKER, and W.

BAUMEISTER, 1994, Domain structure of the Acetogeniurn kivui surface layer revealed by electron crystallography and sequence analysis, J.BAUMEISTER, 1994, Domain structure of the Acetogeniurn kivui surface layer revealed by electron crystallography and sequence analysis, J.

Bacteriol. 176:12241233. Bacteriol. 176: 12241233.

18b. LYTLE B., C. HYERS, K. KRUUS and J. H. WU, 1996, Interactions of the
CelS' binding ligand with various receptor domains of the clost ridium the mocellum cellulosomal scaffolding protein, CipA, 25 J. Bacteriol. 178: 12001203.18b. LYTLE B., C. HYERS, K. KRUUS and JH WU, 1996, Interactions of the
CelS 'binding ligand with various receptor domains of the clost ridium the mocellum cellulosomal scaffolding protein, CipA, 25 J. Bacteriol. 178: 12001203.

19. MANIATIS T., E.F. FRITSCH, and J. SAMBROOK, 1982, Molecular Cloning, a
Laboratory Manual. p. In Editor (ed.) Vol. Cold Spring Harbor Laboratory,
N.Y.19. MANIATIS T., EF FRITSCH, and J. SAMBROOK, 1982, Molecular Cloning, a
Laboratory Manual. p. In Editor (ed.) Vol. Cold Spring Harbor Laboratory,
NY

20. MATUSCHEK M., G. BURCHHARDT, K. SAHM, and H. BAHL, 1994,
Pullulanase of Therm oanaerobacter thermosulfurigenes EM 1 (Clostridiun] thermosulfurogenes), molecular analysis of the gene, composite structure of the enzyme, and a common model for its attachment to the cell surface, J.20. MATUSCHEK M., G. BURCHHARDT, K. SAHM, and H. BAHL, 1994,
Pullulanase of Therm oanaerobacter thermosulfurigenes EM 1 (Clostridiun] thermosulfurogenes), molecular analysis of the gene, composite structure of the enzyme, and a common model for its attachment to the cell surface, J.

Bacteriol. 176:3295-3302.Bacteriol. 176: 3295-3302.

21. McBEE R.H., 1948, The culture and physiology of a thermophilic cellulose-frementing bacterium, J. Bacteriol. 56:653-663.21. McBEE R.H., 1948, The culture and physiology of a thermophilic cellulose-frementing bacterium, J. Bacteriol. 56: 653-663.

22. MORAG E., E. A. BAYER, and R. LAMED, 1990, Relationship of cellulosomal and non-cellulosomal xylanases of Clostridium thermocellum to cellulosedegrading enzymes, J. Bacteriol. 172:6098-6105. 22. MORAG E., E. A. BAYER, and R. LAMED, 1990, Relationship of cellulosomal and non-cellulosomal xylanases of Clostridium thermocellum to cellulosedegrading enzymes, J. Bacteriol. 172: 6098-6105.

23. MORAG E., I. HALEVY, E.A. BAYER, and R. LAMED, 1991, Isolation and properties of a major cellobiohydrolase from the cellulosome of Clostridium thermocellum, J. Bacteriol., 173:4155-4162.23. MORAG E., I. HALEVY, E.A. BAYER, and R. LAMED, 1991, Isolation and properties of a major cellobiohydrolase from the cellulosome of Clostridium thermocellum, J. Bacteriol., 173: 4155-4162.

24. PODBIELSKI A., J. HAWLITZKY, T.D. PACK, A. FLOSDORFF, and M.D. BOYLE, 1994, A group A streptococcal Enn protein potentially resulting from intergenomic recombination exhibits atypical immunoglobulin-binding characteristics, Mol. Microbiol. 12:725-736. 24. PODBIELSKI A., J. HAWLITZKY, T.D. PACK, A. FLOSDORFF, and M.D. BOYLE, 1994, A group A streptococcal Enn protein potentially resulting from intergenomic recombination exhibits atypical immunoglobulin-binding characteristics, Mol. Microbiol. 12: 725-736.

25. QUIVIGER B., C. FRANCHE, G. LUTFALLA, R.D., R. HASELKORN, and C.25. QUIVIGER B., C. FRANCHE, G. LUTFALLA, R.D., R. HASELKORN, and C.

ELMERICH, 1982, Cloning of a nitrogen fixation (nif) gene cluster of
Azospirillum brasilense, Biochimie 64:495-502.ELMERICH, 1982, Cloning of a nitrogen fixation (nif) gene cluster of
Azospirillum brasilense, Biochemistry 64: 495-502.

26. ROBBINS J.C., J.G. SPANIER, S.J. JONES, W.J. SIMPSON, and P.P. CLEARY, 1987, Streptococcus pyogenes type 12 M protein gene regulation by upstream sequences, J. Bacteriol. 169:5633-5640.26. ROBBINS J.C., J.G. SPANIER, S.J. JONES, W.J. SIMPSON, and P.P. CLEARY, 1987, Streptococcus pyogenes type 12 M protein gene regulation by upstream sequences, J. Bacteriol. 169: 5633-5640.

27. SAIKI R.K., D.H. GELFAND, S. STOFFEL, S.J. SCHARF, R. HIGUCHI, G.T. HORN,
K.B. MULLIS and H.A. ERLICH, 1988, Primer-directed enzymatic amplification of DNA with a thermostable DNA polymerase, Science 239:487 lu91. 27. SAIKI RK, DH GELFAND, S. STOFFEL, SJ SCHARF, R. HIGUCHI, GT HORN,
KB MULLIS and HA ERLICH, 1988, Primer-directed enzymatic amplification of DNA with a thermostable DNA polymerase, Science 239: 487 lu91.

28. SALAMITOU S., M. LEMAIRE, T. FUJINO, H. OHAYON, P. GOUNON, P. BEGUIN, and J.P. AUBERT, 1994, Subcellular localization of Clostridium thermocellum
ORF3p, a protein carrying a receptor for the docking sequence borne by the catalytic components of the cellulosome, J. Bacteriol. 176:2828-2834.28. SALAMITOU S., M. LEMAIRE, T. FUJINO, H. OHAYON, P. GOUNON, P. BEGUIN, and JP AUBERT, 1994, Subcellular localization of Clostridium thermocellum
ORF3p, a protein carrying a receptor for the docking sequence borne by the catalytic components of the cellulosome, J. Bacteriol. 176: 2828-2834.

29. SAIAMITOU S., O. RAYNAUD, M. LEMAIRE, M. COUGHLAN, P. BEGUIN and
J.P. AUBERT, 1994, Recognition specificity of the duplicated segments present in Clostridium thermocellum endoglucanase CelD and in the cellulosome-integrating protein CipA, J. Bacteriol. 176:2822-2827.29. SAIAMITOU S., O. RAYNAUD, M. LEMAIRE, M. COUGHLAN, P. BEGUIN and
JP AUBERT, 1994, Recognition specificity of the duplicated segments present in Clostridium thermocellum endoglucanase CelD and in the cellulosome-integrating protein CipA, J. Bacteriol. 176: 2822-2827.

30. SANGER F., S. NICKLEN, and A.R. COULSON, 1977, DNA sequencing with chain-terminating inhibitors, Proc. Natl. Acad. Sci. USA 74:5463-5467.30. SANGER F., S. NICKLEN, and A.R. COULSON, 1977, DNA sequencing with chain-terminating inhibitors, Proc. Natl. Acad. Sci. USA 74: 5463-5467.

31. TAILLIEZ P., H. GIRARD, J. MILLET, and P. BEGUIN, 1989, Enhanced cellulose fermentation by an asporogenous and ethanol-tolerant mutant of
Clostridium thermocellum. Appl. Environ. Microbiol. 55:207-211.31. TAILLIEZ P., H. GIRARD, J. MILLET, and P. BEGUIN, 1989, Enhanced cellulose fermentation by an asporogenous and ethanol-tolerant mutant of
Clostridium thermocellum. Appl. About. Microbiol. 55: 207-211.

32. TOKATLIDIS K., P. DHURJATI, and P. BEGUIN, 1993, Properties conferred on Clostridium thermocellum endoglucanase CelC by grafting the duplicated segment of endoglucanase CelD. Protein Engng 6:947-952.32. TOKATLIDIS K., P. DHURJATI, and P. BEGUIN, 1993, Properties conferred on Clostridium thermocellum endoglucanase CelC by grafting the duplicated segment of endoglucanase CelD. Protein Engng 6: 947-952.

33. TOKATLIDIS K., S. SALAMITOU, P. BEGUIN, P. DHURJATI, and J.P. AUBERT, 1991, Interaction of the duplicated segment carried by Clostridium thermocellum cellulases with cellulosome components, FEBS Lett. 291:185188. 33. TOKATLIDIS K., S. SALAMITOU, P. BEGUIN, P. DHURJATI, and J.P. AUBERT, 1991, Interaction of the duplicated segment carried by Clostridium thermocellum cellulases with cellulosome components, FEBS Lett. 291: 185188.

34. VIJAYKUMAR P. and V.A. FISCHETTI, 1988, Isolation and characterization of the cell-associated region of group A streptococcal M6 proteins, J.34. VIJAYKUMAR P. and V.A. FISCHETTI, 1988, Isolation and characterization of the cell-associated region of group A streptococcal M6 proteins, J.

Bacteriol. 170: 35. VILLAREJO M.R., and I. ZABIN, 1974, ss-galactosidase from termination and deletion mutant strains, J. Bacteriol. 120:466-474.Bacteriol. 170: 35. VILLAREJO M.R., and I. ZABIN, 1974, ss-galactosidase from termination and deletion mutant strains, J. Bacteriol. 120: 466-474.

36. VON HEIJNE G., 1983, Patterns of amino acids near signal-sequence cleavage sites, Eur. J. Biochem. 133:17-21. 36. VON HEIJNE G., 1983, Patterns of amino acids near signal-sequence cleavage sites, Eur. J. Biochem. 133: 17-21.

37. WU J.H.D. and A.L. DEMAIN, 1988, Proteins of the Clostridium thermocellum cellulase complex responsible for degradation of crystalline cellulose, p. 117-131, In J.P. AUBERT, P. BEGUIN, and j: MILLET (ed.), FEMS
Symposium <RTI ID=26 37. WU JHD and AL DEMAIN, 1988, Proteins of the Clostridium thermocellum cellulase complex responsible for degradation of crystalline cellulose, p. 117-131, In JP AUBERT, P. BEGUIN, and j: MILLET (ed.), FEMS
Symposium <RTI ID = 26

LISTE DE SEQUENCES (1) INFORMATIONS GENERALES:
(i) DEPOSANT:
(A) NOM: INSTITUT PASTEUR
(B) RUE: 28 Rue du Docteur Roux
(C) VILLE: PARIS
(E) PAYS: FRANCE
(F) CODE POSTAL: 75724 CEDEX 15
(ii) TITRE DE L' INVENTION: "POLYPEPTIDE COMPORTANT UN DOMAINE
COHESINE, COMPOSITION ENZYMATIQUE EN COMPORTANT ET FRAGMENTS D'ADN
CODANT POUR CES POLYPEPTIDES"
(iii) NOMBRE DE SEQUENCES: 4
(iv) FORME DECHIFFRABLE PAR ORDINATEUR:
(A) TYPE DE SUPPORT: Floppy disk
(B) ORDINATEUR: IBM PC compatible
(C) SYSTEME D' EXPLOITATION: PC-DOS/MS-DOS
(D) LOGICIEL: PatentIn Release &num;1.0, Version &num;1.3 (OEB) (2) INFORMATIONS POUR LA SEQ ID NO: 1:
(t) CARACTERISTIQUES DE LA SEQUENCE:
(A) LONGUEUR: 1893 paires de bases
(B) TYPE: nucléotide
(C) NOMBRE DE BRINS: simple
(ii) TYPE DE MOLECULE: ADN
(ix) CARACTERISTIQUE:
(A) NOM/CLE: SdbA de Clostridium thermocellum
(B) EMPLACEMENT:1..1893
(xi) DESCRIPTION DE LA SEQUENCE: SEQ ID NO: 1:
ATG AGG AAG AAA AAA AGA TTA ATA TCA TTA CTG CTT GCG GTT TH ATC 48
Met Arg Lys Lys Lys Arg Leu île Ser Leu Leu Leu Ala Val Phe Ile
1 5 10 15
GCC GTT GCA TGT CTG CCG GCG GGA ATT GCA AGG GCA GAT AAA GCC TCG 96
Ala Val Ala Cys Leu Pro Ala Gly Ile Ala Arg Ala Asp Lys Ala Ser
20 25
AGC ATT GAG CTT AAG TTT GAC CGC AAT AAG GGA GAA GTT GGA GAT ATA 144
Ser Ile Glu Leu Lys Phe Asp Arg Asn Lys Gly Glu Val Gly Asp Ile
35 4 45
CTT ATT GGT ACC GTA AGG ATA AAC AAT ATC AAG AAT TTC GCA GGA TTT 192
Leu île Gly Thr Val Arg Ile Asn Asn Ile Lys Asn Phe Ala Gly Phe
se 55 6
CAG GTA AAC ATT GTA TAT GAT CCA AAA GTC TTA ATG GCT GTT GAC CCT 24 Gln Val Asn Ile Val Tyr Asp Pro Lys Val Leu Met Ala Val Asp Pro
65 7 75 8
GAA ACG GGG AAA GAA TTT ACT TCT TCA ACA TTT CCG CCA GGA CGC ACT 288
Glu Thr Gly Lys Glu Phe Thr Ser Ser Thr Phe Pro Pro Gly Arg Thr
85 9 95
GTA CTG AAA AAC AAT GCT TAC GGC CCA ATA CAG ATT GCG GAC AAT GAT 336
Val Leu Lys Asn Asn Ala Tyr Gly Pro Ile Gln Ile Ala Asp Asn Asp
100 105 110
CCG GAA AAA GGG ATA CTG AAC TTC GCG CTT GCA TAT TCA TAT ATT GCG 384
Pro Glu Lys Gly Ile Leu Asn Phe Ala Leu Ala Tyr Ser Tyr Ile Ala
115 120 125
GGA TAC AAA GAA ACA GGA GTA GCG GAG GAA AGC GGC ATA AH GCG AAA 432
Gly Tyr Lys Glu Thr Gly Val Ala Glu Glu Ser Gly Ile Ile Ala Lys
130 135 140
ATT GGA TTT AAA ATA CTC CAG AAA AAG AGC ACT GCC GTA AAA TTC CAG 480
Ile Gly Phe Lys Ile Leu Gîn Lys Lys Ser Thr Ala Val Lys Phe Gîn 145 150 155 160
GAT ACA TTA AGC ATG CCC GGA GCT ATT TCG GGA ACA CAG CTG TTT GAC 528
Asp Thr Leu Ser Met Pro Gly Ala Ile Ser Gly Thr Gln Leu Phe Asp
165 170 175
TGG GAC GGA GAA GTT ATT ACC GGA TAT GAG GTA ATA CAG CCG GAT GTG 576
Trp Asp Gly Glu Val Ile Thr Gly Tyr Glu Val Ile Gln Pro Asp Val
180 185 190
CTG AGT TTG GGT GAC GAG CCT TAT GAG ACA CCG GGA ACG GAT ATT CCG 624
Leu Ser Leu Gly Asp Glu Pro Tyr Glu Thr Pro Gly Thr Asp Ile Pro
195 200 205
ATA TCC GAC AAT CCG GCA GCA ACT CCG TCA TCC ACG CCG TCA GTT ACT 672
Ile Ser Asp Asn Pro Ala Ala Thr Pro Ser Ser Thr Pro Ser Val Thr
210 215 220
CCT TCA CCG GAA GTT AAA CCG ACT CAG ACG CCT TCG CCT GCA GAA AAT 720
Pro Ser Pro Glu Val Lys Pro Thr Gîn Thr Pro Ser Pro Ala Glu Asn 225 230 235 240
TCT GCA AAA GTG GAG CTT GAA CCT GTG TTG GAT AAT GCA ACA GGA GAA 768
Ser Ala Lys Val Glu Leu Glu Pro Val Leu Asp Asn Ala Thr Gly Glu
245 250 255
GCA AAG GCG GCA ATA GAT GAA GAA AAA TTA AAC AAG GCT CTT GAT GAA 816
Ala Lys Ala Ala île Asp Glu Glu Lys Leu Asn Lys Ala Leu Asp Glu
260 265 270
GCG AAA AAA TCG GAA GAT GAC AAA CTT GTG GAA CTT AAC ATA AAG AAG 864
Ala Lys Lys Ser Glu Asp Asp Lys Leu Val Glu Leu Asn Ile Lys Lys
275 280 285
GTT GAA AAT GCC GAT GCT TAC ATA CAA CAG CTT CCG GCG AAA TTC CTG 912
Val Glu Asn Ala Asp Ala Tyr île Gîn Gln Leu Pro Ala Lys Phe Leu
290 295 300
ATA AAA AGT GAC GCC GAA TAT AAG CTG AGA ATA GCT ACA GAG CAG GGA 960
Ile Lys Ser Asp Ala Glu Tyr Lys Leu Arg Ile Ala Thr Glu Gîn Gly 305 310 315 320 AH ATA GAA GTA CCG GCC AAC ATG CTG AAT ACT GCG GAT ATT TCA AAG 1008
Ile Ile Glu Val Pro Ala Asn Met Leu Asn Thr Ala Asp Ile Ser Lys
325 330 335
CTT GTA AAA AAT GAC TCC GTT GTT GAA TTC GTC ATA AGA AAA GTA AAA 1056
Leu Val Lys Asn Asp Ser Val Val Glu Phe Val Ile Arg Lys Val Lys
340 345 350
GTC GAT GAA CTT GGT GCA GAG CTC AAA GAG AAG ATA GGC AAC AGG CCG 1104
Val Asp Glu Leu Gly Ala Glu Leu Lys Glu Lys Ile Gly Asn Arg Pro
355 360 365
GTG ATT GAC ATA AGC GTG GH GTT GAC GGC AAA AAA GTT GAA TGG AGC 1152
Val Ile Asp île Ser Val Val Val Asp Gly Lys Lys Val Glu Trp Ser
370 375 380
AAT TAC AAA GCC AAG GTT AAA ATA TCA ATT CCT TAC AAG CCT GAT GCA 1200
Asn Tyr Lys Ala Lys Val Lys Ile Ser île Pro Tyr Lys Pro Asp Ala 385 390 395 400
AAA GAG CTG GAG AAC CAC GAG CAT ATT GTT GTA CTC CAT ATT GAT GAC 1248
Lys Glu Leu Glu Asn His Glu His Ile Val Val Leu His Ile Asp Asp
405 410 415
GCC GGC AAG GCA GH TCC GTA CCC AGC GGA AAA TAT GAA CCT TCT TTG 1296
Ala Gly Lys Ala Val Ser Val Pro Ser Gly Lys Tyr Glu Pro Ser Leu
420 425 430
GGC GTC GTT ACG TTT GAG ACG AAT CAT TTA AGC AAG TAT GCG GTT TCA 1344
Gly Val Val Thr Phe Glu Thr Asn His Leu Ser Lys Tyr Ala Val Ser
435 440 445
TAT GTT TAC AAG ACT TTC GCG GAT ATT GGT TCA TAT GCC TGG GCT AAA 1392
Tyr Val Tyr Lys Thr Phe Ala Asp Ile Gly Ser Tyr Ala Trp Ala Lys
450 455 460
AAG CAG ATA GAG GTT TTG GCT TCC AAA GGA GTA ATT AAC GGT ACA TCC 1440
Lys Gln île Glu Val Leu Ala Ser Lys Gly Val île Asn Gly Thr Ser 465 470 475 480
GAT ACC ACT HT ACG CCC CAG GCA GAC ATA ACA AGG GCG GAT TTC ATG 1488
Asp Thr Thr Phe Thr Pro Gin Ala Asp Ile Thr Arg Ala Asp Phe Met
485 490 495
ATA CTT CTT GTA AAG GCA CTG GGA TTG ACT GCC GAG GTT ACT TCC AAT 1536 île Leu Leu Val Lys Ala Leu Gly Leu Thr Ala Glu Val Thr Ser Asn
500 505 510
TTT GAT GAT GTG TCC GAA AAA GAC TAC TAT TAT GAA TAC GTG GGA ATT 1584
Phe Asp Asp Val Ser Glu Lys Asp Tyr Tyr Tyr Glu Tyr Val Gly île
515 520 525
GCA AAA GAG CTT GGA ATT ACG ACA GGA GTC GGA AAC AAC AAG TTC AAT 1632
Ala Lys Glu Leu Gly île Thr Thr Gly Val Gly Asn Asn Lys Phe Asn
530 535 540
CCG AAA GCC AAA ATT ACA AGA CAG GAT ATG ATG GTA CTT ACA ACA AAT 1680
Pro Lys Ala Lys lie Thr Arg Gln Asp Met Met Val Leu Thr Thr Asn 545 550 555 560
GCT CTC AGG ATT GCA GGA AAA ATA TCG AGC ACA GGA ACC CGC GCT GAT 1728
Ala Leu Arg lie Ala Gly Lys île Ser Ser Thr Gly Thr Arg Ala Asp
565 570 575
GTT GAA AGA TTT TCG GAC AAG GAC CAG ATA GCT TCA TAT GCG GTT GAA 1776
Val Glu Arg Phe Ser Asp Lys Asp Gln Ile Ala Ser Tyr Ala Val Glu
580 585 590
GGC GTT GCA ACC TTG GTA AAA GAA GGT ATT GTA GTG GGA AGC GGC GAT 1824
Gly Val Ala Thr Leu Val Lys Glu Gly Ile Val Val Gly Ser Gly Asp
595 600 605
ATT ATA AAT CCA AGG GGA AAT GCT TCA AGA GCC GAA CTT GCA GCA ATC 1872 île île Asn Pro Arg Gly Asn Ala Ser Arg Ala Glu Leu Ala Ala Ile
610 615 620
ATA TAC AAG ATT TAC TAC AAG 1893
Ile Tyr Lys Ile Tyr Tyr Lys 625 630 (3) INFORMATIONS POUR LA SEQ ID NO: 2:
(i) CARACTERISTIQUES DE LA SEQUENCE:
(A) LONGUEUR: 4992 paires de bases
(B) TYPE: nucléotide
(C) NOMBRE DE BRINS: simple
(ii) TYPE DE MOLECULE: ADN
(ix) CARACTERISTIQUE:
(A) NOM/CLE: OlpB de Clostridium thermocellum
(B) EMPLACEMENT:1..4992
(xi) DESCRIPTION DE LA SEQUENCE: SEQ ID NO: 2:
ATG AAA CGA AAA AAT AAA GTA TTA TCA ATT TTG TTA ACT CTG CTG CTA 48
Met Lys Arg Lys Asn Lys Val Leu Ser île Leu Leu Thr Leu Leu Leu
1 5 10 15
ATA ATC TCT ACC ACA TCC GTA AAC ATG TCT TTT GCT GAA GCA ACT CCA 96
Ile île Ser Thr Thr Ser Val Asn Met Ser Phe Ala Glu Ala Thr Pro
20 25 30
AGT ATT GAA ATG GTT CTT GAT AAA ACT GAA GTC CAT GTA GGA GAT GTA 144
Ser Ile Glu Met Val Leu Asp Lys Thr Glu Val His Val Gly Asp Val
35 40 45
ATA ACG GCC ACA ATA AAA GTC AAT AAC ATT AGA AAA TTG GCG GGA TAT 192
Ile Thr Ala Thr Ile Lys Val Asn Asn Ile Arg Lys Leu Ala Gly Tyr
50 55 60
CAG CTA AAT ATC AAA TTT GAC CCT GAA GTT TTA CAG CCG GTA GAC CCT 240 Gîn Leu Asn Ile Lys Phe Asp Pro Glu Val Leu Gln Pro Val Asp Pro
65 70 75 80
GCA ACA GGA GAG GAA TTT ACT GAT AAG TCC ATG CCG GTA AAT AGG GTT 288
Ala Thr Gly Glu Glu Phe Thr Asp Lys Ser Met Pro Val Asn Arg Val
85 90 95
TTG CTG ACA AAC AGC AAA TAT GGA CCT ACT CCT GTG GCG GGT AAC GAT 336
Leu Leu Thr Asn Ser Lys Tyr Gly Pro Thr Pro Val Ala Gly Asn Asp 100 105 110
ATA AAG TCA GGA AH ATT AAT TTT GCT ACG GGA TAT AAC AAT TTA ACA 384
Ile Lys Ser Gly île île Asn Phe Ala Thr Gly Tyr Asn Asn Leu Thr
115 120 125
GCG TAC AAA TCC AGC GGA ATA GAC GAA CAT ACA GGA ATA ATA GGA GAG 432
Ala Tyr Lys Ser Ser Gly Ile Asp Glu His Thr Gly Ile île Gly Glu
130 135 140
ATT GGT TTT AAA GTT TTA AAG AAA CAA AAT ACG TCT ATT AGG TTT GAA 480
Ile Gly Phe Lys Val Leu Lys Lys Gîn Asn Thr Ser Ile Arg Phe Glu 145 150 155 160
GAT ACA TTA TCG ATG CCC GGG GCA ATA TCG GGA ACA AGT TTG TTT GAC 528
Asp Thr Leu Ser Met Pro Gly Ala île Ser Gly Thr Ser Leu Phe Asp
165 170 175
TGG GAT GCA GAA ACT ATA ACA GGA TAT GAG GTA ATA CAG CCG GAT CTT 576
Trp Asp Ala Glu Thr île Thr Gly Tyr Glu Val île Gln Pro Asp Leu
180 185 190
ATA GTT GTA GAG GCA GAA CCG TTA AAA GAC GCC AGC GTG GCT CTG GAA 624
Ile Val Val Glu Ala Glu Pro Leu Lys Asp Ala Ser Val Ala Leu Glu
195 200 205
CTG GAT AAG ACG AAG GTA AAA GTA GGG GAC ATA ATA ACA GCG ACG ATA 672
Leu Asp Lys Thr Lys Val Lys Val Gly Asp île Ile Thr Ala Thr île
210 215 220
AAG ATA GAG AAC ATG AAG AAT TTT GCA GGG TAC CAG TTG AAT ATC AAG 720
Lys île Glu Asn Met Lys Asn Phe Ala Gly Tyr Gln Leu Asn Ile Lys 225 230 235 240
TAT GAC CCG ACC ATG TTG GAG GCA ATA GAA CTG GAG ACA GGA AGT GCG 768
Tyr Asp Pro Thr Met Leu Glu Ala Ile Glu Leu Glu Thr Gly Ser Ala
245 250 255
ATA GCG AAG AGG ACA TGG CCG GTT ACA GGA GGT ACT GTT CTG CAA AGT 816
Ile Ala Lys Arg Thr Trp Pro Val Thr Gly Gly Thr Val Leu Gîn Ser
260 265 270
GAC AAT TAT GGA AAG ACG ACT GCG GTA GCG AAT GAT GTA GGA GCA GGT 864
Asp Asn Tyr Gly Lys Thr Thr Ala Val Ala Asn Asp Val Gly Ala Gly
275 280 285
ATA ATA AAC TTT GCT GAG GCA TAC TCG AAC CTT ACC AAA TAC AGA GAG 912
Ile Ile Asn Phe Ala Glu Ala Tyr Ser Asn Leu Thr Lys Tyr Arg Glu
290 295 300
ACA GGT GTG GCA GAG GAG ACA GGT ATA ATA GGA AAG ATA GGC TTC AGA 960
Thr Gly Val Ala Glu Glu Thr Gly Ile Ile Gly Lys Ile Gly Phe Arg 305 310 315 320
GTA CTG AAG GCA GGA AGT ACG GCT ATA AGA TTT GAG GAT ACG ACA GCG 1008
Val Leu Lys Ala Gly Ser Thr Ala Ile Arg Phe Glu Asp Thr Thr Ala
325 330 335
ATG CCG GGA GCA ATA GAA GGA ACA TAC ATG TTC GAC TGG TAT GGC GAG 156
Met Pro Gly Ala Ile Glu Gly Thr Tyr Met Phe Asp Trp Tyr Gly Glu
340 345 350
AAC ATC AAA GGG TAT AGC GTA GTA CAG CCT GGG GAA ATA GTG GCA GAA 114
Asn île Lys Gly Tyr Ser Val Val Gîn Pro Gly Glu Ile Val Ala Glu
355 360 365
GGA GAA GAG CCG GGT GAA GAG CCG ACA GAA GAG CCT GTA CCG ACA GAG 1152
Gly Glu Glu Pro Gly Glu Glu Pro Thr Glu Glu Pro Val Pro Thr Glu
370 375 380
ACA CCA GTA GAT CCC ACA CCG ACA GTG ACA GAA GAG CCT GTA CCT TCA îzee
Thr Pro Val Asp Pro Thr Pro Thr Val Thr Glu Glu Pro Val Pro Ser 385 390 395 400
GAG CTT CCA GAT TCC TAT GTA ATA ATG GAA CTG GAT AAG ACG AAG GTA 1248
Glu Leu Pro Asp Ser Tyr Val île Met Glu Leu Asp Lys Thr Lys Val
405 410 415
AAA GTA GGG GAC ATA ATA ACA GCG ACG ATA AAG ATA GAG AAC ATG AAG 1296
Lys Val Gly Asp Ile île Thr Ala Thr Ile Lys île Glu Asn Met Lys
420 425 430
AAT HT GCA GGG TAC CAG TTG AAT ATC AAG TAT GAC CCG ACC ATG TTG 1344
Asn Phe Ala Gly Tyr Gln Leu Asn Ile Lys Tyr Asp Pro Thr Met Leu
435 440 445
GAG GCA ATA GAA CTG GAG ACA GGA AGT GCG ATA GCG AAG AGG ACA TGG 1392
Glu Ala île Glu Leu Glu Thr Gly Ser Ala Ile Ala Lys Arg Thr Trp
450 455 460
CCG GTT ACA GGA GGT ACT GTT CTG CAA AGT GAC AAT TAT GGA AAG ACG 1440
Pro Val Thr Gly Gly Thr Val Leu Gln Ser Asp Asn Tyr Gly Lys Thr 465 470 475 480
ACT GCG GTA GCG AAT GAT GTA GGA GCA GGT ATA ATA AAC TTT GCT GAG 1488
Thr Ala Val Ala Asn Asp Val Gly Ala Gly Ile île Asn Phe Ala Glu
485 490 495
GCA TAC TCG AAC CTT ACC AAA TAC AGA GAG ACA GGT GTG GCA GAG GAG 1536
Ala Tyr Ser Asn Leu Thr Lys Tyr Arg Glu Thr Gly Val Ala Glu Glu
500 505 510
ACA GGT ATA ATA GGA AAG ATA GGC TTC AGA GTA CTG AAG GCA GGA AGT 1584
Thr Gly île île Gly Lys île Gly Phe Arg Val Leu Lys Ala Gly Ser
515 520 525
ACG GCT ATA AGA TTT GAG GAT ACG ACA GCG ATG CCG GGA GCA ATA GAA 1632
Thr Ala Ile Arg Phe Glu Asp Thr Thr Ala Met Pro Gly Ala Ile Glu
530 535 540
GGA ACA TAC ATG TTC GAC TGG TAT GGC GAG AAC ATC AAA GGG TAT AGC 1680
Gly Thr Tyr ket Phe Asp Trp Tyr Gly Glu Asn Ile Lys Gly Tyr Ser 545 550 555 560
GTA GTA CAG CCT GGG GAA ATA GTG GCG GAA GGA GAA GAG CCG ACA GAA 1728
Val Val Gîn Pro Gly Glu Ile Val Ala Glu Gly Glu Glu Pro Thr Glu
565 570 575
GAG CCT GTA CCG ACA GAG ACA CCA GTA GAT CCC ACA CCG ACA GTG ACA 1776
Glu Pro Val Pro Thr Glu Thr Pro Val Asp Pro Thr Pro Thr Val Thr
580 585 590
GAA GAG CCT GTA CCT TCA GAG CTT CCA GAT TCC TAT GTG ATA ATG GAA 1824
Glu Glu Pro Val Pro Ser Glu Leu Pro Asp Ser Tyr Val Ile Met Glu
595 600 605
TTG GAT AAG ACG AAG GTA AAA GAA GGC GAC GTA ATA ATA GCA ACA ATA 1872
Leu Asp Lys Thr Lys Val Lys Glu Gly Asp Val Ile Ile Ala Thr Ile
610 615 620
AGA GTA AAT AAC ATA AAG AAT CTT GCC GGA TAT CAG ATA GGC ATC AAA 1920
Arg Val Asn Asn île Lys Asn Leu Ala Gly Tyr Gln île Gly île Lys 625 630 635 640
TAT GAC CCG AAA GTA TTA GAG GCA TTT AAT ATC GAG ACA GGG GAC CCA 1968
Tyr Asp Pro Lys Val Leu Glu Ala Phe Asn Ile Glu Thr Gly Asp Pro
645 650 655
ATA GAT GAA GGA ACA TGG CCT GCA GTA GGG GGA ACA ATA CTG AAG AAT 2016
Ile Asp Glu Gly Thr Trp Pro Ala Val Gly Gly Thr île Leu Lys Asn
660 665 670
AGA GAT TAC CTG CCG ACT GGG GTA GCA ATA AAC AAT GTA TCT AAA GGA 2064
Arg Asp Tyr Leu Pro Thr Gly Val Ala Ile Asn Asn Val Ser Lys Gly
675 680 685
ATA CTG AAT TTT GCT GCT TAT TAC GTT TAC TTC GAT GAC TAT AGA GAG 2112 île Leu Asn Phe Ala Ala Tyr Tyr Val Tyr Phe Asp Asp Tyr Arg Glu
690 695 700
GAA GGA AAG TCA GAA GAT ACA GGA ATT ATA GGA AAT ATA GGC TTT AGA 2160
Glu Gly Lys Ser Glu Asp Thr Gly Ile Ile Gly Asn Ile Gly Phe Arg 705 710 715 720
GTA CTG AAG GCG GAA GAT ACA ACG ATA AGA TTT GAA GAG CTG GAG TCA 2208
Val Leu Lys Ala Glu Asp Thr Thr île Arg Phe Glu Glu Leu Glu Ser
725 730 735
ATG CCG GGT TCA ATA GAC GGA ACA TAT ATG TTG GAT TGG TAT CTT AAT 2256
Met Pro Gly Ser Ile Asp Gly Thr Tyr Met Leu Asp Trp Tyr Leu Asn
740 745 750
AGA ATC TCT GGC TAT GTA GTA ATA CAA CCG GCG CCT ATA AAG GCG GCT 2304
Arg Ile Ser Gly Tyr Val Val île Gln Pro Ala Pro Ile Lys Ala Ala
755 760 765
AGT GAC GAA CCA ATA CCA ACG GAT ACA CCA TCA GAT GAA CCG ACA CCG 2352
Ser Asp Glu Pro île Pro Thr Asp Thr Pro Ser Asp Glu Pro Thr Pro
770 775 780
TCA GAC GAG CCA ACG CCA TCT GAC GAA CCG ACA CCG TCT GAT GAG CCA 24
Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro 785 790 795 800
ACA CCG TCA GAT GAA CCG ACT CCG TCA GAG ACA CCT GAG GAG CCG ATA 2448
Thr Pro Ser Asp Glu Pro Thr Pro Ser Glu Thr Pro Glu Glu Pro île
805 810 815
CCG ACG GAT ACA CCA TCA GAT GAA CCG ACA CCA TCA GAC GAG CCA ACG 2496
Pro Thr Asp Thr Pro Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro Thr
820 825 830
CCA TCT GAT GAA CCA ACA CCG TCT GAT GAG CCA ACA CCA TCT GAT GAA 2544
Pro Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro Thr Pro Ser Asp Glu
835 840 845
CCG ACT CCG TCA GAG ACA CCT GAG GAG CCG ATA CCG ACG GAT ACA CCA 2592
Pro Thr Pro Ser Glu Thr Pro Glu Glu Pro île Pro Thr Asp Thr Pro
850 855 860
TCA GAT GAA CCG ACA CCG TCA GAC GAG CCA ACG CCA TCT GAC GAA CCA 264
Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro 865 87 875 880
ACA CCG TCT GAT GAG CCA ACA CCG TCA GAT GAA CCG ACT CCG TCA GAG 2688
Thr Pro Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro Thr Pro Ser Glu
885 890 895
ACA CCT GAG GAG CCG ATA CCG ACG GAT ACA CCA TCA GAT GAA CCG ACA 2736
Thr Pro Glu Glu Pro île Pro Thr Asp Thr Pro Ser Asp Glu Pro Thr
900 905 910
CCG TCA GAC GAG CCA ACG CCA TCT GAC GAA CCA ACA CCG TCT GAT GAG 2784
Pro Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro Thr Pro Ser Asp Glu
915 920 925
CCA ACA CCG TCA GAT GAA CCG ACT CCG TCA GAG ACA CCT GAG GAG CCG 2832
Pro Thr Pro Ser Asp Glu Pro Thr Pro Ser Glu Thr Pro Glu Glu Pro
930 935 940
ATA CCG ACG GAT ACA CCA TCA GAT GAA CCG ACA CCG TCA GAC GAG CCG 2880 île Pro Thr Asp Thr Pro Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro 945 950 955 960 ATA CCA TCT GAC GAA CCA ACA CCG TCA GAC GAG CCA ACG CCA TCT GAC 2928
Thr Pro Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro Thr Pro Ser Asp
965 970 975
GAA CCG ACA CCG TCT GAT GAG CCA ACA CCA TCT GAT GAA CCG ACT CCG 2976
Glu Pro Thr Pro Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro Thr Pro
980 985 990
TCA GAG ACA CCT GAG GAG CCG ATA CCG ACG GAT ACA CCA TCA GAT GAA 3024
Ser Glu Thr Pro Glu Glu Pro Ile Pro Thr Asp Thr Pro Ser Asp Glu
995 1000 1005
CCG ACA CCG TCA GAC GAG CCG ACA CCA TCT GAC GAA CCA ACA CCG TCA 3072
Pro Thr Pro Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro Thr Pro Ser 1010 1015 1020
GAC GAG CCA ACG CCA TCT GAC GAA CCG ACA CCG TCT GAT GAG CCA ACA 3120
Asp Glu Pro Thr Pro Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro Thr 1025 1030 1035 1040
CCA TCT GAT GAA CCG ACT CCG TCA GAG ACA CCT GAG GAG CCG ATA CCG 3168
Pro Ser Asp Glu Pro Thr Pro Ser Glu Thr Pro Glu Glu Pro île Pro
1045 1050 1055
ACG GAT ACA CCA TCA GAT GAA CCG ACA CCG TCA GAC GAG CCG ACA CCA 3216
Thr Asp Thr Pro Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro Thr Pro 1060 1065 1070
TCT GAC GAA CCA ACA CCG TCT GAT GAG CCA ACA CCG TCA GAT GAA CCG 3264
Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro
1075 1080 1085
ACT CCG TCA GAG ACA CCT GAG GAG CCG ATA CCG ACG GAT ACA CCA TCA 3312
Thr Pro Ser Glu Thr Pro Glu Glu Pro île Pro Thr Asp Thr Pro Ser
1090 1095 1100
GAT GAA CCG ACA CCG TCA GAC GAG CCA ACG CCA TCT GAC GAA CCG ACA 3360
Asp Glu Pro Thr Pro Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro Thr 1105 1110 1115 1120
CCG TCT GAT GAG CCA ACA CCG TCA GAT GAA CCG ACT CCG TCA GAG ACA 3408
Pro Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro Thr Pro Ser Glu Thr
1125 1130 1135
CCT GAG GAG CCG ATA CCG ACG GAT ACA CCA TCA GAT GAA CCC ACA CCC 3456
Pro Glu Glu Pro Ile Pro Thr Asp Thr Pro Ser Asp Glu Pro Thr Pro
1140 1145 1150
TCA GAC GAG CCA ACG CCA TCT GAC GAA CCG ACA CCG TCT GAT GAG CCA 3504
Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro
1155 1160 1165
ACA CCG TCA GAT GAA CCG ACT CCG TCA GAG ACA CCT GAG GAG CCG ATA 3552
Thr Pro Ser Asp Glu Pro Thr Pro Ser Glu Thr Pro Glu Glu Pro lie
1170 1175 1180
CCG ACG GAT ACA CCA TCA GAT GAA CCG ACA CCA TCA GAC GAG CCA ACG 3600
Pro Thr Asp Thr Pro Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro Thr 1185 1190 1195 1200
CCA TCT GAT GAA CCA ACA CCG TCT GAT GAG CCA ACA CCA TCT GAT GAA 3648
Pro Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro Thr Pro Ser Asp Glu
1205 1210 1215
CCG ACT CCG TCA GAG ACA CCT GAG GAG CCG ATA CCG ACG GAT ACA CCA 3696
Pro Thr Pro Ser Glu Thr Pro Glu Glu Pro île Pro Thr Asp Thr Pro
122 1225 1230
TCA GAT GAA CCG ACA CCG TCA GAC GAG CCA ACG CCA TCT GAC GAA CCA 3744
Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro
1235 1240 1245
ACA CCG TCT GAT GAG CCA ACA CCG TCA GAT GAA CCG ACT CCG TCA GAG 3792
Thr Pro Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro Thr Pro Ser Glu
1250 1255 1260
ACA CCT GAG GAG CCG ATA CCG ACG GAT ACA CCA TCA GAT GAA CCG ACA 3840
Thr Pro Glu Glu Pro île Pro Thr Asp Thr Pro Ser Asp Glu Pro Thr 1265 1270 1275 1280
CCG TCA GAC GAG CCG ACA CCA TCT GAC GAA CCA ACA CCG TCA GAC GAG 3888
Pro Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro Thr Pro Ser Asp Glu
1285 1290 1295
CCA ACG CCA TCT GAC GAA CCG ACA CCG TCT GAT GAG CCA ACA CCA TCT 3936
Pro Thr Pro Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro Thr Pro Ser
1300 1305 1310
GAT GAA CCG ACT CCG TCA GAG ACA CCT GAG GAG CCG ATA CCG ACG GAT 3984
Asp Glu Pro Thr Pro Ser Glu Thr Pro Glu Glu Pro lie Pro Thr Asp
1315 1320 1325
ACA CCA TCA GAT GAA CCG ACA CCG TCA GAC GAG CCG ACA CCA TCT GAC 4032
Thr Pro Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro Thr Pro Ser Asp
1330 1335 1340
GAA CCA ACA CCG TCA GAC GAG CCA ACG CCA TCT GAC GAA CCG ACA CCG 4080
Glu Pro Thr Pro Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro Thr Pro 1345 1350 1355 1360
TCT GAT GAG CCA ACA CCA TCT GAT GAA CCG ACT CCG TCA GAG ACA CCT 4128
Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro Thr Pro Ser Glu Thr Pro
1365 1370 1375
GAG GAG CCG ACA CCG ACT ACT ACA CCG ACA CCA ACA CCG TCG ACA ACG 4176
Glu Glu Pro Thr Pro Thr Thr Thr Pro Thr Pro Thr Pro Ser Thr Thr
1380 1385 1390
CCT ACA AGT GGC AGC GGA GGC AGT GGT GGA AGC GGT GGT GGC GGC GGA 4224
Pro Thr Ser Gly Ser Gly Gly Ser Gly Gly Ser Gly Gly Gly Gly Gly
1395 14e0 145
GGT GGT GGA GGA ACT GTA CCT ACA TCT CCA ACA CCG ACA CCG ACA TCT 4272
Gly Gly Gly Gly Thr Val Pro Thr Ser Pro Thr Pro Thr Pro Thr Ser
1410 1415 1420
AAA CCG ACG TCT ACA CCT GCA CCG ACA GAA ATC GAA GAG CCT ACA CCA 4320
Lys Pro Thr Ser Thr Pro Ala Pro Thr Glu île Glu Glu Pro Thr Pro 1425 1430 1435 1440
TCT GAT GTG CCT GGT GCA ATC GGT GGA GAA CAT AGA GCA TAC TTA AGA 4368
Ser Asp Val Pro Gly Ala Ile Gly Gly Glu His Arg Ala Tyr Leu Arg
1445 1450 1455
GGA TAT CCG GAT GGA AGC TTC AGG CCT GAA AGA AAT ATA ACA AGA GCT 4416
Gly Tyr Pro Asp Gly Ser Phe Arg Pro Glu Arg Asn Ile Thr Arg Ala
1460 1465 1470
GAA GCG GCG GTA ATC HT GCT AAG TTG CTT GGA GCC GAT GAA AGC TAT 4464
Glu Ala Ala Val lie Phe Ala Lys Leu Leu Gly Ala Asp Glu Ser Tyr
1475 1480 1485
GGA GCT CAG TCT GCA AGT CCA TAT AGT GAT TTG GCT GAT ACT CAC TGG 4512
Gly Ala Gin Ser Ala Ser Pro Tyr Ser Asp Leu Ala Asp Thr His Trp
1490 1495 1500
GCT GCA TGG GCA ATC AAA TTT GCA ACA AGC CAG GGC TTG TTC AAA GGA 4560
Ala Ala Trp Ala lie Lys Phe Ala Thr Ser Gln Gly Leu Phe Lys Gly 1505 1518 1515 1520
TAT CCG GAC GGT ACG HT AAA CCT GAT CAG AAC ATA ACG AGA GCG GAA 4608
Tyr Pro Asp Gly Thr Phe Lys Pro Asp Gin Asn île Thr Arg Ala Glu
1525 1530 1535
TTC GCA ACT GTG GTA CTC CAC TTC CTG ACA AAA GTT AAG GGT CAG GAA 4656
Phe Ala Thr Val Val Leu His Phe Leu Thr Lys Val Lys Gly Gln Glu
1540 1545 1550
ATA ATG AGC AAG CTT GCA ACA ATA GAT ATA AGT AAT CCG AAG HT GAC 4704 île Met Ser Lys Leu Ala Thr île Asp île Ser Asn Pro Lys Phe Asp
1555 1560 1565
GAT TGT GTC GGA CAT TGG GCA CAA GAG HT ATT GAG AAA TTG ACA AGC 4752
Asp Cys Val Gly His Trp Ala Gin Glu Phe Ile Glu Lys Leu Thr Ser
1570 1575 1580
TTG GGT TAT ATT AGT GGC TAT CCT GAC GGA ACG TTC AAG CCG CAA AAC 48
Leu Gly Tyr île Ser Gly Tyr Pro Asp Gly Thr Phe Lys Pro Gin Asn 1585 1590 1595 1600
TAT ATT AAA CGT TCC GAA AGT GTG GCA CTG ATT AAC AGA GCT CTG GAG 4848
Tyr île Lys Arg Ser Glu Ser Val Ala Leu île Asn Arg Ala Leu Glu
1605 1610 1615
AGA GGT CCG CTT AAT GGA GCG CCG AAG CTC TTC CCG GAT GTT AAC GAA 4896
Arg Gly Pro Leu Asn Gly Ala Pro Lys Leu Phe Pro Asp Val Asn Glu
1620 1625 1630
TCA TAC TGG GCA TTT GGC GAC ATT ATG GAC GGT GCT CTC GAC CAC AGT 4944
Ser Tyr Trp Ala Phe Gly Asp île Met Asp Gly Ala Leu Asp His Ser
1635 1640 1645
TAC ATT ATC GAA GAT GAG AAA GAA AAA TTC GTT AAA TTG CTC GAA GAT 4992
Tyr Ile lie Glu Asp Glu Lys Glu Lys Phe Val Lys Leu Leu Glu Asp
1650 1655 1660 (4) INFORMATIONS POUR LA SEQ ID NO: 3:
(i) CARACTERISTIQUES DE LA SEQUENCE:
(A) LONGUEUR: 2064 paires de bases
(B) TYPE: nucléotide
(C) NOMBRE DE BRINS: simple
(ii) TYPE DE MOLECULE: ADN
(ix) CARACTERISTIQUE:
(A) NOM/CLE: ORF2p de Clostridium thermocellum
(B) EMPLACEMENT:1..2064
(xi) DESCRIPTION DE LA SEQUENCE: SEQ ID NO: 3:
ATG AAA AAA AAC AAT GTA TTA ACA ATA GCA GCT ATG ATA GCG CTT CTT 48
Met Lys Lys Asn Asn Val Leu Thr île Ala Ala Met île Ala Leu Leu
1 5 10 15
CTA ACC AGC TTA CTT ACA AGT ATA ACT TTT GGG GAG ACT TCG AGT ATA 96
Leu Thr Ser Leu Leu Thr Ser île Thr Phe Gly Glu Thr Ser Ser île
20 25 30
CCT TCA AGA ATA TCT ATG GAG CTT GAC AAG ACA AAA GCA AAC ATA GGC 144
Pro Ser Arg île Ser Met Glu Leu Asp Lys Thr Lys Ala Asn île Gly
35 40 45
GAC ATA ATT ATA GCC ACA ATA AGA ATT GAC AAT ATC AAT AAC TTT AGC 192
Asp île Ile île Ala Thr Ile Arg Ile Asp Asn lie Asn Asn Phe Ser
50 55 60
GGA TAT CAA HA AAT ATA AAG TAT GAT CCG TCA TAC CTC CAG GCA GTT 240
Gly Tyr Gln Leu Asn île Lys Tyr Asp Pro Ser Tyr Leu Gin Ala Val
65 70 75 80
AAT CCT TTG ACA GGA GAA CCG ATA AAA AAG AGA ACA ATG CCG GCA GTG 288
Asn Pro Leu Thr Gly Glu Pro lie Lys Lys Arg Thr Met Pro Ala Val
85 90 95
AAC GGC ACG GTG TTG TTA AAG GGA GAT CAG TAC AGT ATT ACT GAG GTT 336
Asn Gly Thr Val Leu Leu Lys Gly Asp Gln Tyr Ser Ile Thr Glu Val
100 105 110
GTA GAA AAT AAC GTC GAT GAA GGG ATT TTA AAT TTT GGC AAG GGA TAT 384
Val Glu Asn Asn Val Asp Glu Gly Ile Leu Asn Phe Gly Lys Gly Tyr
115 120 125
GCA AAT HA ACT GAA TAC AGG AAA AGC GGA AAA CCT GAA ACA ACC GGA 432
Ala Asn Leu Thr Glu Tyr Arg Lys Ser Gly Lys Pro Glu Thr Thr Gly
130 135 140
ATT ATT GGC AAG ATA GGA TTT AAA GCC TTA AAG CTT GGC AAG ACG GAG 480
Ile Ile Gly Lys Ile Gly Phe Lys Ala Leu Lys Leu Gly Lys Thr Glu 145 150 155 160
ATC AAA TTT GAG AAC ACA CCC GTC ATG CCT GGG GCA AAA GAA GGA ACA 528
Ile Lys Phe Glu Asn Thr Pro Val Met Pro Gly Ala Lys Glu Gly Thr
165 170 175
CTG CTG HT GAC TGG GAT GCA GAA ACT ATA ACG GAA TAT AAT GTA ATT 576
Leu Leu Phe Asp Trp Asp Ala Glu Thr Ile Thr Glu Tyr Asn Val île
180 185 190
CAG CCT AAA GAA CTT GCA ATA ACG TTA CCG GAC GAT GCA CAC ATT GCT 624 Gin Pro Lys Glu Leu Ala Ile Thr Leu Pro Asp Asp Ala His Ile Ala
195 200 205
TTG GAA CTT GAC AAG ACA AAA GTG AAA GTG GGA GAT GTA ATT GTT GCG 672
Leu Glu Leu Asp Lys Thr Lys Val Lys Val Gly Asp Val Ile Val Ala
210 215 220
ACA GTA AAA GCA AAG AAT ATG ACT AGT ATG GCG GGA ATT CAG GTA AAT 720
Thr Val Lys Ala Lys Asn Met Thr Ser Met Ala Gly Ile Gln Val Asn 225 230 235 240
ATT AAA TAT GAC CCT GAA GTA TTG CAG GCG ATT GAT CCT GCG ACG GGA 768 île Lys Tyr Asp Pro Glu Val Leu Gln Ala Ile Asp Pro Ala Thr Gly
245 250 255
AAA CCG HT ACA AAA GAA ACA HA CTT GTG GAC CCG GAA CTG TTA TCA 816
Lys Pro Phe Thr Lys Glu Thr Leu Leu Val Asp Pro Glu Leu Leu Ser
260 265 270
AAC AGA GAA TAT AAT CCG TTG TTA ACA GCA GTT AAT GAC ATA AAT TCC 864
Asn Arg Glu Tyr Asn Pro Leu Leu Thr Ala Val Asn Asp île Asn Ser
275 280 285
GGC ATT ATA AAT TAT GCA TCT TGT TAT GTA TAT TGG GAT TCC TAC AGA 912
Gly Ile île Asn Tyr Ala Ser Cys Tyr Val Tyr Trp Asp Ser Tyr Arg
290 295 300
GAA TCA GGA GTA TCT GAA AGC ACC GGA ATA ATT GGA AAG GTT GGC HT 960
Glu Ser Gly Val Ser Glu Ser Thr Gly Ile Ile Gly Lys Val Gly Phe 305 310 315 320
AAA GTG CTG AAA GCT GCC AAC ACC ACA GTA AAA CTG GAA GAA ACA AGA 1008
Lys Val Leu Lys Ala Ala Asn Thr Thr Val Lys Leu Glu Glu Thr Arg
325 330 335 HT ACA CCA AAT TCG ATA GAC GGT ACT TTG GTA ATT GAT TGG TAT GGC 1056
Phe Thr Pro Asn Ser Ile Asp Gly Thr Leu Val lie Asp Trp Tyr Gly
340 345 350
CAA CAG ATA GTT GGT TAT AAA GTA ATA CAG CCC GAC AAA ATT ACT GTG 1104 Gin <RTI
GAA ATA GAA GAA CCC GAA CCT GAA ATA CCG GGC ACT GTT GGA ATA CAT 1440
Glu Ile Glu Glu Pro Glu Pro Glu Ile Pro Gly Thr Val Gly Ile His 465 470 475 480
TAT TCA TAC CTG ACA GGT TAT CCG GAC AAA ATG TTC AGA CCT GAA AAG 1488
Tyr Ser Tyr Leu Thr Gly Tyr Pro Asp Lys Met Phe Arg Pro Glu Lys
485 490 495
AGT ATT ACA AGA GCT GAA GCA GCC GTG ATT TTT GCA AAA CTT TTG GGA 1536
Ser Ile Thr Arg Ala Glu Ala Ala Val Ile Phe Ala Lys Leu Leu Gly
500 505 510
GCA AAC GAA AAT ACA AAG ATA AAC TAT AAT GTT TCA TAC ACC GAT GTT 1584
Ala Asn Glu Asn Thr Lys île Asn Tyr Asn Val Ser Tyr Thr Asp Val
515 520 525
GAC AGC TCC CAT TGG GCA AGT TGG GCA ATC AAA TTT GTA TCA TAC AAG 1632
Asp Ser Ser His Trp Ala Ser Trp Ala île Lys Phe Val Ser Tyr Lys
530 535 540
AAA CTG TTT ACC GGA TAT CCT GAT GGC TCG TTC AAG CCT AAT CAG AAT 1680
Lys Leu Phe Thr Gly Tyr Pro Asp Gly Ser Phe Lys Pro Asn Gîn Asn 545 550 555 560
ATA ACG AGA GCC GAA TTT TCA ACG GTT GTG TTT AAG CTT CTT GTA TCT 1728 île Thr Arg Ala Glu Phe Ser Thr Val Val Phe Lys Leu Leu Val Ser
565 570 575
GAG AAA GGT CTA AAA GAA GAA AAG ATT GAA AAG TCC AAG TTT GGT GAT 1776
Glu Lys Gly Leu Lys Glu Glu Lys Ile Glu Lys Ser Lys Phe Gly Asp
580 585 590
ACA AAG GGC CAC TGG GCA CAA CAG TTT ATT GAA CAG CTG TCA GAC CTT 1824
Thr Lys Gly His Trp Ala Gln Gln Phe Ile Glu Gîn Leu Ser Asp Leu
595 600 605
GGA TAC ATC AAC GGA TAT CCT GAT GGT ACA TTC AAG CCC AAC AAC AAT 1872
Gly Tyr Ile Asn Gly Tyr Pro Asp Gly Thr Phe Lys Pro Asn Asn Asn
610 615 620
ATC AAA CGA TCA GAA AGT GTT GCC CTG ATA AAC AGA GCT ATG GGA AGA 1920
île Lys Arg Ser Glu Ser Val Ala Leu Ile Asn Arg Ala Met Gly Arg
625 630 635 640
GGG CCT TTG CAT GGC GCA CCG CAG GTA TTC GAG GAT GTT CCT CAG ACA 1968
Gly Pro Leu His Gly Ala Pro Gîn Val Phe Glu Asp Val Pro Gln Thr
645 650 655
CAC TGG GCT TTC AAA GAT ATT GCA GAG GGC GTG CTC AAT CAC AGA TAC 2016
His Trp Alo Phe Lys Asp île Ala Glu Gly Val Leu Asn His Arg Tyr
660 665 670
AAA CTG GAC AAT GAG GGC AAA GAA CAA TTG CTG GAG ATA ATT GAT AAC 2064
Lys Leu Asp Asn Glu Gly Lys Glu Gln Leu Leu Glu Ile Ile As
675 680 685
DESCRIPTION DE LA SEQUENCE : SEQ ID NO: 4:
SEQUENCE NUCLEOTIDIQUE DE LA PROTEINE CipA
ATGAGAAAAGTCATCAGTATGCTCTTAGTTGTGGCTATGCTGACGACGATTTTTGCGGCGATGATAC
CGC AGACAGTATCGGCGGCCACAATGACAGTCG AGATCGGCAAAGTTACAGCAGCCGTTGGATCAAAA
OTAGA
AATACCTATAACCCTGAAAGGAGTGCCATCCAAAGGAATGGCCAATTGCGACTTCGTATTGGGTTA
TGAT
CCAAATGTGCTGGAAGTAACAGAAGTAAAACCAGGAAGCATAATAAAAGATCCGGATCCTAGCAA
GAGCT TTGATAGCGCAATATATCCGGATCGAAAGATGATTGTATTTCTGTTTGCAGAAGACAGTGGAAGAG
GAAC
GTATGCAATAACTCAGGATGGAGTATTTGCAACAATTGTAGCCACTGTCAAATCAGCTGCAGCGGC
ACCG
ATTACHTGCTTGAAGTAGGTGCATTTGCGGACAACGATTTAGTAGAAATAAGCACAACTTTTGTCG
CGG
GCGGAGTAAATCTTGGTAGTTCCGTACCGACAACACAGCCAAATGTTCCGTCAGACGGTGTGGTAG
TAGA
AATTGGCAAAGTTACGGGATCTGTTGGAACTACAGTTGAAATACCTGTATATTTCAGAGGAGTTCC
ATCC
AAAGGAATAGCAAACTGCGACTTTGTGTTCAGATATGATCCGAATGTATTGGAAATTATAGGGATA
GATC
CCGGAGACATAATAGTTGACCCGAATCCTACCAAGAGCTTTGATACTGCAATATATCCTGACAGAA
AGAT
AATAGTATTCCTGTTTGCGGAAGACAGCGGAACAGGAGCGTATGCAATAACTAAAGACGGAGTATT
TGCA AAAATAAGAGCAACTGTAAAATCAAGTGCTCCGGGCTATATTACTTTCGACGAAGTAGGTGGA m
GCAG
ATAATGACCTOGTAGAACAGAAGGTATCATTTATAGACGGTGGTGTTAACGTTGGCAATGCAACAC
CGAC
CAAGGGAGCAACACCAACAAATACAGCTACGCCGACAAAATCAGCTACGGCTACGCCCACCAGGC
CATCG
GTACCGACAAACACACCGACAAACACACCGGCAAATACACCGGTATCAGGCAATTTGAAGGTTGA
ATTCT
ACAACAGCAATCCTTCAGATACTACTAACTCAATCAATCCTCAGTTCAAGGTTACTAATACCGGAA
GCAG
TGCAATTGATTTGTCCAAACTCACATTGAGATATTATTATACAGTAGACGGACAGAAAGATCAGAC
CTTC
TGGTGTGACCATGCTGCAATAATCGGCAGTAACGGCAGCTACAACGGAATTACTTCAAATGTAAAA
GGAA
CATTTGTAAAAATGAGTTCCTCAACAAATAACGCAGACACCTACCTTGAAATAAGCTTTACAGGCG
GAAC
TCTTGAACCGGGTGCACATGTTCAGATACAAGGTAGA m GCAAAGAATGACTGGAGTAACTATAC
ACAG TCAAATGACTACTCATTCAAGTCTGCTTCACAGTTTGTTGAATGGGATCAGGTAACAGCATACTTGA
ACG
GTGTTCTTGTATGGGGTAAAGAACCCGGTGGCAGTGTAGTACCATCAACACAGCCTGTAACAACAC
CACC TGCAACAACAAAACCACCTGCAACAACAAAACCACCTGCAACAACAATACCGCCGTCAGATGATCC
GAAT
GCAATAAAGATTAAGGTGGACACAGTAAATGCAAAACCGGGAGACACAGTAAATATACCTGTAAG
ATTCA
GTGGTATACCATCCAAGGGAATAGCAAACTGTGACTTTGTATACAGCTATGACCCGAATGTACTTG
AGAT
AATAGAGATAAAACCGGGAGAATTGATAGTTGACCCGAATCCTGACAAGAGCTTTGATACTGCAGT
ATAT
CCTGACAGAAAGATAATAGTATTCCTGTTTGCAGAAGACAGCGGAACAGGAGCGTATGCAATAACT
AAAG
ACGGAGTAHTGCTACGATAGTAGCGAAAGTAAAATCCGGAGCACCTAACGGACTCAGTGTAATCA
AATT
TGTAGAAGTAGGCGGATTTGCGAACAATGACCTTGTAGAACAGAGGACACAGTTCTTTGACGGTGG
AGTA
AATGTTGGAGATACAACAGTACCTACAACACCTACAACACCTGTAACAACACCGACAGATGATTCG
AATG
CAGTAAGGATTAAGGTGGACACAGTAAATGCAAAACCGGGAGACACAGTAAGAATACCTGTAAGA
TTCAG
CGGTATACCATCCAAGGGAATAGCAAACTGTGACTTTGTATACAGCTATGACCCGAATGTACTTGA
GATA
ATAGAGATAGAACCGGGAGACATAATAGTTGACCCGAATCCTGACAAGAGCTTTGATACTGCAGTA
TATC CTGACAGAAAGATAATAGTATTCCTGTTTGCGGAAGACAGCGGAACAG GAGCGTATGCAATAACTA
AAGA
CGGAGTAHTGCTACGATAGTAGCGAAAGTAAAATCCGGAGCACCTAACGGACTCAGTGTAATCAA
ATTT
GTAGAAGTAGGCGGATTTGCGAACAATGACCTTGTAGAACAGAAGACACAGTTCTTTGACGGTGGA
GTAA
ATGTTGGAGATACAACAGAACCTGCAACACCTACAACACCTGTAACAACACCGACAACAACAGAT
GATCT
GGATGCAGTAAGGATTAAAGTGGACACAGTAAATGCAAAACCGGGAGACACAGTAAGAATACCTG
TAAGA
TTCAGCGGTATACCATCCAAGGGAATAGCAAACTGTGACTTTGTATACAGCTATGACCCGAATGTA
CTTG AGATAATAGAGATAGAACCGGGAGACATAATAGTTGACCCGAATCCTGACAAGAGCTTTGATACTG
CAGT ATATCCTGACAGAAAGATAATAGTATTCCTGTTTGCGGAAGACAGCGGAACAGGAGCGTATGCAAT
AACT
AAAGACGGAGTATTTGCTACGATAGTAGCGAAAGTAAAATCCGGAGCACCTAACGGACTCAGTGT
AATCA
AATTTGTAGAAGTAGGCOGAHTGCGAACAATGACCTTGTAGAACAGAAGACACAGTTCTTTGACG
GTGG AGTAAATGTTGGAGATACAACAGAACCTGCAACACCTACAACACCTGTAACAACACCGACAACAA
CAGAT GATCTGGATGCAGTAAGGATTAAAGTGGACACAGTAAATGCAAAACCGGGAGACACAGTAAGAAT
ACCTG
TAAGATTCAGCGGTATACCATCCAAGGGAATAGCAAACTGTGACTTTGTATACAGCTATGACCCGA
ATGT ACTTGAGATAATAGAGATAGAACCGGGAGACATAATAGTTGACCCGAATCCTGACAAGAGCTTTGA
TACT
GCAGTATATCCTGACAGAAAGATAATAGTATTCCTGTTTGCAGAAGACAGCGGAACAGGAGCGTAT
GCAA
TAACTAAAGACGGAGTATTTGCTACGATAGTAGCGAAAGTAAAAGAAGGAGCACCTAACGGACTC
AGTGT
AATCAAATTTGTAGAAGTAGGCGGATTTGCGAACAATGACCTTGTAGAACAGAAGACACAGTTCTT
TGAC
GGTGGAGTAAATGTTGGAGATACAACAGAACCTGCAACACCTACAACACCTGTAACAACACCGAC
AACAA
CAGATGATCTGGATGCAGTAAGGATTAAAGTGGACACAGTAAATGCAAAACCGGGAGACACAGTA
AGAAT
ACCTGTAAGATTCAGCGGTATACCATCCAAGGGAATAGCAAACTGTGACTTTGTATACAGCTATGA
CCCG
AATGTACTTGAGATAATAGAGATAGAACCGGGAGAATTGATAGTTGACCCGAATCCTACCAAGAGC
TTTG ATACTGCAGTATATCCTGACAGAAAGATGATAGTATTCCTGTTTGCGGAAGACAGCGGAACAGGAG
CGTA
TGCAATAACTGAAGATGGAGTATTTGCTACGATAGTAGCGAAAGTAAAATCCGGAGCACCTAACGG
ACTC
AGTGTAATCAAAHTGTAGAAGTAGGCGGATTTGCGAACAATGACCTTGTAGAACAGAAGACACAG
TTCT
HGACGGTGGAGTAAATGTTGGAGATACAACAGAACCTGCAACACCTACAACACCTGTAACAACAC
CGAC
AACAACAGATGATCTGGATGCAGTAAGGATTAAAGTGGACACAGTAAATGCAAAACCGGGAGACA
CAGTA
AGAATACCTGTAAGATTCAGCGGTATACCATCCAAGGGAATAGCAAACTGTGACTTTGTATACAGC
TATG
ACCCGAATGTACTTGAGATAATAGAGATAGAACCGGGAGACATAATAGTTGACCCG AATCCTGACA
AGAG
C m GATACTGCAGTATATCCTGACAGAAAGATAATAGTATTCCTGTTTGCAGAAGACAGCGGAAC
GGGA
GCGTATGCAATAACTAAAGACGGAGTATTTGCTACGATAGTAGCGAAAGTAAAAGAAGGAGCACC
TAACG GACTCAGTGTAATCAAATTTGTAGAAGTAGGCGGATTTGCGAACAATGACCTTGTAGAACAGAAGA
CACA
GTTCTTTGACGGTGGAGTAAATGrFGGAGATACAACAGTACCTACAACATCGCCGACAACAACACC
GCCA GAGCCGACGATAACTCCGAACAAGTTGACACTTAAGATAGGCAGAGCAGAAGGAAGACCTGGAGA
CACGG
TGGAAATACCGGTTAACTTGTATGGAGTACCTCAAAAAGGAATAGCAAGCGGTGACTTCGTAGTAA
GCTA
TGACCCGAATGTACTTGAGATAATAGAGATAGAACCGGGAGAATTGATAGTTGACCCGAATCCTAC
CAAG
AGCHTGATACTGCAGTATATCCTGACAGAAAGATGATAGTATTCCTGTTTGCGGAAGACAGCGGA
ACAG
GAGCGTATGCAATAACTGAAGATGGAGTATTTGCTACGATAGTAGCGAAAGTAAAAGAAGGAGCA
CCTGA
AGGATTCAGTGCAATAGAAAHTCTGAGTTTGGTGCATTTGCAGATAATGATCTGGTAGAAGTGGA
AACT
GACCTTATCAATGGTGGAGTACTTGTAACTAATAAACCTGTAATAGAAGGATATAAAGTATCCGGA
TACA
TTTTGCCAGACTTCTCCTTCGACGCTACTGTTGCACCACTTGTAAAGGCCGGATTCAAAGTTGAAAT
AGT
AGGAACAGAATTGTATGCAGTAACAGATGCAAACGGATACTTTGAAATAACCGGAGTACCTGCAA
ATGCA
AGCGGATATACATTGAAGATTTCAAGAGCAACTTACTTGGACAGAGTAATTGCAAATGTTGTAGTA
ACGG
GAGATACTTCAGTTTCAACTTCACAGGCTCCAATAATGATGTGGGTAGGAGACATAGTGAAAGACA
ATTC
TATCAACCTGTTGOACGTTGCAGAAGTTATCCGTTGCTTCAACGCTACTAAAGGAAGCGCAAACTA
CGTA
GAAGAACTTGACATTAATAGAAACGGCGCAATTAACATGCAAGACATAATGATTGTTCATAAGCAC
TTTG
GAGCTACATCAAGTGATTACGACGCACAGTAA
SEQUENCE DE LA PROTEINE CipA
MRKV!SMLLVVAMLTT!FAAMlPQTVSAATMTVEIGKVTAAVGSKVEIPITLKGvPSKGMANCDFvLGY DPNVLEVTEVKPGSIIKDPDPSKSFDSAIYPDRKMIVFLFAEDSGRGTYA ITQDGVFATI VATVKSAAAAP
ITLLEVGAFADNDLVEISTTFVAGGVNLGSSVPTTQPNVPSDGVVVEIGKVTGSVGTTVEIPVYFRGVPS KGIANCDFVFRYDPNVLEIIGIDPGDIIVDPNPTKSFDTAIYPDRKIIVFLFAEDSGTGAYAITKDGVFAKIR
ATVKSSAPGYITFDEVGGFADNDLVEQKVSFI DGGVNVGNATPTKGATPTNTATPTKSATATPTRPSVPT
NTPTNTPANTPVSGNLKVEFYNSNPSDTTNSINPQFKVTNTGSSAIDLSKLTLRYYYTVDGQKDQTFWC DHAAIIGSNGSYNGITSNVKGTFVKMSSSTNNADTYLEISFTGGTLEPGAHVQ IQGRFAKNDWSNYTQS
NDYSFKSASQFVEWDQVTAYLNGVLVWGKEPGGSVVPSTQPVTTPPATTKPPAHKPPAHIPPSDDPN
AIKIKVDTVNAKPGDTVNIPVRFSGIPSKGIANCDFVYSYDPNVLEIIEIKPGELIVDPNPDKSFDTAVYPD
RKIIVFLFAEDSGTGAYAITKDGVFATIVAKVKSGAPNGLSVIKFVEVGGFANNDLVEQRTQFFDGGVN
VGDTTVPTTPTTPVTTPTDDSNAVRIKVDTVNAKPGDTVRIPVRFSGIPSKGIANCDFVYSYDPNVLEIIEI
EPGDIIVDPNPDKSFDTAVYPDRKIIVFLFAEDSGTGAYAITKDGVFATIVAKVKSGAPNGLSVIKFVEVG
GFANNDLVEQKTQFFDGGVNVGDTTEPATPTTPVTTPTTTDDLDAVRIKVDTVNAKPGDTVRIPVRFSG
IPSKGIANCDFVYSYDPNVLEIIEIEPGDIIVDPNPDKSFDTAVYPDRKIIVFLFAEDSGTGAYAITKDGVFA
TIVAKVKSGAPNGLSVIKFVEVGGFANNDLVEQKTQFFDGGVNVGDTIEPATPTTPVTTPTTTDDLDAV RIKVDTVNAKPGDTVRIPVRFSGIPSKGIANCDFVYSYDPNVLEIIEIEPGDIIVDPNPDKSFDTAVYPDRKI lVFLFAEDSGTGAYAlTKDGVFATlVAKVKEGAPNGLSVlKFVEVGG FANNDLVEQKTQFFDGGVNVGD
TTEPATPTTPVTTPTTTDDLDAVRIKVDTVNAKPGDTVRIPVRFSGIPSKGIANCDFVYSYDPNVLEIIEIEP GELIVDPNPTKSFDTAVYPDRKMIVFLFAEDSGTGAYAITEDGVFATIVAKVKSGAPNGLSVIKFVEVGG
FANNDLVEQKTQFFDGGVNVGDTTEPATPTTPVTTPEDDLDAVRIKVDTVNAKPGDTVRIPVRFSGIP
SKGIANCDFVYSYDPNVLEIIEIEPGDIIVDPNPDKSFDTAVYPDRKIIVFLFAEDSGTGAYAITKDGVFATI
VAKVKEGAPNGLSVIKFVEVGGFANNDLVEQKTQFFDGGVNVGDTTVPTTSPTTTPPEPTITPNKLTLKI
GRAEGRPGDTVEIPVNLYGVPQKGIASGDFVVSYDPNVLEIIEIEPGELIVDPNPTKSFDTAVYPDRKMIV
FLFAEDSGTGAYAITEDGVFATIVAKVKEGAPEGFSAIEISEFGAFADNDLVEVETDLINGGVLVTNKPVI
EGYKVSGYILPDFSFDATVAPLVKAGFKVEIVGTELYAVTDANGYFEITGVPANASGYTLKISRATYLDR
VIANVVVTGDTSVSTSQAPIMMWVGDIVKDNSINLLDVAEVIRCFNATKGSANYVEELDINRNGAINMQ
DIMIVHKHFGATSSDYDAQ LIST OF SEQUENCES (1) GENERAL INFORMATION:
(i) DEPOSITOR:
(A) NAME: INSTITUT PASTEUR
(B) STREET: 28 Rue du Docteur Roux
(C) CITY: PARIS
(E) COUNTRY: FRANCE
(F) POSTAL CODE: 75724 CEDEX 15
(ii) TITLE OF THE INVENTION: "POLYPEPTIDE HAVING A DOMAIN
COHESIN, ENZYMATIC COMPOSITION COMPRISING SAME AND DNA FRAGMENTS
CODING FOR THESE POLYPEPTIDES "
(iii) NUMBER OF SEQUENCES: 4
(iv) COMPUTER-DETACHABLE FORM:
(A) TYPE OF SUPPORT: Floppy disk
(B) COMPUTER: IBM PC compatible
(C) OPERATING SYSTEM: PC-DOS / MS-DOS
(D) SOFTWARE: PatentIn Release &num; 1.0, Version &num; 1.3 (EPO) (2) INFORMATION FOR SEQ ID NO: 1:
(t) CHARACTERISTICS OF THE SEQUENCE:
(A) LENGTH: 1893 base pairs
(B) TYPE: nucleotide
(C) NUMBER OF STRANDS: single
(ii) TYPE OF MOLECULE: DNA
(ix) CHARACTERISTIC:
(A) NAME / KEY: Clostridium thermocellum SdbA
(B) LOCATION: 1..1893
(xi) DESCRIPTION OF THE SEQUENCE: SEQ ID NO: 1:
ATG AGG AAG AAA AAA AGA TTA ATA TCA TTA CTG CTT GCG GTT TH ATC 48
Met Arg Lys Lys Lys Arg Leu island Ser Leu Leu Leu Ala Val Phe Island
1 5 10 15
GCC GTT GCA TGT CTG CCG GCG GGA ATT GCA AGG GCA GAT AAA GCC TCG 96
Ala Val Ala Cys Leu Pro Ala Gly Ile Ala Arg Ala Asp Lys Ala Ser
20 25
AGC ATT GAG CTT AAG TTT GAC CGC AAT AAG GGA GAA GTT GGA GAT ATA 144
Ser Ile Glu Leu Lys Phe Asp Arg Asn Lys Gly Glu Val Gly Asp Ile
35 4 45
CTT ATT GGT ACC GTA AGG ATA AAC AAT ATC AAG AAT TTC GCA GGA TTT 192
Leu île Gly Thr Val Arg Ile Asn Asn Ile Lys Asn Phe Ala Gly Phe
get 55 6
CAG GTA AAC ATT GTA TAT GAT CCA AAA GTC TTA ATG GCT GTT GAC CCT 24 Gln Val Asn Ile Val Tyr Asp Pro Lys Val Leu Met Ala Val Asp Pro
65 7 75 8
GAA ACG GGG AAA GAA TTT ACT TCT TCA ACA TTT CCG CCA GGA CGC ACT 288
Glu Thr Gly Lys Glu Phe Thr Ser Ser Thr Phe Pro Pro Gly Arg Thr
85 9 95
GTA CTG AAA AAC AAT GCT TAC GGC CCA ATA CAG ATT GCG GAC AAT GAT 336
Val Leu Lys Asn Asn Ala Tyr Gly Pro Ile Gln Ile Ala Asp Asn Asp
100 105 110
CCG GAA AAA GGG ATA CTG AAC TTC GCG CTT GCA TAT TCA TAT ATT GCG 384
Pro Glu Lys Gly Ile Leu Asn Phe Ala Leu Ala Tyr Ser Tyr Ile Ala
115 120 125
GGA TAC AAA GAA ACA GGA GTA GCG GAG GAA AGC GGC ATA AH GCG AAA 432
Gly Tyr Lys Glu Thr Gly Val Ala Glu Glu Ser Gly Ile Ala Lys
130 135 140
ATT GGA TTT AAA ATA CTC CAG AAA AAG AGC ACT GCC GTA AAA TTC CAG 480
Ile Gly Phe Lys Ile Leu Gîn Lys Lys Ser Thr Ala Val Lys Phe Gîn 145 150 155 160
GAT ACA TTA AGC ATG CCC GGA GCT ATT TCG GGA ACA CAG CTG TTT GAC 528
Asp Thr Leu Ser Met Pro Gly Ala Ile Ser Gly Thr Gln Leu Phe Asp
165 170 175
TGG GAC GGA GAA GTT ATT ACC GGA TAT GAG GTA ATA CAG CCG GAT GTG 576
Trp Asp Gly Glu Val Ile Thr Gly Tyr Glu Val Ile Gln Pro Asp Val
180 185 190
CTG AGT TTG GGT GAC GAG CCT TAT GAG ACA CCG GGA ACG GAT ATT CCG 624
Leu Ser Leu Gly Asp Glu Pro Tyr Glu Thr Pro Gly Thr Asp Ile Pro
195 200 205
ATA TCC GAC AAT CCG GCA GCA ACT CCG TCA TCC ACG CCG TCA GTT ACT 672
Ile Ser Asp Asn Pro Ala Ala Thr Pro Ser Ser Thr Pro Ser Val Thr
210 215 220
CCT TCA CCG GAA GTT AAA CCG ACT CAG ACG CCT TCG CCT GCA GAA AAT 720
Pro Ser Pro Glu Val Lys Pro Thr Gîn Thr Pro Ser Pro Ala Glu Asn 225 230 235 240
TCT GCA AAA GTG GAG CTT GAA CCT GTG TTG GAT AAT GCA ACA GGA GAA 768
Ser Ala Lys Val Glu Leu Glu Pro Val Leu Asp Asn Ala Thr Gly Glu
245 250 255
GCA AAG GCG GCA ATA GAT GAA GAA AAA TTA AAC AAG GCT CTT GAT GAA 816
Ala Lys Ala Ala Island Asp Glu Glu Lys Leu Asn Lys Ala Leu Asp Glu
260 265 270
GCG AAA AAA TCG GAA GAT GAC AAA CTT GTG GAA CTT AAC ATA AAG AAG 864
Ala Lys Lys Ser Glu Asp Asp Lys Leu Val Glu Leu Asn Ile Lys Lys
275 280 285
GTT GAA AAT GCC GAT GCT TAC ATA CAA CAG CTT CCG GCG AAA TTC CTG 912
Val Glu Asn Ala Asp Ala Tyr Île Gîn Gln Leu Pro Ala Lys Phe Leu
290 295 300
ATA AAA AGT GAC GCC GAA TAT AAG CTG AGA ATA GCT ACA GAG CAG GGA 960
Ile Lys Ser Asp Ala Glu Tyr Lys Leu Arg Ile Ala Thr Glu Gîn Gly 305 310 315 320 AH ATA GAA GTA CCG GCC AAC ATG CTG AAT ACT GCG GAT ATT TCA AAG 1008
Ile Ile Glu Val Pro Ala Asn Met Leu Asn Thr Ala Asp Ile Ser Lys
325 330 335
CTT GTA AAA AAT GAC TCC GTT GTT GAA TTC GTC ATA AGA AAA GTA AAA 1056
Leu Val Lys Asn Asp Ser Val Val Glu Phe Val Ile Arg Lys Val Lys
340 345 350
GTC GAT GAA CTT GGT GCA GAG CTC AAA GAG AAG ATA GGC AAC AGG CCG 1104
Val Asp Glu Leu Gly Ala Glu Leu Lys Glu Lys Ile Gly Asn Arg Pro
355 360 365
GTG ATT GAC ATA AGC GTG GH GTT GAC GGC AAA AAA GTT GAA TGG AGC 1152
Val Ile Asp île Ser Val Val Asp Asp Gly Lys Lys Val Glu Trp Ser
370 375 380
AAT TAC AAA GCC AAG GTT AAA ATA TCA ATT CCT TAC AAG CCT GAT GCA 1200
Asn Tyr Lys Ala Lys Val Lys Ile Ser île Pro Tyr Lys Pro Asp Ala 385 390 395 400
AAA GAG CTG GAG AAC CAC GAG CAT ATT GTT GTA CTC CAT ATT GAT GAC 1248
Lys Glu Leu Glu Asn His Glu His Ile Val Val Leu His Ile Asp Asp
405 410 415
GCC GGC AAG GCA GH TCC GTA CCC AGC GGA AAA TAT GAA CCT TCT TTG 1296
Ala Gly Lys Ala Val Ser Val Pro Ser Gly Lys Tyr Glu Pro Ser Leu
420 425 430
GGC GTC GTT ACG TTT GAG ACG AAT CAT TTA AGC AAG TAT GCG GTT TCA 1344
Gly Val Val Thr Phe Glu Thr Asn His Leu Ser Lys Tyr Ala Val Ser
435 440 445
TAT GTT TAC AAG ACT TTC GCG GAT ATT GGT TCA TAT GCC TGG GCT AAA 1392
Tyr Val Tyr Lys Thr Phe Ala Asp Ile Gly Ser Tyr Ala Trp Ala Lys
450 455 460
AAG CAG ATA GAG GTT TTG GCT TCC AAA GGA GTA ATT AAC GGT ACA TCC 1440
Lys Gln Glu Island Val Leu Ala Ser Lys Gly Val Asn Island Gly Thr Ser 465 470 475 480
GAT ACC ACT HT ACG CCC CAG GCA GAC ATA ACA AGG GCG GAT TTC ATG 1488
Asp Thr Thr Phe Thr Pro Gin Ala Asp Ile Thr Arg Ala Asp Phe Met
485 490 495
ATA CTT CTT GTA AAG GCA CTG GGA TTG ACT GCC GAG GTT ACT TCC AAT 1536 Île Leu Leu Val Lys Ala Leu Gly Leu Thr Ala Glu Val Thr Ser Asn
500 505 510
TTT GAT GAT GTG TCC GAA AAA GAC TAC TAT TAT GAA TAC GTG GGA ATT 1584
Phe Asp Asp Val Ser Glu Lys Asp Tyr Tyr Tyr Glu Tyr Val Gly Island
515 520 525
GCA AAA GAG CTT GGA ATT ACG ACA GGA GTC GGA AAC AAC AAG TTC AAT 1632
Ala Lys Glu Leu Gly Île Thr Thr Gly Val Gly Asn Asn Lys Phe Asn
530 535 540
CCG AAA GCC AAA ATT ACA AGA CAG GAT ATG ATG GTA CTT ACA ACA AAT 1680
Pro Lys Ala Lys lie Thr Arg Gln Asp Met Met Val Leu Thr Thr Asn 545 550 555 560
GCT CTC AGG ATT GCA GGA AAA ATA TCG AGC ACA GGA ACC CGC GCT GAT 1728
Ala Leu Arg lie Ala Gly Lys island Ser Ser Thr Gly Thr Arg Ala Asp
565,570,575
GTT GAA AGA TTT TCG GAC AAG GAC CAG ATA GCT TCA TAT GCG GTT GAA 1776
Val Glu Arg Phe Ser Asp Lys Asp Gln Ile Ala Ser Tyr Ala Val Glu
580 585 590
GGC GTT GCA ACC TTG GTA AAA GAA GGT ATT GTA GTG GGA AGC GGC GAT 1824
Gly Val Ala Thr Leu Val Lys Glu Gly Ile Val Val Gly Ser Gly Asp
595 600 605
ATT ATA AAT CCA AGG GGA AAT GCT TCA AGA GCC GAA CTT GCA GCA ATC 1872 Île Asn Pro Arg Gly Asn Ala Ser Arg Ala Glu Leu Ala Ala Ile
610 615 620
ATA TAC AAG ATT TAC TAC AAG 1893
Ile Tyr Lys Ile Tyr Tyr Lys 625 630 (3) INFORMATION FOR SEQ ID NO: 2:
(i) CHARACTERISTICS OF THE SEQUENCE:
(A) LENGTH: 4992 base pairs
(B) TYPE: nucleotide
(C) NUMBER OF STRANDS: single
(ii) TYPE OF MOLECULE: DNA
(ix) CHARACTERISTIC:
(A) NAME / KEY: Clostridium thermocellum OlpB
(B) LOCATION: 1..4992
(xi) DESCRIPTION OF THE SEQUENCE: SEQ ID NO: 2:
ATG AAA CGA AAA AAT AAA GTA TTA TCA ATT TTG TTA ACT CTG CTG CTA 48
Met Lys Arg Lys Asn Lys Val Leu Ser Island Leu Leu Thr Leu Leu Leu
1 5 10 15
ATA ATC TCT ACC ACA TCC GTA AAC ATG TCT TTT GCT GAA GCA ACT CCA 96
Island Ser Thr Thr Ser Val Asn Met Ser Phe Ala Glu Ala Thr Pro
20 25 30
AGT ATT GAA ATG GTT CTT GAT AAA ACT GAA GTC CAT GTA GGA GAT GTA 144
Ser Ile Glu Met Val Leu Asp Lys Thr Glu Val His Val Gly Asp Val
35 40 45
ATA ACG GCC ACA ATA AAA GTC AAT AAC ATT AGA AAA TTG GCG GGA TAT 192
Ile Thr Ala Thr Ile Lys Val Asn Asn Ile Arg Lys Leu Ala Gly Tyr
50 55 60
CAG CTA AAT ATC AAA TTT GAC CCT GAA GTT TTA CAG CCG GTA GAC CCT 240 Gîn Leu Asn Ile Lys Phe Asp Pro Glu Val Leu Gln Pro Val Asp Pro
65 70 75 80
GCA ACA GGA GAG GAA TTT ACT GAT AAG TCC ATG CCG GTA AAT AGG GTT 288
Ala Thr Gly Glu Glu Phe Thr Asp Lys Ser Met Pro Val Asn Arg Val
85 90 95
TTG CTG ACA AAC AGC AAA TAT GGA CCT ACT CCT GTG GCG GGT AAC GAT 336
Leu Leu Thr Asn Ser Lys Tyr Gly Pro Thr Pro Val Ala Gly Asn Asp 100 105 110
ATA AAG TCA GGA AH ATT AAT TTT GCT ACG GGA TAT AAC AAT TTA ACA 384
Ile Lys Ser Gly Île Asn Phe Ala Thr Gly Tyr Asn Asn Leu Thr
115 120 125
GCG TAC AAA TCC AGC GGA ATA GAC GAA CAT ACA GGA ATA ATA GGA GAG 432
Ala Tyr Lys Ser Ser Gly Ile Asp Glu His Thr Gly Ile Gly Glu Island
130 135 140
ATT GGT TTT AAA GTT TTA AAG AAA CAA AAT ACG TCT ATT AGG TTT GAA 480
Ile Gly Phe Lys Val Leu Lys Lys Gîn Asn Thr Ser Ile Arg Phe Glu 145 150 155 160
GAT ACA TTA TCG ATG CCC GGG GCA ATA TCG GGA ACA AGT TTG TTT GAC 528
Asp Thr Leu Ser Met Pro Gly Ala island Ser Gly Thr Ser Leu Phe Asp
165 170 175
TGG GAT GCA GAA ACT ATA ACA GGA TAT GAG GTA ATA CAG CCG GAT CTT 576
Trp Asp Ala Glu Thr Île Thr Gly Tyr Glu Val Île Gln Pro Asp Leu
180 185 190
ATA GTT GTA GAG GCA GAA CCG TTA AAA GAC GCC AGC GTG GCT CTG GAA 624
Ile Val Val Glu Ala Glu Pro Leu Lys Asp Ala Ser Val Ala Leu Glu
195 200 205
CTG GAT AAG ACG AAG GTA AAA GTA GGG GAC ATA ATA ACA GCG ACG ATA 672
Leu Asp Lys Thr Lys Val Lys Val Gly Asp island Thr Island Ala Thr island
210 215 220
AAG ATA GAG AAC ATG AAG AAT TTT GCA GGG TAC CAG TTG AAT ATC AAG 720
Lys Île Glu Asn Met Lys Asn Phe Ala Gly Tyr Gln Leu Asn Ile Lys 225 230 235 240
TAT GAC CCG ACC ATG TTG GAG GCA ATA GAA CTG GAG ACA GGA AGT GCG 768
Tyr Asp Pro Thr Met Leu Glu Ala Ile Glu Leu Glu Thr Gly Ser Ala
245 250 255
ATA GCG AAG AGG ACA TGG CCG GTT ACA GGA GGT ACT GTT CTG CAA AGT 816
Ile Ala Lys Arg Thr Trp Pro Val Thr Gly Gly Thr Val Leu Gîn Ser
260 265 270
GAC AAT TAT GGA AAG ACG ACT GCG GTA GCG AAT GAT GTA GGA GCA GGT 864
Asp Asn Tyr Gly Lys Thr Thr Ala Val Ala Asn Asp Val Gly Ala Gly
275 280 285
ATA ATA AAC TTT GCT GAG GCA TAC TCG AAC CTT ACC AAA TAC AGA GAG 912
Ile Ile Asn Phe Ala Glu Ala Tyr Ser Asn Leu Thr Lys Tyr Arg Glu
290 295 300
ACA GGT GTG GCA GAG GAG ACA GGT ATA ATA GGA AAG ATA GGC TTC AGA 960
Thr Gly Val Ala Glu Glu Thr Gly Ile Gly Lys Ile Gly Phe Arg 305 310 315 320
GTA CTG AAG GCA GGA AGT ACG GCT ATA AGA TTT GAG GAT ACG ACA GCG 1008
Val Leu Lys Ala Gly Ser Thr Ala Ile Arg Phe Glu Asp Thr Thr Ala
325 330 335
ATG CCG GGA GCA ATA GAA GGA ACA TAC ATG TTC GAC TGG TAT GGC GAG 156
Met Pro Gly Ala Ile Glu Gly Thr Tyr Met Phe Asp Trp Tyr Gly Glu
340 345 350
AAC ATC AAA GGG TAT AGC GTA GTA CAG CCT GGG GAA ATA GTG GCA GAA 114
Asn Lys Island Gly Tyr Ser Val Val Gîn Pro Gly Glu Val Ala Glu Island
355 360 365
GGA GAA GAG CCG GGT GAA GAG CCG ACA GAA GAG CCT GTA CCG ACA GAG 1152
Gly Glu Glu Pro Gly Glu Glu Pro Thr Glu Glu Pro Val Pro Thr Glu
370 375 380
ACA CCA GTA GAT CCC ACA CCG ACA GTG ACA GAA GAG CCT GTA CCT TCA îzee
Thr Pro Val Asp Pro Thr Pro Thr Val Thr Glu Glu Pro Val Pro Ser 385 390 395 400
GAG CTT CCA GAT TCC TAT GTA ATA ATG GAA CTG GAT AAG ACG AAG GTA 1248
Glu Leu Pro Asp Ser Tyr Val Met island Glu Leu Asp Lys Thr Lys Val
405 410 415
AAA GTA GGG GAC ATA ATA ACA GCG ACG ATA AAG ATA GAG AAC ATG AAG 1296
Lys Val Gly Asp Ile Thr Ala Thr Ile Lys Glu Island Asn Met Lys
420 425 430
AAT HT GCA GGG TAC CAG TTG AAT ATC AAG TAT GAC CCG ACC ATG TTG 1344
Asn Phe Ala Gly Tyr Gln Leu Asn Ile Lys Tyr Asp Pro Thr Met Leu
435 440 445
GAG GCA ATA GAA CTG GAG ACA GGA AGT GCG ATA GCG AAG AGG ACA TGG 1392
Glu Ala Island Glu Leu Glu Thr Gly Ser Ala Island Ala Lys Arg Thr Trp
450 455 460
CCG GTT ACA GGA GGT ACT GTT CTG CAA AGT GAC AAT TAT GGA AAG ACG 1440
Pro Val Thr Gly Gly Thr Val Leu Gln Ser Asp Asn Tyr Gly Lys Thr 465 470 475 480
ACT GCG GTA GCG AAT GAT GTA GGA GCA GGT ATA ATA AAC TTT GCT GAG 1488
Thr Ala Val Ala Asn Asp Val Gly Ala Gly Ile Asn Phe Ala Glu
485 490 495
GCA TAC TCG AAC CTT ACC AAA TAC AGA GAG ACA GGT GTG GCA GAG GAG 1536
Ala Tyr Ser Asn Leu Thr Lys Tyr Arg Glu Thr Gly Val Ala Glu Glu
500 505 510
ACA GGT ATA ATA GGA AAG ATA GGC TTC AGA GTA CTG AAG GCA GGA AGT 1584
Thr Gly Île Gly Lys Île Gly Phe Arg Val Leu Lys Ala Gly Ser
515 520 525
ACG GCT ATA AGA TTT GAG GAT ACG ACA GCG ATG CCG GGA GCA ATA GAA 1632
Thr Ala Ile Arg Phe Glu Asp Thr Thr Ala Met Pro Gly Ala Ile Glu
530 535 540
GGA ACA TAC ATG TTC GAC TGG TAT GGC GAG AAC ATC AAA GGG TAT AGC 1680
Gly Thr Tyr ket Phe Asp Trp Tyr Gly Glu Asn Ile Lys Gly Tyr Ser 545 550 555 560
GTA GTA CAG CCT GGG GAA ATA GTG GCG GAA GGA GAA GAG CCG ACA GAA 1728
Val Val Gîn Pro Gly Glu Ile Val Ala Glu Gly Glu Glu Pro Thr Glu
565,570,575
GAG CCT GTA CCG ACA GAG ACA CCA GTA GAT CCC ACA CCG ACA GTG ACA 1776
Glu Pro Val Pro Thr Glu Thr Pro Val Asp Pro Thr Pro Thr Val Thr
580 585 590
GAA GAG CCT GTA CCT TCA GAG CTT CCA GAT TCC TAT GTG ATA ATG GAA 1824
Glu Glu Pro Val Pro Ser Glu Leu Pro Asp Ser Tyr Val Ile Met Glu
595 600 605
TTG GAT AAG ACG AAG GTA AAA GAA GGC GAC GTA ATA ATA GCA ACA ATA 1872
Leu Asp Lys Thr Lys Val Lys Glu Gly Asp Val Ile Ile Ala Thr Ile
610 615 620
AGA GTA AAT AAC ATA AAG AAT CTT GCC GGA TAT CAG ATA GGC ATC AAA 1920
Arg Val Asn Asn Lys Island Asn Leu Ala Gly Tyr Gln Gly Island Lys Island 625 630 635 640
TAT GAC CCG AAA GTA TTA GAG GCA TTT AAT ATC GAG ACA GGG GAC CCA 1968
Tyr Asp Pro Lys Val Leu Glu Ala Phe Asn Ile Glu Thr Gly Asp Pro
645 650 655
ATA GAT GAA GGA ACA TGG CCT GCA GTA GGG GGA ACA ATA CTG AAG AAT 2016
Ile Asp Glu Gly Thr Trp Pro Ala Val Gly Gly Thr Ile Leu Lys Asn
660 665 670
AGA GAT TAC CTG CCG ACT GGG GTA GCA ATA AAC AAT GTA TCT AAA GGA 2064
Arg Asp Tyr Leu Pro Thr Gly Val Ala Ile Asn Asn Val Ser Lys Gly
675 680 685
ATA CTG AAT TTT GCT GCT TAT TAC GTT TAC TTC GAT GAC TAT AGA GAG 2112 Leu Island Asn Phe Ala Ala Tyr Tyr Val Tyr Phe Asp Asp Tyr Arg Glu
690 695 700
GAA GGA AAG TCA GAA GAT ACA GGA ATT ATA GGA AAT ATA GGC TTT AGA 2160
Glu Gly Lys Ser Glu Asp Thr Gly Ile Gly Asn Ile Gly Phe Arg 705 710 715 720
GTA CTG AAG GCG GAA GAT ACA ACG ATA AGA TTT GAA GAG CTG GAG TCA 2208
Val Leu Lys Ala Glu Asp Thr Thr Île Arg Phe Glu Glu Leu Glu Ser
725 730 735
ATG CCG GGT TCA ATA GAC GGA ACA TAT ATG TTG GAT TGG TAT CTT AAT 2256
Met Pro Gly Ser Ile Asp Gly Thr Tyr Met Leu Asp Trp Tyr Leu Asn
740 745 750
AGA ATC TCT GGC TAT GTA GTA ATA CAA CCG GCG CCT ATA AAG GCG GCT 2304
Arg Ile Ser Gly Tyr Val Val Gln Island Pro Ala Pro Lys Island Ala Ala
755,760,765
AGT GAC GAA CCA ATA CCA ACG GAT ACA CCA TCA GAT GAA CCG ACA CCG 2352
Ser Asp Glu Pro Island Pro Thr Asp Thr Pro Ser Asp Glu Pro Thr Pro
770,775,780
TCA GAC GAG CCA ACG CCA TCT GAC GAA CCG ACA CCG TCT GAT GAG CCA 24
Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro 785 790 795 800
ACA CCG TCA GAT GAA CCG ACT CCG TCA GAG ACA CCT GAG GAG CCG ATA 2448
Thr Pro Ser Asp Glu Pro Thr Pro Ser Glu Thr Pro Glu Glu Pro Island
805 810 815
CCG ACG GAT ACA CCA TCA GAT GAA CCG ACA CCA TCA GAC GAG CCA ACG 2496
Pro Thr Asp Thr Pro Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro Thr
820 825 830
CCA TCT GAT GAA CCA ACA CCG TCT GAT GAG CCA ACA CCA TCT GAT GAA 2544
Pro Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro Thr Pro Ser Asp Glu
835 840 845
CCG ACT CCG TCA GAG ACA CCT GAG GAG CCG ATA CCG ACG GAT ACA CCA 2592
Pro Thr Pro Ser Glu Thr Pro Glu Glu Pro Island Pro Thr Asp Thr Pro
850 855 860
TCA GAT GAA CCG ACA CCG TCA GAC GAG CCA ACG CCA TCT GAC GAA CCA 264
Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro 865 87 875 880
ACA CCG TCT GAT GAG CCA ACA CCG TCA GAT GAA CCG ACT CCG TCA GAG 2688
Thr Pro Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro Thr Pro Ser Glu
885 890 895
ACA CCT GAG GAG CCG ATA CCG ACG GAT ACA CCA TCA GAT GAA CCG ACA 2736
Thr Pro Glu Glu Pro Island Pro Thr Asp Thr Pro Ser Asp Glu Pro Thr
900 905 910
CCG TCA GAC GAG CCA ACG CCA TCT GAC GAA CCA ACA CCG TCT GAT GAG 2784
Pro Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro Thr Pro Ser Asp Glu
915 920 925
CCA ACA CCG TCA GAT GAA CCG ACT CCG TCA GAG ACA CCT GAG GAG CCG 2832
Pro Thr Pro Ser Asp Glu Pro Thr Pro Ser Glu Thr Pro Glu Glu Pro
930 935 940
ATA CCG ACG GAT ACA CCA TCA GAT GAA CCG ACA CCG TCA GAC GAG CCG 2880 Pro Thr Asp Thr Pro Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro 945 950 955 960 ATA CCA TCT GAC GAA CCA ACA CCG TCA GAC GAG CCA ACG CCA TCT GAC 2928
Thr Pro Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro Thr Pro Ser Asp
965,970,975
GAA CCG ACA CCG TCT GAT GAG CCA ACA CCA TCT GAT GAA CCG ACT CCG 2976
Glu Pro Thr Pro Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro Thr Pro
980 985 990
TCA GAG ACA CCT GAG GAG CCG ATA CCG ACG GAT ACA CCA TCA GAT GAA 3024
Ser Glu Thr Pro Glu Glu Pro Ile Pro Thr Asp Thr Pro Ser Asp Glu
995 1000 1005
CCG ACA CCG TCA GAC GAG CCG ACA CCA TCT GAC GAA CCA ACA CCG TCA 3072
Pro Thr Pro Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro Thr Pro Ser 1010 1015 1020
GAC GAG CCA ACG CCA TCT GAC GAA CCG ACA CCG TCT GAT GAG CCA ACA 3120
Asp Glu Pro Thr Pro Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro Thr 1025 1030 1035 1040
CCA TCT GAT GAA CCG ACT CCG TCA GAG ACA CCT GAG GAG CCG ATA CCG 3168
Pro Ser Asp Glu Pro Thr Pro Ser Glu Thr Pro Glu Glu Pro Island Pro
1045 1050 1055
ACG GAT ACA CCA TCA GAT GAA CCG ACA CCG TCA GAC GAG CCG ACA CCA 3216
Thr Asp Thr Pro Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro Thr Pro 1060 1065 1070
TCT GAC GAA CCA ACA CCG TCT GAT GAG CCA ACA CCG TCA GAT GAA CCG 3264
Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro
1075 1080 1085
ACT CCG TCA GAG ACA CCT GAG GAG CCG ATA CCG ACG GAT ACA CCA TCA 3312
Thr Pro Ser Glu Thr Pro Glu Glu Pro Island Pro Thr Asp Thr Pro Ser
1090 1095 1100
GAT GAA CCG ACA CCG TCA GAC GAG CCA ACG CCA TCT GAC GAA CCG ACA 3360
Asp Glu Pro Thr Pro Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro Thr 1105 1110 1115 1120
CCG TCT GAT GAG CCA ACA CCG TCA GAT GAA CCG ACT CCG TCA GAG ACA 3408
Pro Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro Thr Pro Ser Glu Thr
1125 1130 1135
CCT GAG GAG CCG ATA CCG ACG GAT ACA CCA TCA GAT GAA CCC ACA CCC 3456
Pro Glu Glu Pro Ile Pro Thr Asp Thr Pro Ser Asp Glu Pro Thr Pro
1140 1145 1150
TCA GAC GAG CCA ACG CCA TCT GAC GAA CCG ACA CCG TCT GAT GAG CCA 3504
Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro
1155 1160 1165
ACA CCG TCA GAT GAA CCG ACT CCG TCA GAG ACA CCT GAG GAG CCG ATA 3552
Thr Pro Ser Asp Glu Pro Thr Pro Ser Glu Thr Pro Glu Glu Pro lie
1170 1175 1180
CCG ACG GAT ACA CCA TCA GAT GAA CCG ACA CCA TCA GAC GAG CCA ACG 3600
Pro Thr Asp Thr Pro Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro Thr 1185 1190 1195 1200
CCA TCT GAT GAA CCA ACA CCG TCT GAT GAG CCA ACA CCA TCT GAT GAA 3648
Pro Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro Thr Pro Ser Asp Glu
1205 1210 1215
CCG ACT CCG TCA GAG ACA CCT GAG GAG CCG ATA CCG ACG GAT ACA CCA 3696
Pro Thr Pro Ser Glu Thr Pro Glu Glu Pro Island Pro Thr Asp Thr Pro
122 1225 1230
TCA GAT GAA CCG ACA CCG TCA GAC GAG CCA ACG CCA TCT GAC GAA CCA 3744
Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro
1235 1240 1245
ACA CCG TCT GAT GAG CCA ACA CCG TCA GAT GAA CCG ACT CCG TCA GAG 3792
Thr Pro Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro Thr Pro Ser Glu
1250 1255 1260
ACA CCT GAG GAG CCG ATA CCG ACG GAT ACA CCA TCA GAT GAA CCG ACA 3840
Thr Pro Glu Glu Pro Island Pro Thr Asp Thr Pro Ser Asp Glu Pro Thr 1265 1270 1275 1280
CCG TCA GAC GAG CCG ACA CCA TCT GAC GAA CCA ACA CCG TCA GAC GAG 3888
Pro Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro Thr Pro Ser Asp Glu
1285 1290 1295
CCA ACG CCA TCT GAC GAA CCG ACA CCG TCT GAT GAG CCA ACA CCA TCT 3936
Pro Thr Pro Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro Thr Pro Ser
1300 1305 1310
GAT GAA CCG ACT CCG TCA GAG ACA CCT GAG GAG CCG ATA CCG ACG GAT 3984
Asp Glu Pro Thr Pro Ser Glu Thr Pro Glu Glu Pro lie Pro Thr Asp
1315 1320 1325
ACA CCA TCA GAT GAA CCG ACA CCG TCA GAC GAG CCG ACA CCA TCT GAC 4032
Thr Pro Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro Thr Pro Ser Asp
1330 1335 1340
GAA CCA ACA CCG TCA GAC GAG CCA ACG CCA TCT GAC GAA CCG ACA CCG 4080
Glu Pro Thr Pro Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro Thr Pro 1345 1350 1355 1360
TCT GAT GAG CCA ACA CCA TCT GAT GAA CCG ACT CCG TCA GAG ACA CCT 4128
Ser Asp Glu Pro Thr Pro Ser Asp Glu Pro Thr Pro Ser Glu Thr Pro
1365 1370 1375
GAG GAG CCG ACA CCG ACT ACT ACA CCG ACA CCA ACA CCG TCG ACA ACG 4176
Glu Glu Pro Thr Pro Thr Thr Thr Pro Thr Pro Thr Pro Ser Thr Thr
1380 1385 1390
CCT ACA AGT GGC AGC GGA GGC AGT GGT GGA AGC GGT GGT GGC GGC GGA 4224
Pro Thr Ser Gly Ser Gly Gly Ser Gly Gly Ser Gly Gly Gly Gly Gly Gly
1395 14e0 145
GGT GGT GGA GGA ACT GTA CCT ACA TCT CCA ACA CCG ACA CCG ACA TCT 4272
Gly Gly Gly Gly Thr Val Pro Thr Ser Pro Thr Pro Thr Pro Thr Ser
1410 1415 1420
AAA CCG ACG TCT ACA CCT GCA CCG ACA GAA ATC GAA GAG CCT ACA CCA 4320
Lys Pro Thr Ser Thr Pro Ala Pro Thr Glu Glu Island Glu Glu Pro Thr Pro 1425 1430 1435 1440
TCT GAT GTG CCT GGT GCA ATC GGT GGA GAA CAT AGA GCA TAC TTA AGA 4368
Ser Asp Val Pro Gly Ala Ile Gly Gly Glu His Arg Ala Tyr Leu Arg
1445 1450 1455
GGA TAT CCG GAT GGA AGC TTC AGG CCT GAA AGA AAT ATA ACA AGA GCT 4416
Gly Tyr Pro Asp Gly Ser Phe Arg Pro Glu Arg Asn Ile Thr Arg Ala
1460 1465 1470
GAA GCG GCG GTA ATC HT GCT AAG TTG CTT GGA GCC GAT GAA AGC TAT 4464
Glu Ala Ala Val lie Phe Ala Lys Leu Leu Gly Ala Asp Glu Ser Tyr
1475 1480 1485
GGA GCT CAG TCT GCA AGT CCA TAT AGT GAT TTG GCT GAT ACT CAC TGG 4512
Gly Ala Gin Ser Ala Ser Pro Tyr Ser Asp Leu Ala Asp Thr His Trp
1490 1495 1500
GCT GCA TGG GCA ATC AAA TTT GCA ACA AGC CAG GGC TTG TTC AAA GGA 4560
Ala Ala Trp Ala lie Lys Phe Ala Thr Ser Gln Gly Leu Phe Lys Gly 1505 1518 1515 1520
TAT CCG GAC GGT ACG HT AAA CCT GAT CAG AAC ATA ACG AGA GCG GAA 4608
Tyr Pro Asp Gly Thr Phe Lys Pro Asp Gin Asn Thr Island Arg Ala Glu
1525 1530 1535
TTC GCA ACT GTG GTA CTC CAC TTC CTG ACA AAA GTT AAG GGT CAG GAA 4656
Phe Ala Thr Val Val Leu His Phe Leu Thr Lys Val Lys Gly Gln Glu
1540 1545 1550
ATA ATG AGC AAG CTT GCA ACA ATA GAT ATA AGT AAT CCG AAG HT GAC 4704 Met Ser Lys Island Leu Ala Thr Island Asp Island Ser Asn Pro Lys Phe Asp
1555 1560 1565
GAT TGT GTC GGA CAT TGG GCA CAA GAG HT ATT GAG AAA TTG ACA AGC 4752
Asp Cys Val Gly His Trp Ala Gin Glu Phe Ile Glu Lys Leu Thr Ser
1570 1575 1580
TTG GGT TAT ATT AGT GGC TAT CCT GAC GGA ACG TTC AAG CCG CAA AAC 48
Leu Gly Tyr Île Ser Gly Tyr Pro Asp Gly Thr Phe Lys Pro Gin Asn 1585 1590 1595 1600
TAT ATT AAA CGT TCC GAA AGT GTG GCA CTG ATT AAC AGA GCT CTG GAG 4848
Tyr island Lys Arg Ser Glu Ser Val Ala Leu island Asn Arg Ala Leu Glu
1605 1610 1615
AGA GGT CCG CTT AAT GGA GCG CCG AAG CTC TTC CCG GAT GTT AAC GAA 4896
Arg Gly Pro Leu Asn Gly Ala Pro Lys Leu Phe Pro Asp Val Asn Glu
1620 1625 1630
TCA TAC TGG GCA TTT GGC GAC ATT ATG GAC GGT GCT CTC GAC CAC AGT 4944
Ser Tyr Trp Ala Phe Gly Asp Ile Met Asp Gly Ala Leu Asp His Ser
1635 1640 1645
TAC ATT ATC GAA GAT GAG AAA GAA AAA TTC GTT AAA TTG CTC GAA GAT 4992
Tyr Ile lie Glu Asp Glu Lys Glu Lys Phe Val Lys Leu Leu Glu Asp
1650 1655 1660 (4) INFORMATION FOR SEQ ID NO: 3:
(i) CHARACTERISTICS OF THE SEQUENCE:
(A) LENGTH: 2064 base pairs
(B) TYPE: nucleotide
(C) NUMBER OF STRANDS: single
(ii) TYPE OF MOLECULE: DNA
(ix) CHARACTERISTIC:
(A) NAME / KEY: ORF2p of Clostridium thermocellum
(B) LOCATION: 1..2064
(xi) DESCRIPTION OF THE SEQUENCE: SEQ ID NO: 3:
ATG AAA AAA AAC AAT GTA TTA ACA ATA GCA GCT ATG ATA GCG CTT CTT 48
Met Lys Lys Asn Asn Val Leu Thr island Ala Ala Met island Ala Leu Leu
1 5 10 15
CTA ACC AGC TTA CTT ACA AGT ATA ACT TTT GGG GAG ACT TCG AGT ATA 96
Leu Thr Ser Leu Leu Thr Ser island Thr Phe Gly Glu Thr Ser Ser island
20 25 30
CCT TCA AGA ATA TCT ATG GAG CTT GAC AAG ACA AAA GCA AAC ATA GGC 144
Pro Ser Arg Île Ser Met Glu Leu Asp Lys Thr Lys Ala Asn Île Gly
35 40 45
GAC ATA ATT ATA GCC ACA ATA AGA ATT GAC AAT ATC AAT AAC TTT AGC 192
Asp island Ile island Ala Thr Ile Arg Ile Asp Asn lie Asn Asn Phe Ser
50 55 60
GGA TAT CAA HA AAT ATA AAG TAT GAT CCG TCA TAC CTC CAG GCA GTT 240
Gly Tyr Gln Leu Asn Lys Island Tyr Asp Pro Ser Tyr Leu Gin Ala Val
65 70 75 80
AAT CCT TTG ACA GGA GAA CCG ATA AAA AAG AGA ACA ATG CCG GCA GTG 288
Asn Pro Leu Thr Gly Glu Pro lie Lys Lys Arg Thr Met Pro Ala Val
85 90 95
AAC GGC ACG GTG TTG TTA AAG GGA GAT CAG TAC AGT ATT ACT GAG GTT 336
Asn Gly Thr Val Leu Leu Lys Gly Asp Gln Tyr Ser Ile Thr Glu Val
100 105 110
GTA GAA AAT AAC GTC GAT GAA GGG ATT TTA AAT TTT GGC AAG GGA TAT 384
Val Glu Asn Asn Val Asp Glu Gly Ile Leu Asn Phe Gly Lys Gly Tyr
115 120 125
GCA AAT HA ACT GAA TAC AGG AAA AGC GGA AAA CCT GAA ACA ACC GGA 432
Ala Asn Leu Thr Glu Tyr Arg Lys Ser Gly Lys Pro Glu Thr Thr Gly
130 135 140
ATT ATT GGC AAG ATA GGA TTT AAA GCC TTA AAG CTT GGC AAG ACG GAG 480
Ile Ile Gly Lys Ile Gly Phe Lys Ala Leu Lys Leu Gly Lys Thr Glu 145 150 155 160
ATC AAA TTT GAG AAC ACA CCC GTC ATG CCT GGG GCA AAA GAA GGA ACA 528
Ile Lys Phe Glu Asn Thr Pro Val Met Pro Gly Ala Lys Glu Gly Thr
165 170 175
CTG CTG HT GAC TGG GAT GCA GAA ACT ATA ACG GAA TAT AAT GTA ATT 576
Leu Leu Phe Asp Trp Asp Ala Glu Thr Ile Thr Glu Tyr Asn Val île
180 185 190
CAG CCT AAA GAA CTT GCA ATA ACG TTA CCG GAC GAT GCA CAC ATT GCT 624 Gin Pro Lys Glu Leu Ala Ile Thr Leu Pro Asp Asp Ala His Ile Ala
195 200 205
TTG GAA CTT GAC AAG ACA AAA GTG AAA GTG GGA GAT GTA ATT GTT GCG 672
Leu Glu Leu Asp Lys Thr Lys Val Lys Val Gly Asp Val Ile Val Ala
210 215 220
ACA GTA AAA GCA AAG AAT ATG ACT AGT ATG GCG GGA ATT CAG GTA AAT 720
Thr Val Lys Ala Lys Asn Met Thr Ser Met Ala Gly Ile Gln Val Asn 225 230 235 240
ATT AAA TAT GAC CCT GAA GTA TTG CAG GCG ATT GAT CCT GCG ACG GGA 768 Île Lys Tyr Asp Pro Glu Val Leu Gln Ala Ile Asp Pro Ala Thr Gly
245 250 255
AAA CCG HT ACA AAA GAA ACA HA CTT GTG GAC CCG GAA CTG TTA TCA 816
Lys Pro Phe Thr Lys Glu Thr Leu Leu Val Asp Pro Glu Leu Leu Ser
260 265 270
AAC AGA GAA TAT AAT CCG TTG TTA ACA GCA GTT AAT GAC ATA AAT TCC 864
Asn Arg Glu Tyr Asn Pro Leu Leu Thr Ala Val Asn Asp Île Asn Ser
275 280 285
GGC ATT ATA AAT TAT GCA TCT TGT TAT GTA TAT TGG GAT TCC TAC AGA 912
Gly Ile Asn island Tyr Ala Ser Cys Tyr Val Tyr Trp Asp Ser Tyr Arg
290 295 300
GAA TCA GGA GTA TCT GAA AGC ACC GGA ATA ATT GGA AAG GTT GGC HT 960
Glu Ser Gly Val Ser Glu Ser Thr Gly Ile Gly Ile Lys Val Gly Phe 305 310 315 320
AAA GTG CTG AAA GCT GCC AAC ACC ACA GTA AAA CTG GAA GAA ACA AGA 1008
Lys Val Leu Lys Ala Ala Asn Thr Thr Val Lys Leu Glu Glu Thr Arg
325 330 335 HT ACA CCA AAT TCG ATA GAC GGT ACT TTG GTA ATT GAT TGG TAT GGC 1056
Phe Thr Pro Asn Ser Ile Asp Gly Thr Leu Val lie Asp Trp Tyr Gly
340 345 350
CAA CAG ATA GTT GGT TAT AAA GTA ATA CAG CCC GAC AAA ATT ACT GTG 1104 Gin <RTI
GAA ATA GAA GAA CCC GAA CCT GAA ATA CCG GGC ACT GTT GGA ATA CAT 1440
Glu Ile Glu Glu Pro Glu Pro Glu Ile Pro Gly Thr Val Gly Ile His 465 470 475 480
TAT TCA TAC CTG ACA GGT TAT CCG GAC AAA ATG TTC AGA CCT GAA AAG 1488
Tyr Ser Tyr Leu Thr Gly Tyr Pro Asp Lys Met Phe Arg Pro Glu Lys
485 490 495
AGT ATT ACA AGA GCT GAA GCA GCC GTG ATT TTT GCA AAA CTT TTG GGA 1536
Ser Ile Thr Arg Ala Glu Ala Ala Val Ile Phe Ala Lys Leu Leu Gly
500 505 510
GCA AAC GAA AAT ACA AAG ATA AAC TAT AAT GTT TCA TAC ACC GAT GTT 1584
Ala Asn Glu Asn Thr Lys Isle Asn Tyr Asn Val Ser Tyr Thr Asp Val
515 520 525
GAC AGC TCC CAT TGG GCA AGT TGG GCA ATC AAA TTT GTA TCA TAC AAG 1632
Asp Ser Ser His Trp Ala Ser Trp Ala Lys Island Phe Val Ser Tyr Lys
530 535 540
AAA CTG TTT ACC GGA TAT CCT GAT GGC TCG TTC AAG CCT AAT CAG AAT 1680
Lys Leu Phe Thr Gly Tyr Pro Asp Gly Ser Phe Lys Pro Asn Gîn Asn 545 550 555 560
ATA ACG AGA GCC GAA TTT TCA ACG GTT GTG TTT AAG CTT CTT GTA TCT 1728 Île Thr Arg Ala Glu Phe Ser Thr Val Val Phe Lys Leu Leu Val Ser
565,570,575
GAG AAA GGT CTA AAA GAA GAA AAG ATT GAA AAG TCC AAG TTT GGT GAT 1776
Glu Lys Gly Leu Lys Glu Glu Lys Ile Glu Lys Ser Lys Phe Gly Asp
580 585 590
ACA AAG GGC CAC TGG GCA CAA CAG TTT ATT GAA CAG CTG TCA GAC CTT 1824
Thr Lys Gly His Trp Ala Gln Gln Phe Ile Glu Gîn Leu Ser Asp Leu
595 600 605
GGA TAC ATC AAC GGA TAT CCT GAT GGT ACA TTC AAG CCC AAC AAC AAT 1872
Gly Tyr Ile Asn Gly Tyr Pro Asp Gly Thr Phe Lys Pro Asn Asn Asn
610 615 620
ATC AAA CGA TCA GAA AGT GTT GCC CTG ATA AAC AGA GCT ATG GGA AGA 1920
Lys Arg Ser Glu Ser Val Ala Leu Island Asn Arg Ala Met Gly Arg Island
625 630 635 640
GGG CCT TTG CAT GGC GCA CCG CAG GTA TTC GAG GAT GTT CCT CAG ACA 1968
Gly Pro Leu His Gly Ala Pro Gîn Val Phe Glu Asp Val Pro Gln Thr
645 650 655
CAC TGG GCT TTC AAA GAT ATT GCA GAG GGC GTG CTC AAT CAC AGA TAC 2016
His Trp Alo Phe Lys Asp Île Ala Glu Gly Val Leu Asn His Arg Tyr
660 665 670
AAA CTG GAC AAT GAG GGC AAA GAA CAA TTG CTG GAG ATA ATT GAT AAC 2064
Lys Leu Asp Asn Glu Gly Lys Glu Gln Leu Leu Glu Ile Ile As
675 680 685
DESCRIPTION OF THE SEQUENCE: SEQ ID NO: 4:
NUCLEOTIDE SEQUENCE OF THE CipA PROTEIN
ATGAGAAAAGTCATCAGTATGCTCTTAGTTGTGGCTATGCTGACGACGATTTTTGCGGCGATGATAC
CGC AGACAGTATCGGCGGCCACAATGACAGTCG AGATCGGCAAAGTTACAGCAGCCGTTGGATCAAAA
OTAGA
AATACCTATAACCCTGAAAGGAGTGCCATCCAAAGGAATGGCCAATTGCGACTTCGTATTGGGTTA
TGAT
CCAAATGTGCTGGAAGTAACAGAAGTAAAACCAGGAAGCATAATAAAAGATCCGGATCCTAGCAA
GAGCT TTGATAGCGCAATATATCCGGATCGAAAGATGATTGTATTTCTGTTTGCAGAAGACAGTGGAAGAG
GAAC
GTATGCAATAACTCAGGATGGAGTATTTGCAACAATTGTAGCCACTGTCAAATCAGCTGCAGCGGC
ACCG
ATTACHTGCTTGAAGTAGGTGCATTTGCGGACAACGATTTAGTAGAAATAAGCACAACTTTTGTCG
CGG
GCGGAGTAAATCTTGGTAGTTCCGTACCGACAACACAGCCAAATGTTCCGTCAGACGGTGTGGTAG
TAGA
AATTGGCAAAGTTACGGGATCTGTTGGAACTACAGTTGAAATACCTGTATATTTCAGAGGAGTTCC
ATCC
AAAGGAATAGCAAACTGCGACTTTGTGTTCAGATATGATCCGAATGTATTGGAAATTATAGGGATA
GATC
CCGGAGACATAATAGTTGACCCGAATCCTACCAAGAGCTTTGATACTGCAATATATCCTGACAGAA
AGAT
AATAGTATTCCTGTTTGCGGAAGACAGCGGAACAGGAGCGTATGCAATAACTAAAGACGGAGTATT
TGCA AAAATAAGAGCAACTGTAAAATCAAGTGCTCCGGGCTATATTACTTTCGACGAAGTAGGTGGA m
GCAG
ATAATGACCTOGTAGAACAGAAGGTATCATTTATAGACGGTGGTGTTAACGTTGGCAATGCAACAC
CGAC
CAAGGGAGCAACACCAACAAATACAGCTACGCCGACAAAATCAGCTACGGCTACGCCCACCAGGC
CATCG
GTACCGACAAACACACCGACAAACACACCGGCAAATACACCGGTATCAGGCAATTTGAAGGTTGA
ATTCT
ACAACAGCAATCCTTCAGATACTACTAACTCAATCAATCCTCAGTTCAAGGTTACTAATACCGGAA
GCAG
TGCAATTGATTTGTCCAAACTCACATTGAGATATTATTATACAGTAGACGGACAGAAAGATCAGAC
CTTC
TGGTGTGACCATGCTGCAATAATCGGCAGTAACGGCAGCTACAACGGAATTACTTCAAATGTAAAA
GGAA
CATTTGTAAAAATGAGTTCCTCAACAAATAACGCAGACACCTACCTTGAAATAAGCTTTACAGGCG
GAAC
TCTTGAACCGGGTGCACATGTTCAGATACAAGGTAGA m GCAAAGAATGACTGGAGTAACTATAC
ACAG TCAAATGACTACTCATTCAAGTCTGCTTCACAGTTTGTTGAATGGGATCAGGTAACAGCATACTTGA
ACG
GTGTTCTTGTATGGGGTAAAGAACCCGGTGGCAGTGTAGTACCATCAACACAGCCTGTAACAACAC
CACC TGCAACAACAAAACCACCTGCAACAACAAAACCACCTGCAACAACAATACCGCCGTCAGATGATCC
GAAT
GCAATAAAGATTAAGGTGGACACAGTAAATGCAAAACCGGGAGACACAGTAAATATACCTGTAAG
ATTCA
GTGGTATACCATCCAAGGGAATAGCAAACTGTGACTTTGTATACAGCTATGACCCGAATGTACTTG
AGAT
AATAGAGATAAAACCGGGAGAATTGATAGTTGACCCGAATCCTGACAAGAGCTTTGATACTGCAGT
ATAT
CCTGACAGAAAGATAATAGTATTCCTGTTTGCAGAAGACAGCGGAACAGGAGCGTATGCAATAACT
AAAG
ACGGAGTAHTGCTACGATAGTAGCGAAAGTAAAATCCGGAGCACCTAACGGACTCAGTGTAATCA
AATT
TGTAGAAGTAGGCGGATTTGCGAACAATGACCTTGTAGAACAGAGGACACAGTTCTTTGACGGTGG
AGTA
AATGTTGGAGATACAACAGTACCTACAACACCTACAACACCTGTAACAACACCGACAGATGATTCG
AATG
CAGTAAGGATTAAGGTGGACACAGTAAATGCAAAACCGGGAGACACAGTAAGAATACCTGTAAGA
TTCAG
CGGTATACCATCCAAGGGAATAGCAAACTGTGACTTTGTATACAGCTATGACCCGAATGTACTTGA
GATA
ATAGAGATAGAACCGGGAGACATAATAGTTGACCCGAATCCTGACAAGAGCTTTGATACTGCAGTA
TATC CTGACAGAAAGATAATAGTATTCCTGTTTGCGGAAGACAGCGGAACAG GAGCGTATGCAATAACTA
AAGA
CGGAGTAHTGCTACGATAGTAGCGAAAGTAAAATCCGGAGCACCTAACGGACTCAGTGTAATCAA
ATTT
GTAGAAGTAGGCGGATTTGCGAACAATGACCTTGTAGAACAGAAGACACAGTTCTTTGACGGTGGA
GTAA
ATGTTGGAGATACAACAGAACCTGCAACACCTACAACACCTGTAACAACACCGACAACAACAGAT
GATCT
GGATGCAGTAAGGATTAAAGTGGACACAGTAAATGCAAAACCGGGAGACACAGTAAGAATACCTG
TAAGA
TTCAGCGGTATACCATCCAAGGGAATAGCAAACTGTGACTTTGTATACAGCTATGACCCGAATGTA
CTTG AGATAATAGAGATAGAACCGGGAGACATAATAGTTGACCCGAATCCTGACAAGAGCTTTGATACTG
CAGT ATATCCTGACAGAAAGATAATAGTATTCCTGTTTGCGGAAGACAGCGGAACAGGAGCGTATGCAAT
AACT
AAAGACGGAGTATTTGCTACGATAGTAGCGAAAGTAAAATCCGGAGCACCTAACGGACTCAGTGT
AATCA
AATTTGTAGAAGTAGGCOGAHTGCGAACAATGACCTTGTAGAACAGAAGACACAGTTCTTTGACG
GTGG AGTAAATGTTGGAGATACAACAGAACCTGCAACACCTACAACACCTGTAACAACACCGACAACAA
CAGAT GATCTGGATGCAGTAAGGATTAAAGTGGACACAGTAAATGCAAAACCGGGAGACACAGTAAGAAT
ACCTG
TAAGATTCAGCGGTATACCATCCAAGGGAATAGCAAACTGTGACTTTGTATACAGCTATGACCCGA
ATGT ACTTGAGATAATAGAGATAGAACCGGGAGACATAATAGTTGACCCGAATCCTGACAAGAGCTTTGA
TACT
GCAGTATATCCTGACAGAAAGATAATAGTATTCCTGTTTGCAGAAGACAGCGGAACAGGAGCGTAT
GCAA
TAACTAAAGACGGAGTATTTGCTACGATAGTAGCGAAAGTAAAAGAAGGAGCACCTAACGGACTC
AGTGT
AATCAAATTTGTAGAAGTAGGCGGATTTGCGAACAATGACCTTGTAGAACAGAAGACACAGTTCTT
TGAC
GGTGGAGTAAATGTTGGAGATACAACAGAACCTGCAACACCTACAACACCTGTAACAACACCGAC
AACAA
CAGATGATCTGGATGCAGTAAGGATTAAAGTGGACACAGTAAATGCAAAACCGGGAGACACAGTA
AGAAT
ACCTGTAAGATTCAGCGGTATACCATCCAAGGGAATAGCAAACTGTGACTTTGTATACAGCTATGA
CCCG
AATGTACTTGAGATAATAGAGATAGAACCGGGAGAATTGATAGTTGACCCGAATCCTACCAAGAGC
TTTG ATACTGCAGTATATCCTGACAGAAAGATGATAGTATTCCTGTTTGCGGAAGACAGCGGAACAGGAG
CGTA
TGCAATAACTGAAGATGGAGTATTTGCTACGATAGTAGCGAAAGTAAAATCCGGAGCACCTAACGG
ACTC
AGTGTAATCAAAHTGTAGAAGTAGGCGGATTTGCGAACAATGACCTTGTAGAACAGAAGACACAG
TTCT
HGACGGTGGAGTAAATGTTGGAGATACAACAGAACCTGCAACACCTACAACACCTGTAACAACAC
CGAC
AACAACAGATGATCTGGATGCAGTAAGGATTAAAGTGGACACAGTAAATGCAAAACCGGGAGACA
CAGTA
AGAATACCTGTAAGATTCAGCGGTATACCATCCAAGGGAATAGCAAACTGTGACTTTGTATACAGC
TATG
ACCCGAATGTACTTGAGATAATAGAGATAGAACCGGGAGACATAATAGTTGACCCG AATCCTGACA
AGAG
C m GATACTGCAGTATATCCTGACAGAAAGATAATAGTATTCCTGTTTGCAGAAGACAGCGGAAC
GGGA
GCGTATGCAATAACTAAAGACGGAGTATTTGCTACGATAGTAGCGAAAGTAAAAGAAGGAGCACC
TAACG GACTCAGTGTAATCAAATTTGTAGAAGTAGGCGGATTTGCGAACAATGACCTTGTAGAACAGAAGA
POOH
GTTCTTTGACGGTGGAGTAAATGrFGGAGATACAACAGTACCTACAACATCGCCGACAACAACACC
GCCA GAGCCGACGATAACTCCGAACAAGTTGACACTTAAGATAGGCAGAGCAGAAGGAAGACCTGGAGA
CACGG
TGGAAATACCGGTTAACTTGTATGGAGTACCTCAAAAAGGAATAGCAAGCGGTGACTTCGTAGTAA
GCTA
TGACCCGAATGTACTTGAGATAATAGAGATAGAACCGGGAGAATTGATAGTTGACCCGAATCCTAC
CAAG
AGCHTGATACTGCAGTATATCCTGACAGAAAGATGATAGTATTCCTGTTTGCGGAAGACAGCGGA
ACAG
GAGCGTATGCAATAACTGAAGATGGAGTATTTGCTACGATAGTAGCGAAAGTAAAAGAAGGAGCA
CCTGA
AGGATTCAGTGCAATAGAAAHTCTGAGTTTGGTGCATTTGCAGATAATGATCTGGTAGAAGTGGA
AACT
GACCTTATCAATGGTGGAGTACTTGTAACTAATAAACCTGTAATAGAAGGATATAAAGTATCCGGA
TACA
TTTTGCCAGACTTCTCCTTCGACGCTACTGTTGCACCACTTGTAAAGGCCGGATTCAAAGTTGAAAT
AGT
AGGAACAGAATTGTATGCAGTAACAGATGCAAACGGATACTTTGAAATAACCGGAGTACCTGCAA
ATGCA
AGCGGATATACATTGAAGATTTCAAGAGCAACTTACTTGGACAGAGTAATTGCAAATGTTGTAGTA
ACGG
GAGATACTTCAGTTTCAACTTCACAGGCTCCAATAATGATGTGGGTAGGAGACATAGTGAAAGACA
ATTC
TATCAACCTGTTGOACGTTGCAGAAGTTATCCGTTGCTTCAACGCTACTAAAGGAAGCGCAAACTA
CGTA
GAAGAACTTGACATTAATAGAAACGGCGCAATTAACATGCAAGACATAATGATTGTTCATAAGCAC
TTTG
GAGCTACATCAAGTGATTACGACGCACAGTAA
PROTEIN SEQUENCE CipA
MRKV! SMLLVVAMLTT! FAAMlPQTVSAATMTVEIGKVTAAVGSKVEIPITLKGvPSKGMANCDFvLGY DPNVLEVTEVKPGSIIKDPDPSKSFDSAIYPDRKMIVFLFAEDSGRGTYA ITQDGVFATI VATVKSAAAAP
ITLLEVGAFADNDLVEISTTFVAGGVNLGSSVPTTQPNVPSDGVVVEIGKVTGSVGTTVEIPVYFRGVPS KGIANCDFVFRYDPNVLEIIGIDPGDIIVDPNPTKSFDTAIYPDRKIIVFLFAEDSGTGAYAITKDGVAK
ATVKSSAPGYITFDEVGGFADNDLVEQKVSFI DGGVNVGNATPTKGATPTNTATPTKSATATPTRPSVPT
NTPTNTPANTPVSGNLKVEFYNSNPSDTTNSINPQFKVTNTGSSAIDLSKLTLRYYYTVDGQKDQTFWC DHAAIIGSNGSYNGITSNVKGTFVKMSSSTNNADTYLEISFTGGTLEPGAHVQ IQGRAK
NDYSFKSASQFVEWDQVTAYLNGVLVWGKEPGGSVVPSTQPVTTPPATTKPPAHKPPAHIPPSDDPN
AIKIKVDTVNAKPGDTVNIPVRFSGIPSKGIANCDFVYSYDPNVLEIIEIKPGELIVDPNPDKSFDTAVYPD
RKIIVFLFAEDSGTGAYAITKDGVFATIVAKVKSGAPNGLSVIKFVEVGGFANNDLVEQRTQFFDGGVN
VGDTTVPTTPTTPVTTPTDDSNAVRIKVDTVNAKPGDTVRIPVRFSGIPSKGIANCDFVYSYDPNVLEIIEI
EPGDIIVDPNPDKSFDTAVYPDRKIIVFLFAEDSGTGAYAITKDGVFATIVAKVKSGAPNGLSVIKFVEVG
GFANNDLVEQKTQFFDGGVNVGDTTEPATPTTPVTTPTTTDDLDAVRIKVDTVNAKPGDTVRIPVRFSG
IPSKGIANCDFVYSYDPNVLEIIEIEPGDIIVDPNPDKSFDTAVYPDRKIIVFLFAEDSGTGAYAITKDGVFA
TIVAKVKSGAPNGLSVIKFVEVGGFANNDLVEQKTQFFDGGVNVGDTIEPATPTTPVTTPTTTDDLDAV RIKVDTVNAKPGDTVRIPVRFSGIPSKGIANCDFVYSYDPNVLEIIEIEPGDIIVDPNPDKSFDTAVYPDRKI lVFLFAEDSGTGAYAlTKDGVFATlVAKVKEGAPNGLSVlKFVEVGG FANNDLVEQKTQFFDGGVNVGD
TTEPATPTTPVTTPTTTDDLDAVRIKVDTVNAKPGDTVRIPVRFSGIPSKGIANCDFVYSYDPNVLEIIEIEP GELIVDPNPTKSFDTAVYPDRKMIVFLFAEDSGTGAYAITEDGVFATIVAKVKSGAPNGLSVIKFVEVGG
FANNDLVEQKTQFFDGGVNVGDTTEPATPTTPVTTPEDDLDAVRIKVDTVNAKPGDTVRIPVRFSGIP
SKGIANCDFVYSYDPNVLEIIEIEPGDIIVDPNPDKSFDTAVYPDRKIIVFLFAEDSGTGAYAITKDGVFATI
VAKVKEGAPNGLSVIKFVEVGGFANNDLVEQKTQFFDGGVNVGDTTVPTTSPTTTPPEPTITPNKLTLKI
GRAEGRPGDTVEIPVNLYGVPQKGIASGDFVVSYDPNVLEIIEIEPGELIVDPNPTKSFDTAVYPDRKMIV
FLFAEDSGTGAYAITEDGVFATIVAKVKEGAPEGFSAIEISEFGAFADNDLVEVETDLINGGVLVTNKPVI
EGYKVSGYILPDFSFDATVAPLVKAGFKVEIVGTELYAVTDANGYFEITGVPANASGYTLKISRATYLDR
VIANVVVTGDTSVSTSQAPIMMWVGDIVKDNSINLLDVAEVIRCFNATKGSANYVEELDINRNGAINMQ
DIMIVHKHFGATSSDYDAQ

Claims

1. Compound on which at least one type II cohesin domain is capable of covalently or not fixing.

2. Compound according to claim 1, characterized in that the type II cohesin domain comes from a cellulolytic bacterium.

3. Compound according to claim 2, characterized in that the type II cohesin domain comes from a strain of Clostridium, and in particular Clostridium thermocellum.

4. Compound according to claim 3, characterized in that the type II cohesin domain comes from a Clostridium thermocellum protein or a fragment thereof comprised between 50 and 600 amino acids.

5. Compound according to claim 3, characterized in that the type II cohesin domain comes from a Clostridium thermocellum protein chosen from SdbA, OlpB and ORF2p or from a homologous protein.

6. Compound according to claim 5, characterized in that it comprises the sequence of 165 amino acids substantially as represented in IDS n "1 of amino acid n" 27 to amino acid ne210 of the sequence of the SdbA protein or a homologous sequence or a fragment of this sequence or of a homologous sequence having a type II cohesin activity.

7. Compound according to claim 5, characterized in that it comprises as type II cohesin domain one of the sequences of the protein OlpB chosen from the sequence of amino acids n 28 to n "190, the sequence of amino acids n "207 to n" 362, the sequence of amino acids n "409 to n" 564 and the sequence of amino acids n 607 to n 762 of IDS n 2 or a sequence homologous to one of these sequences or a fragment of these sequences of at least 50 amino acids having cohesin type II activity.

8. A compound according to claim 5, characterized in that it has a type II cohesin domain, a sequence of the ORF2p protein chosen from the sequence of amino acids n 38 to 194 and the sequence of amino acids n 209 to 364 of IDS n 3, or a sequence homologous to these sequences or a fragment of these sequences of at least 50 amino acids having cohesin type II activity

9. Compound according to one of claims 1 to 8, characterized in that it is essentially a polypeptide or a protein.

10. Compound according to claim 9, characterized in that it is a protein with enzymatic activity.

11. Compound according to one of claims 1 to 10, characterized in that it comprises at least one other cohesin domain which is not of type II and / or a dockerine domain.

12. SdbA protein from Clostridium thermocellum, the amino acid sequence of which is the complete sequence of 631 amino acids substantially as shown in IDS No. 1.

13. Fragment of a protein according to one of claims 1 to 12 or of a homologous protein, characterized in that it is a type II cohesin domain.

14. Compound according to one of claims 1 to 11, characterized in that it comprises at least one non-protein fragment.

15. DNA fragment, characterized in that it comprises at least one sequence coding for a type II cohesin domain.

16. DNA fragment according to claim 15, coding for the protein SdbA or fragment thereof.

17. DNA fragment according to claim 15, characterized in that it substantially comprises nucleotides 82 to 573 in IDS n "1 coding for the type II cohesin domain of SdbA.

18. DNA fragment according to claim 15, comprising substantially the nucleotide sequence 1 to 1893 of IDS n "1 encoding the protein SdbA.

19. DNA fragment according to claim 15, characterized in that it has substantially for sequence one of the sequences coding for a cohesin domain of OlpB chosen from the sequence of nucleotides 85 to 570, the sequence of nucleotides 619 to 1095 and the sequence of nucleotides 1225 to 1689 and the sequence of nucleotides 1819 to 2189 in lDS n "2.

20. DNA fragment according to claim 15, characterized in that it has substantially for sequence one of the sequences coding for a cohesin domain of ORF2 chosen from the sequence of nucleotides 109 to 582 and the sequence of nucleotides n 625 to 1092 in IDES No. 3.

21. DNA fragment characterized in that it has a sequence that is complementary or homologous or complementary to the homolog of a fragment according to one of claims 15 to 20.

22. DNA fragment characterized in that it is capable of hybridizing under weakly stringent conditions with a fragment according to one of claims 15 to 21.

23. Strain of E. coli deposited at the CNCM of the Pasteur institute under the number I-1683 transformed by the plasmid pCT 1801.

24. E. coli strain deposited at the CNCM of the Pasteur Institute under the number I-1684 transformed by the plasmid pCT 1830.

25. Compound characterized in that it comprises at least one dockerine domain of type il.

26. Complex comprising at least one compound according to one of claims 1 to 14 linked by a C / D type II interaction with a compound comprising at least one dockerine type II domain, each compound constituting an element of the complex.

27. Complex according to claim 26, characterized in that the affinity of the complex is at least equal to 105 M.

28. Complex according to one of claims 26 and 27, characterized in that it comprises at least three elements, two of which are linked by a C / D interaction other than of type II.

29. Complex according to claim 28, characterized in that two elements are linked by a C / D type I interaction.

30. A multimeric complex according to claims 28 and 29 characterized in that it comprises between 1 and 50 elements associated with each other and preferably 1 and 20.

31. Complex according to claim 30, characterized in that it comprises at least two C / D type II interaction domains.

32. Complex according to claim 30 characterized in that it comprises at least one type I C / D interaction associated with an interaction

C / D type II.

33. Multimeric complex according to one of claims 28 to 32, characterized in that the elements of the complex are essentially proteins.

34. Complex according to one of claims 26 to 33, characterized in that at least one of the elements comprises a protein fragment rich in proline and / or in hydroxy amino acid.

35. Multimeric complex according to one of claims 33 and 34, characterized in that some of the elements of the complex are enzymes.

36. Expression vector comprising a DNA fragment according to one of claims 15 to 22, placed under the control of elements ensuring its expression in a host cell.

37. A strain of E. coli transformed by a vector according to claim 36.

38. Process for the preparation of a polypeptide according to one of claims 1 to 14, characterized in that the cultivation of transformed host cells is carried out using a vector according to claim 36 or by culture of a strain according to claim 37.

39. Enzymatic composition comprising a complex according to one of claims 26 to 35.

40. The enzymatic composition according to claim 39, comprising two enzymes linked by a C / D type II interaction.

41. Composition according to claims 39, characterized in that the multimeric complex comprises a compound according to one of claims 1 to 11 linked to a dockerine domain of the CipA protein, linked to a first enzyme, and the second compound comprising a domain dockerine of a catalytic subunit of the cellulytic complex of

Clostridium thermocellum linked to a second enzyme.

42. Use of the multimeric complex according to one of claims 26 to 35, characterized in that said multimeric complex potentiates the synergy of the elements of the complex.

43. Use of the multimeric complex according to one of claims 39 to 42, characterized in that said complex ensures the potentiation of the enzymatic composition.

44. Method for detecting an antigen or an antibody, characterized by bringing a multimeric complex according to one of claims 26 to 35 into contact with a solution containing an antibody or an antigen of interest and revealing it of the reaction between the multimeric complex and the antigen or antibody.