WO1998049274A1

WO1998049274A1 - Thermostable dna polymerase and inteins of the thermococcus fumicolans species

Info

Publication number: WO1998049274A1
Application number: PCT/FR1997/000761
Authority: WO
Inventors: Joël QUERELLOU; Marie-Anne Cambon
Original assignee: Appligene-Oncor
Priority date: 1997-04-29
Filing date: 1997-04-29
Publication date: 1998-11-05
Also published as: WO1998049275A3; WO1998049275A2

Abstract

The invention concerns a purified thermostable DNA polymerase, thermostable archae bacteria DNA polymerase of the Thermococcus fumicolans species having a molecular weight of the order of 89000 daltons and its thermostable inteins.

Description

THERMOSTABLE DNA POLYMERASE AND INTEINS OF THE THERMOCOCCUS FUMICOLANS SPECIES

The present invention relates to a new thermostable DNA polymerase and its two inteins, originating from an archaebacterium of the species Thermococcus fumicolans.

DNA polymerases are enzymes involved in the replication and repair of DNA in any living cell. Many DNA polymerases isolated from microorganisms such as E. coli are known today.

(DNA polymerase I) or phage T4. DNA polymerases have also been identified and purified and from thermophilic microorganisms such as Thermus aqua ticus (Taq polymerase, Chien, A. et a. J. Bact. 1976,

127: 1550-1557; Kaladin et al. Biokhymiya 1980, 45: 644-

651), Thermus thermophilus, or species of the genus Bacillus (European patent application published under No. 699 760), Thermococcus (European patent application No. 455 430), Sulfolobus and Pyrococcus (patent application

European published under No. 547 359). Among these DNA polymerases originating from archaebacteria, mention may be made of

Pfu, isolated from Pyrococcus furiosus (18), Vent ™ polymerase from Thermococcus li toralis (10), 9 ° N from Pyrococcus sp. 9 ° N (15) and the DeepVent ™ from Pyrococcus GB-

D, the first two from littoral strains (Bay of Naples), the next two from deep underwater strains.

The mechanism of action of DNA polymerases is relatively well known today and consists of replication of DNA identically according to a semi-conservative mode. The copied strand serves as a matrix and the four nucleotide triphosphates are the substrate for this polymerization. Enzymes with DNA polymerase activity are increasingly used today in vitro to work in molecular biology for various purposes such as cloning, error detection,

RECTIFIED SHEET (RULE 91) ISA / EP sequencing, labeling, and generally speaking, amplification of nucleic acid sequences.

This amplification, in vi tro, of deoxyribonucleic acid sequences calls upon the technique of the polymerase chain reaction (PCR) described in European patents Nos. 200 362 and 201 184. The principle of this technique is based on the carrying out successive cycles of extension of primers using the four nucleotide triphosphates as well as a DNA polymerase and a template DNA to be copied. At each cycle, the enzyme doubles the number of DNA strands available and between each cycle thermodenaturation is necessary in order to open the DNA double helix for the next cycle. The temperatures used for this thermodenaturation step are not compatible with the conservation of the activity of most of the known DNA polymerases, such as Klenow. Many research efforts have been made to find enzymes that can withstand these temperatures. However, if thermophilic microorganisms are known today, it still remains difficult to obtain these thermostable enzymes with sufficient production yields. Molecular biology and genetic engineering overcome this drawback. Thus, once located in the genome, the gene coding for DNA polymerase is cloned, sequenced and then recloned in an expression vector in order to produce the so-called recombinant protein, in a mesophilic host which is easier to cultivate such as E. coli or S. cerevisiae. This expression method in E. coli has in particular been described in the international patent application PCT published under No. WO 89/06691 for producing DNA polymerase from Thermus aquaticus.

The DNA polymerase of the invention comes from an archaebacterium of the species Thermococcus fumicolans. In addition to its thermostable properties making it particularly effective in particular in a PCR process, this DNA

RECTIFIED SHEET (RULE 91) ISAEP polymerase is remarkable in that it has two "protein introns", also called "inteins", at the level of its precursor polypeptide.

The nucleotide sequence of its inteins is inserted into that of DNA polymerase, generally at the level of conserved sites involved, after translation, in catalytic reactions. These sequences are transcribed and translated at the same time as that of DNA polymerase and the autocatalytic splicing of the inteins then produces three enzymes: two inteins and one DNA polymerase.

Such inteins are also found in other molecules such as vacuolar ATPase in S. cerevisiae (4), GyrA in Mycobacterium leprae (7), Rec A in Mycobacterium tuberculosis (5, 6). The inteins belong for the most part to the family of endonucleases of the "homing endonucleases" type since they cut DNA at a recognized site, at the very place where their nucleotide sequence is inserted. The development of biotechnologies both in research and in the fields of medicine or agri-food, requires having various types of DNA polymerases capable of improving quantitatively and qualitatively techniques as diverse as cloning, detection , amplification of DNA sequences. The present invention aims precisely to offer a new thermostable DNA polymerase which is derived from a recently described species: Thermococcus fumi colans (8). This isolate was isolated from fragments of chimneys taken in the North Fidgian basin during the Franco-Japanese STARMER campaign in 1989. This species, strict anaerobic, has an optimal growth temperature of 90 ° C, which is relatively high for a Thermococcus. Its optimum pH is 8.8, and its salinity level from 20 g / 1 to 40 g / 1.

The subject of the invention is therefore a purified thermostable DNA polymerase of archaebacteria of

RECTIFIED SHEET (RULE 91) ISA / EP the species Thermococcus fumi colans having a molecular weight of the order of 89,000 Da as well as its thermostable inteins, the gene of which comprising the two sequences coding for said inteins has been cloned.

The research work which made it possible to identify, sequence and study the DNA polymerase of the invention was carried out using the strain Thermococcus fumicolans ST557 deposited in the Collection of the Institut Pasteur (CIP) under the number CIP 104680. This DNA polymerase will be named in the following Tfu. Its sequence of 774 amino acids is represented in the sequence list in the appendix under the number SEQ ID NO: 2. A molecular weight of 89797 Da and a pI of 8.1 have been deduced from this sequence.

The invention therefore relates to purified thermostable DNA polymerase of archaebacteria of the species Thermococcus fumicolans having a molecular weight of the order of 89,000 daltons as well as its enzymatically equivalent derivatives. The term “enzymatically equivalent derivatives” is understood to mean the polypeptides and proteins constituted by or comprising the amino acid sequence represented in the sequence list in the appendix under the number SEQ ID NO: 2 as soon as they exhibit the properties of the DNA polymerase of Thermococcus fumi col ans. As such, the invention more particularly contemplates a DNA polymerase, the amino acid sequence of which is represented in the sequence list in the appendix under the number SEQ ID NO: 1 or a fragment thereof or an assembly of such fragments, such as the 774 amino acid sequence represented in the sequence list in the appendix under the number SEQ ID NO: 2.

Indeed, the presence of two inteins I-Tfu-1 and I-Tfu-2 in the sequence number SEQ ID NO: 1, are likely to lead during the preparation by chemical synthesis or by genetic engineering, to sequences of Truncated T. fumiculans DNA polymerase, the

RECTIFIED SHEET (RULE 91) ISA / EP Enzymatic properties are equivalent to that of purified T. fumicolans DNA polymerase.

The term “derivatives” is also understood to mean the amino acid sequences modified above by insertion and / or deletion and / or substitution of one or more amino acids, provided that the properties of the DNA polymerase of T. fumicolans which result therefrom are not significantly changed.

The invention also relates to a DNA sequence consisting of or comprising the sequence coding for a DNA polymerase of the invention.

The DNA sequence represented in the annexed sequence list under the number SED ID NO: 1 represents such a sequence. The DNA coding for the DNA polymerase of T. fumicolans and its two inteins consists of nucleotides 357 to 5028. Nucleotides 1 to 356 correspond to the promoter region of this gene. Consequently, the subject of the invention is a DNA sequence constituted by or comprising the sequence between nucleotides 357 to 5028 of SED ID NO: 1, or a fragment thereof, or an assembly of such fragmen s .

The invention relates more particularly to a DNA sequence constituted by or comprising nucleotides 357 to 1674 and 2755 to 3156 and 4324 to 5028 of the DNA sequence represented in the sequence list in the appendix under the number SED ID NO : 1.

This sequence codes for the DNA polymerase of T. fumicolans whose sequence of 774 amino acids is represented in the sequence list in the appendix under the number SED ID NO: 2.

The invention relates as much to DNA polymerase isolated and purified from the Thermococcus fumicolans strain as to DNA polymerase prepared by chemical synthesis, for example by ligation of polypeptide fragments, or also by genetic engineering methods. Within the framework of these genetic engineering methods, the invention also relates to a vector comprising a DNA sequence defined above, as well as a process for the production or expression in a cellular host of the thermostable DNAs of the invention. .

A process for the production of a DNA polymerase according to the invention consists in:

transferring a nucleic acid sequence coding for DNA polymerase or a vector containing said sequence into a cell host,

to cultivate the cell host obtained in the previous step under conditions allowing the production of DNA polymerase,

- to isolate, by any appropriate means, said DNA polymerase.

The cell host used in the above methods can be chosen from prokaryotes or eukaryotes and in particular from bacteria, yeasts, mammalian, plant or insect cells.

The vector used is chosen according to the host to which it will be transferred.

The thermostable DNA polymerase of the invention is useful in particular in the methods of enzymatic amplification of nucleic acid sequences. Consequently, the subject of the invention is such methods using the thermostable DNA polymerase described above, as well as the amplification kits comprising, in addition to the reagents generally used, an adequate quantity of this DNA polymerase.

The invention also relates to a purified thermostable intein of archaebacteria of the species Thermococcus fumicolans. As indicated previously, the inteins are also defined as protein introns which are not spliced at the level of the messenger RNA but at the level

RECTIFIED SHEET (RULE 91) ISA / EP protein maturation. They therefore relate to a single gene translated and transcribed in a single step, and constitute by-products of the maturation of the protein encoded by this gene (Xu, MG, Comb, DG., Paulus H., Noren CJ, Shao Y ., Perler, F., 1994, Protein splicing: an analysis of the branched intermediate and its resolution by succinimidine formation. EMBO J. 13, 5517-5522.)

The inteins are restriction endonucleases which have the property of cutting DNA at the very place where their gene is inserted, and therefore they can be considered as selfish sequences.

The inteins have in their sequence all the information necessary for their own splicing since they splice in E. coli.

It is possible to distinguish four main stages of protein maturation:

A first step in the formation of a linear intermediary which has an ester function.

This reaction is dependent on the pH and the local environment of this bond (nature of the amino acids). This principle is used in cloning, expression or purification kits using inteins, because a change in the environment causes splicing or not. Indeed, it would be inhibited at pH 11 and activated at pH 7.5.

- A second transesterification step which allows the previous intermediary to be transformed and to shift the balance of the first step.

The third step consists in a cyclization of 1 asparagine releasing the intein.

- The fourth step is the stabilization of the mature protein and the formation of a real peptide bond.

It is therefore possible to construct thermosensitive mutants making it possible to block splicing.

RECTIFIED SHEET (RULE 91) ISA / EP protein at expression temperatures (30 ° C) and induce it by heating.

This possibility of controlling protein splicing by temperature can be used in cloning vectors with a sequence coding for the intein and around cloning sites. If the protein to be cloned and expressed is toxic to the host, it can be cloned into two fragments around the intein sequence. Thus, overall, the gene to be cloned is complete but it is interrupted by the sequence of the intein. During expression, the intein is found in the expressed protein, thus making it inactive. It is then possible by heating, at the end of the induction, to release the intein by autocatalytic splicing and thus find the cloned active protein.

The inteins thus make it possible to carry out purifications and are used in kits according to the principle described below. Certain residues around the intein splicing site are modified. The expression of the recombinant protein is carried out at low temperature to block possible splicing too early. At the C-terminal of the intein is fixed a site having a strong affinity for chitin. During induction, the cloned protein is expressed as well as the intein and the chitin binding site. The purification is then carried out with the chitin, on affinity columns, which retain the chitin and also the intein and the cloned protein, the whole being part of the same pre-protein. The N-terminal end of the intein is then hydrolyzed with DTT or β-mercaptoethanol to release the cloned protein.

The inteins are also thermostable restriction endonucleases, which have as a recognition site the very place where their gene is inserted in the "host" sequence. They have a twice repeated nucleic sequence (LAGLIDADG) in the protein,

RECTIFIED SHEET (RULE 91) ISA / EP more or less conserved sequence which corresponds to the active DNA recognition and cleavage site. These enzymes also seem to need g ++ for their activity. It should be noted that the two inteins of the invention are co-expressed in E. coli and are self-splicing. This means that they have no toxicity for the host, unlike one of the inteins of Thermococcus li toralis (9), and therefore their use in expression or purification kits is easy. .

A first intein sequence according to the invention is represented in the annexed sequence list under the number SEQ ID NO: 3. This intein, called I-Tfu-1, has a molecular weight of 41,409 Da and a pi of 9.13, deduced from the 360 amino acid sequence of the sequence SEQ ID NO: 3.

A second intein sequence according to the invention is represented in the annexed sequence list under the number SEQ ID NO: 4. This intein, called I-Tfu-2, has a molecular weight of 44,765 Da and pi of 9 , 6, deduced from the 389 amino acid sequence of the sequence SEQ ID NO: 4.

As recalled previously, the thermostable inteins of the invention are useful in particular in methods of restriction of nucleic acids and in the development of expression vectors making it possible to reduce the toxicity of the protein to be expressed by inserting one of the two sequences of inteins in the sequence of the protein to be expressed. This can be done without manipulation of the sequence of inteins if the cloning is carried out in E. coli, the expression techniques used having demonstrated their harmlessness for this host organism. Consequently, the subject of the invention is such processes using one or both of the two thermostable inteins described above, as well as the expression or purification kits containing one or the two sequences coding for said thermostable inteins.

The invention also relates to a DNA sequence consisting of or comprising the sequence coding for an intein of the invention.

A DNA sequence coding for the I-Tfu intein

1 is between nucleotides 1675 and 2754 in the sequence SED ID NO: 1 in the appendix. This DNA sequence codes for the intein, the amino acid sequence of which is represented in the annexed sequence list under the number SED ID NO: 3.

A DNA sequence coding for the I-Tfu intein

2 is between nucleotides 3157 and 4323 in the sequence SED ID NO: 1 in the appendix. This DNA sequence codes for the intein whose amino acid sequence is represented in the sequence list in the appendix under the number SED ID NO: 4.

The invention relates both to these thermostable inteins isolated and purified from the Thermococcus fumicolans strain as to inteins prepared by chemical synthesis, for example by ligation of polypeptide fragments, or also by genetic engineering methods. Within the framework of these methods of genetic engineering, the invention also relates to a vector comprising a DNA sequence defined above, as well as a process for the production or expression in a cellular host of DNA coding for the inteins of the invention. Such methods are identical to those reported previously for T DNA polymerase. fumicolans.

Other advantages and characteristics of the invention will become apparent on reading the examples which follow, given without limitation and relating to cloning, expression, characterization and activity

RECTIFIED SHEET (RULE 91) ISA / EP of the thermostable DNA of the invention, and referring to the accompanying drawings in which:

- Figure 1 shows the DNA-DNA hybridization of the genomic DNA of T. fumicolans digested with various restriction enzymes and hybridized with the probes GE23ClaI-HindIII and GE23XhoI-SalI.

- Figure 2 shows the cloning strategy, the gene structure of the DNA polymerase of T. fumiculans and the gene products. - Figure 3 shows the results of purification of the recombinant polymerase of T. fumi colans after a heparin sepharose column, visualized by SDS-PAGE.

- Figure 4 represents the results of purification of the recombinant polymerase of T. fumiculans after a Blue column HTrap n ° 2, visualized by SDS-PAGE.

- Figure 5 shows the purification results of the recombinant polymerase of T. fumi col years after a phosphocellulose column, visualized by SDS-PAGE.

- Figure 6 represents the results of purification of the recombinant polymerase of T. fumicolans after a MonoQ column, visualized by SDS-PAGE.

- Figure 7 shows the PCR results with DNA polymerase from T. fumicolans with the exclusion fractions from the MonoQ column.

I- Materials and methods.

1) Culture conditions, plasmids and strains used. The strains Thermococcus li toralis (DSM 5474 T) and Pyrococcus furiosus (DSM 3638 T) were obtained from the collection of the Deutsche Sammlung von

Microorganismen (DSM) Braunschweig-Stocheim, Germany.

RECTIFIED SHEET (RULE 91) ISAEP The strain Pyrococcus sp. GE 23 was isolated from chimneys of deep hydrothermal vents and was supplied by G Erauso (CNRS, Station Biologique de Roscoff, France). The Thermococcus fumicolans strain was obtained from the Marine Microbiology laboratory of G. Barbier (IFREMER-DRV-VP-CMM) in Brest, France. This strain, Thermococcus fumicolans, was obtained by purification from fragments of hydrothermal vents collected in the North Fidgian basin during the Franco-Aponais campaign STARMER carried out in 1989 at 2000 meters depth.

Pyrococcus sp.GE23 was cultured at 85 ° C in 2216S medium (DIFCO) at a pH of 6.5.

Thermococcus fumi col ans, described in 1996 (Godfroy et al.) Is cultivated under anaerobic conditions in a medium containing the following elements: peptone 2g / 1; yeast extracts 0.5 g / l; sea salt (Sigma) 30g / l; PIPES buffer 6.05g / l, elemental sulfur 10g / 1, rezasurin lmg / 1. The pH is adjusted to 8.5 with 5N sodium hydroxide at 20 ° C.

The Escherichia coli SURE strain (Stratagene, La Jolla, Calif.) Was cultured in LB medium with the appropriate antibiotic (s), at 37 ° C. with stirring. This strain was used as a host to receive the primary constructs from the pUC 18 or pBluescript vectors. The NovaBlue, BL21 (DE3) and BL21 (DE3) pLysS strains (Novagen, Madison, Wi.) Were cultured in 2xYT medium with the appropriate antibiotics at 37 ° C or 30 ° C. These strains were used as hosts for the expression plasmids.

2) Isolation of DNA, hybridization and recombinant DNA.

The high molecular weight DNA of Thermococcus fumicolans was isolated by the modified Charbonnier method (3). The cells were resuspended in TE-Na IX buffer, then lysed at 40 ° C for three hours

RECTIFIED SHEET (RULE 91) ISA / EP with a mixture of 1% N-Lauryl-sarcosine, 1% sodium dodecyl sulfate and 0.4 mg / ml of proteinase K. After lysis, centrifugation at 5000 g for 10 minutes makes it possible to remove cellular debris. The DNA is extracted by a treatment with Phenol-Chloroform-Isoamyl alcohol or PCI (25-24-1), then treated with RNAse at 5 μg / ml at 60 ° C for one hour. These steps are followed by two additional PCI extractions and a chlorophoric extraction. The DNA is then precipitated with absolute ethanol at -20 ° C, then centrifuged, air dried and taken up in TE-1x buffer. The concentration and purity of this DNA are measured by spectrophotometry at 230, 260 and 280nm with a GeneQuantlI device (Pharmacia, Upsalla, Sweden). For the construction of the genomic mini-bank in pUC 18 (17), the DNA was completely digested overnight at 37 ° C. by a series of restriction enzymes (BamHI, HindIII, EcoRI, EcoRV, PvuII, Sali , Xbal and Xhol) by simple and double digestions. Then the DNA fragments are fractionated on 0.8% agarose gel in TBE-1X and transferred in vacuo to a Hybond-N + nylon membrane (Amersham, UK). These membranes were hybridized with DNA probes prepared by PCR with specific primers selected from the DNA polymerase genes of P. furi osus, T. li toral is and Pyrococcus sp. GE23. These probes are previously marked with 32P by "random primmg" in accordance with the manufacturer's recommendations (Megaprime, Amersham, UK).

Two P. furiosus probes were used, Pfu and Pfu

F, covering respectively the regions delimited by base pairs 8 to 2316 and 819 to 1915 of the coding section of the polymerase gene as defined by Uemori et al. (18). Two T. li toralis probes, Tli I and Tli T, covering the regions delimited by base pairs 297 to 1768 and 4631 to 5378 respectively, as defined by Hodges et al. (9). Two Pyrococcus sp GE23 probes were used, one containing the 5 'part of the gene (Clal-HindIII fragment

RECTIFIED SHEET (RULE 91) ISA / EP corresponding to sites pb 8 and pb 1353 of the coding section) and the other containing the terminal part of this same gene obtained by PCR (primers known as expression Ndel and Sali corresponding to sites pb -1 and pb 2318, then digestion by Xhol and Sali including the bases of Nos. 1879 to 2318). Positive fragments were identified by DNA-DNA hybridization (14). Only hybridizations with Pyrococcus sp. GE 23 provided positive signals at 55 ° C in less than 24 h of exposure. The probes from T. li toralis and P. furiosus gave no results, even at 50 ° C in a standard buffer without formamide. From hybridizations with the Pyrococcus sp. GE 23, HindIII fragments of 1.9 kb were selected, then prepared by appropriate digestion of 100 μg of genomic DNA, purified in dialysis bags from agarose gels, and precipitated with absolute ethanol after PCI extraction. The fragments were ligated into a dephosphorylated pUC 18 / HindIII. The transformations of the host strains were carried out by electroporation (Gene Puiser, Biorad). The screening of the recombinant clones was carried out by selection with ampicillin, alpha-complementation on X-Gal-IPTG substrate then hybridization of colonies according to standard techniques (12). The temperature of the colony hybridizations was 55 ° C. with the Pyrococcus sp GE23 probes, in a standard buffer without formamide. The plasmid DNA was isolated according to the method described by Birnboim and Doly (1), then purified by anion exchange chromatography in solid phase (Quiagen, Chatsworth, Calif.). The restriction fragments of the plasmids were purified on agarose gel by the GeneClean method (Bio 101, La Jolla, Calif.) For later cloning.

Thermococcus fumicolans 16S and 23S rRNA was amplified by PCR using the following primers:

- direct primer Aa: 5 'TCCGGTTGATCCTGCCGGAA-3'

- 23Sa reverse primer: 5 '-CTTTCGGTCGCCCCTACT-3'

RECTIFIED SHEET (RULE 91) ISAEP - initial step 3 minutes at 94 ° C followed by 30 cycles (94 ° C, 1min / 49 ° C, lmn / 72 ° C, 2mn) and, final elongation of 5 will go to 72 ° C.

The PCR product was cloned into the vector pUC18 for subsequent sequencing.

3) Se uencaσe of DNA.

The DNA sequences were obtained by the chain termination method (13) using an Applied Biosyste s automatic DNA analysis system. The two strands of the genes coding for DNA polymerase and the two inteins were sequenced using universal primers localized on vectors as well as internal primers. The 16S rDNA sequence was carried out on both strands, after cloning (SureClone, Pharmacia, Uppsala, Sweeden), using the Hot-Tub kit (Amersham, UK.) In order to remove the compressions.

Sequence analysis was performed with DNASTAR software (Madison, Wis., USA) and the program of Genetic Computer Group (University of Wisconsin Biotechnology Center, Wis., USA) available online on INFOBIOGEN. Computerized similarity searches were performed with the BLAST program, multiple alignments with CLUSTAL V, and phylogenetic trees were established using the so-called Neighbouring method (11).

4) Construction of the recombinant expressing the DNA polymerase of Thermococcus fumicolans.

The DNA polymerase of Thermococcus fumicolans as well as its two inteins, were expressed at the same time in E. coli with the expression vector PARHS2 which belongs to the family of expression systems T7 (16) acquired from Eurogentec.

PCR was used to prepare the complete fragment of DNA polymerase and the two inteins in using primers containing the NdeI and BamHI restriction sites:

- primer Tfu Dir: 5 '- TGG GGA TCC ATA TGA TCC TCG ATA CAG ACT ACA TC-3' - primer Tfu Rev:

5 '-AAG CTT GGA TCC TCA TTT CTT CCC CAT TTT GAG CC-3'

The reaction mixture contained GOLDSTAR DNA polymerase (Eurogentec, B), the Taq Extender enzyme (containing Pfag from Stratagene), the extension buffer with the four dNTPs (each at 0.2mM) and the primers Tfu Dir and Tfu Rev at 50 pmol in a volume of 50 μl final. The amplification was carried out over 20 cycles: 1 min 94 ° C, 1 min at 54 ° C and 6 min at 72 ° C using a Stratagene 96-gradient thermocycler. The PCR fragments were then digested with the enzymes NdeI and BamHI and then ligated to the same sites of the vector, thus restoring the initiation codon. The construction thus obtained was named PARHS2TFU1. This construction was sequenced at the junction sites to verify its integrity with respect to the genomic DNA sequence. The expression tests were carried out according to the following protocol: selection of the recombinant clones in the E. coli Novablue strain, expression with the BL21 (DE3) pLysS strain in a 2xYT medium and induction for four hours at ImM of IPTG.

The first induction tests were carried out on 5 ml cultures, whether induced or not. Night precultures are carried out without an inductor and restarted in a fresh medium in the morning (to the tenth), until the optical density (measured at 600nm) is 0.6, then either induced for 4 hours, or not induced and stopped after 2, 4 or 14 hours. Four ml of cultures are then centrifuged at 4 ° C, 5000 rpm for 10 minutes. The pellet is then taken up in a lysis buffer (10 mM Tris-HCl pH 7.5; 10 mM NaCl, 2 mM MgCl2). The cells thus taken up are then lysed, either with triton X-100 1% v / v, or lysozyme at 1 mg / ml of

RECTIFIED SHEET (RULE 91) ISA / EP lysis and left on ice about 5 to 10 min. The cells are then thermodenatured by an exposure of 20 min at 72 ° C. This largely destroys the cells of the mesophilic host without destroying the recombinant proteins. The lysis product is then centrifuged for 20 min at 10 000 rpm at 4 ° C. The supernatant is recovered in order to test it in incorporation, in PCR or on gel. The incorporations are carried out according to two techniques: - With tritiated thymidine as tracer.

The incorporation is carried out on activated calf thymus (SIGMA Aldrich, F) in the following reaction medium: Tris-HCl 50mM pH 8.8; DTT ImM; MgCl2 10 mM; KCl 10MM;

BSA 0.4 mg / ml; each dNTP at 0.4 mM. - With 32 P -dATP (Amersham) as a tracer.

The incorporation is carried out on activated calf thymus DNA (Appligene) in the following reaction medium:

50 mM Tris-HCl pH 9; 50 mM KCl; 7mM MgCl2; BSA 0.2mg / ml and

(NH4 +) S04 (filtered) 16mM, with a mixture of the 4 dNTPs at 500μM final each. This second method makes it possible to accurately estimate the number of units of enzyme.

5) Purification.

a) Culture.

After tests in small volumes, the cultures intended for the expression of the recombinant enzyme were carried out as follows: production of an inoculum of 700 ml (2x YT medium supplemented with ampicillin and chloramphenicol) cultivated at 30 ° C. until at DO = 0.8; inoculation of a fermenter containing 16 1 of the same medium; culture for 4 h until OD = 0.6, then induction with 1 mM iPTG and culture for 4 h. The resulting biomass is centrifuged for 20 min at 6000 rpm at 4 ° C (JOUAN centrifuge).

RECTIFIED SHEET (RULE 91) ISAEP b) Cell lysis and first purification step.

20 g of biomass are taken up in 80 ml of buffer (20 mM Tris-Ci pH 7.5, 10 mM NaCl; 2 mM MgC12; 1 mM EGTA; 1% Triton xlOO; 2.2 mM PMSF). The resulting mixture, kept at 4 ° C maximum, is sonicated 12 successive times (15 s cycle) until a liquid solution is obtained. The supernatant is then centrifuged for 20 min at 4 ° C at 20,000 rpm (SORVALL Ti45). The supernatant is recovered and treated with heat (70 ° C. for 10 min) in order to thermodenature most of the native proteins of E. coli, then centrifuged again.

c) Chromatography. The 70 ml of supernatant from the previous step are loaded onto a Pharmacia Heparin Sepharose column (30 ml of resin), after equilibration with buffer A (10 mM Tris-Ci pH 7.5, 0.5 mM EGTA; 5 mM MgCl2; 10 mM β-mercaptoethanol; 0.2% Triton xlOO and 10% glycerol. Washing is carried out with buffer B (same buffer A + 2 M NaCl) at a rate of 0.3 ml / min on an FPLC system Pharmacia The different fractions are recovered in NaCl gradient.

The active fractions thus recovered are dialyzed for 5 h against a buffer C: 10 mM Tris-Ci pH 7.5; 0.5 mM EGTA; 5 mM MgCl2; 10 mM -mercaptoethanol; 0.2% Triton x100; 10% glycerol; 50 mM NaCl. The products resulting from dialysis are successively loaded onto an affinity column for the proteins binding to DNA (Pharmacia, Blue HiTrap) and eluted in NaCl gradient with the same buffers as above.

30ml of active fractions obtained previously are loaded onto a phosphocellulose column with Pli resin from Whatmann (volume: 20ml; diameter: 2.5cm). These fractions were dialyzed for 5 hours against the following buffer: KP04 pH7 20mM, EDTA 0, 1mM, DTT ImM, Glycerol 5%, Triton X-100 0.1% and KCl 0.1M. The flow of

RECTIFIED SHEET (RULE 91) ISAEP the column is adjusted to 0.2 ml / min, the loading buffer A is composed of KP04 pH7 20mM, EDTA 0, lmM, DTT ImM, Glycerol 5%, Triton X-100 0.1% and the gradient (between 0% and 50% deB) is produced by KCl present in buffer B at 2M.

Having already demonstrated that this polymerase does not bind to a MonoQ or Q resource and this, whatever the pH used, we tried to recover it in exclusion by making the hypothesis of a DNA binding contaminant through the column.

First of all, a test was carried out at the end of the second HiTrap Blue with an aliquot of 5ml and dialysis according to the same method as for a passage on a HiTrap Blue. A second attempt was made after phosphocellulose and after two dialysis of the most active fractions 45, 46 and 49. The fractions are first heated 40 min at 85 ° C. A first dialysis is then carried out against the 0.1 KM KCl buffer, 1M K2HP04 pH7.5 for 3 hours. The second dialysis is carried out with the following buffer: K2HP04 pH 7.5, 10 lMM, K2PO4 lOmM, KCl 25 mM, DTT ImM, Triton X-100 0.1%, Glycerol 10%, for 1 hour. The solution is then loaded onto a MonoQ column at 0.5 ml / min with a NaCl gradient of 0 to 20% (buffers used for heparin).

6) Exonuclease activities.

The 3 '-5' exonuclease assays are quantified by the release of P labeled nucleotides. To this end, a first step makes it possible to carry out the labeling: the DNA of is digested with HindIII and then, the Klenow fragment copies the DNA from the free 3 '-OH ends, in a medium containing, in addition to the buffer, l DNA and the enzyme, ³ P dATP and ³² P dTTP, the dGTP and dCTP being cold. After one hour at 37 ° C, the four cold dNTPs are added in excess for half a hour. Kleno and dNTPs are removed by phenol extraction and precipitated with ethanol.

The exonuclease tests are carried out in solutions containing the enzyme buffers, 0.02 mg / ml of labeled DNA, and incubated overnight at 72 ° C, 80 ° C and 95 ° C. Different buffers containing MgCl2 or MnS04 are tested. The same test is carried out with the Wind as a positive control. 101 of reaction solution are then deposited on DE81 paper (Watmann), dried and then counted before and after washing (3 times 5 min with a2HP04, 1 time with water and then with 95% ethanol) using the technique from Cerenkov.

For the 5'-3 'exonuclease activity test, the labeling reaction uses the polynucleotide kinase to label the 5' substrate.

II - Results.

1) Isolation of the DNA polymerase gene from Thermococcus fumicolans as well as from the two intein genes.

The DNA of Thermococcus fumicolans, digested with a series of restriction enzymes, was hybridized to probes of P. furiosus and T. li toralis, prepared by PCR and to probes of Pyrococcus sp. GE23 obtained during the cloning of this other DNA polymerase (Patent deposit n ° 96 08631 with the INPI). As shown in FIGS. 1a and 1b, the Southern blot hybridization revealed fragments of two types: a HindIII-HindIII fragment of 1.9 kb and an Xhol-Xhol fragment of 5 kb. These two fragments were revealed only with the probe

Clal-HindIII of the Pyrococcus sp. GE23, marked with

32 P with a two hour exposure. The two identified fragments were then recovered and purified as described previously in Materials and Methods then cloned into the vector pUC18 dephosphorylated and digested with

HindIII or in the dephosphorylated pBluescript vector and

RECTIFIED SHEET (RULE 91) ISAEP digested with Xhol. About 400 recombinants (E. Coli SURE) were screened with the Clal-HindIII probe. Of these 400 colonies, two gave a positive signal during hybridization. (n ° 9.26 and 12.79). The two clones were cultured in LB-Amp medium and their restriction profiles were identical, with an insert of 1.8 kb. The subsequent sequencing of one of these clones (designated 557MACa) and the sequence comparison (Megalign, DNASTAR program) have shown that it corresponds to the promoter region and to the first 1404 base pairs of a DNA gene. polymerase belonging to the family B (2).

With regard to the 5 kb Xhol-Xhol fragment, 700 recombinants were screened without any positive signal during hybridization. This lack of success could be due to the presence of an intein within this fragment, thus rendering it unstable in a vector with a high number of copies of pBluescript type.

After 12 hours of exposure, a second HindIII-HindIII fragment of 2 kb was identified by Southern type hybridization on the same membrane as above, using the end of the gene for Pyrococcus sp. GE23 between the Xhol and Sali sites

(fragment obtained by digestion of PCR product). This fragment was cloned as before. About 200 recombinant clones were screened and four of them gave a positive signal. The four clones have an identical profile after digestion with the restriction enzyme HindIII. One of them, 557MACc, was sequenced and the sequence comparison demonstrated that it was the end of the DNA polymerase gene previously identified.

Assuming that the gene fragments obtained belong to the same DNA polymerase, oligonucleotides were used in order to amplify the missing area. This PCR fragment was then purified on gel as described in Materials and Methods, and used as a radioactively labeled probe to locate the missing fragment on the membrane. After two hours of development, a HindIII-HindIII fragment was revealed, of approximately 2 kb. Clone as before, 600 colonies were then screened, giving four positive clones. The four clones had the same profile and one of them, named 557MACb, was sequenced and the sequence comparisons demonstrated that it is the intermediate part of the DNA polymerase and this fragment, delimited by two HindIII sites, fits perfectly between 557MACa and 557MACc. Together, these three clones give the complete DNA polymerase sequence of T. fumicolans.

2) Phylogenetic position of Theirmococcus fumicolans

Thermococcus fumicolans is a new species of Thermococcales described by Godfroy et al (8).

3) Nucleotide and polypeptide sequences of the DNA polymerase of Thermococcus fumicolans.

a) Nucleotide sequences

The three fragments delimited by HindIII sites were assembled (Figures 2 and 3). The three of them form a fragment of 5039 base pairs. The first fragment, coming from the clone 557MACa, contains the starting codon ATG at position 457 bp. The open reading phase is then uninterrupted on 4572 base pairs until it encounters a STOP codon on the fragment derived from the clone 557MACc in position 5028 bp, six base pairs before the last HindIII site. By alignment, with the other DNA polymerase genes available in libraries

(Pfu, Tli, GB-D, KOD, ..), with the CLUSTAL V method, supplemented by a final manual alignment necessary to restore the sites of self-splicing of the inteins, two sequences of insertions were highlighted. - The first is inserted into base pair no. 1675 and ends at 2754, thus being distributed between the fragment originating from clone 557MACa and that of clone 557MACb.

- The second is inserted at the base n ° 3157 and ends at 4323. It is also distributed between the clone 557MACb and 557MACC.

These two insertion sequences form, with the rest of the sequence coding for DNA polymerase, a single reading frame.

b) Polypeptide sequences.

The coding section of the DNA polymerase gene therefore consists of three disjoint parts. The first part of the gene, carried by 557MACa, comprises the zone coding for the 3'-5 'exonuclease, where, after translation, the motif FDIET is recognized. The second part, carried by 557MACb and the third part carried by 557MACC include the preserved sites SLYPSI, and YG.DTD. The two insertion sequences are located on conserved areas of DNA polymerase, the first at the DFR / SLYPSII site such as I-KOD-1 of Pyrococcus sp. KOD1 and the second at the D / TDG site as T-I-Tli-1. l i toral i s. These two proteins are released by protein self-splicing. They include LAGLIDADG type sites repeated twice at approximately 100 base pairs apart. The alignments with the other inteins deposited in the sequence banks make it possible to assimilate them to restriction endonucleases of the archaebacterial type, endonucleases which cut DNA at the precise place where their gene is inserted.

c) Comparison with other DNA polymerase sequences. The alignment of the different polypeptide sequences of the DNA polymerases of Thermococcales available in the bank, P. abyssi strain GE5, Pyrococcus sp. GE23, Pyrococcus sp. KOD1, Pyrococcus sp. GB-D, P. furi os us, Pyrococcus sp 9 ° N, T. li toralis with the sequence of T. fumicolans (without inteins), carried out with the CLUSTAL program using the PAM250 matrix, gives the levels of similarity between these various polymerases listed in Table 1 below.

Table 1

The closest similarity level is that observed with Pyrococcus sp. 9 ° N (90.6%), which clearly indicates that the DNA polymerase of T. fumicolans is original in terms of its sequence and therefore constitutes a new DNA polymerase.

d) Comparison of the inteins with the other inteins available in banks.

The sequences available in the bank were aligned using the same methods as above. The comparisons of all the inteins show that these are divided into three groups corresponding to the sites of insertion of the motifs A (R / SLYPSI), B (KILAN / S) and C (D / TDG). The analysis of the levels of similarity and the search for phylogenetic relationships only make sense for inteins belonging to the same class, that is to say, fitting into a given motif. The similarity levels for I-Tfu-1 (class A) and I-Tfu-2 (class C) with their homologous inteins already described are given in Tables 2 and 3 respectively below. Table 2

RECTIFIED SHEET (RULE 91) ISA EP

Table 3

I-Tfu-1 and I-Tfu-2 seem to represent two new "alleles" of "homing" archaebacterial endonucleases of classes A and C.

I-Tfu-1 is the third known allele of intein inserting into the A site of Archaea DNA polymerase, while I-Tfu-2 is the second of its class.

5) Expression, characterization and activity of T. fumicolans DNA polymerase.

a) Clonaσe and expression. A 4595 bp insert obtained by long-distance PCR with the primers TfuDir and TfuRev and covering the entire DNA polymerase gene of Thermococcus fumicolans, with the two inteins, was cloned at the NdeI and BamHI sites of a vector for transform the E. coli Novablue strain. Mini-plasmid DNA preparations were carried out on ten transforming clones and all gave a positive signal to hybridization. Two clones were selected on the basis of their profile after digestion with Ndel and Sali or with HindIII. These two plasmids were then transformed into the E. coli BL21 (DE3) pLysS strain.

Expression tests were carried out in 5 ml culture in order to determine the optimal culture and induction conditions. First of all, the culture tests were carried out at 37 ° C., where there is too much cell lysis, then at 30 ° C. where the culture practically does not lyse. Culture is carried out in a 2xYT medium, where the production yield is better and the lysis reduced, supplemented with ampicillin (100 μg / ml) and chloramphenicol (15 μg / ml). Expression is then induced in the exponential growth phase (OD600nm = 0.6 to 0.7) with an IPTG concentration of ImM, a concentration which has been found to be optimal. Samples are taken before induction, then 2 hours, 4 hours and 6 hours after induction, as well as overnight. Samples are also taken from non-induced cultures after 6, 8, 12 and 24 hours of culture.

The samples are then treated as indicated in the Materials and Methods chapter, in order to test the level of activity of the recombinant DNA polymerase by

39 incorporation of tritiated thymidine or P dCTP. This first technique, which allows a qualitative approach, gives us two types of behavior. One of the clones is non-inducible and the best activity is obtained after a night of culture (rTful-1 clone). A second clone is inducible and exhibits maximum activity after four hours of induction, an activity which subsequently decreases

(rTful00-2 clone). Two other clones, also tested, are weakly inducible and express very little DNA polymerase. The rTful-1 and rTful00-2 clones are then tested in a 50 ml Erlenmeyer flask under the conditions described above. Only the inducible rTfulOO-2 clone has a constant volume expression of 50 ml. The rest of the work was therefore carried out on this clone.

b) Fermentation and extraction of cells.

The culture of the clone Tful00-2 was carried out in a 16 liter fermenter, in the 2xYT medium supplemented with ampicillin and chloramphenicol. The 750 ml preculture was carried out the day before and stopped in the exponential phase (OD 600nm = 0.7-0.8) and left at

4 ° C for the night. The fermenter has been prepared and put in temperature at 30 ° C with the culture medium. The preculture was returned to 30 ° C one hour before its transfer to the fermenter. The final culture volume is 15 liters. The conditions are as follows: temperature = 30 ° C; agitation = 300 rpm. Induction with 1 mM IPTG was carried out at OD 600 = 0.58. The pH of the culture was adjusted to 7 during the acidification phase and then left free during the alkaline phase. The bacteria were removed after four hours of induction, when the pH was 8.3. The culture was then centrifuged and the cells were divided into three batches. One of them, 20 g of paste, was taken up in 80 ml of lysis buffer for further processing indicated in the Materials and methods chapter.

c) Purification of T. fumicolans DNA polymerase.

The purification was carried out as indicated previously (Materials and Methods). For the Heparin-Sepharose column, a gradient of 3 to 50% of buffer B, corresponding to the volume 365 ml to 1363 ml, makes it possible to recover 73 fractions of 6 ml. The peak of polymerase activity is obtained for a gradient value of approximately 0.5 M, and corresponds to fractions 55/56 (assayed at 10 and 12 units respectively) as indicated in FIG. 7. These fractions, incubated at 37 ° C overnight in the presence of pBR322 DNA, degrade the DNA and consequently show traces of host nuclease, not visible on gel. Their elimination, or at least a substantial reduction in their concentration, was achieved by passing through an affinity column (Blue Hitrap).

Fractions 54 to 60, grouped and dialyzed, are loaded at a rate of 0.25 ml / min. The elution makes it possible to recover 65 fractions of 5 ml with peaks of activity for fractions 36 to 56. The dosage of the activity shows a concentration of 3 to 5 units for fractions 36 to 40. These fractions, when put in the presence of DNA at 37 ° C, sense a weak nucleic activity. Nevertheless, the activity on the plasmid pBR322 to

37 ° C overnight shows a marked improvement in the purity of the enzyme. A second HiTrap Blue column was reused, taking fractions 36 to 44 and dialyzed as before. 25ml of the 30ml fraction were injected onto the column with a flow rate of 0.25ml / min (5ml being kept to try a MonoQ). After this second Blue HiTrap column, the activity on the plasmid pBR322 closed and incubated overnight with fractions of the column is zero. The one hour incubation at 72 ° C of fractions with the DNA of Lambda digested with HindIII shows a very clear degradation, highlighting the exonuclease activity 3 '-5' associated with our DNA polymerase. The peak of activity is between fractions 27 and 32. The purer DNA polymerase therefore left earlier on the NaCl gradient. On the most active fractions, a count was made giving an activity greater than 5U / μl for fractions 29, 30 and 31. FIG. 4 shows the result on SDS-PAGE gel. Following these three columns, the purity of the polymerase is significantly improved. Nevertheless, traces of DNA from E. coli attached to the polymerase and demonstrated by PCR remain. Two additional columns will therefore be implemented. 30ml of active fractions obtained previously are loaded onto a phosphocellulose column (this should fix the DNA and the polymerase differently). The activity of the polymerase is identified by its high 3 '-5' exonuclease activity at 72 ° C. on the DNA of Lambda digested with HindIII. The strongest activity is located between fractions 40 and 49 with a sharp peak at 46 and 47. On these fractions 40 to 49 the pBR322 DNA is intact after overnight at 37 ° C. Figure 5 shows us the results on gel with activities

RECTIFIED SHEET (RULE 91) ISAEP measured at 6U / μl for fraction 46, 4.5U / μl for 47 and 3U / μl for 48. However, traces of DNA from E. coli still remain.

Having already demonstrated that this polymerase does not bind to a MonoQ or Q resource, regardless of the pH used, we tried to recover it by exclusion, hoping for fixation of the contaminating DNA by the column.

First of all, a test was carried out at the end of the second HiTrap Blue with 5 ml dialysis. The results were disappointing, the exclusion fractions being not very active in incorporation.

A second attempt was made after phosphocellulose and after two dialyses of the most active fractions 45, 46 and 49. The fractions are first heated 40 min at 85 ° C. due to the detection of a contaminant degrading pBR322 at 72 ° C overnight. No flocculation is then visible and the extract is put on ice. A new test shows that the contamination seems reduced. After the dialysis and the passage through a column, the exonuclease activity is demonstrated on the exclusion fractions 3 to 7. The activity is assayed at 2U / μl and the enzyme is displayed in FIG. 10. Following this last purification step, we obtained positive PCR results and comparable to those obtained for Wind.

d) Characterization of the activities of the purified fractions. DNA polymerase activity of different

32 fractions were assayed by incorporation into P dATP according to the protocol described in Materials and Methods.

- Amplification of genes in vi tro. A 459 bp fragment was amplified from genomic DNA from Archaebacteria (Thermococcus sp. GE

RECTIFIED SHEET (RULE 91) ISA / EP 8) with specific primers. Different buffers were used:

- Ox R buffer: Tris HC1 pH 8.8: 300mM; KCl: 500mM; MgC12: 30mM; T een 20: 0.1% - 10 × H buffer: Tris HC1: pH 8.8: 300 mM; KCl:

500mM; MgC12: 15mM; Tween 20: 0.1%

- lOx T buffer: Tris HC1 pH 8.8: 600mM; KCl: 500mM; MgC12: 15mM; Tween 20: 0.1%

- lOx S buffer: Tris Hcl pH 8.8: 200mM; KCl: 250mM; MgC12: 20mM; Tween 20: 0.1%

Thirty cycles were carried out, each comprising a denaturation step at 94 ° C for 30 sec., A step of hybridization of the primers at 51 ° C for 1 min. then an elongation step at 72 ° C for 2 min. FIG. 8 presents the results obtained with a reaction volume of 50 μl for quantities of DNA polymerase from Thermococcus fumicolans of 2.7 units. As it stands, the best results with Tfu are obtained with buffer R. FIG. 9 represents the results of the amplification of a 1.6 kb fragment with the purified Tfu on heparin and then sephacryl-blue columns, and a 10 × reaction buffer having the following composition: Tris HC1 pH 8.8 : 200mM; KCl: 100 mM; (NH4) 2Sθ4: 100mM; MgSO4: 20mM; Triton X-100: 1%.

- Exonuclease activity.

The activity tests, according to the protocol detailed in Materials and Methods, do not reveal any 5 '-3' exonuclease activity in Tfu. This is in accordance with the structure of the enzyme deduced from the analysis of the polypeptide sequence which does not show a functional domain of exonuclease 5 '-3', contrary to what is observed for polished DNA of E. coli and the Taq.

The activity tests, according to the protocol detailed in the Materials and Methods chapter, a 3 '-5' exonuclease activity (proof-reading or error correction activity) appears in the Tfu, at a level substantially equal to that of the Wind as shown in table 4 below. Table 4 reports the measurement of the 3 '-5' exonuclease activity of Tfu as a function of the dNTP concentration, and in comparison with the Wind.

Table 4

These results are in accordance with the structure of the enzyme deduced from the analysis of the polypeptide sequence which reveals the presence of a 3 '-5' exonuclease domain in the N-terminal position, as well as the presence of the catalytic motifs characteristic of this domain. The Tfu is sensitive to a concentration of the order of 0.8 mM of dNTP, while the Wind manifests a sensitivity from 0.5 mM. This activity is known to improve in vi tro the fidelity of the polymerases used in PCR. In addition, this exonuclease activity is confirmed by a simpler test. The purified fractions, devoid of nuclease activity at 37 ° C., placed in the presence of DNA digested with HindIII and then exposed to 72 ° C. overnight, completely degrade this DNA, thus demonstrating the presence of activity 3 '-5' exonuclease of T fu and its thermostability.

- Thermostability.

RECTIFIED SHEET (RULE 91) ISA / EP 32

The thermostability measured according to the protocol described above (Materials and Methods), or better, the residual activity in incorporation at 72 ° C. after exposure of the enzyme to high temperatures for variable times, is given in table 5 below. below.

Table 5

This thermostability, which is lower than that of polymerases from more hyperthermophilic organisms such as Pyrococcus, is nonetheless very much greater than that of polymerases from Thermus and in particular all Taq. The thermostability of the purified recombinant enzyme, both for the polymerase domain and for the exonuclease, is in any case very much higher than all known requirements in PCR.

BIBLIOGRAPHICAL REFERENCES.

1. Birnboim, H. C, and J. Doly. 1979. A rapid extraction procedure for recombinant screening plasmid DNA. Nucleic Acids Research. 7: 1503.

2. Braithwaite, D. K., and J. Ito. 1993. Compilation, alignment, and phylogenetic relationships of DNA polymerases. Nucleic Acids Research. 21 (4): 787-802.

3. Charbonnier, F. 1993. Paris Sud. 4. Chong, S. R., Y. Shao, H. Paulus, J. Benner,

F. B. Perler, and M. Q. Xu. 1996. Protein splicing involving the Saccharomyces cerevisiae VMA intein - The steps in the splicing pathway, side reactions leading to protein cleavage, and establishment of an in vitro splicing System. Journal of Biological Chemistry.

271 (36): 22159-22168.

5. Davis, E. 0., S. G. Sedgwick, and J. Colston. 1991. Novel structure of Mycobacterium tuberculosis implies processing of the gene product. Journal of Bacteriology. 173 (18): 5653-5662.

6. Davis, E. 0., S. G. Sedgwick, and M. J. Colston. 1991. Novel structure of the recA locus of Mycobacterium tuberculosis implies processing of the gene product. Journal of Bacteriology. 173 (18): 5653-5662. 7. Fsihi, H., V. Vincent, and S. T. Cole. 1996.

Homing events in the gyrA gene of some mycobacteria. Proceedings of the National Academy of Sciences of the United States of America. 93 (8): 3410-3415.

8. Godfroy, A., J. R. Meunier, J. Guézennec, F. Lesongeur, G. Raguénès, A. Rimbault, and G. Barbier.

1996. Thermococcus fumicolans sp. Nov . , a new hyperthermophilic archaeon isolated from a deep-sea hydrothermal vent in the North Fiji Basin. International Journal of Systematic Bacteriology. 46 (4). 1113-1119. 9. Hodges, RA, FB Perler, CJ Noren, and WE Jack. 1992. Protein Splicing Removes Intervening Sequences in an Archaea DNA Polymerase. Nucleic Acids Res. 20 (23): 6153-6157.

10. Kong, H. M., R. B: Kucera, and W. E. Jack. 1993. Characterization of a DNA Polymerase from the Hyperthermophile Archaea Thermococcus litoralis - Vent DNA Polymerase, Steady State Kinetics, Thermal Stability, Processivity, Strand Displacement, and Exonuclease Activities. J Biol Chem. 268 (3): 1965-1975.

11. Saitou, N., and M. Nei. 1987. The neighbor-oining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4 (4): 406-425.

12. Sambrook, J., E. F. Fritsch, and T. Maniatis. 1989. Molecular Cloning. A Laboratory Manual, 2nd ed. Cold Spring Harbor Laboratory Press, Cold Spring Harbor.

13. Sanger, F., S. Nicklen, and A. R. Coulson. 1977. DNA sequencing with chain terminating inhibitors. Proc. Natl. Acad. Sci. USA. 74: 5467-5473.

14. Southern, E. M. 1975. Detection of specifies sequences among DNA fragments separated by gel electrophoresis. Journal of molecular biology. 98: 503.

15. Southworth, M. W., H. Kong, R. B. Kucera, J. Ware, H. W. Jannasch, and F. B. Perler. 1996. Cloning of thermostable DNA polymerases from hyperthermophilic marine Archaea with emphasis on

Thermococcus sp. 9 ° N-7 and mutations affecting 3 '-5' exonuclease activity. Proc. Ntl. Acad. Sci. USA. 93: 5281-5285.

16. Studier, F. W., A. H. Rosenberg, F. J. Dunn, and J. W. Dubendorff. 1990. Use of T7 RNA polymerase to direct expression of cloned genes. Methods Enzymol. 185: 60-89.

17. Sutherland, K. J., C. M. Henneke, P. Towner, and D. W. Hough. 1990. Citrate synthase from the thermophilic archaebacterium Thermoplasma acidophilum.

Cloning and sequencing of the gene. European Journal of Biochemistry. 194: 839-844. 18. Uemori, T., Y. Ishino, H. Toh, K. Asada, and I. Kato. 1993. Organization and nucleotide sequence of the DNA polymerase gene from the archaeon Pyrococcus furiosus. Nucleic Acids Research. 21 (2): 259-265.

LIST OF SEQUENCES

; i) GENERAL INFORMATION:

(i) DEPOSITOR: APPLIGENE - ONCOR

(ii) TITLE OF THE INVENTION: THERMOSTABLE DNA POLYMERASE OF ARCHAEBACTERIA OF THE THERMOCOCCUS fumicolans SPECIES (iii) NUMBER OF SEQUENCES: 4

(2) INFORMATION FOR SEQ ID NO: 1:

(i) CHARACTERISTICS OF THE SEQUENCE:

(A) LENGTH: 5039 base pairs

(B) TYPE: nucleotide

(C) NUMBER OF STRANDS: double

(D) CONFIGURATION: linear (ii) TYPE OF MOLECULE: DNA

(ix) CHARACTERISTICS

(A) NAME / KEY: DNA polymerase sequence of THERMOCOCCUS fumicolans Tfu

(B) LOCATION: from 457 to 5028 (ix) CHARACTERISTICS

(A) NAME / KEY: coding sequence of the I-Tfu-1 intein

(B) LOCATION: from 1675 to 2754 (ix) CHARACTERISTICS

(A) NAME / CLE: coding sequence of the I-Tfu-2 intein

(B) LOCATION: from 3157 to 4323 (ix) CHARACTERISTICS

(A) NAME / KEY: stop codon

(B) LOCATION: 5026 to 5028

(xi) DESCRIPTION OF THE SEQUENCES: SEQ ID NO: l:

AGCTTAAAGC GTCCGCCACT ACTTCCTGAA AGCTCACGCG GTAAAACAGC TCCATGCTCG 60

GCTCTTCGAT GGGAGGTTTA AAAAGGTGGT GGTGAGGTTT ATTAGGAAGA AGGCTCAACT 120

AGAGACGGTG GGAGTATGGA AGAGGTCGAC AGGCTCGTGT TCAACTTTCC CCTCTTCAAA 180

GATTACTGGG AAAAGGAGCG GTTCCTCAAG GTCGTTGGGC TTCTGGTGAG CCACCAGATA 240

ACGTTTGAGA AAGCTGCCGA GCTTCTGGAC ATGAGGCTCG AAGAGCTGGC GTTCCTCCTT 300

GACAAGCTCG GCGTTGAGTA CTCGCTTCTT GATGATGAAG AGGCCAGACT TGAGAGAGAA 360

GAGGCCAATA AGCTCATGGG GGAAATGAAG GGTGGAGCGT TTGTCTGATT CTTCTGAGCT 420

GTTATTGGTG TTTCACAGGC TGGGAGGTGG TGGATT ATG ATC CTC GAT ACA GAC 474

Met Ile Leu Asp Thr Asp 1 5

TAC ATC ACC GAA GAC GGA AGG CCC GTC ATC AGG GTG TTC AAG AAG GAG 522

Tyr Ile Thr Glu Asp Gly Arg Pro Val Ile Arg Val Phe Lys Lys Glu 10 15 20 AAC GGC GAG TTC AAA ATC GAG TAC GAC AGG GAC TTC GAG CCT TAC ATC 570 Asn Gly Glu Phe Lys Ile Glu Tyr Asp Arg Asp Phe Glu Pro Tyr Ile 25 30 35

TAC GCT CTC CTG AAG GAC GAT TCC GCG ATC GAG GAC GTC AAG AAG ATA 618 Tyr Ala Leu Leu Lys Asp Asp Ser Ala Ile Glu Asp Val Lys Lys Ile 40 45 50

ACT GCA AGC CGG CAC GGT ACC ACC GTC AGG GTC GTC AGG GCC GGG AAG 666 Thr Ala Ser Arg His Gly Thr Thr Val Arg Val Val Arg Ala Gly Lys 55 60 65 70

GTG AAG AAG AAG TTC CTC GGC AGG CCG ATA GAG GTC TGG AAG CTC TAC 714 Val Lys Lys Lys Phe Leu Gly Arg Pro Ile Glu Val Trp Lys Leu Tyr 75 80 85

TTC ACC CAT CCC CAG GAC GTT CCG GCA ATC AGG GAC AAA ATC AGG GAG 762 Phe Thr His Pro Gin Asp Val Pro Ala Ile Arg Asp Lys Ile Arg Glu 90 95 100

CAC CCT GCC GTG GTC GAC ATA TAT GAG TAC GAC ATA CCC TTT GCC AAG 810 His Pro Ala Val Val Asp Ile Tyr Glu Tyr Asp Ile Pro Phe Ala Lys 105 110 115

CGC TAC CTC ATC GAT AAG GGC CTC ATC CCG ATG GAG GGC GAC GAG GAG 858 Arg Tyr Leu Ile Asp Lys Gly Leu Ile Pro Met Glu Gly Asp Glu Glu 120 125 130

CTC AAG ATG CTC GCC TTC GAC ATC GAG ACG CTC TAC CAC GAG GGC GAG 906 Leu Lys Met Leu Ala Phe Asp Ile Glu Thr Leu Tyr His Glu Gly Glu 135 140 145 150

GAG TTC GCC GAG GGG CCT ATT CTT ATG ATA AGC TAT GCC GAC GAG GAA 954 Glu Phe Ala Glu Gly Pro Ile Leu Met Ile Ser Tyr Ala Asp Glu Glu 155 160 165

GGG GCG AGG GTA ATA ACC TGG AAG AAG ATC GAC CTT CCC TAC GTT GAC 1002 Gly Ala Arg Val Ile Thr Trp Lys Lys Ile Asp Leu Pro Tyr Val Asp 170 175 180

GTC GTT TCA ACG GAG AAG GAG ATG ATA AAG CGC TTC CTG AAG GTT GTC 1050 Val Val Ser Thr Glu Lys Glu Met Ile Lys Arg Phe Leu Lys Val Val 185 190 195

AAG GAG AAG GAC CCC GAT GTC CTC ATA ACC TAC AAC GGC GAC AAC TTC 1098 Lys Glu Lys Asp Pro Asp Val Leu Ile Thr Tyr Asn Gly Asp Asn Phe 200 205 210

GAC TTC GCT TAC CTC AAG AAG CGC TCC GAG AAG CTC GGC GTT AAG TTC 1146 Asp Phe Ala Tyr Leu Lys Lys Arg Ser Glu Lys Leu Gly Val Lys Phe 215 220 225 230

ATC CTC GGA AGG GAC GGC AGC GAG CCG AAG ATA CAG AGG ATG GGC GAC 1194 Ile Leu Gly Arg Asp Gly Ser Glu Pro Lys Ile Gin Arg Met Gly Asp 235 240 245

CGC TTC GCC GTC GAG GTG AAG GGA AGA ATA CAC TTC GAC CTC TAC CCC 1242 Arg Phe Ala Val Glu Val Lys Gly Arg Ile His Phe Asp Leu Tyr Pro 250 255 260 GTC ATA AGA CAC ACC ATC AAC CTG CCC ACC TAC ACG CTG GAG GCC GTC 1290 Val Ile Arg His Thr Ile Asn Leu Pro Thr Tyr Thr Leu Glu Ala Val 265 270 275

TAC GAG GCG ATT TTT GGG CAG CCA AAG GAG AAG GTC TAC GCT GAG GAG 1338 Tyr Glu Ala Ile Phe Gly Gin Pro Lys Glu Lys Val Tyr Ala Glu Glu 280 285 290

ATA GCG CAG GCC TGG GAA ACG GGC GAG GGG CTT GAG CGC GTC GCG CGC 1386 Ile Ala Gin Ala Trp Glu Thr Gly Glu Gly Leu Glu Arg Val Ala Arg 295 300 305 310

TAC TCG ATG GAG GAC GCC AAG GTA ACC TAC GAG CTG GGA AGG GAG TTC 1434 Tyr Ser Met Glu Asp Ala Lys Val Thr Tyr Glu Leu Gly Arg Glu Phe 315 320 325

TTC CCG ATG GAG GCC CAA CTT TCT CGG CTG GTC GGT CAG AGC TTC TGG 1482 Phe Pro Met Glu Ala Gin Leu Ser Arg Leu Val Gly Gin Ser Phe Trp 330 335 340

GAC GTC TCG CGC TCC AGC ACC GGC AAC CTC GTC GAG TGG TAC CTC CTC 1530 Asp Val Ser Arg Ser Ser Thr Gly Asn Leu Val Glu Trp Tyr Leu Leu 345 350 355

AGG AAG GCC TAC GAG AGG AAC GAG CTG GCA CCG AAC AAG CCC TCC GGC 1578 Arg Lys Ala Tyr Glu Arg Asn Glu Leu Ala Pro Asn Lys Pro Ser Gly 360 365 370

AGA GAA CTT GAG AGG CGC CGC GGG GGC TAC GCC GGC GGC TAC GTC AAG 1626 Arg Glu Leu Glu Arg Arg Arg Gly Gly Tyr Ala Gly Gly Tyr Val Lys 375 400 405 410

GAG CCG GAG AGG GGA CTT TGG GAG AAC ATA GCT TAT TTA GAT TTT AGG 1674 Glu Pro Glu Arg Gly Leu Trp Glu Asn Ile Ala Tyr Leu Asp Phe Arg 415 420 425

TGT CAT CCT GCC GAC ACT AAA GTC ATT GTC AAA GGG AAG GGC GTT GTA 1722 Cys His Pro Ala Asp Thr Lys Val Ile Val Lys Gly Lys Gly Val Val 430 435 440

AAC ATC AGC GAA GTT AGG GAG GGG GAC TAC GTT CTC GGC ATA GAC GGC 1770 Asn Ile Ser Glu Val Arg Glu Gly Asp Tyr Val Leu Gly Ile Asp Gly 445 450 455

TGG CAG AAG GTT CAA AGG GTC TGG GAG TAT GAT TAC GAG GGA GAA CTC 1818 Trp Gin Lys Val Gin Arg Val Trp Glu Tyr Asp Tyr Glu Gly Glu Leu 460 465 470

GTA AAT ATA AAC GGC CTT AAG TGC ACA CCG AAC CAT AAG CTT CCG GTC 1866 Val Asn Ile Asn Gly Leu Lys Cys Thr Pro Asn His Lys Leu Pro Val 475 480 485 490

GTT AGG AGG ACT GAG AGG CAG ACT GCG ATA AGG GAC AGC CTT GCA AAG 1914 Val Arg Arg Thr Glu Arg Gin Thr Ala Ile Arg Asp Ser Leu Ala Lys 495 500 505

TCT TTT CTC ACG AAA AAA GTT AAA GGT AAG CTG ATA ACC ACG CCT CTC 1962 Ser Phe Leu Thr Lys Lys Val Lys Gly Lys Leu Ile Thr Thr Pro Leu 510 515 520 TTT GAA AAA ATC GGG AAG ATC GAG CGA GAG GAC GTG CCA GAA GAG GAG 2010 Phe Glu Lys Ile Gly Lys Ile Glu Arg Glu Asp Val Pro Glu Glu Glu 525 530 535

ATA CTC AAA GGA GAA CTC GCC GGA ATA ATC CTG GCT GAG GGC ACA CTC 2058 Ile Leu Lys Gly Glu Leu Ala Gly Ile Leu Ala Glu Gly Thr Leu 540 545 550

CTG AGA AAG GAT GTC GAG TAC TTT GAC TCT TCC AGA GGG AAG AAG AGA 2106 Leu Arg Lys Asp Val Glu Tyr Phe Asp Ser Ser Arg Gly Lys Lys Arg 555 560 565 570

GTA TCA CAC CAG TAC AGG GTT GAA ATA ACC GTT GGG GCG CAG GAG GAG 2154 Val Ser His Gin Tyr Arg Val Glu Ile Thr Val Gly Ala Gin Glu Glu 575 580 585

GAC TTC CAG AGG AGG ATC GTT TAC ATT TTC GAA CGC CTC TTT GGG GTA 2202 Asp Phe X Arg Arg Ile Val Tyr Ile Phe Glu Arg Leu Phe Gly Val 590 595 600

ACT CCC AGT GTT TAC CGG AAA AAG AAC ACA AAC GCA ATA ACG TTC AAA 2250 Thr Pro Ser Val Tyr Arg Lys Lys Asn Thr Asn Ala Ile Thr Phe Lys 605 610 615

GTT GCC AAA AAA GAG GTT TAT CTT AGG GTT AGG GAA ATT ATG GAT GGC 2298 Val Ala Lys Lys Glu Val Tyr Leu Arg Val Arg Glu Ile Met Asp Gly 620 625 630

ATT GAG AAC CTC CAC GCT CCT TCT GTG TTA AGG GGC TTT TTT GAA GGA 2346 Ile Glu Asn Leu His Ala Pro Ser Val Leu Arg Gly Phe Phe Glu Gly 635 640 645 650

GAC GGA AGC GTC AAC AAG GTC CGG AAG ACA GTG GTA GTG AAT CAG GGC 2394 Asp Gly Ser Val Asn Lys Val Arg Lys Thr Val Val Val Asn Gin Gly 655 660 665

ACC AAT AAT GAA TGG AAA ATT GAA GTG GTG TCA AAA CTC CTC AAC AAG 2442 Thr Asn Asn Glu Trp Lys Ile Glu Val Val Ser Lys Leu Leu Asn Lys 670 675 680

TTG GGG ATT CCG CAT AGA AGG TAC ACA TAC GAT TAC ACC GAA AGA GAA 2490 Leu Gly Ile Pro His Arg Arg Tyr Thr Tyr Asp Tyr Thr Glu Arg Glu 685 690 695

AAA ACC ATG ACA ACG CAT ATA CTT GAG ATA GCC GGC AGG GAT GGG TTA 2538 Lys Thr Met Thr Thr His Ile Leu Glu Ile Ala Gly Arg Asp Gly Leu 700 705 710

ATC CTT TTC CAG ACC ATT GTG GGA TTC ATA AGC ACT GAG AAG AAC ATG 2586 Ile Leu Phe Gin Thr Ile Val Gly Phe Ile Ser Thr Glu Lys Asn Met 715 720 725 730

GCG CTG GAG GAG GCA ATC AGG AAC AGG GAA GTG AAC CGC CTA GAA AAC 2634 Ala Leu Glu Glu Ala Ile Arg Asn Arg Glu Val Asn Arg Leu Glu Asn 735 740 745

AAT GCC TTC TAT ACC CTA GCC GAC TTT ACG GCG AAG ACA GAG TAC TAC 2682 Asn Ala Phe Tyr Thr Leu Ala Asp Phe Thr Ala Lys Thr Glu Tyr Tyr 750 755 780 AAG GGC AAA GTT TAC GAC TTA ACC CTT GAG GGA ACG CCC TAT TAC TTC 2730 Lys Gly Lys Val Tyr Asp Leu Thr Leu Glu Gly Thr Pro Tyr Tyr Phe 785 790 795

GCC AAT GGC ATA CTG ACC CAC AAT TCG CTA TAT CCT TCG ATT ATA ATT 2778 Ala Asn Gly Ile Leu Thr His Asn Ser Leu Tyr Pro Ser Ile Ile 800 805 810

TCC CAC AAC GTC TCC CCC GAT ACG CTC AAC CGC GAG GGC TGC GGG GAG 2826 Ser His Asn Val Ser Pro Asp Thr Leu Asn Arg Glu Gly Cys Gly Glu 815 820 825 830

TAC GAC GAG GCT CCG CAG GTA GGG CAT CGC TTT TGT AAG GAC TTC CCC 2874 Tyr Asp Glu Ala Pro Gin Val Gly His Arg Phe Cys Lys Asp Phe Pro 835 840 845

GGC TTC ATC CCC AGC CTC CTC GGT GAC CTG CTC GAC GAG AGG CAG AAG 2922 Gly Phe Ile Pro Ser Leu Leu Gly Asp Leu Leu Asp Glu Arg Gin Lys 855 860 865

GTA AAG AAG CAC ATG AAG GCC ACG GTG GAC CCG ATA GAG AAG AAG CTC 2970 Val Lys Lys His Met Lys Ala Thr Val Asp Pro Ile Glu Lys Lys Leu 870 875 880

CTC GAT TAC AGG CAG CGC GCA ATT AAA ATC CTC GCC AAC AGC TTC TAC 3018 Leu Asp Tyr Arg Gin Arg Ala Ile Lys Ile Leu Ala Asn Ser Phe Tyr 885 890 895

GGC TAC TAT GGC TAC GCA AAG GCC CGC TGG TAC TGC AAG GAG TGC GCC 3066 Gly Tyr Tyr Gly Tyr Ala Lys Ala Arg Trp Tyr Cys Lys Glu Cys Ala 900 905 910 915

GAG AGC GTT ACC GCC TGG GGC AGG CAG TAC ATT GAG ACC ACC ATG AGG 3114 Glu Ser Val Thr Ala Trp Gly Arg Gin Tyr Ile Glu Thr Thr Met Arg 920 925 930

GAA ATA GAG GAA AAA TTT GGC TTT AAA GTG CTG TAC GCG GAT AGT GTT 3162 Glu Ile Glu Glu Lys Phe Gly Phe Lys Val Leu Tyr Ala Asp Ser Val 935 940 945

ACA GGG GAC ACA GAG GTA ACC ATC AGA AGA AAC GGC AGG ATT GAG TTC 3210 Thr Gly Asp Thr Glu Val Thr Ile Arg Arg Asn Gly Arg Ile Glu Phe 950 955 960

GTT CCA ATC GAG AAA CTC TTT GAG CGC GTT GAT CAC CGT GTT GGT GAG 3258 Val Pro Ile Glu Lys Leu Phe Glu Arg Val Asp His Arg Val Gly Glu 965 970 975

AAG GAG TAC TGC GTT CTT GGA GGG GTT GAG GCA CTG ACA CTC GAC AAC 3306 Lys Glu Tyr Cys Val Leu Gly Gly Val Glu Ala Leu Thr Leu Asp Asn 980 985 990 995

AGG GGC AGG CTC GTG TGG AAG AAG GTT CCG TAC GTC ATG AGA CAT AAA 3354 Arg Gly Arg Leu Val Trp Lys Lys Val Pro Tyr Val Met Arg His Lys 1000 1005 1010

ACG GAC AAA AGA ATC TAT AGG GTA TGG TTC ACC AAC TCT TGG TAC CTT 3402 Thr Asp Lys Arg Ile Tyr Arg Val Trp Phe Thr Asn Ser Trp Tyr Leu 1015 1020 1025 GAC GTG ACG GAG GAT CAC TCG CTA ATA GGC TAC CTG AAC ACA AGC AAA 3450 Asp Val Thr Glu Asp His Ser Leu Ile Gly Tyr Leu Asn Thr Ser Lys 1030 1035 1040

GTC AAA CCC GGA AAG CCC TTG AAA GAG CGT CTC GTC GAG GTC AAG CCA 3498 Val Lys Pro Gly Lys Pro Leu Lys Glu Arg Leu Val Glu Val Lys Pro 1045 1050 1055

GAA GAA TTG GGG GGT AAG GTC AAG TCT CTC ATT ACG CCC AAT CGG CCA 3546 Glu Glu Leu Gly Gly Lys Val Lys Ser Leu Ile Thr Pro Asn Arg Pro 1060 1065 1070 1075

ATT GCC CGT ACC ATC AAG GCC AAC CCC ATT GCC GTC AAG CTC TGG GAG 3594 Ile Ala Arg Thr Ile Lys Ala Asn Pro Ile Ala Val Lys Leu Trp Glu 1080 1085 1090

TTA ATT GGC CTG CTG GTG GGA GAT GGC AAC TGG GGT GGA CAA TCG AAC 3642 Leu Ile Gly Leu Leu Val Gly Asp Gly Asn Trp Gly Gly Gin Ser Asn 1095 1100 1105

TGG GCC AAA TAC TAC GTT GGC CTC TCC TGT GGG CTG GAT AAA GCC GAA 3690 Trp Ala Lys Tyr Tyr Val Gly Leu Ser Cys Gly Leu Asp Lys Ala Glu 1110 1115 1120

ATA GAG AGA AAA GTC CTG AAC CCT TTA AGA GAG GCA AGC GTC ATC TCC 3738 Ile Glu Arg Lys Val Leu Asn Pro Leu Arg Glu Ala Ser Val Ile Ser 1125 1130 1135

AAC TAC TAC GAC AAG AGC AAG AAG GGC GAC GTT TCC ATA CTC TCC AAG 3786 Asn Tyr Tyr Asp Lys Ser Lys Lys Gly Asp Val Ser Ile Leu Ser Lys 1140 1145 1150 1155

TGG CTC GCC GGA TTC ATG GTC AAA TAC TTC AAA GAT GAA AAT GGG AAC 3834 Trp Leu Ala Gly Phe Met Val Lys Tyr Phe Lys Asp Glu Asn Gly Asn 1160 1165 1170

AAG GCC ATT CCC AGC TTC ATG TTC AAC CTT CCA AGG GAA TAC ATA GAG 3882 Lys Ala Ile Pro Ser Phe Met Phe Asn Leu Pro Arg Glu Tyr Ile Glu 1175 1180 1185

GCC TTT CTA CGG GGG CTG TTT TCA GCG GAC GGA ACG GTA AGC TTG CGT 3930 Ala Phe Leu Arg Gly Leu Phe Ser Ala Asp Gly Thr Val Ser Leu Arg 1190 1195 1200

AGA GGA ATC CCA GAA ATT AGA CTG ACA AGC GTT AAC AGA GAG CTT AGT 3978 Arg Gly Ile Pro Glu Ile Arg Leu Thr Ser Val Asn Arg Glu Leu Ser 1205 1210 1215

GAT GCC GTG AGA AAG TTG CTG TGG CTG GTT GGG GTC TCC AAC TCA CTA 4026 Asp Ala Val Arg Lys Leu Leu Trp Leu Val Gly Val Ser Asn Ser Leu 1220 1225 1230 1235

TTC ACC GAA ACC AAG CCA AAC CGG TAC CTG GAG AAA GAA AGT GGA ACG 4074 Phe Thr Glu Thr Lys Pro Asn Arg Tyr Leu Glu Lys Glu Ser Gly Thr 1240 1245 1250

CAT TCG ATT CAC GTG AGG ATA AAG AAC AAG CAT CGC TTT GCC GAT AGA 4122 His Ser Ile His Val Arg Ile Lys Asn Lys His Arg Phe Ala Asp Arg 1255 1260 1265 ATA GGC TTT CTC ATA GAC AGA AAA TCC ACC AAA CTC TCC GAG AAC CTG 4170 Ile Gly Phe Leu Ile Asp Arg Lys Ser Thr Lys Leu Ser Glu Asn Leu 1270 1275 1280

GGG GGA CAT ACA AAC AAG AAG AGG GCT TAC AAA TAT GAT TTT GAC TTG 4218 Gly Gly His Thr Asn Lys Lys Arg Ala Tyr Lys Tyr Asp Phe Asp Leu 1285 1290 1295

GTA TAC CCC AGA AAA ATC GAA GAG ATA ACC TAC GAC GGC TAC GTC TAT 4266 Val Tyr Pro Arg Lys Ile Glu Glu Ile Thr Tyr Asp Gly Tyr Val Tyr 1300 1305 1310 1315

GAC ATC GAG GTT GAG GGA ACC CAC AGG TTC TTC GCC AAC GGA ATA CTC 4314 Asp Ile Glu Val Glu Gly Thr His Arg Phe Phe Ala Asn Gly Ile Leu 1320 1325 1330

GTT CAC AAC ACA GAC GGC TTT TTC GCA ACA ATC CCC GGA GCG GAC GCC 4362 Val His Asn Thr Asp Gly Phe Phe Ala Thr Ile Pro Gly Ala Asp Ala 1335 1340 1345

GAG ACG GTC AAA AAG AAG GCC AGG GAG TTC CTT AAC TAC ATT AAC CCC 4410 Glu Thr Val Lys Lys Lys Ala Arg Glu Phe Leu Asn Tyr Ile Asn Pro 1350 1355 1360

AAG CTG CCC GGT CTC CTC GAA CTC GAG TAC GAG GGC TTC TAC AGG CGC 4458 Lys Leu Pro Gly Leu Leu Glu Leu Glu Tyr Glu Gly Phe Tyr Arg Arg 1365 1370 1375

GGT TTC TTC GTA ACC AAG AAG AAG TAC GCG GTG ATA GAC GAG GAG GGC 4506 Gly Phe Phe Val Thr Lys Lys Lys Tyr Ala Val Ile Asp Glu Glu Gly 1380 1385 1390 1395

AAG ATA ACG ACG CGC GGG CTT GAG ATC GTC CGG CGC GAC TGG AGT GAG 4554 Lys Ile Thr Thr Arg Gly Leu Glu Ile Val Arg Arg Asp Trp Ser Glu 1400 1405 1410

GTG GCT AAG GAG ACG CAG GCG AGG GTC TTG GAG GCC ATA CTG CGG CAC 4602 Val Ala Lys Glu Thr Gin Ala Arg Val Leu Glu Ala Ile Leu Arg His 1415 1420 1425

GGT GAC GTC GAG GAG GCC GTG AGG ATT GTC AAG GAA GTG ACG GAA AAG 4650 Gly Asp Val Glu Glu Ala Val Arg Ile Val Lys Glu Val Thr Glu Lys 1430 1435 1440

CTG AGC AAG TAC GAG GTT CCG CCA GAG AAG CTC GTC ATC CAC GAG CAG 4698 Leu Ser Lys Tyr Glu Val Pro Pro Glu Lys Leu Val Ile His Glu Gin 1445 1450 1455

ATT ACC AGG GAG CTG AAG GAC TAC AAG GCC ACC GGC CCG CAC GTG GCC 4746 Ile Thr Arg Glu Leu Lys Asp Tyr Lys Ala Thr Gly Pro His Val Ala 1460 1465 1470 1475

ATA GCG AAG CGC CTC GCC GCG AGG GGG ATT AAG GTT CGC CCT GGG ACA 4794 Ile Ala Lys Arg Leu Ala Ala Arg Gly Ile Lys Val Arg Pro Gly Thr 1480 1485 1490

GTC ATC AGC TAC ATC GTC CTG AAA GGT TCC GGC AGG ATA GGG GAC AGG 4842 Val Ile Ser Tyr Ile Val Leu Lys Gly Ser Gly Arg Ile Gly Asp Arg 1495 1500 1505 ACG ATA CCC TTC GAC GAG TTC GAC CCC ACG AAG CAC AGG TAC GAT GCG 4890 Thr Ile Pro Phe Asp Glu Phe Asp Pro Thr Lys His Arg Tyr Asp Ala 1510 1515 1520

GAG TAC TAC ATC GAG AAC CAG GTT CTG CCG GCG GTG GAG AGA ATC CTC 4938 Glu Tyr Tyr Ile Glu Asn Gin Val Leu Pro Ala Val Glu Arg Ile Leu 1525 1530 1535

AAG GCC TTC GGC TAC AAG AAG GAG GAT TTG CGC TAC CAG AAG ACG CGG 4986 Lys Ala Phe Gly Tyr Lys Lys Glu Asp Leu Arg Tyr Gin Lys Thr Arg 1540 1545 1550 1555

CAG GTT GGG CTG GGG GCG TGG CTC AAA ATG GGG AAG AAA TGA 5028

Gin Val Gly Leu Gly Ala Trp Leu Lys Met Gly Lys Lys

1560 1565 1568

AGGCCAAGCT T 5039

(2) INFORMATION FOR SEQ ID NO: 2:

(i) CHARACTERISTICS OF THE SEQUENCE:

(A) LENGTH: 774 amino acids (ii) TYPE OF MOLECULE: protein (ix) CHARACTERISTICS

(A) NAME / KEY: DNA polymerase of THERMOCOCCUS fumicolans Tfu (xi) DESCRIPTION OF THE SEQUENCES: SEQ ID NO: 2:

Met Ile Leu Asp Thr Asp Tyr Ile Thr Glu Asp Gly Arg Pro Val Ile 1 5 10 15

Arg Val Phe Lys Lys Glu Asn Gly Glu Phe Lys Ile Glu Tyr Asp Arg 20 25 30

Asp Phe Glu Pro Tyr Ile Tyr Ala Leu Leu Lys Asp Asp Ser Ala Ile 35 40 45

Glu Asp Val Lys Lys Ile Thr Ala Ser Arg His Gly Thr Thr Val Arg 50 55 60

Val Val Arg Ala Gly Lys Val Lys Lys Lys Phe Leu Gly Arg Pro Ile 65 70 75 80

Glu Val Trp Lys Leu Tyr Phe Thr His Pro Gin Asp Val Pro Ala Ile 85 90 95

Arg Asp Lys Ile Arg Glu His Pro Ala Val Val Asp Ile Tyr Glu Tyr 100 105 110

Asp Ile Pro Phe Ala Lys Arg Tyr Leu Ile Asp Lys Gly Leu Ile Pro 115 120 125

Met Glu Gly Asp Glu Glu Leu Lys Met Leu Ala Phe Asp Ile Glu Thr 130 135 140

Leu Tyr His Glu Gly Glu Glu Phe Ala Glu Gly Pro Ile Leu Met Ile 145 150 155 160

Ser Tyr Ala Asp Glu Glu Gly Ala Arg Val Ile Thr Trp Lys Lys Ile 165 170 175 Asp Leu Pro Tyr Val Asp Val Val Ser Thr Glu Lys Glu Met Ile Lys 180 185 190

Arg Phe Leu Lys Val Val Lys Glu Lys Asp Pro Asp Val Leu Ile Thr 195 200 205

Tyr Asn Gly Asp Asn Phe Asp Phe Ala Tyr Leu Lys Lys Arg Ser Glu 210 215 220

Lys Leu Gly Val Lys Phe Ile Leu Gly Arg Asp Gly Ser Glu Pro Lys 225 230 235 240

Ile Gin Arg Met Gly Asp Arg Phe Ala Val Glu Val Lys Gly Arg Ile 245 250 255

His Phe Asp Leu Tyr Pro Val Ile Arg His Thr Ile Asn Leu Pro Thr 260 265 270

Tyr Thr Leu Glu Ala Val Tyr Glu Ala Ile Phe Gly Gin Pro Lys Glu 275 280 285

Lys Val Tyr Ala Glu Glu Ile Ala Gin Ala Trp Glu Thr Gly Glu Gly 290 295 300

Leu Glu Arg Val Ala Arg Tyr Ser Met Glu Asp Ala Lys Val Thr Tyr 305 310 315 320

Glu Leu Gly Arg Glu Phe Phe Pro Met Glu Ala Gin Leu Ser Arg Leu 325 330 335

Val Gly Gin Ser Phe Trp Asp Val Ser Arg Ser Ser Thr Gly Asn Leu 340 345 350

Val Glu Trp Tyr Leu Leu Arg Lys Ala Tyr Glu Arg Asn Glu Leu Ala 355 360 365

Pro Asn Lys Pro Ser Gly Arg Glu Leu Glu Arg Arg Arg Gly Gly Tyr 370 375 380

Ala Gly Gly Tyr Val Lys Glu Pro Glu Arg Gly Leu Trp Glu Asn Ile 385 390 395 400

Ala Tyr Leu Asp Phe Arg Ser Leu Tyr Pro Ser Ile Ile Ile Ser His 405 405 410

Asn Val Ser Pro Asp Thr Leu Asn Arg Glu Gly Cys Gly Glu Tyr Asp 415 420 425

Glu Ala Pro GÏn Val Gly His Arg Phe Cys Lys Asp Phe Pro Gly Phe 430 435 440

Ile Pro Ser Leu Leu Gly Asp Leu Leu Asp Glu Arg Gin Lys Val Lys 445 450 455

Lys His Met Lys Ala Thr Val Asp Pro Ile Glu Lys Lys Leu Leu Asp 460 465 470 475

Tyr Arg Gin Arg Ala Ile Lys Ile Leu Ala Asn Ser Phe Tyr Gly Tyr 480 485 490 Tyr Gly Tyr Ala Lys Ala Arg Trp Tyr Cys Lys Glu Cys Ala Glu Ser 495 500 505

Val Thr Ala Trp Gly Arg Gin Tyr Ile Glu Thr Thr Met Arg Glu Ile 510 515 520

Glu Glu Lys Phe Gly Phe Lys Val Leu Tyr Ala Asp Thr Asp Gly Phe 525 530 535

Phe Ala Thr Ile Pro Gly Ala Asp Ala Glu Thr Val Lys Lys Lys Ala 540 545 555 560

Arg Glu Phe Leu Asn Tyr Ile Asn Pro Lys Leu Pro Gly Leu Leu Glu 565 570 575

Leu Glu Tyr Glu Gly Phe Tyr Arg Arg Gly Phe Phe Val Thr Lys Lys 580 585 590

Lys Tyr Ala Val Ile Asp Glu Glu Gly Lys Ile Thr Thr Arg Gly Leu 595 600 605

Glu Ile Val Arg Arg Asp Trp Ser Glu Val Ala Lys Glu Thr Gin Ala 610 615 620

Arg Val Leu Glu Ala Ile Leu Arg His Gly Asp Val Glu Glu Ala Val 625 630 635 640

Arg Ile Val Lys Glu Val Thr Glu Lys Leu Ser Lys Tyr Glu Val Pro 645 650 655

Pro Glu Lys Leu Val Ile His Glu Gin Ile Thr Arg Glu Leu Lys Asp 660 665 670

Tyr Lys Ala Thr Gly Pro His Val Ala Ile Ala Lys Arg Leu Ala Ala 675 680 685

Arg Gly Ile Lys Val Arg Pro Gly Thr Val Ile Ser Tyr Ile Val Leu 690 695 700

Lys Gly Ser Gly Arg Ile Gly Asp Arg Thr Ile Pro Phe Asp Glu Phe 705 710 715 720

Asp Pro Thr Lys His Arg Tyr Asp Ala Glu Tyr Tyr Ile Glu Asn Gin 725 730 735

Val Leu Pro Ala Val Glu Arg Ile Leu Lys Ala Phe Gly Tyr Lys Lys 740 745 750

Glu Asp Leu Arg Tyr Gin Lys Thr Arg Gin Val Gly Leu Gly Ala Trp 755 760 765

Leu Lys Met Gly Lys Lys 770 774 (2) INFORMATION FOR SEQ ID NO: 3:

(i) CHARACTERISTICS OF THE SEQUENCE:

(A) LENGTH: 360 amino acids (ii) TYPE OF MOLECULE: protein (ix) CHARACTERISTICS

(A) NAME / KEY: intein I-Tfu-1 (xi) DESCRIPTION OF THE SEQUENCES: SEQ ID NO: 3:

Cys His Pro Ala Asp Thr Lys Val Ile Val Lys Gly Lys Gly Val Val 1 5 10 15

Asn Ile Ser Glu Val Arg Glu Gly Asp Tyr Val Leu Gly Ile Asp Gly 20 25 30

Trp Gin Lys Val Gin Arg Val Trp Glu Tyr Asp Tyr Glu Gly Glu Leu 35 40 45

Val Asn Ile Asn Gly Leu Lys Cys Thr Pro Asn His Lys Leu Pro Val 50 55 60

Val Arg Arg Thr Glu Arg Gin Thr Ala Ile Arg Asp Ser Leu Ala Lys 65 70 75 80

Ser Phe Leu Thr Lys Lys Val Lys Gly Lys Leu Ile Thr Thr Pro Leu 85 90 95

Phe Glu Lys Ile Gly Lys Ile Glu Arg Glu Asp Val Pro Glu Glu Glu 100 105 110

Ile Leu Lys Gly Glu Leu Ala Gly Ile Leu Ala Glu Gly Thr Leu 115 120 125

Leu Arg Lys Asp Val Glu Tyr Phe Asp Ser Ser Arg Gly Lys Lys Arg 130 135 140

Val Ser His Gin Tyr Arg Val Glu Ile Thr Val Gly Ala Gin Glu Glu 145 150 155 160

Asp Phe X Arg Arg Ile Val Tyr Ile Phe Glu Arg Leu Phe Gly Val 165 170 175

Thr Pro Ser Val Tyr Arg Lys Lys Asn Thr Asn Ala Ile Thr Phe Lys 180 185 190

Val Ala Lys Lys Glu Val Tyr Leu Arg Val Arg Glu Ile Met Asp Gly 195 200 205

Ile Glu Asn Leu His Ala Pro Ser Val Leu Arg Gly Phe Phe Glu Gly 210 215 220

Asp Gly Ser Val Asn Lys Val Arg Lys Thr Val Val Val Asn Gin Gly 225 230 235 240

Thr Asn Asn Glu Trp Lys Ile Glu Val Val Ser Lys Leu Leu Asn Lys 245 250 255

Leu Gly Ile Pro His Arg Arg Tyr Thr Tyr Asp Tyr Thr Glu Arg Glu 260 265 270 Lys Thr Met Thr Thr His I Leu Glu I Ala Gly Arg Asp Gly Leu 275 280 285

Ile Leu Phe Gin Thr Ile Val Gly Phe Ile Ser Thr Glu Lys Asn Met 290 295 300

Ala Leu Glu Glu Ala Ile Arg Asn Arg Glu Val Asn Arg Leu Glu Asn 305 310 315 320

Asn Ala Phe Tyr Thr Leu Ala Asp Phe Thr Ala Lys Thr Glu Tyr Tyr 325 330 335

Lys Gly Lys Val Tyr Asp Leu Thr Leu Glu Gly Thr Pro Tyr Tyr Phe 340 345 350

Ala Asn Gly Ile Leu Thr His Asn 355 360

(2) INFORMATION FOR SEQ ID NO: 4:

(i) CHARACTERISTICS OF THE SEQUENCE:

(A) LENGTH: 389 amino acids (ii) TYPE OF MOLECULE: protein (ix) CHARACTERISTICS

(A) NAME / KEY: intein I-Tfu-2 (xi) DESCRIPTION OF THE SEQUENCES: SEQ ID NO: 4:

Ser Val Thr Gly Asp Thr Glu Val Thr I le Arg Arg Asn Gly Arg I le 1 5 10 15

Glu Phe Val Pro Ile Glu Lys Leu Phe Glu Arg Val Asp His Arg Val 20 25 30

Gly Glu Lys Glu Tyr Cys Val Leu Gly Gly Val Glu Ala Leu Thr Leu 35 40 45

Asp Asn Arg Gly Arg Leu Val Trp Lys Lys Val Pro Tyr Val Met Arg 50 55 60

His Lys Thr Asp Lys Arg I le Tyr Arg Val Trp Phe Thr Asn Ser Trp 65 70 75 80

Tyr Leu Asp Val Thr Glu Asp His Ser Leu Ile Gly Tyr Leu Asn Thr 85 90 95

Ser Lys Val Lys Pro Gly Lys Pro Leu Lys Glu Arg Leu Val Glu Val 100 105 110

Lys Pro Glu Glu Leu Gly Gly Lys Val Lys Ser Leu Ile Thr Pro Asn 115 120 125

Arg Pro Ile Ala Arg Thr Ile Lys Ala Asn Pro Ile Ala Val Lys Leu 130 135 140

Trp Glu Leu Ile Gly Leu Leu Val Gly Asp Gly Asn Trp Gly Gly Gin 145 150 155 160

Ser Asn Trp Ala Lys Tyr Tyr Val Gly Leu Ser Cys Gly Leu Asp Lys 165 170 175 Ala Glu Ile Glu Arg Lys Val Leu Asn Pro Leu Arg Glu Ala Ser Val 180 185 190

Ile Ser Asn Tyr Tyr Asp Lys Ser Lys Lys Gly Asp Val Ser Ile Leu 195 200 205

Ser Lys Trp Leu Ala Gly Phe Met Val Lys Tyr Phe Lys Asp Glu Asn 210 215 220

Gly Asn Lys Ala Ile Pro Ser Phe Met Phe Asn Leu Pro Arg Glu Tyr 225 230 235 240

Ile Glu Ala Phe Leu Arg Gly Leu Phe Ser Ala Asp Gly Thr Val Ser 245 250 255

Leu Arg Arg Gly Ile Pro Glu Ile Arg Leu Thr Ser Val Asn Arg Glu 260 265 270

Leu Ser Asp Ala Val Arg Lys Leu Leu Trp Leu Val Gly Val Ser Asn 275 280 285

Ser Leu Phe Thr Glu Thr Lys Pro Asn Arg Tyr Leu Glu Lys Glu Ser 290 295 300

Gly Thr His Ser Ile His Val Arg Ile Lys Asn Lys His Arg Phe Ala 305 310 315 320

Asp Arg Ile Gly Phe Leu Ile Asp Arg Lys Ser Thr Lys Leu Ser Glu 325 330 335

Asn Leu Gly Gly His Thr Asn Lys Lys Arg Ala Tyr Lys Tyr Asp Phe 340 345 350

Asp Leu Val Tyr Pro Arg Lys Ile Glu Glu Ile Thr Tyr Asp Gly Tyr 355 360 365

Val Tyr Asp Ile Glu Val Glu Gly Thr His Arg Phe Phe Ala Asn Gly 370 375 380

Ile Leu Val His Asn 385 389

Claims

1) purified thermostable DNA polymerase of archabacteria of the species Thermococcus fumicolans having a molecular weight of the order of 89,000 daltons and its enzymatically equivalent derivatives.

2) DNA polymerase according to claim 1, the amino acid sequence of which is represented in the annexed sequence list under the number SEQ ID NO: 1 or a fragment thereof or an assembly of such fragments.

3) DNA polymerase according to claim 2, the amino acid sequence of which is represented in the annexed sequence list under the number SEQ ID NO: 2.

4) A DNA sequence constituted by or comprising the sequence coding for a DNA polymerase according to any one of claims 1 to 3.

5) A DNA sequence according to claim 4 consisting of or comprising the sequence between nucleotides 357 to 5028 of the sequence SED ID NO: 1, or a fragment thereof or an assembly of such fragments.

6) A DNA sequence according to one of claims 4 to 5 consisting of or comprising nucleotides 357 to 1674 and 2755 to 3156 and 4324 to 5028 of the DNA sequence represented in the sequence list in the annex under the number SED ID NO: 1.

7) A vector containing the DNA sequence of any one of claims 4 to 6. 8) A host transformed by a vector according to claim 7.

9) Process for the preparation of a thermostable DNA polymerase of archaebacteria of the species Thermococcus fumicolans, characterized in that the host is cultivated according to claim 8 under conditions allowing the expression of said DNA polymerase and in that 1 extract and recover it by any suitable means.

10) Method of enzymatic amplification of a nucleic acid sequence characterized in that a thermostable DNA polymerase according to any one of claims 1 to 3 is used.

11) Purified thermostable intein from archaebacteria of the species Thermococcus fumicolans.

RECTIFIED SHEET (RULE 91) ISA / EP