AU2005211587A1

AU2005211587A1 - Method for obtaining nucleic acids from an environment sample, resulting nucleic acids and use in synthesis of novel compounds

Info

Publication number: AU2005211587A1
Application number: AU2005211587A
Authority: AU
Inventors: Maria Ball; Camela Cappellano; Sophie Courtois; Francois Francou; Asa Frostegard; Michel Guerineau; Jeannin Pascale; Jean-Luc Pernodet; Alain Raynal; Guennadi Sezonov; Pascal Simonet; Karine Tuphile
Original assignee: Aventis Pharma SA
Current assignee: Aventis Pharma SA
Priority date: 1999-11-29
Filing date: 2005-09-19
Publication date: 2005-10-13
Also published as: JP2003520578A; CA2393041A1; WO2001040497A3; WO2001040497A2; AU2179101A; EP1268764A2; BR0015993A; IL149846A0; KR20020060242A; AU781961B2; NO20022532D0; NO20022532L

Description

P/00/011 Regulation 3.2

AUSTRALIA

Patents Act 1990 COMPLETE SPECIFICATION STANDARD PATENT

(ORIGINAL)

Name of Applicant: Aventis Pharma of 20 avenue Raymond Aron, F-92160 Antony, FRANCE Actual Inventors: JEANNIN, Pascale PERNODET, Jean-Luc GUERINEAU, Michel SIMONET, Pascal COURTOIS, Sophie CAPPELLANO, Carmela FRANCOU, Francois RAYNAL, Alain BALL, Maria SEZONOV, Guennadi TUPHILE, Karine FROSTEGARD, Asa Address for Service: DAVIES COLLISON CAVE, Patent Trademark Attorneys, of 1 Nicholson Street, Melbourne, 3000, Victoria, Australia Ph: 03 9254 2777 Fax: 03 9254 2770 Attorney Code: DM "Method for obtaining nucleic acids from an environment Invention Title: sample, resulting nucleic acids and use in synthesis of novel compounds" The following statement is a full description of this invention, including the best method of performing it known to us:- -1- SMethod for obtaining nucleic acids from an environment sample, resulting 0nucleic acids and use in synthesis of novel compounds This application is a divisional of Australian Patent Application No. 21791/01, the entire contents of which are incorporated herein by reference.

The present invention relates to a process for preparing nucleic 5 acids from an environmental sample, more particularly a process for 00 Sobtaining a collection of nucleic acids from a sample. The invention also relates to the nucleic acids or to the collections of nucleic acids obtained according to the process and to their use in the synthesis of novel compounds, in particular novel compounds of therapeutic interest.

The invention also relates to the novel means used in the above process for obtaining nucleic acids, such as novel vectors and novel processes for preparing such vectors or alternatively recombinant host cells comprising a nucleic acid of the invention.

The invention also relates to processes for detecting a nucleic acid of interest in a collection of nucleic acids obtained according to the above process, as well as to the nucleic acids detected by such a process and to the polypeptides encoded by such nucleic acids.

The invention also relates to nucleic acids obtained and detected according to the above processes, in particular nucleic acids encoding an enzyme which participates in the pathway for the biosynthesis of antibiotics such as 3-actams, aminoglycosides, heterocyclic nucleotides or polyketides, as well as the enzyme encoded by these nucleic acids, the polyketides produced by means of the expression of these nucleic acids and, finally, pharmaceutical compositions comprising a pharmacologically active amount of a polyketide produced by means of the expression of such nucleic acids.

Since the discovery of the production of streptomycin by actinomycetes, the search for novel compounds of therapeutic interest, Q and most particularly of novel antibiotics, has made increasing use of methods for screening the metabolites produced by soil microorganisms.

Such methods consist mainly in isolating the organisms of the telluric microflora, in culturing them on specially adapted nutrient media t 5 and then in detecting a pharmacological activity in the products found in 00 the culture supernatants or in the cell lysates which have, where Sappropriate, undergone one or more prior separation and/or purification 0 steps.

CThus, the methods for the in vitro isolation and culturing of the organisms constituting the telluric microflora have, to date, enabled the characterization of about 40,000 molecules, about half of which show biological activity.

Major products have been characterized according to such in vitro culture methods, such as antibiotics (penicillin, erythromycin, actinomycin, tetracycline, cephalosporin), anticancer agents, anti-cholesterolaemiants or pesticides.

The products of therapeutic interest of microbial origin which are known to date originate in the majority (about 70%) from the actinomycetes and more particularly from the Streptomyces genus. However, other therapeutic compounds, such as teicoplanins, gentamycin and spinosins, have been isolated from microorganisms of genera that are more difficult to culture, such as Micromonospora, Actinomadura, Actinoplanes, Nocardia, Streptosporangium, Kitasatosporia or Saccharomonospora.

However, the practice illustrates the fact that the characterization of novel natural products synthesized by the microorganisms of soil microflora remains limited, partly on account of the fact that the in vitro culturing step usually results in a selection of organisms that are already previously known.

The methods for in vitro separation and culturing of telluric organisms in order to identify novel compounds of interest thus have many limitations.

For example, in actinomycetes, the level of rediscovery of antibiotics that are already previously known is about 99%. Specifically, 00 t fluorescence microscopy techniques have made it possible to count more F3 than 1010 bacterial cells in 1 g of soil, whereas only 0.1 to 1% of these bacteria can be isolated after inoculation on culture media.

c With the aid of DNA recombination kinetics techniques, it has been possible to show that between 12,000 and 18,000 bacterial species can be contained in 1 g of soil, whereas, to date, only 5000 non-eukaryotic microorganisms have been described, all habitats considered.

Molecular ecology studies have made it possible to amplify and clone many novel sequences of 16S rDNA from environmental DNA.

The results of these studies have led to a trebling of the number of bacterial divisions previously characterized.

At the present time, bacteria are subdivided into 40 divisions, some of which consist only of bacteria which cannot be cultured. These latest results bear witness to the breadth of microbial biodiversity which remains unexploited to date.

Recent studies have attempted to overcome the many obstacles to gaining access to the biodiversity of the soil microflora, in particular including the step of in vitro culturing prior to the isolation and characterization of compounds of industrial interest, especially of therapeutic interest.

Methods have thus been developed which include a step of extracting the DNA from telluric organisms, where appropriate after a prior isolation of the organisms contained in the soil samples.

SThe DNA thus extracted, after lysis of the bacterial cells without prior in vitro culturing, is cloned into vectors used to transfect host Sorganisms, in order to constitute libraries of DNA originating from soil bacteria.

These libraries of recombinant clones are used to detect the o00 presence of genes encoding compounds of therapeutic interest or Salternatively to detect the production of compounds of therapeutic interest by these recombinant clones.

SHowever, the methods for gaining direct access to the DNA of soil microflora, described in the prior art, present drawbacks during the implementation of each of the steps described above, these drawbacks being of a nature to considerably affect the quantity and quality of the genetic material obtained and exploitable.

The prior art regarding each of the steps for constructing libraries of DNA originating from soil samples is detailed below, along with the technical drawbacks identified by the Applicant and which have been overcome according to the present invention.

1. Step of extracting DNA from a soil sample 1.1 Direct extraction of environmental DNA This is essentially a process using DNA extraction techniques performed directly on the environmental sample, usually after a prior in situ lysis of the organisms in the sample.

Such techniques have been used on samples originating from aquatic media, both from freshwater and marine water. They comprise a first step of preconcentrating the cells present in free form or in the form of particles, which generally consists of a filtration of large volumes of water Son different filtration devices, for example conventional membrane filtration, Stangential or rotational filtration or alternatively ultrafiltration.

The pore size is between 0.22 and 0.45 mm and often requires a prefiltration in order to avoid blockages due to the treatment of large volumes.

00oo _In a second stage, the cells harvested are lysed directly on the Sfilters in small volumes of solutions, by enzymatic and/or chemical treatment.

This technique is illustrated for example by the studies by Stein et al., 1996, Journal of Bacteriology, Vo1.178 591-599 who describes the cloning of genes encoding ribosomal DNA and encoding a transcription elongation factor (EF 2) from Archaebacteria of marine plankton.

Techniques of direct extraction of DNA from samples of soil or sediment have also been described, which are based on protocols of physical, chemical or enzymatic lysis performed in situ.

For example, US patent No. 5 824 485 (Chromaxome Corporation) describes a chemical lysis of bacteria directly on the sample taken by addition of a hot lysis buffer based on guanidium isothiocyanate.

International patent application No. WO 99/20799 (Wisconsin Alumni Research Foundation) decribes a step of in situ lysis of bacteria using an extraction buffer containing a protease and SDS.

Other techniques have also been used, such as carrying out several cycles of freezing-thawing on the sample followed by high-pressure pressing of the thawed sample. Techniques of bacterial lysis using a succession of steps of sonication, heating with microwaves and heat shocks have also been used (Picard et al. 1992).

However, the techniques of the prior art described above for the direct extraction of DNA have very variable efficacy in quantitative and qualitative terms.

SThus, in situ chemical or enzymatic treatments of the sample Shave the drawback of lysing only certain categories of microorganisms on account of the selective resistance of the various microorganisms indigenous to the lysis step due to their heterogeneous morphology.

Thus, Gram-positive bacteria withstand a treatment with hot SDS 00oo detergent whereas virtually all Gram-negative cells are lysed.

C In addition, some of the direct extraction protocols described 0 above promote the adsorption of the nucleic acids extracted onto the mineral particles of the sample, thus significantly reducing the amount of available DNA.

Moreover, although some of the protocols of the prior art disclose a mechanical treatment step to lyse the microorganisms in the sample taken, such a mechanical lysis step is systematically carried out in liquid medium in an extraction buffer, which does not allow good homogenization of the starting sample in the form of fine particles enabling maximum accessibility to the diversity of organisms present in the sample. Grinding tests have also been carried out on crude soil samples using glass beads, but the amount of DNA extracted was low.

It has been observed according to the invention that a first step of in situ mechanical lysis in liquid medium has negative effects on the amount of DNA which can be extracted.

The amount of DNA which can be used directly for cloning in recombinant vectors is also dependent on the purification steps subsequent to its extraction.

In the prior art, the DNA extracted is then purified, for example by using polyvinylpolypyrrolidone, by a precipitation in the presence of ammonium acetate or potassium acetate, by centrifugations on a caesium chloride gradient, or by chromatographic techniques, in particular on a hydroxyapatite support, on an ion-exchange column or molecular sieving, Sor by electrophoresis techniques on agarose gel.

The DNA purification techniques previously described, especially when combined with the abovementioned techniques for extracting environmental DNA, are liable to lead to a co-purification of the DNA with 00 inhibitory compounds, originating from the initial sample, that are difficult to remove.

0 The co-extraction of inhibitory compounds with the DNA necessitates the multiplication of the number of purification steps, which leads to considerable losses of the DNA initially extracted and simultaneously reduces the diversity of the genetic material initially contained in the sample, as well as its quantity.

Another aim of the invention was to overcome the drawbacks of the prior purification protocols and to develop a DNA purifcation step which makes it possible to maintain an optimum level of diversity of the DNA in the initial sample, on the one hand, and to promote quantitatively its production, on the other hand.

Most particularly, the qualitative and quantitative improvements to the purification of DNA are at a maximum when they make use of a combination of a direct DNA extraction process according to the invention and a subsequent purification process, as will be described hereinbelow.

1.2. Indirect extraction of environmental

DNA.

Such techniques involve a first step of separation of the various organisms in the telluric microflora from the other constituents of the starting sample, prior to the actual DNA extraction step.

In the state of the art, the prior separation of a microbial fraction from a soil sample usually comprises a physical dispersion of the sample by grinding it in liquid medium, for example using devices such as a Waring a Blender or a mortar.

Chemical dispersions have also been described, for example dispersions on ion-exchange resins or dispersions using non-specific detergents such as sodium deoxycholate or polyethylene glycol. Whatever 00oo the mode of dispersion, the solid sample should be suspended in water, phosphate buffer or a saline solution.

O The physical or chemical dispersion step can be followed by a Scentrifugation on a density gradient allowing the separation of the cells contained in the sample and of the particles of this sample, it being understood that bacteria have lower densities than those of most soil particles.

The physical dispersion step can also alternatively be followed by a-step of low-speed centrifugation or a step of cell elutriation.

The DNA can then be extracted from the separated cells by any available method of lysis and can be purified by many methods, including the purification methods described in paragraph 1.1 above. In particular, the inclusion of the cells in low-melting agarose can be carried out in order to control the lysis.

However, the methods described in the Irior art that are known to the Applicant are unsatisfactory on account of the presence, in the fractions containing the extracted DNA, of unwanted constituents of the starting sample which have a significant influence on the final quality and quantity of DNA.

The present invention proposes to solve the technical difficulties encountered in the processes of the prior art, as will be described hereinbelow.

2. Molecular characterization of the extracted DNA.

When it is desired to construct a DNA library from an environmental sample, in particular from a soil sample, it is advantageous to check the quality and diversity of the source of DNA extracted and purified before it is inserted into suitable vectors.

o00 The object of such a molecular characterization of the DNA Sextracted and purified is to obtain profiles representing the proportions of the various bacterial taxons present in this DNA extract. The molecular c characterization of the DNA extracted and purified makes it possible to determine whether or not artefacts have been introduced during the implementation of the various extraction and purification steps and, where appropriate, whether or not the original diversity of the DNA extracted and purified is representative of the microbial diversity initially present in the sample, in particular in the soil sample.

To the Applicant's knowledge, the prior art makes use of quantitative hybridization processes using oligonucleotide probes that are specific for different bacterial groups, applied directly to the DNA extracted from the environment.

Unfortunately, such an approach is relatively insensitive and does not make it possible to detect taxonomic groups or genera that are present in low abundance.

The prior art also describes quantitative PCR processes, such as MPN-PCR or competitive quantitative PCR. However, these techniques have major drawbacks.

Thus, MPN-PCR is complicated to carry out on account of the multiplication of the dilutions and repetitions, making it unsuitable for a large number of samples or for primer couples.

Moreover, competitive quantitative PCR is difficult to carry out on account of the need to construct a competitor which is specific to the target Q DNA and which, in addition, does not induce any bias or artefacts into the competition itself.

cy) According to the invention, a process is thus proposed for prescreening a library of DNA originating from an environmental sample, which is both quick, simple and reliable and which makes it possible to test 00 I the quality of the DNA extracted and purified beforehand and thus to determine the value of constructing a library of clones prepared from this 0 purified starting DNA.

3. Vectors for cloning DNA extracted and purified from an environmental sample.

Many vectors have already been described in the prior art for cloning DNA preextracted from an environmental sample.

Thus, according to the description of international patent application No. WO 99/20799, viral vectors, phages, plasmids, phagemids, cosmids, phosmids, vectors of the BAC (bacterial artificial chromosome) type or bacteriophage P1, vectors of PAC type (artificial chromosome based on bacteriophage P1), vectors of the YAC (yeast artificial chromosome) type, yeast plasmids or any other vector capable of maintaining and expressing a genomic DNA in a stable manner can be used.

Example 1 of PCT patent application No. WO 99/20799 describes the construction of a genomic DNA library by cloning into a vector of the BAC type.

To the Applicant's knowledge, no DNA library originating from an environmental sample has yet been effectively produced with vectors of conjugative type, such a technique being made available to and Sreproducible by those skilled in the art for the first time by virtue of the Steaching of the present invention.

4. Host cells S 00 SIn the prior art, many host cells have been described as being Sable to be used in order to accommodate vectors containing inserts of DNA O originating from the DNA extracted and purified from an environmental c sample.

Thus, PCT patent application No. WO 99/20799 cites many suitable host cells, such as Escherichia coli, in particular the strain DH or the strain 294 (ATCC 31446, the strain E. coli B, E. Coli X 1776 (ATCC No. 31.537), E.coliDH5 a and E.coli W3110 (ATCC No. 27.325).

This PCT patent application also cites other suitable host cells such as Enterobacter, Erwinia, Klebsiella, Proteus, Salmonella, Serratia, Schigella or strains of the bacillus type such as B. subtilis and B. licheniformis as well as bacteria of the genus Pseudomonas, Streptomyces or Actinomyces.

US patent No. 5 824 485 in particular cites the Streptomyces lividans TK66 strain or yeast cells such as those of Saccharomyces pombe.

Characterization of genes of interest in DNA libraries originating from an environmental sample.

PCT patent application No. WO 99/20799 describes an identification of the phenotype of different clones belonging to the DNA library of B. cereus, respectively a clone producing haemolysin, a clone hydrolysing esculin or a clone producing an orange pigment.

SMutagenesis techniques based on the use of a transposon Sencoding the phoA enzyme made it possible subsequently to isolate mutated clones and to characterize the sequences responsible for the phenotypes observed.

The abovementioned article by Stein et al. (1996) describes the 00 n use of specific primers for ribosomal DNA in order to amplify the DNA inserted into the vectors harboured by certain clones of a genomic DNA 0 library of marine plankton Archaebacteria and the identification of several c coding sequences in the DNA thus amplified.

The article by Borschert S. et al. (1992) describes the screening of a genomic DNA library of Bacillus subtilis using pairs of primers which hybridize with conserved regions of known peptide synthetases in order to identify one or more corresponding genes in the genome of Bacillus subtilis.

This technique made it possible to detect a chromosomal DNA fragment of about 26 kb carrying a portion of the surfactin biosynthesis operon.

The article by Kah-Tong S. et al. (1997) describes the screening of a library of DNA originating from the soil with the aid of primers which hybridize with conserved sequences of the operon responsible for the biosynthetic pathway of type II polyketides and shows the identification, in this DNA library, of sequences belonging to the PKS-P gene. This article also describes the construction of hybrid expression cassettes in which the sequence of the PKS-P subunit, found naturally in the operon responsible for polyketide biosynthesis, has been replaced with various similar sequences found in the DNA library.

Similarly, the article by Hong-Fu et al. (1995) describes the construction of expression cassettes containing the various open reading frames of the operon responsible for polyketide biosynthesis, the various Q expression cassettes having been constructed artificially by combining the $4 open reading frames which are not found together naturally in the genome of Streptomyces coelicolor. This article shows that the combination, in the artificial expression cassettes, of open reading frames originating from different bacterial strains allows the production of polyketides that have 00 different structural characteristics and relatively large antibiotic activities Swith respect to Bacillus subtilis and Bacillus cereus.

Polyketides form part of a large family of natural products of C variable structure having great diversity of biological activity. Among the polyketides are, for example, tetracyclines and erythromycin (antibiotics), FK506 (immunosuppressant), doxorubicin (anticancer agent), monensin (a coccidiostatic agent) and avermectin (an antiparasitic agent).

These molecules are synthesized by means of multifunctional enzymes known as polyketide synthases, which catalyse repeated cycles of condensation between acyl thioesters (in general acetyl, propionyl, malonyl or methylmalonyl thioesters). Each condensation cycle results in the formation, on a growing carbon chain, of a 13-keto group which can then undergo, where appropriate, one or more series of reductive steps.

Given the major clinical interest of polyketides, their common mechanism of biosynthesis and the high degree of conservation observed between the groups of genes encoding polyketide synthases, increased interest has developed for the development of novel polyketides by genetic engineering.

Novel artificial polyketides have thus been produced by genetic engineering, such as mederrhodin A or dihydrogranatirhodin. The vast majority of the novel polyketide molecules obtained by genetic engineering are very different, in structural terms, from the corresponding natural polyketides.

O From the prior art, it thus emerges that there is a need to obtain 4novel polyketides of interest and most particularly polyketides of therapeutic interest which have in particular, relative to their natural homologues, an increased level of antibiotic activity or a different spectrum of antibiotic activity, either which is broader than that of the known 00 Spolyketides, or which is, on the other hand, more selective.

As will be described below, this need is partly fulfilled according to the present invention.

DESCRIPTION OF THE INVENTION The invention relates firstly to a process for constructing libraries of DNA originating from an environmental sample, such a sample possibly being, without discrimination, an aquatic medium (fresh water or marine water), a sample of soil (surface layer of soil, subsoil or sediments), or a sample of eukaryotic organisms containing an associated microflora, such as, for example, a sample originating from plants, insects or marine organisms and having an associate microflora.

The development of a process for constructing a library of DNA from an environmental sample, and most particularly from a soil sample, comprises critical steps whose implementation must necessarily be optimized in order to obtain a library of DNA whose content of nucleic acids of interest satisfies the objectives initially set.

A first critical step consists in extracting and subsequently purifying the nucleic acids initially contained in the sample, i.e. mainly the nucleic acids contained in the various organisms of which the microflora of this sample is composed.

The quality of purification of the extracted DNA is a factor which determines the result obtained.

QA second important step of a process for constructing a library of nucleic acids originating from an environmental sample is the evaluation of the genetic diversity of the nucleic acids extracted and purified. The development of a step for the simple and reliable pre-screening of the DNA extracted and purified in order to check that it takes account, at least 00oo partially, of the phylogenetic diversity of the organisms initially present in the starting sample effectively makes it possible to determine the value or 0 otherwise of using the initial source of extracted and purified DNA for the C construction of the nucleic acid library itself or, on the contrary, to not continue the construction of the nucleic acid library on account of excessive artefacts introduced at the time of the extraction and purification of the nucleic acids. It has also been identified, according to the invention, that the quality of the inserts introduced into the vectors to construct the library is a determining factor. It has thus been determined that the use of restriction enzymes to cleave the DNA extracted and purified from the environmental sample was of a nature to introduce artefacts or "bias" into the structure of the inserts obtained. Specifically, the DNA extracted from the soil or from other environments, originating in the vast majority of cases from unculturable organisms, is composed of molecules whose content of G and C bases is by definition unknown and furthermore variable as a function of the origin of these organisms.

A third critical step is the insertion of the extracted and purified nucleic acids into vectors capable of integrating nucleic acids of chosen length, on the one hand, and to allow their transfection or integration into the genome of given host cells, on the other hand, as well as, where appropriate, to allow their expression in such host cells.

Vectors capable of integrating large nucleic acids, i.e. larger than 100 kb in size, constitute vectors of interest when the objective pursued consists in cloning and identifying a complete operon capable of directing a complete biosynthetic pathway of a compound of industrial interest, in aparticular of a compound of pharmaceutical or agronomic interest.

DEFINITIONS

oo 00For the purposes of the present invention, the terms "nucleic acids", "polynucleotides" and "oligonucleotides" mean not only DNA and RNA sequences but also hybrid RNA/DNA sequences of more than C 2 nucleotides, in either single-stranded or double-stranded form.

The term "library" or "collection" is used in the present description with reference either to a set of extracted, and where appropriate purified, nucleic acids originating from an environmental sample, to a set of recombinant vectors, each of the recombinant vectors of the set comprising a nucleic acid originating from the set of abovementioned extracted, and where appropriate purified, nucleic acids, or to a set of recombinant host cells comprising one or more nucleic acids originating from the set of abovementioned extracted, and where appropriate purified, nucleic acids, the said nucleic acids being either carried by one or more recombinant vectors or integrated into the genome of the said recombinant host cells.

The expression "environmental sample" denotes, without discrimination, a sample of aquatic origin, for example from fresh or salt water, or a telluric sample originating from the surface layer of a soil, from sediments or from lower layers of the soil (subsoil), as well as samples of eukaryotic organisms, which may be multicellular, of plant origin, originating from marine organisms or from insects and having an associated microflora, this associated microflora constituting organisms of interest.

V According to the invention, the term "operon" means a set of open reading frames whose transcription and/or translation is co-regulated by a unique set of signals for regulating the transcription and/or translation.

According to the invention, an operon can also comprise the said signals for regulating the transcription and/or translation.

00oo For the purposes of the invention, the expression "metabolic Spathway" or "biosynthetic pathway" means a set of anabolic or catabolic O biochemical reactions which results in the conversion of a first chemical Sspecies into a second chemical species.

For example, a biosynthetic pathway for an antibiotic consists of the set of biochemical reactions converting primary metabolites into intermediate products of the antibiotics, and then subsequently into antibiotics.

The expression "regulation sequence which is operably linked relative to a nucleotide sequence whose expression is desired" means that the transcription regulation sequence(s) is (are) located, relative to the nucleotide sequence of interest whose expression is desired, so as to allow the expression of the said sequence of interest, the regulation of the said expression being dependent on factors which interact with the regulatory nucleotide sequences.

According to another terminology, it may also be said that the nucleotide sequence of interest whose expression is desired is placed "under the control" of the transcription-regulating nucleotide sequences.

For the purposes of the present invention, the term "isolated" denotes a biological material which has been abstracted from its original environment (the environment in which it is naturally located).

For example, a polynucleotide or a polypeptide present in the natural state in an organism (virus, bacterium, fungus, yeast, plant or animal) is not isolated. The same polypeptide separated from its natural environment or the same polynucleotide separated from the adjacent nucleic acids within which it is naturally inserted in the genome of the organism, is isolated.

Such a polynucleotide can be included into a vector and/or such a polynucleotide can be included into a composition and nevertheless remain 00oo in isolated form, due to the fact that the vector or composition does not Sconstitute its natural environment.

O The term "purified" does not require the material to be present in a form of absolute purity, exclusive of the presence of other compounds.

Rather, this is a relative definition.

A polypeptide or polynucleotide is in purified form after purification of the starting material by at least one order of magnitude, preferably two or three and preferentially four or five orders of magnitude.

For the purposes of the present invention, the "percentage of identity" between two sequences of nucleotides or of amino acids can be determined by comparing two optimally aligned sequences across a comparison window.

The portion of the nucleotide or polypeptide sequence in the comparison window can thus comprise additions or deletions (for example "gaps") relative to the reference sequence (which does not comprise these additions or deletions) so as to obtain an optimum alignment of the two sequences.

The percentage is calculated by determining the number of positions at which an identical nucleic base or an identical amino acid residue is observed for the two compared sequences (nucleic acid or peptide), followed by dividing the number of positions at which there is identity between the two bases or amino acid residues by the total number of positions in the comparison window, followed by multiplying the result by 100 in order to obtain the percentage of sequence identity.

SThe optimum alignment of the sequences for the comparison can be achieved by computer with the aid of known algorithms contained in the package from the company Wisconsin Genetics Software Package, Genetics Computer Group (GCG), 575 Science Doctor, Madison, Wisconsin.

00 In By way of illustration, the percentage of sequence identity may be Sdetermined using the BLAST software (BLAST versions 1.4.9 of March 0 1996, BLAST 2.0.4. of February 1998 and BLAST 2.0.6. of September S1998), exclusively using the default parameters Altschul et al., J. Mol.

Biol. 1990 215: 403-410, S. F. Altschul et al., Nucleic Acids Res. 1997 3389-3402). Blast recherche des sequences similaires/homologues a une sequence requ6te de r6f6rence, a I'aide de I'algorithme [Blast search for sequences similar/homologous to a reference "request" sequence, with the aid of the algorithm] from Altschul et al. The request sequence and the databases used can be of peptide or nucleic nature, any combination being possible.

EXTRACTION AND PURIFICATION OF NUCLEIC ACIDS ORIGINATING FROM AN ENVIRONMENTAL

SAMPLE.

1. Direct extraction of nucleic acids It has been shown according to the present invention that, in order to obtain a library of nucleic acids originating from organisms contained in a sample of soil, it was important to create conditions under which, on the one hand, the various organisms in the sample are made accessible to the subsequent steps for extracting the nucleic acids, and, on the other hand, that the initial step of treatment of the sample of soil allows a maximum mechanical lysis of the organisms in the sample, which is of a nature to Qmake the nucleic acids of these organisms directly accessible, mainly the genomic and plasmid DNA, to the buffers used for the subsequent extraction steps.

It has thus been demonstrated according to the invention that maximum accessibility of nucleic acids originating from microorganisms 00 from a sample of soil was achieved by a thorough dry-grinding of the pre- Sdried soil sample in order to obtain microparticles. The Applicant has thus 0 determined that the drying of the soil sample prior to any subsequent c treatment brings about a significant reduction in the cohesion of the crude soil sample and consequently promotes its subsequent disintegration in the form of microparticles, when a suitable grinding treatment is carried out.

Surprisingly, the Applicant has shown that microparticles of dry soil samples combined physicochemical properties that are favourable to the extraction of an optimum quantity of nucleic acids which, in their nature, could be representative of the genetic diversity of the organisms initially present in the starting soil sample. It has been shown in particular that the process of direct extraction of nucleic acids according to the invention allows the extraction of DNA originating from rare microorganisms, such as certain rare Streptomyces or sporulated microorganisms.

For the purposes of the present invention, the term "microparticles" of the soil sample means particles derived from the sample which have an average size of about 50 p/m, i.e. on average between and 55 pm.

According to the invention, the microparticles are obtained from soil samples that are pre-dried or pre-desiccated and then ground until microparticles with an average size of between 2 pm and 50 pm are obtained, before resuspension of the microparticles obtained in a liquid buffer medium.

SSuch a liquid buffer medium can consist of a nucleic acid Sextraction buffer, in particular a conventional DNA extraction buffer which is well known to those skilled in the art.

The grinding of the soil sample into microparticles has the twin 5 function of mechanically lysing most of the organisms present in the initial o00 soil sample and of-making the organisms that are not lysed by this c mechanical treatment accessible to optional subsequent steps of chemical O and/or enzymatic lysis.

Thus, a first subject of the invention consists of a process for preparing a collection of nucleic acids from a soil sample containing organisms, the said process comprising a first step of obtaining microparticles by grinding the pre-dried or pre-desiccated soil sample, followed by suspending the microparticles in a liquid buffer medium.

In an entirely preferred manner, the grinding step is carried out using a device with agate or tungsten beads or alternatively using a device with tungsten rings. These devices are preferred since the hardness of materials such as agate or tungsten significantly facilitates the production of microparticles of the size specified above. For this reason, use of a grinding device with glass beads, which is found to be much less efficient, will preferably not be chosen, or will be avoided.

The drying or classification of the soil sample can be carried out by any method known to those skilled in the art. For example, the crude soil sample can be dried at room temperature for a period of 24 to 48 hours.

As indicated previously, the liquid buffer medium can consist of a medium for extracting the DNA present in the microparticles. An extraction buffer known as TENP containing, respectively, 50 mM Tris, 20 mM EDTA, 100 mM NaCI and 1% (weight/volume) of polyvinylpolypyrrolidone, at pH will mnost preferably be used.

0 The process for preparing a collection of nucleic acids from a soil sample is also characterized in that the step for obtaining microparticles by grinding the pre-dried or pre-desiccated soil sample is followed by a step of extracting the nucleic acids present in the microparticles.

It is common ground that the extraction of the nucleic acids is 00 accompanied by a co-extraction of unwanted soil constituents and/or F compounds, thus necessitating the subsequent purification of the nucleic acids extracted, such a subsequent purification step needing to be both c selective enough to allow the removal of the unwanted soil constituents and/or compounds, and of a yield which is sufficient to entail a small loss in terms of the amount of pre-extracted

DNA.

It has been shown according to the invention that a step of purifying the DNA extracted from the microparticles of the soil sample which satisfies the selectivity and yield criteria defined above comprises a treatment of the extracted DNA with a combination of two successive chromatography steps, a chromatography on molecular sieves and an anion-exchange chromatography, respectively.

According to another characteristic of the above process, step of extracting the nucleic acids is followed by a step of purifying the extracted nucleic acids with the aid of the following two chromatography steps: passing the solution containing the nucleic acids over a molecular sieve, followed by recovery of the elution fractions enriched in nucleic acids; passing the elution fractions enriched in nucleic acids over an anion-exchange chromatography support, followed by recovery of the elution fractions containing the nucleic acids.

The nature and order of the above chromatography steps are essential for good selectivity and an excellent yield for the step of purifying Q the DNA pre-extracted from the microparticles of the pre-dried or pre- Sdesiccated soil sample.

In a very advantageous manner, the chromatographic support of the "molecular sieve" type in the above nucleic acid purification step consists of a chromatographic support of Sephacryl® S400 HR type or a 00 chromatographic support of equivalent characteristics.

In an entirely preferred manner, the anion-exchange 0 chromatographic support used in the second step for purifying the Sextracted DNA is a support of Elutip® d type, or a chromatographic support of equivalent characteristics.

By combining the steps of obtaining microparticles of the dry soil sample, of extracting the nucleic acids present in the microparticles and of purification by the chromatography steps described above, it is possible according to the invention to extract the DNA from the soil directly without prior purification of the cells of the organisms initially contained in the sample, while at the same time avoiding the co-extraction of soil contaminants, such as, for example, humic acids, which is observed with the processes of the prior art.

The contaminants, such as humic acids, severely impair the analyses and the subsequent uses of the nucleic acids whose purification is desired.

According to the above process, it is also possible to gain access to the nucleic acids contained in the organisms which have not been lysed mechanically during step of obtaining microparticles of the soil sample, with the aim of obtaining a virtually exhaustive collection of the genetic diversity of nucleic acids initially present in the soil sample. Thus, the microparticles of the soil sample can undergo subsequent steps of chemical, enzymatic or physical lysis treatment, or alternatively a combination of chemical, enzymatic or physical treatments.

According to a first aspect, the process for preparing a collection $4 of nucleic acids from a soil sample according to the invention can also be characterized in that step is followed by the following steps: treatment of the soil suspension in a liquid buffer medium by 00 I sonication; S* extraction and recovery of the nucleic acids.

In a preferred manner, for a treatment by sonication, use will be made of a device of titanium micro-point type, such as the 600 W Vibracell Ultrasonicator device sold by the company Bioblock or a sonicator of Cup Horn type.

In an entirely preferred manner, the sonication step is carried out at a power of 15 W for a duration of 7 to 10 minutes and comprises successive cycles of sonication, the sonication itself being carried out for of the duration of each cycle.

According to a second aspect, the above process can also be characterized in that step is followed by the following steps: treatment of the soil suspension in a liquid buffer medium by sonication; incubation of the suspension at 37°C after sonication in the presence of lysozyme and achromopeptidase; addition of SDS before centrifugation and precipitation of the nucleic acids; recovery of the precipitated nucleic acids.

Preferably, the step of incubation in the presence of lysozyme and achromopeptidase will be carried out at a final concentration of 0.3 mg/ml of each of the two enzymes, preferably for 30 minutes at 37°C.

oo n Preferably, the SDS will be used at a final concentration of 1% c- and for an incubation time of 1 hour at a temperature of 60°C before O centrifugation and precipitation.

According to a third aspect, the process for preparing a collection of nucleic acids from a soil sample above is also characterized in that step is followed by the following steps: homogenization of the soil suspension with a step of vigorous mixing (vortex) followed by a step of simple stirring; freezing of the homogeneous suspension followed by thawing; treatment of the suspension by sonication after thawing; incubation of the suspension at 370C after sonication in the presence of lysozyme and achromopeptidase; addition of SDS before centrifugation and precipitation of the nucleic acids; recovery of the nucleic acids.

Preferably, the suspensions of soil microparticles are mixed on the vortex machine and then homogenized by gentle stirring on a stirrer with circular rotation for a duration of two hours, after which they are frozen at Preferably, the suspensions are again vigorously stirred with a vortex machine for 10 minutes, after thawing and before the sonication step.

SIt goes without saying that the nucleic acids extracted by the 0. embodiments of the process described above for the direct extraction-of Snucleic acids are preferably purified according to the purification step consisting of a first passage over molecular sieves and then a subsequent passage, of the elution fractions obtained after the chromatography on 00oo tt molecular sieves, over an anion-exchange chromatographic support.

2. Indirect extraction of nucleic acids According to a second embodiment of the process for preparing a collection of nucleic acids from an environmental sample, according to the invention, the said environmental sample undergoes a first treatment which is of a nature to allow separation of the organisms, contained in this sample, from the other macro-constituents of the sample.

This second embodiment of the process for preparing a collection of nucleic acids according to the invention promotes the production of large nucleic acids, which are virtually impossible to obtain according to the first embodiment of the process according to the invention described above, the mechanical lysis step performed in order to obtain the microparticles also having the effect of physically breaking the nucleic acids in the soil sample or the nucleic acids contained in the organisms in the soil sample.

The production of large nucleic acids has been sought by the Applicant for the purpose of isolating and characterizing nucleic acids comprising, at least partially, all of the coding sequences belonging to the same operon capable of directing the biosynthesis of a compound of industrial interest.

Preferably, by carrying out the second embodiment of the process for preparing a collection of nucleic acids from a soil sample according to the invention, nucleic acids are obtained which are greater than 100 kb in zsize, preferably greater than 200, 250 or 300 kb, and most preferably nucleic acids greater than 400, 500 or even 600 kb in size.

This second embodiment of a process for preparing a collection of nucleic acids from an environmental sample according to the invention consists of a combination of four successive steps intended to obtain 00oo _nucleic acids having the characteristics described above.

When the environmental sample is a soil sample, it has been 0 shown according to the invention that a first step for obtaining a suspension by dispersing the soil sample in liquid medium promotes the accessibility of the organisms contained in the sample without bringing about any significant mechanical lysis of the cells.

The first step of obtaining a dispersion of the above soil sample makes the organisms in the sample accessible to the external medium and also allows a partial dissociation of the organisms in the sample and of the macro-constituents. It thus makes possible a subsequent separation of the organisms initially contained in the sample from the other constituents of this sample.

When the environmental sample originates, for example, from plants, from marine organisms or from insects, a pretreatment by grinding is necessary in order to make the organisms of the associated microflora accessible to the subsequent steps of the process.

Thus, the present process comprises a step of separating the organisms from the other inorganic and/or organic constituents obtained above by means of centrifugation on a density gradient. The organisms thus separated are then subjected to a step of lysis and then of extraction of the nucleic acids.

The step of centrifugation on a density gradient makes it possible, surprisingly, to separate the cells of organisms in the soil particles contained in the sample suspension. In point of fact, it might have been O expected that a proportion of the cells would be entrained with the macroparticles in the gradient phase. In addition, it had never been demonstrated hitherto that a centrifugation of a soil sample on a density gradient made it possible to find, at the aqueous phase/gradient interface, a population of organisms representative of the diversity of the organisms 00 present in the starting sample, due to the fact that these organisms are extremely variable in volume, density and shape. It could reasonably be Sassumed that they would be found either in the aqueous phase, at the aqueous phase/density gradient interface or in the density gradient itself.

Thus, a person skilled in the art could expect that organisms with densities less than or greater than the density of the density gradient used (density of the density gradient of between 1.2 and 1.5 g/ml, preferably 1.3 g/ml) could not be recovered, the effect of which would have been to introduce a bias into the representativeness of the organisms effectively separated and, consequently, also into the diversity of the nucleic acids extracted.

Also, in one specific embodiment of the process, a step of germination of spores, in particular of actinomycetes, is carried out, the effect of which is to significantly increase the amount of actinomycete DNA recovered.

The final step consists of a step of purifying the nucleic acids thus extracted on a caesium chloride gradient.

Surprisingly, the purification of the nucleic acids on the caesium chloride gradient allows a substantial or even complete removal of the substances of which the density gradient is composed. This characteristic is a determining factor as regards the subsequent use of the purified nucleic acids, since the density gradient is known as being a powerful enzymatic inhibitor, capable where appropriate of inhibiting the catalytic Sactivity of the enzymes used to prepare the insertion of extracted nucleic Sacids into vectors.

According to this second embodiment, the process for preparing a collection of nucleic acids from an environmental sample containing organisms according to the invention comprises the succession of steps 00oo _below: S(i) production of a suspension by dispersing the environmental sample in liquid medium and then homogenizing the suspension obtained by gentle stirring; (ii) separating the organisms from the other inorganic and/or organic. constituents of the homogeneous suspension obtained in step (i) by centrifugation on a density gradient; (iii) lysis of the microorganisms separated in step (ii) and extraction of the nucleic acids; (iv) purification of the nucleic acids on a caesium chloride gradient.

Preferably, the suspension of the soil sample is obtained by dispersing this sample by grinding with the aid of a device such as a Waring Blender or a device of equivalent characteristics. In an entirely preferred manner, the sample suspension is obtained after three successive grinding operations each lasting one minute in a device such as a Waring Blender. Preferably, the ground sample will be cooled in ice between each of the grinding operations.

SPreferably, the organisms are then separated from the soil particles by centrifugation on a density cushion of the "Nycodenz" type, sold by the company Nycomed Pharma AS. (Oslo, Norway). The preferred centrifugation conditions are 10,000xg for 40 minutes at 4 0

C,

advantageously in a rotor with swing-out buckets of the "rotor TST 28.38" 00 type sold by the company Kontron.

The ring of organisms located, after centrifugation, at the interphase of the upper aqueous phase and the lower Nycodenz phase is then removed and washed by centrifugation before taking up the cell pellet in a suitable buffer.

Step (iii) of lysis of the organisms separated out in step (ii) described above can be carried out in any manner known to those skilled in the art.

Advantageously, the cells are lysed in a 10mM Tris-100mM EDTA solution at pH 8.0 in the presence of lysozyme and achromopeptidase, advantageously for one hour at 370C.

The actual extraction of the DNA can advantageously be carried out by adding a solution of lauryl sarcosyl of the final weight of the solution) in the presence of proteinase K and incubation of the final solution at 37°C for 30 minutes.

The nucleic acids extracted in step (iii) are then purified on a caesium chloride gradient. Preferably, the step of purifying the nucleic acids on a caesium chloride gradient is carried out by centrifugation at 35,000 rpm for 36 hours, for example on a rotor of the Kontron 65.13 type.

According to one specific aspect of the process for preparing a collection of nucleic acids from a soil sample containing organisms according to the invention, the said nucleic acids consist predominantly, if not exclusively, of DNA molecules.

SAccording to another aspect, the nucleic acids can be recovered Safter inclusion of the organisms, separated on a density gradient, in an agarose block and lysis, for example chemical and/or enzymatic lysis, or the organisms included in the agarose block.

Another subject of the invention consists of a collection of nucleic 00 acids consisting of the nucleic acids obtained in step ll-(iv) of the process Sfor preparing a collection of nucleic acids according to the invention, or 0 alternatively obtained in step or a subsequent step of the process for c preparing a collection of nucleic acids according to the invention.

The invention also relates to a nucleic acid which is characterized in that it is contained in a collection of nucleic acids as defined above.

According to a first aspect, such a nucleic acid constituting a collection of nucleic acids according to the invention is characterized in that it -comprises a nucleotide sequence encoding at least one operon, or part of an operon.

Most preferably, such an operon encodes all or part of a metabolic pathway.

Example 9 describes the construction of a genomic DNA library from a strain of Streptomyces alboniger and its cloning into the shuttle cosmids pOS7001 and pOS700R, respectively. It has been shown according to the invention that, in the DNA library prepared in the integrative vector pOS7001, new clones contain nucleotide sequences belonging to the operon responsible for the puromyocin biosynthetic pathway. Similarly, twelve clones containing nucleotide sequences of the operon responsible for the puromycin biosynthetic pathway have been identified in the DNA library prepared in the replicative vector pOS 700R.

In particular, certain integrative and replicative cosmids of the libraries produced have, after digestion with the restriction endonucleases Clal and EcoRV, a 12-kb fragment capable of containing all of the sequences of the operon responsible for the puromycin biosynthetic pathway.

Thus, according to another aspect, a nucleic acid according to the invention contains, at least partially, nucleotide sequences of the operon responsible for the puromycin biosynthetic pathway.

00oo Example 2 below describes the construction of a DNA library Saccording to a process in accordance with the present invention, in a 0' pBluescript SK vector starting with a soil contaminated with lindane.

The recombinant vectors were transfected into Escherichia coli DH10B cells and the transformed cells were then cultured in a suitable culture medium in the presence of lindane. Screening of the clones on transformed cells of the library made it possible to show that, out of 10,000 screened clones, 35 of them had a lindane degradation phenotype.

The presence of the linA gene in these clones was confirmed by PCT amplification by means of primers specific for this gene.

Thus, according to another aspect, the invention also relates to a nucleic acid containing a nucleotide sequence for the metabolic pathway which brings about the biodegradation of lindane.

It is thus clearly demonstrated, as described above, that a process for preparing a collection of nucleic acids from a soil sample containing organisms according to the invention and a process for preparing a collection of recombinant vectors containing the constituent nucleic acids of the collection of abovementioned nucleic acids was entirely suitable for the isolation and characterization of nucleotide sequences included in an operon.

An additional demonstration of the ability of a process according to the invention to identify coding nucleotide sequences involved in a biosynthetic pathway regulated in the form of an operon is also described later: this concerns the cloning and characterization of sequences Nencoding polyketide synthases involved in the pathway for the biosynthesis d of polyketides, which belong to a family of molecules certain representatives of which are of major therapeutic interest, in particular antibiotic interest.

A subject of the present invention is thus also a constituent o00 nucleic acid of a collection of nucleic acids according to the invention, ccharacterized in that it comprises all of a nucleotide sequence encoding a O polypeptide.

According to a first aspect, a constituent nucleic acid of a collection of nucleic acids according to the invention is of prokaryotic origin.

According to a second aspect, a constituent nucleic acid of a collection of nucleic acids according to the invention originates from a bacterium or from a virus.

According to a third aspect, a constituent nucleic acid of a collection of nucleic acids according to the invention is of eukaryotic origin.

In particular, such a nucleic acid is characterized in that it originates from a fungus, a yeast, a plant or an animal.

MOLECULAR CHARACTERIZATION OF THE COLLECTION OF NUCLEIC ACIDS EXTRACTED FROM THE SOIL.

In order to overcome the various technical drawbacks of the methods for characterizing libraries of DNA extracted and purified from an environmental sample which have been described in the section of the description relating to the prior art, the Applicant has developed a simple and reliable process for qualitatively and semi-quantitatively characterizing the nucleic acids obtained from the process described above.

The process according to the invention thus consists in universally amplifying a 700 bp fragment located inside a sequence of ribosomal DNA of 16S type, and then in hybridizing the amplified DNA with an oligonucleotide probe of variable specificity and finally in comparing the Shybridization intensity of the sample relative to an external calibration range of DNA of known sequence or origin.

The amplification prior to the hybridization with the oligonucleotide 00 probe makes it possible to quantify relatively scarce microorganism genera Sor species. Furthermore, the amplification with universal primers makes it O possible, during the hybridization, to use a broad series of oligonucleotide probes.

Thus, a subject of the invention is also a process for determining the diversity of nucleic acids contained in a collection of nucleic acids, and most particularly of a collection of nucleic acids originating from an environmental sample, preferably from a soil sample, the said process comprising the following steps: placing the nucleic acids of the collection of nucleic acids to be tested in contact with a pair of oligonucleotide primers hybridizing at any sequence of bacterial 16S ribosomal DNA; carrying out at least three amplification cycles; detection of the amplified nucleic acids using an oligonucleotide probe or a plurality of oligonucleotide probes, each probe hybridizing specifically with a 16S ribosomal DNA sequence common to a bacterial kingdom, order, subclass or genus; where appropriate, comparison of the results from the preceding detection step with the detection results, using the probe or the plurality of probes of nucleic acids of known sequence constituting a calibration range.

Preferably, a first pair of primers hybridizing with universally conserved regions of the gene for the 16S ribosomal RNA consists, respectively, of the primers FGPS 612 (SEQ ID No 12) and FGPS 669 (SEQ ID No 13).

V A second embodiment of a preferred pair of primers according to the invention consists of the pair of universal primers 63 f (SEQ ID No 22) and 1387 r (SEQ ID No 23).

According to one specific embodiment of a process for determining the diversity of nucleic acids in a collection of nucleic acids, 00 _the amplification step using a pair of universal primers can be carried out on a collection of recombinant vectors into each of which has been O inserted a nucleic acid from the collection of nucleic acids under consideration, prior to the step of hybridization with the oligonucleotide probes specific for a particular bacterial kingdom, order, subclass or genus.

Such a process for determining the diversity of the nucleic acids contained in a collection is most particularly applicable to the collections of nucleic acids obtained in accordance with the teaching of the present description.

Thus, Example 3 details a process for preparing a collection of nucleic acids from a soil sample containing organisms, comprising a step of indirect extraction of DNA by dispersion of a soil sample prior to the separation of the cells on a Nycodenz gradient, lysis of the cells and then purification of the DNA on a caesium chloride gradient.

The collection of nucleic acids thus obtained was used as obtained or in the form of inserts into vectors of cosmid type in an amplification process using the abovementioned universal primers for 16S rDNA, and the amplified DNA was then subjected to a step of detection using oligonucleotide probes of sequences SEQ ID No 14 to SEQ ID No 21 which are presented in Table 4.

The results show that a process for preparing a collection of nucleic acids starting with a soil sample containing organisms according to the invention makes it possible to gain access to the DNA of more than 1V% of the total telluric microflora, i.e. 2x 108 cells per gram of soil, whereas the total microflora which can be cultured represents barely 2% of the total microbial population.

SIn order to determine the phylogenetic diversity of a collection of nucleic acids prepared in accordance with the invention, 47 sequences of 5 the 16S rRNA gene were isolated and sequenced. These sequences 00 correspond, respectively, to the nucleotide sequences SEQ ID No 60 to SSEQ ID No 106.

The nucleic acids comprising the sequences SEQ ID No 60 to c SEQ ID No 106 also form part of the invention, as do nucleic acids possessing at least 99%, preferably 99.5% or 99.8%, nucleic acid identity with the nucleic acids comprising the sequences SEQ ID No 60 to SEQ ID No 106. Such sequences can be used in particular as probes for screening clones of a DNA library and for thus identifying those, among the clones of the library, which contain such sequences, these sequences being liable to be close to coding sequences of interest, such as sequences encoding enzymes involved in the biosynthetic pathway of antibiotic metabolites, for example polyketides.

Comparison of the sequences of 16S rRNA from a DNA library prepared in accordance with the invention, with the sequences listed in the RDP database (Maidak Cole Parker Garrity G.M., Larsen Li Lilburn McCaughey, Olsen Overbeek R., Pramanik Schmidt Tiedje Woese C.R. (1999) "A new project of the RDP (Ribosomal Database Project)" Nucleic Acids Research Vol.

27: 171-173) made it possible to determine that the nucleic acids contained in a collection of nucleic acids according to the invention originate from a-proteobacteria, from 13-proteobacteria, from 8-proteobacteria, from y-proteobacteria, from actinomycetes and from a genus related to acidobacterium. These results, presented in Table 7 and in the phylogenetic tree in Figure 7, take account of the huge phylogenetic Qdiversity of the nucleic acids contained in a DNA library prepared in ~caccordance with the process according to the invention.

CLONING AND/OR EXPRESSION VECTORS 00 (oo I Each of the nucleic acids contained in a collection of nucleic acids prepared in accordance with the invention can be inserted into a cloning 0 and/or expression vector.

c For this purpose, any type of vector known in the prior art can be used, such as viral vectors, phages, plasmids, phagemids, cosmids, phosmids, vectors of BAC type, P1 bacteriophages, vectors of BAC type, vectors of YAC type, yeast plasmids or any other vector known in the prior art to a person skilled in the art.

Use will advantageously be made according to the invention of vectors which allow a stable expression of the nucleic acids of a DNA library. To this end, such vectors preferentially include transcriptionregulation sequences which are operably linked with the genomic insert so as to allow the initiation and/or regulation of the expression of at least a portion of the said DNA insert.

It results from the text hereinabove that the invention also relates to a process for preparing a collection of recombinant vectors, characterized in that the nucleic acids obtained in step ll-(iv) or in step I-(c) or any other subsequent step of a process for preparing a collection of nucleic acids from a soil sample containing organisms according to the invention are inserted into a cloning and/or expression vector.

Prior to their insertion into a cloning and/or expression vector, the constituent nucleic acids of a collection of nucleic acids according to the invention can be separated as a function of their size, for example by electrophoresis on an agarose gel, where appropriate after digestion with a restriction endonuclease.

SAccording to another aspect, the average size of the constituent nucleic acids of a collection of nucleic acids according to the invention can be rendered into a substantially uniform size by carrying out a step of 00 _physical rupture prior to their insertion into the cloning and/or expression Svector.

O Such a step of physical or mechanical rupture of nucleic acids can Nconsist of successive passages of these nucleic acids, in solution, in a metal channel about 0.4 mm in diameter, for example the channel of a syringe needle having such a diameter.

The average size of the nucleic acids can be, in this case, between 30 and 40 kb in length.

The construction of the vectors that are preferred according to the invention is represented schematically in Figures 25 (conjugative integrative cosmid) and 26 (integrative BAC).

Cloning and/or expression vectors which can be used advantageously for the purposes of inserting nucleic acids contained in a DNA library or collection according to the invention are, in particular, the vectors described in European patent No EP 0 350 341 and in US patent No 5 688 689, such vectors being especially suitable for the transformation of actinomycete strains. Such vectors contain, besides an insert DNA sequence, an attachment sequence att and a DNA sequence encoding an integrase (int sequence) which is functional in actinomycete strains.

However, it has been observed according to the invention that certain cloning and/or expression vectors had drawbacks and that their theoretical functional capacity was not achieved in practice.

Thus, it was seen that the integration system contained in vectors of the prior art, and in particular in the vectors described in European Q)patent No EP 0 350 341, do not in reality allow good integration of the DNA insert from the library into the bacterial chromosome.

rJ') Starting from the hypothesis that the functional defects in the integration of such vectors into the bacterial chromosome were due to a defect in the expression of the integrase gene present in these vectors, the 00oo t Applicant first attempted to increase the expression of the integrase gene Sby replacing the initial transcription promoter with a transcription promoter O capable of significantly increasing the number of integrase transcripts.

SThe results were disappointing and the function of integration of these vectors into the chromosome was not improved.

Surprisingly, it has been shown according to the invention that the integrase expression difficulties contained in this family of integrative vectors did not lie in the amount of transcript expression, but in the stability of the transcripts.

According to a second hypothesis, the Applicant was able to show that the stability defect of the integrase transcripts was caused by defects in termination of the transcription of the corresponding messenger RNA.

The Applicant thus inserted a stop site placed downstream of the sequence encoding the integrase of the vector so as to obtain a messenger RNA of given size. The insertion of an additional termination signal downstream of the nucleotide sequence encoding the integrase of the vector made it possible to obtain a family of integrative vectors of cosmid type and of BAC type.

Preferentially, the stop site is placed downstream of the attachment site att.

In addition, the Applicant has developed novel conjugative vectors and novel replicative vectors of cosmid type and novel conjugative vectors of BAC type which can be used advantageously to insert constituent nucleic acids of a collection of nucleic acids prepared according to the Sprocess of the invention.

When the insertion of DNA fragments of average size is desired, vectors of the cosmid type, capable of receiving inserts having a maximum size of about 50 kb, are preferably used.

00oo Such cosmid vectors are most particularly suitable for inserting Sconstituent nucleic acids of a collection of nucleic acids obtained according O to the process of the invention comprising a first step of direct DNA extraction by mechanical lysis of the organisms contained in the initial soil sample.

When the insertion of large nucleic acids, in particular of nucleic acids greater than 100 kb in size, or even greater than 200, 300, 400, 500 or 600 kb, is desired, use will then preferentially be made of vectors of the BAC type which are capable of receiving DNA inserts of such a size.

Such vectors of BAC type are most particularly suitable for inserting constituent nucleic acids of a collection of nucleic acids obtained in accordance with the process according to the invention, in which the first step consists of an indirect extraction of the DNA by prior separation of the organisms contained in the initial soil sample and removal of the macroconstituents from the said soil sample.

In particular, vectors of the BAC type are advantageously used to insert large nucleic acids containing, at least partially, the nucleotide sequence of an operon.

Thus, the process for preparing a collection of recombinant cloning and/or expression vectors according to the invention is also characterized in that the cloning and/or expression vector is of the plasmid type.

According to another aspect, such a process is characterized in that the cloning and/or expression vector is of the cosmid type.

O According to a first aspect, it can be a cosmid which is replicative Sin E. coli and integrative in Streptomyces. An entirely preferred cosmid corresponding to such a definition is the cosmid pOS7001 described in Example 3.

According to yet another aspect, the cosmid vector is conjugative oo Iand integrative in Streptomyces.

In general, conjugative vectors of cosmid type or of BAC type, Swhich comprise in their nucleotide sequences a unit recognized by the cN cellular enzymatic machinery known as a "conjugation origin", are used whenever it is desired to avoid resorting to laborious transformation techniques that are difficult to automate.

For example, the transfection of vectors initially harboured by E. coli cells into Streptomyces cells conventionally requires a step of recovering the recombinant vector contained in the Escherichia coli cells, and purifying it prior to the step of transforming Streptomyces protoplasts.

It is commonly accepted that a transfection of an assembly of 1000 Escherichia coli clones into Streptomyces requires the production of about 8000 clones in order for each E. coli clone to have a chance of being represented.

Conversely, a step of transfection by conjugating a vector harboured by E. coli into Streptomyces cells requires the same number of clones of each of the microorganisms, the conjugation step taking place "clone to clone" and moreover not comprising the technical difficulties associated with the step for transferring genetic material by transformation of protoplasts, for example in the presence of polyethylene glycol.

In order to optimize the construction of a DNA library in Streptomyces, novel conjugative vectors of cosmid type and of BAC type which are of a nature to allow maximum efficacy of the conjugation step have been developed according to the invention.

SIn particular, the novel conjugative vectors according to the invention have been constructed by placing a selection marker gene at the Send of the DNA of the vector which is transferred into the recipient bacterium at the end. This improvement to the conjugative vectors of the prior art makes it possible to positively select only the recipient bacteria oo t which have received all of the vector DNA and, consequently, all of the Sinsert DNA of interest.

0 Cosmids which are conjugative and integrative in Streptomyces C and which are preferred according to the invention are the cosmids pOSV303, pOSV306 and pOSV307 described in Example According to another aspect, a process for preparing a collection of recombinant vectors according to the invention is carried out using a cosmid which is replicative both in E. coli and in Streptomyces. Such a cosmid is advantageously the cosmid pOS700R described in Example 6.

According to yet another aspect, the above process can be carried out with a cosmid which is replicative in E. coli and Streptomyces and conjugative in Streptomyces.

Such a replicative and conjugative cosmid can be obtained from a replicative cosmid in accordance with the invention, by inserting a suitable transfer origin, such as RK2, as described in Example 5 for the construction of the vector pOSV303.

According to another advantageous embodiment of the process for preparing a collection of recombinant vectors according to the invention, use is made of a cloning and/or expression vector of BAC type.

According to a first aspect, the vector of the BAC type is integrative and conjugative in Streptomyces.

In an entirely preferred manner, such a BAC vector which is integrative and conjugative in Streptomyces is the vector BAC pOSV403 described in Example 8 or else the vectors BAC pMBD-1, pMBD-2, pMBD- 3, pMBD-4, pMBD-5 and pMBD-6 described in Example O A subject of the invention is also a recombinant vector, characterized in that it is chosen from the following recombinant vectors: a) a vector comprising a constituent nucleic acid of a collection of oo nucleic acids according to the invention; C b) a vector as obtained according to a process which avoids any involvement of the action of a restriction endonuclease on the DNA fragment to be inserted, as described previously.

In an entirely preferable manner, the invention also relates to a vector chosen from the following vectors: the cosmid pOS7001; the cosmid pOSV303; the cosmid pOSV306; the cosmid pOSV307; the cosmid pOS700R; the vector BAC pOSV403; the vector BAC pMBD-1; the vector BAC pMBD-2; the vector BAC pMBD-3; the vector BAC pMBD-4; the vector BAC the vector BAC pMBD-6.

The invention also relates to a collection of recombinant vectors as obtained according to any one of the processes according to the invention.

Process for preparing a recombinant cloning and/or expression vector according to the invention.

The conventional techniques for inserting DNA into a vector in order to prepare a recombinant cloning and/or expression vector conventionally involve a first step in which a restriction endonuclease is incubated both with the DNA to be inserted and with the recipient vector, thus creating compatible ends between the DNA to be inserted and the 00oo vector DNA, allowing the assembly of the two DNAs before a final ligation cstep allowing the production of the recombinant vector.

0 However, such a conventional technique has notable drawbacks, most particularly when it is desired to insert large nucleic acids into a cloning and/or expression vector.

Specifically, the prior action of a restriction enzyme on the DNA fragments intended to be inserted into a vector is liable to appreciably reduce the size of this DNA prior to its insertion into the vector. It goes without saying that a significant reduction in the size of the DNA prior to its insertion into a vector is a situation that is particularly unfavourable when it is desired to clone large fragments of DNA liable to contain all of the coding sequences and, where appropriate, also the regulatory sequences, of an operon whose expression constitutes a complete biosynthetic pathway of a metabolite of industrial interest, and most particularly of a compound of therapeutic interest.

To overcome the drawbacks of the prior art, two processes have been developed according to the invention, for preparing a recombinant cloning and/or expression vector which do not use a restriction endonuclease on the DNA to be inserted prior to its introduction into the vector. Such processes are consequently entirely suitable for cloning long DNA fragments liable to contain, at least partially, all of the coding sequences and, where appropriate, also the regulatory sequences, of a complete operon responsible for a biosynthetic pathway.

SAccording to a first aspect, one process for preparing a Srecombinant cloning and/or expression vector according to the invention is characterized in that the insertion of a nucleic acid into the cloning and/or expression vector comprises the following steps: 00 opening the cloning and/or expression vector at a chosen Scloning site, using a suitable restriction endonuclease;

N

adding a first homopolymeric nucleic acid at the free 3' end of the open vector; adding a second homopolymeric nucleic acid, whose sequence is complementary to the first homopolymeric nucleic acid, at the free 3' end of the nucleic acid to be inserted into the vector; assembling the nucleic acid of the vector and the nucleic acid by hybridizing the first and second homopolymeric nucleic acids of mutually complementary sequence; closing the vector by ligation.

Such a process is described in Examples 10 and 13 below.

Advantageously, the above process can comprise the following characteristics, separately or in combination: the first homopolymeric nucleic acid is of poly(A) or poly(T) sequence; S- the second homopolymeric nucleic acid is of poly(T) or poly(A) Ssequence.

In an entirely preferred manner, the homopolymeric nucleic acids have a length of between 25 and 100 nucleotide bases, preferably 00 Sbetween 25 and 70 nucleotide bases.

SThe process for preparing a recombinant cloning and/or O expression vector described above is particularly suitable for the construction of DNA libraries in vectors of BAC type. Thus, according to one advantageous embodiment of the process for preparing a recombinant vector described above, the said process is also characterized in that the size of the nucleic acid to be inserted is at least 100 kb and preferably at least 200, 300, 400, 500 or 600 kb.

Such a preparation process is thus particularly suited to the insertion of nucleic acids contained in a collection of nucleic acids obtained according to the process of the invention.

In order to allow the insertion of large DNA fragments into cloning and/or expression vectors, a second process has been developed according to the invention, which makes it possible to dispense with any use of a restriction endonuclease on the DNA intended to be inserted into the vector.

Such a process for preparing a recombinant cloning and/or expression vector according to the invention is characterized in that the step of inserting a nucleic acid into the said cloning and/or expression vector comprises the following steps: creation of blunt ends on the ends of the nucleic acid of the collection by removing the protruding 3' sequences and filling in the protruding 5' sequences; S- opening the cloning and/or expression vector at a chosen q cloning site using a suitable restriction endonuclease; adding complementary oligonucleotide adapters; 00 creation of blunt ends at the ends of the vector nucleic acid by C removing the protruding 3' sequences and filling in the protruding 0 sequences, then dephosphorylating the 5' ends in order to prevent a recircularization of the vector; inserting the nucleic acid of the collection into the vector by ligation.

Preferably, the removal of the protruding 3' sequences is carried out using an exonuclease, such as the Klenow enzyme.

Preferably, the filling in of the protruding 5' sequences is carried out using a polymerase, and most preferably T4 polymerase, in the presence of the four nucleotide triphosphates.

A process for preparing a recombinant cloning and/or expression vector by removing the protruding 3' sequences and filling in the protruding sequences as described above is particularly suitable for the construction of DNA libraries from vectors of cosmid type.

Such a process for obtaining recombinant vectors is described in Example 12.

In one specific method for preparing a recombinant vector according to the invention, oligonucleotides comprising one or more rare restriction sites are added to the vector in the cloning site of the DNA to be inserted, in accordance with the teaching of Example 10. This addition of Soligonucleotides facilitates the subsequent recovery of the inserts without Scleavage thereof.

HOST CELLS 00 oo Although any type of host cell can be used for the transfection or transformation with a nucleic acid or a recombinant vector according to the Sinvention, in particular a prokaryotic or eukaryotic host cell, host cells Nwhose physiological, biochemical and genetic properties are well characterized, which can be cultured easily on a large scale and whose culturing conditions for the production of metabolites are well known will preferably be used.

Preferably, the host cell receiving a nucleic acid or a recombinant vector according to the invention is phylogenetically close to the donor organisms initially contained in the environmental sample from which the nucleic acids originate.

In a most preferred manner, a host cell according to the invention should have a similar, or at least close, codon usage in the donor organisms initially present in the environmental sample, most particularly in the soil sample.

The size of the DNA fragments liable to carry the desired nucleotide sequences of interest can be variable. Thus, enzymes encoded by genes with an average size of 1 kb may be expressed using inserts of small size, while the expression of secondary metabolites will require the maintenance in the host organism of much larger fragments, for example from 40 kb to more than 100 kb, 200 kb, 300 kb, 400 kb or 600 kb.

Thus, the host cells of Escherichia col constitute a preferred choice for cloning large DNA fragments.

In a most preferred manner, use will be made of the Escherichia a coli strain known as DH10B and described by Shizuya et al. (1992), for which protocols for cloning into BAC vectors have been optimized.

However, other strains of Escherichia coli can be used advantageously to construct a DNA library according to the invention, such oo as the strains E.coliSure, E.coliDH5 ao, or E.coli294 (ATCC No. 31446).

c- In addition, the construction of a DNA library by transfecting 0 E. coli cells with recombinant vectors according to the invention is also possible, the expression of genes of various prokaryotes such as Bacillus, Thermotoga, Corynebacterium, Lactobacillus or Clostridium having been described in PCT patent application No WO 99/20799.

In general, E. coli host cells can in all cases constitute transient hosts in which recombinant vectors according to the invention may be maintained highly effectively, it being possible for the genetic material to be handled easily and archived stably.

For the purposes of expressing the widest possible molecular diversity, other host cells may also advantageously be used, such as Bacillus, Pseudomonas, Streptomyces, Myxococcus, Aspergillus nidulans or Neurospora crassa cells.

It has also been shown according to the present invention that Streptomyces lividans cells can be used successfully and constitute expression systems complementary to Escherichia coli.

Streptomyces lividans constitutes a model for studying the genetics of Streptomyces and has also been used as a host for the heterologous expression of many secondary metabolites. Streptomyces lividans has, in common with other actinomycetes such as Streptomyces coelicolor, Streptomyces griseus, Streptomyces fradiae and Streptomyces griseochromogenes, the precursor molecules and the regulatory systems required for the expression of all or part of complex biosynthetic pathways, such as, for example, the polyketide biosynthetic pathway or the pathway Sfor the biosynthesis of non-ribosomal polypeptides representing classes of molecules of very diverse structure.

Streptomyces lividans also has the advantage of accepting 00 foreign DNA with high transformation efficacies.

SThus, the invention also relates to a recombinant host cell O comprising a nucleic acid according to the invention, which is a constituent of a collection of nucleic acids prepared according to a process in accordance with the invention, or alternatively a recombinant host cell comprising a recombinant vector as defined above.

According to a first aspect, it may be a recombinant host cell of prokaryotic or eukaryotic origin.

S Advantageously, a recombinant cell according to the invention is a bacterium, and most preferably a bacterium chosen from E. coli and Streptomyces.

According to another aspect, a recombinant host cell according to the invention is characterized in that it is a yeast or a filamentous fungus.

The invention also relates to a collection of recombinant host cells, each of the constituent host cells of the collection comprising a nucleic acid originating from a collection of nucleic acids prepared in accordance with a process for preparing a collection of nucleic acids from a soil sample containing organisms as described above.

The invention also relates to a collection of recombinant host cells, each of the constituent host cells of the collection comprising a recombinant vector according to the invention.

On account of the large size of the inserts, it is necessary to have maximum transformation efficacy. With this aim, a recipient strain of Streptomyces lividans constitutively expressing the pSAM2 integrase in order to promote the site-specific integration of the vector is preferred. For Sthis, the int gene under the control of a strong promoter is integrated into the chromosome. The overproduction of integrase does not induce any excision phenomena (Raynal et al., 1998).

The production of a novel metabolite from the insert might be oo t toxic for Streptomyces if the insert does not contain genes for resistance to Sthe antibiotic produced or if this gene is not expressed or only expressed to Sa small extent. The capacity of the various genes for allowing c Streptomyces ambofaciens to resist the antibiotic that it produces has been studied (Gourmelen et al., 1998; Pernodet et al., 1999). Some of these genes encode transporters of ABC type which are liable to impart a broad spectrum of resistance. These genes can be introduced into and overexpressed in the Streptomyces lividans host strain.

Conversely, a strain that is hypersensitive to antibiotics can be used (Pernodet et al., 1996) in order to detect the presence of resistance genes in the library. Specifically, in antibiotic-producing microorganisms, these resistance genes are often associated with the genes for the biosynthetic pathway of the antibiotic. The selection of resistance clones can make it possible to carry out a first sorting easily before the more complex tests for detecting a novel metabolite produced by the clone.

ISOLATION AND CHARACTERIZATION OF NOVEL NUCLEOTIDE SEQUENCES ENCODING POLYKETIDE SYNTHASES.

According to the invention, a collection of recombinant host cells was obtained after transfecting host cells with a collection of recombinant vectors each containing a nucleic acid insert originating from a collection of nucleic acids prepared in accordance with the process according to the invention.

SMore specifically, the DNA fragments obtained according to the process of the invention, in which a step of indirect extraction of DNA from the organisms contained in the soil sample is carried out, were first cloned into the integrative cosmid pOS7001.

The step of inserting DNA fragments into the integrative cosmid 00 n pOS7001 was carried out according to the process of the invention in which homopolymeric polynucleotide tails poly(A) and poly(T) were added to the S3' end of the vector nucleic acid and of the DNA fragments to be inserted, c respectively.

The recombinant vectors thus constructed were encapsidated in lambda phage heads and the phages obtained were used to infect E. coli cells according to techniques that are well known to those skilled in the art.

A library of about 5000 Escherichia coli clones was obtained.

This library of clones was screened with pairs of primers specific for a nucleotide sequence encoding an enzyme involved in the polyketide biosynthetic pathway, the type I PKS enzyme, also known as (3-ketoacyl synthase.

It is recalled here that polyketides constitute a chemical category of wide structural diversity comprising a large number of molecules of pharmaceutical interest such as tylosin, monensin, vermectin, erythromycin, doxorubicin or FK506.

Polyketides are synthesized by condensation of acetate molecules under the action of enzymes known as polyketide synthases (PKSs). Two types of polyketide synthase exist. The type II polyketide synthases are generally involved in the synthesis of polycyclic aromatic antibiotics and catalyze the iterative condensation of acetate units.

The type I polyketide synthases are involved in the synthesis of macrocyclic or macrolide polyketides and constitute modular multifunctional enzymes.

SGiven their therapeutic interest, there is a need in the state of the q art to isolate and characterize novel polyketide synthases which can be used for the production of novel pharmaceutical compounds, in particular novel pharmaceutical compounds with antibiotic activity.

r 5 The screening of the library of recombinant clones described i above using PCR primers which selectively amplify nucleotide sequences c encoding type I polyketide synthases has made it possible to identify recombinant clones containing DNA inserts comprising a nucleotide sequence encoding novel polyketide synthases. The nucleotide sequences encoding these novel polyketide synthases are referenced as the sequences SEQ ID No 33 to SEQ ID No 44 and SEQ ID No. 115 to SEQ ID No. 120.

Another subject of the invention consists of a nucleic acid encoding a novel polyketide synthase I, characterized in that it comprises one of the nucleotide sequences SEQ ID No 34 to SEQ ID No 44 and SEQ ID No. 115 to SEQ ID No. 120.

Preferably, such a nucleic acid is in isolated and/or purified form.

The invention also relates to a recombinant vector comprising a polynucleotide comprising one of the sequences SEQ ID No 34 to SEQ ID No 44 and SEQ ID No. 115 to SEQ ID No. 120.

The invention also relates to a recombinant host cell comprising a nucleic acid chosen from polynucleotides comprising one of the nucleotide sequences SEQ ID No 34 to SEQ ID No 44 and SEQ ID No. 115 to SEQ ID No. 120 as well as to a recombinant host cell comprising a recombinant vector into which is inserted a polynucleotide comprising one of the nucleotide sequences SEQ ID No 34 to SEQ ID No 44 and SEQ ID No.

115 to SEQ ID No. 120.

Advantageously, the recombinant vectors containing a DNA insert 3 encoding a novel type I polyketide synthase according to the invention are cloning and expression vectors.

Preferably, a recombinant host cell as described above is a bacterium, a yeast or a filamentous fungus.

oo tt The amino acid sequences of novel polyketide synthases originating from organisms contained in a soil sample were deduced from 0 the nucleotide sequences SEQ ID No 34 to SEQ ID No 44 and SEQ ID No.

115 to SEQ ID No. 120 above. They are polypeptides comprising one of the amino acid sequences SEQ ID No 48 to SEQ ID No 59 and SEQ ID No. 121 to 126.

The invention also relates to novel polyketide synthases comprising an amino acid sequence chosen from the sequences SEQ ID No 48 to SEQ ID No 59 and SEQ ID No. 121 to SEQ ID No. 126.

The nucleotide sequence SEQ ID No. 114 which comprises six open reading frames respectively encoding the polypeptides of sequences SEQ ID No. 121 to SEQ ID No. 126 also forms part of the invention.

The nucleotide sequence SEQ ID No. 113 of the a26G1 cosmid, which contains the sequence complementary to the sequence SEQ ID No. 114 also forms part of the invention.

Genomic DNA originating from pure bacterial strains, such as Streptomyces coelicolor (ATCC No. 101.478), Streptomyces ambofaciens (NRRL No. 2.420), Streptomyces lactamandurans (ATCC No. 27.382), Streptomyces rimosus (ATCC No. 109.610), Bacillus subtilis (ATCC No. 6633) or Bacillus lichenifornis and Saccharopolyspora erythrea, was also extracted and amplified according to the invention.

A PCR amplification of DNA from each of the bacterial strains described above was carried out using pairs of primers specific for the nucleic acid sequences of type I polyketide synthase.

SNovel bacterial type I polyketide synthase genes were thus able to be isolated and characterized. These are the nucleic acid sequences SEQ SID No 30 to SEQ ID No 32.

A subject of the invention is also, therefore, nucleotide sequences encoding novel type I polyketide synthases chosen from the 00 Spolynucleotides comprising one of the nucleotide sequences SEQ ID No to SEQ ID No 32.

Recombinant vectors comprising the nucleotide sequences encoding novel type I polyketide synthases defined above also form part of the invention.

The invention also relates to recombinant host cells, characterized in that they contain a nucleic acid encoding a novel type I polyketide synthase comprising a nucleotide sequence chosen from the sequences SEQ ID No 30 to SEQ ID No 32 and recombinant host cells comprising a recombinant vector as defined above.

A subject of the invention is also polypeptides encoded by sequences comprising the nucleic acids SEQ ID No 30 to 32, and more specifically polypeptides comprising the amino acid sequences SEQ ID No 47 to SEQ ID No A subject of the invention is also a process for producing a type I polyketide synthase according to the invention, the said production process comprising the following steps: production of a recombinant host cell comprising a nucleic acid encoding a type I polyketide synthase comprising a nucleotide sequence chosen from the sequences SEQ ID No 33 to SEQ ID No 44 and SEQ ID No 30 to SEQ ID No 32 and SEQ ID No. 115 to SEQ ID No. 120; culturing of the recombinant host cells in a suitable culture Smedium; recovery and, where appropriate, purification of the type I polyketide synthase from the culture supernatant or from the cell lysate.

00 The novel type I polyketide synthases obtained according to the 0 process described above can be characterized by binding to an immunoc affinity chromatography column onto which antibodies recognizing these polyketide synthases have been pre-immobilized.

The type I polyketide synthases according to the invention, and more particularly the recombinant polyketide synthases described above, can also be purified by high performance liquid chromatography (HPLC) techniques such as, for example, reverse-phase chromatography techniques or anion-exchange or cation-exchange chromatography techniques, that are well known to those skilled in the art.

The recombinant or non-recombinant polyketide synthases according to the invention can be used for the preparation of antibodies.

According to another aspect, a subject of the invention is also an antibody which specifically recognizes a type I polyketide synthase according to the invention or a peptide fragment of such a polyketide synthase.

The antibodies according to the invention may be monoclonal or polyclonal. The monoclonal antibodies can be prepared from hybridoma cells according to the technique described by Kohler and Milstein C.

(1975), Nature, Vol. 256:495.

The polyclonal antibodies can be prepared by immunizing a mammal, in particular mice, rats or rabbits, with a type I polyketide synthase according to the invention, where appropriate in the presence of an immunity-adjuvant compound, such as complete Freund's adjuvant, Sincomplete Freund's adjuvant, aluminium hydroxide or a compound from C the muramyl peptide family.

For the purposes of the present invention, antibody fragments 5 such as the Fab, Fab', F(ab') 2 or single-chain antibody fragments 00 containing the variable portion (ScFv) described by Martineau et al. (1998) J. Mol. Biol., Vol. 280 (1):117-127 or in US patent 4 946 778, and the O humanized antibodies described by Reinmann KA et al. (1997), AIDS Res.

c Hum. Retroviruses, Vol. 13(11):933-943 or by Leger O.J et al. (1997), Hum. Antibodies, Vol. 8 3-16, also constitute "antibodies".

The antibody preparations according to the invention are useful in particular in qualitative or quantitative immunological tests intended either simply to detect the presence of a type I polyketide synthase according to the invention or to quantify the amount of this polyketide synthase, for example in the culture supernatant or the cell lysate of a bacterial strain capable of producing such an enzyme.

Another subject of the invention consists of a process for detecting a type I polyketide synthase according to the invention or a peptide fragment of this enzyme, in a sample, the said process comprising the steps of: a) placing an antibody according to the invention in contact with the sample to be tested; b) detecting the antigen/antibody complex possibly formed.

The invention also relates to a kit or equipment for detecting a type I polyketide synthase according to the invention in a sample, comprising: i- a) an antibody according to the invention; o b) where appropriate, reagents required for detecting the antigen/antibody complex possibly formed.

r- 5 An antibody directed against a type I polyketide synthase 00o _according to the invention can be labelled using an isotopic or non-isotopic C detectable label, according to processes that are well known to those skilled in the art.

Screening of a DNA library according to the invention using a pair of primers which hybridize with target sequences whose presence is desired, such as sequences of the puromycin biosynthetic pathway, sequences of the linA gene involved in the biodegradation of lindane or sequences encoding type I polyketide synthases, have been detailed hereinabove.

A subject of the invention is thus a process for detecting a nucleic acid of given nucleotide sequence, or whose nucleotide sequence is structurally similar to a given nucleotide sequence, in a collection of recombinant host cells according to the invention, characterized in that it comprises the following steps: placing the collection of recombinant host cells in contact with a pair of primers which hybridize with the given nucleotide sequence or which hybridize with the nucleotide sequence that is structurally similar to a given nucleotide sequence; carrying out at least three amplification cycles; detecting any nucleic acid amplified.

For the amplification conditions that are appropriate as a function of the desired target sequences, a person skilled in the art may advantageously refer to the examples below.

O According to another aspect, the invention also relates to a process for detecting a nucleic acid, given nucleotide sequences or nucleotide sequences that are structurally similar to a given nucleotide sequence, in a collection of recombinant host cells according to the invention, characterized in that it comprises the following steps: 00o placing the collection of recombinant host cells in contact with a probe which hybridizes with the given nucleotide sequence or which 0 hybridizes with a nucleotide sequence that is structurally similar to the given nucleotide sequence; detecting the hybrid possibly formed between the probe and the nucleic acids included in the vectors of the collection.

To carry out the screening of a DNA library according to the invention in order to detect the presence of a nucleotide sequence encoding a polypeptide capable of degrading lindane, the recombinant clones of interest were detected on the basis of their phenotype corresponding to their capacity to degrade lindane. With this aim, the clones isolated and/or sets of clones of the DNA library prepared were cultured in a culture medium in the presence of lindane and the lindane degradation was observed by the formation of a cloudy halo in the immediate environment of the cells.

The invention also relates to a process for identifying the production of a compound of interest by one or more recombinant host cells in a collection of recombinant host cells according to the invention, characterized in that it comprises the following steps: culturing the recombinant host cells of the collection in a suitable culture medium; detecting the compound of interest in the culture supernatant or in the cell lysate of one or more of the recombinant cells cultured.

(A subject of the invention is also a process for selecting a Srecombinant host cell which produces a compound of interest in a collection of recombinant host cells according to the invention, characterized in that it comprises the following steps: 5 culturing recombinant host cells of the collection in a suitable 00oo culture medium; C detecting the compound of interest in the culture supernatant or O in the cell lysate of one or more of the recombinant host cells cultured; selecting recombinant host cells which produce the compound of interest.

The invention also relates to a process for producing a compound of interest, characterized in that it comprises the following steps: culturing a recombinant host cell selected according to the process described above; recovering and, where appropriate, purifying the compound produced by the said recombinant host cell.

The invention also relates to a compound of interest, characterized in that it is obtained according to the process described above.

A compound of interest according to the invention can consist of a polyketide produced by means of expressing at least one nucleotide sequence comprising a sequence chosen from the sequences SEQ ID No 33 to 44 and SEQ ID No 30 to 32 and SEQ ID No. 115 to SEQ ID No.

120.

The invention also relates to a composition comprising a polyketide produced by means of expressing at least one nucleotide sequence comprising a sequence chosen from the sequences SEQ ID No 33 to SEQ ID No 44 and SEQ ID No 30 to SEQ ID No 32 and SEQ ID No. 115 to SEQ ID No. 120.

A polyketide produced by means of expressing at least one nucleotide sequence above is preferentially the product of the activity of several coding sequences included in a functional operon whose translation products are the various enzymes required for the synthesis of a polyketide, one of the above sequences being included and expressed in oo the said operon. Such an operon comprising a nucleic acid sequence according to the invention encoding a polyketide synthase can be 0 constructed, for example, according to the teaching of Borchert et al.

N (1992).

The invention also relates to a pharmaceutical composition comprising a pharmacologically active amount of a polyketide according to the invention, where appropriate in combination with a pharmaceutically compatible vehicle.

Such pharmaceutical compositions will advantageously be adapted for the administration, for example parenteral administration, of an amount of a polyketide synthesized by a type I polyketide synthase according to the invention ranging from 1 pg/kg per day to 10 mg/kg per day, preferably at least 0.01 mg/kg per day and most preferably between 0.01 and 1 mg/kg per day.

The pharmaceutical compositions according to the invention can be administered either orally, rectally, parenterally, intravenously, subcutaneously or intradermally.

The invention also relates to the use of a polyketide obtained by means of expressing a type I polyketide synthase according to the invention, for the manufacture of a medicinal product, in particular a medicinal product with antibiotic activity.

The invention will also be illustrated, without however being limited, by the figures and examples below.

SFigure 1 illustrates the scheme of the various lysis steps carried out according to protocols 1, 2, 3n, 4a, 4b, 5a and 5b described in Example 1.

Figure 2 illustrates an electrophoresis on 0.8% agarose gel of the DNAs extracted from 300 mg of soil No 3 (St Andr6 coast) after various oo lysis treatments (protocols 1 to 5, cf. Fig. M: lambda phage molecular Sweight marker.

Figure 3 illustrates the proportion of various genera of actinomycetes cultured after treatments 1 to 5 (cf. Fig. The cfu (colonyforming unit) number was determined on a medium which is selective for this group of bacteria. A total number of about 400 colonies was analysed.

Figure 4 illustrates the recovery of lambda phage DNA digested with Hindlll added to the soils at different concentrations before or after grinding. The treatments T (heat shocks) and S (sonication) are additional lysis treatments. The quantification was carried out by analysis with a phospho-imager after dot-blot hybridization. A sample of each soil was used for each concentration of lambda phage added. The characteristics of the soils are given in Table 1. The samples corresponding to 10 and 15 pg pf DNA added were not treated.

Figure 5 illustrates the PCR amplification of the DNAs extracted from soil No 3 according to protocols 1, 2, 3, 5a and 5b. The primers FGPS 122 and FGPS 350 (Table 2) were used to target indigenous Streptosporangium spp. The DNAs extracted were used undiluted or at and 100-fold dilutions. M: 123 bp molecular weight marker (Gibco BRL), C: DNA-free amplification control.

(Z Figure 6 illustrates the amounts of DNA extracted after inoculating Sspores or mycelium of S. lividans OS48.3 inoculated into the soils at different concentrations. The amounts of mycelium added to the soil correspond to the number of spores inoculated in the germination medium.

About 50% of the spores germinated and the number of cells or genomes 00 _contained in the germinated spore hyphae was not determined. The c amounts of spores and of mycelium inoculated are thus not directly o comparable. The extraction protocol was carried out according to protocol 6 (cf. materials and methods section). Symbol indicates that RNA was included in the extraction buffer. The target DNA was amplified by PCR with the primers FGPS 516 and FGPS 517, and the quantification was carried out with a phospho-imager after dot-blot hybridization using the probe FGPS 518. A sample of each soil was used for each concentration of hyphae or of spores. The characteristics of the soils are described in Table 1.

Figure 7 represents the phylogenetic tree obtained with the Neighbour Joining algorithm, positioning the 16S rDNA sequences contained in the soil DNA library, relative to cultured reference bacteria.

In grey: the sequences obtained from the pools of clones of the library.

The bootstrap values are indicated at the nodes, after re-sampling of 100 repetitions. The scale bar indicates the number of substitutions per site. The access number of the sequences in the Genbank database is indicated in parentheses.

Figure 8 represents a scheme of the vector pOSint 1.

Figure 9 represents a scheme of the vector pWED 1.

Figure 10 represents a scheme of the vector pWE15 (ATCC No 37503).

O Figure 11 represents a scheme of the vector pOS7001.

D Figure 12 represents a scheme of the vector pOSV01O.

Figure 13 represents the fragment containing a "cos" site inserted into the plasmid pOSV010 during construction of the vector pOSV303.

Figure 14 represents a scheme of the vector pOSV303.

oo SFigure 15 represents a scheme of the vector pE116.

SFigure 16 represents a scheme of the vector pOS700R.

O Figure 17 represents a scheme of the vector pOSV001.

SFigure 18 represents a scheme of the vector pOSV002.

Figure 19 represents a scheme of the vector pOSV014.

Figure 20 represents a scheme of the vector pBAC11.

Figure 21 represents a scheme of the vector pOSV403.

Figure 22 represents the electrophoresis gels for DNA of the library after digestion with the enzymes BamHI and Dral of the positive clones of the library screened with the PKS-I oligonucleotides.

Figure 23 illustrates the production of puromycin by the S. lividans recombinants compared with the production of the S. alboniger wild-type strain.

Figure 24 illustrates the alignment of soil PKSs with the conserved active sites of other PKSs. The references for each peptide are indicated.

The beta-ketoacyl synthase domains were aligned using the GCG PILEUP program (Wisconsin Package Version 9.1, Genetics Computer Group, Madison, Wisc).

Figure 25 illustrates the construction of an integrative conjugative cosmid.

Figure 26 illustrates the construction of an integrative conjugative

BAC.

Figure 27 illustrates the scheme for constructing the vector pOSV308.

(Figure 28 illustrates the scheme for constructing the vector 0 pOSV306.

SFigure 29 illustrates the scheme for constructing the vector pOSV307.

Figure 30 illustrates the scheme for constructing the vector 00 n PMBD-1.

N Figure 31 shows a detailed map of the plasmid pMBD-2 and also O a scheme for constructing the vector pMBD-3.

Figure 32 illustrates a detailed map of the plasmid pMBD-4.

Figure 33 illustrates the scheme for constructing the plasmid from the plasmid pMBD-1.

Figure 34 illustrates the detailed map of the vector pBTP-3.

Figure 35 illustrates the scheme for constructing the vector pMBD-6 from the vector pMBD-1.

Figure 36 illustrates the map of the cosmid a26G1 whose DNA insertion contains open reading frames encoding several polyketide synthases.

Figure 37 is a scheme representing the DNA insertion strand) of the cosmid a26G1, on which are positioned the various reading frames encoding several polyketide synthases.

EXAMPLES:

EXAMPLE 1: Process for preparing a collection of nucleic acids from a soil sample containing organisms, comprising a step of direct extraction of DNA from the soil sample.

1. MATERIAL AND METHODS 1.1 SOILS: The characteristics of the six soils used in this study are listed g in Table 1.

SThe clay content and organic matter content range, respectively, from 9 to 47% and from 1.7 to the pH ranging from 4.3 to 5.8.

Soil samples were collected from the surface layer of 5 to 10 cm oo in depth. All the visible roots were removed and the soils were stored at S 4°C for a few days if necessary, after which they were dried for 24 hours at 0 room temperature and screened (average mesh size: 2 mm) and then stored for up to several months at 1.2 BACTERIAL STRAIN AND CULTURE CONDITIONS: The extracellular DNA and the bacterial strains supplying vegetative cells, spores or hyphae, used to inoculate the soil samples, were chosen such that their presence could be specifically monitored.

In order to obtain large amounts of extracellular DNA, the lysogenic strain of E.coli 1192 Hfr P4X (metB), containing the lambda phage CI857 Sam7, was cultured on Luria-Bertani (LB) medium for two hours at 30°C, then for 30 minutes at 400C, and then for 3 hours at 370C.

The lambda phage DNA was extracted according to the technique desribed by Sambrook J. et al. (1989) Molecular Cloning: A Laboratory Manual, 2nd, ed. Cold Spring Harbor Laboratory, Cold Spring Harbor N.Y.

The avirulent strain of Bacillus anthracis (STERNE 7700) was used as bacterial cell inoculum. Bacillus anthracis was multiplied on a "trypticase soy broth (TSB) (Biomerieux, Lyons, France) culture broth for about 6 hours, checking that the OD 6 00 was maintained below 0.6. These conditions allow the growth of vegetative cells without formation of spores (Patra et al., (1996), FEMS Immunol. Medical Microbiology, vol.15:223- 231). The spores of Streptomyces lividans OS48.3 (Clerc-Bardin et al., unpublished) were removed mechanically from the organism cultures on a R2YE medium (Hopwood et al., (1985), Genetic Manipulation of Streptomyces-A Laboratory Manual. The John Innes Foundation, Norwich, United Kingdom). The hyphae of S.lividans OS48.3 were obtained from pre-germination spores, since it was expected that the use of short hyphae t- 5 would minimize the rupture and subsequent loss of DNA. The spores were 00 suspended in TES buffer (N-tris [hydroxymethyl]methyl-2-aminoethanesulphonic acid; Sigma-Aldrich Chimie, France) (0.05 M; pH 8) (Holben WE O et al., (1988), APPL. Environ. Microbiol. vol. 54:703-711), and were then c subjected to a heat shock (50°C for 10 minutes followed by cooling under cold running water and then addition to an equal volume of pre-germination medium yeast extract, 1% casamino acids, 0.01 M CaCI 2 The solution was incubated at 370C on an agitator. The proportion of germinated spores was estimated at about 50%, in accordance with the results of Hopwood et al. (1985). After centrifugation, the pellets were resuspended in TES buffer, added to 3% TSB medium and incubated at 37°C until an OD 450 of 0.15 was obtained (Hopwood et al., (1985)).

Streptomyces hygroscopicus SWN 736 and Streptosporangium fragile AC1296 (Institute Pushino, Moscow) were cultured according to techniques described by Hickey and Tresner (1952).

The DNA of the spores and hyphae of S. lividans was extracted from pure cultures according to the lysis protocol 6 described below (except that no grinding was carried out), while the spores of S. hygroscopicus and S. fragile were extracted by chemical/enzymatic lysis (Hintermann et al., 1981).

1.3 CHOICE OF THE EXTRACTION BUFFER: A TENP buffer (50 mM Tris, 20 mM EDTA, 100 mM NaCI, 1% wt/vol of polyvinylpolypyrrolidone) developed by Picard (1992) was used. Similar buffers were subsequently Sused by other authors (Clegg et al., 1997; Kuske et al., 1998; Zhou et al., 1996).

o The Tris and the EDTA protect the DNA from the nuclease activity, the NaCI provides a dispersant effect and the PVPP absorbs the humic acids and the other phenolic compounds (Holben et al. (1988); 00 SPicard et al., (1992)).

C In this study, the extraction efficacy of this buffer was evaluated at 0 different pH values (6.0-10.0) using 20 different soils having a pH range from 5.8 to 8.3 and an organic matter content of between 0.2 and 6.3%.

These twenty soils (the other characteristics are not indicated) were used only in this experiment. The amount of DNA was determined by colorimetric means as described by Richard (1974), and detailed below.

1.4 PROTOCOL OF IN SITU LYSIS AND OF DNA EXTRACTION: Several protocols using an increasing number of steps were tested in order to evaluate the efficacy of various techniques for lysing the soil microbes in situ. For these experiments, the indigenous soil microflora was targetted in six soils. Additional experiments were carried out in order to study the effects of the lysis treatments on the DNA released, by analysing the quantities and quality of DNA recovered originating from a lambda phage DNA added beforehand to the soils.

Once an optimized protocol (referred to as protocol 6) had been developed, this protocol was used to quantify the DNA originating from indigenous Actinomycetes and of DNA originating from gram-positive bacteria inoculated in the selected soils. In all cases, the soil samples were dried and screened as described above.

After grinding, 0.5 ml of TENP buffer was added to 200 mg dry weight of soil, except for protocol 1 in which the buffer was added to an unrground soil.

SFor the various lysis treatments (see below), the soil suspensions Swere vortexed for 10 minutes and centrifuged (4000 x g for five minutes), after which an aliquot fraction (25 pl) of the supernatant was analysed by gel electrophoresis agarose).

Another aliquot fraction of the supernatant representing a known oo t volume, generally 350 pl, was precipitated with isopropanol.

Five aliquot fractions (representing the DNA derived from 1 g of 0 soil) were combined and resuspended in 100 gl of a sterile TE buffer mM Tris, 1 mM EDTA, pH 8.0) before purification (protocol D, see below) and quantification, either by hybridization (Dot-Blot) of the total DNA, or by hybridization (Dot-Blot) of the PCR amplification products (see below).

The hybridization signals were quantified by phosphorescence imaging ("phospho-imaging" technique, see below).

EVALUATION OF THE METHODS OF IN SITU CELL LYSIS: The quality and quantity of DNA extracted after an increasing number of lysis treatment steps (protocol 2-5b) were compared with those of the extracellular DNA obtained after washing the soil with an extraction buffer (protocol 1; see also Figure 1).

Protocol 1: No lysis treatment.

The TENP buffer was added to an unground soil, and a DNA extraction step was carried out as described above.

Protocol 2: Grinding of the soil followed by a DNA extraction.

Two different types of device were used to grind the soil.

SIn order to compare their respective efficacy, 5 g of dry soil were .,ground for 30 seconds in a grinder containing tungsten rings, or for times rVf) varying up to 60 minutes in a soil grinder containing a mortar and agate beads (20 mm in diameter).

The TENP buffer was then added and the DNA was extracted as 00oo In described above.

The gel electrophoresis results showed that grinding for O 40 minutes using agate beads was necessary in order to obtain amounts of extracted DNA equivalent to those obtained after grinding for 30 seconds using tungsten rings.

The size distribution of the DNA fragments is similar whatever the method used.

Thus, these treatments were considered as equivalent and the one which is used in the protocols described below will consequently not be specified.

In protocols 3 to 5, the efficacy of several other lysis treatments subsequent to the grinding of the soil was tested, either separately or in different combinations.

Protocol 3: This protocol is identical to protocol 2, except that it comprises a step of homogenization using an Ultra-turrax type mixer (Janker and Kunkel, IKA Labortechnik, Germany) set at half the maximum speed for minutes.

PROTOCOLS 4a and 4b: SThese protocols are identical to protocol 3, except for an additional sonication step.

Two types of sonicator device were compared: a titanium micropoint sonicator (600W Vibracell Ultrasonicator, Bioblock, Illkirch, France) (Protocol 4a) and a sonicator of Cup Horn type (protocol 4b).

00 The Vibracell micropoint producing ultrasound is in direct contact e with the soil solution.

0 As regards the device of Cup Horn type, the soil solution is stored in tubes which are placed in a water bath through which the ultrasound passes.

Preliminary experiments were carried out in order to determine the optimum conditions for the two sonicators (results not presented).

The best compromise, in terms of amount of DNA extracted and fragment size, consists of a sonication with the titanium micropoint and the sonicator of Cup Horn type for 7 and 10 minutes respectively, adjusting the power to 15 W and with 50% active cycles.

Protocols 5a and After sonication with a titanium micropoint or a device of Cup Horn type (protocols 4a and 4b respectively), lysozyme and achromopeptidase were added to each of the enzymes at a final concentration of 0.3 mg/ml.

The soil suspensions were incubated for 30 minutes at 37°C, after which lauryl sulphate at a final concentration of 1% was added, and the suspensions were then incubated for 1 hour at 600C before centrifugation and precipitation as described above.

In addition to the protocols described above, the effect of the sonication (Cup Horn, see protocol 4b) and heat shocks (30 seconds in liquid nitrogen followed by three minutes in boiling water, the treatments Sbeing repeated three times) on lambda phage DNA digested with Hindll a added beforehand to the soil, were examined (see below).

Heat shocks were suggested in the prior art as means for in situ cell lysis (Picard et al. (1992)). However, due to the fact that such a treatment has a harmful effect on the free DNA (see the results section) it 00 kn _was not included in the protocols described above.

o OPTIMIZED PROTOCOL After evaluation of the various lysis treatments, an optimized protocol was defined, which is referred to as protocol 6. Protocol 6 is identical to protocol 5b except that, before sonication, the soil suspensions are subjected to a vortexing treatment and then agitated by rotation on a Wheel for two hours before being frozen at After thawing, the soil suspensions were vortexed for 10 minutes' before sonication. Protocol 6 was used in the experiments in which the soils were inoculated with bacterial cells, as well as in the experiments in which the indigenous actinomycetes were quantified (see below).

1.6 COUNTING BY MICROSCOPE: The efficacy of grinding of the soil as a method for lysing bacterial cells was examined by microscope.

g of dried crude soil were mixed in a Waring Blender device with ml of ultrapure sterilized water for 1.5 minutes; simultaneously, 1 g (dry weight) of ground soil (protocol 2) was suspended in 10 ml by agitation for 10 minutes. The soil suspensions were serially diluted and acridine orange was added to a final concentration of 0.001%.

After 2 minutes, the suspensions were filtered through a Nucleopore brand membrane of 0.2 pm black type. Each filter was rinsed O with lysed sterile water, treated with 1 ml of isopropanol for 1 minute in order to fix the bacterial cells, and then rinsed again.

0 The bacterial cells were counted using a Zeiss Universal epifluorescence microscope with a 100x objective lens. For each of the types of soil, three filters were counted, and at least 200 cells were counted 00 on each of the filters.

S1.7 COUNTING OF THE CULTURABLE ACTINOMYCETES AND TOTAL SNUMBER OF COLONY-FORMING UNITS (CFU): The actinomycetes which survived the lysis treatments (protocols 1-5) were examined specifically with soil No. 3 (Saint Andre coast, see Table 1).

After a 10-fold dilution of a solution of yeast extract (6% weight/volume) and of SDS in order to induce germination (Hayakawa et al. (1988)), the soil suspensions were serially diluted in sterile water, incubated at 40 0 C for 20 minutes and inoculated on HV medium (Hayakawa et al., 1987).

The HV medium was supplemented with actidione (50 mg/I) and nystatin (50 mg/I).

The actinomycete colonies were counted after incubation for 15 days at 28 0

C.

In total, about 400 colonies were examined. The identification was carried out on the basis of the macro- and microscopic morphological characteristics as well as on the analysis of the diaminopimelic acid content of the isolates (Shirling et al., 1966); Staneck et al., 1974; Williams etal.,1993).

The total amount of culturable bacteria (total CFU) was also determined for each of the lysis protocols 1 to 5. The soil suspensions were serially diluted and inoculated in triplicate on a Bennett agar medium (Waksman et al., 1961) supplemented with nystatin and actidione (each at 50 mg/I).

SEach Petri dish was covered with a cellulose nitrate filter (Millipore) and incubated for three days at 28 0 C. After counting the colonies on the membranes, the filters were removed and the Petri dishes oo t were reincubated for 7 days at 28°C and then counted again.

1.8 RECOVERY OF THE LAMBDA PHAGE DNA ADDED TO THE SOILS: The lambda phage DNA was digested with Hindlll, extracted with a phenolchloroform mixture, precipitated and then resuspended in ultrapure sterile water according to standard protocols (Sambrook et al.,1989).

Dilutions corresponding, respectively, to 0, 2.5, 5, 7.5, 10 and pg of DNA/g of dry weight of soil were prepared in 60 i1l volumes. These DNA dilutions were added to 5 g batches of dry soil which were subsequently vortexed vigorously for 5 minutes before grinding.

The lambda phage DNA was also added to a soil before grinding at concentrations corresponding to 0, 10 and 15 pg of DNA/g of dry weight of soil.

After grinding, the extraction buffer was added and the DNA was extracted according to protocol 2 (see above).

1.9 SATURATION OF THE ADSORPTION SITES WITH RNA: In order to determined whether or not the saturation of the nucleic acid adsorption sites of the soil colloids could increase the level of recovery of the DNA, the sandy compost (soil No. 4) and the clayey soil (soil No. 5) were incubated with an RNA solution before any other treatment.

Commercial Saccharomyces cerevisiae RNA (Boehringer Mannheim, Meylan, France) was diluted in phosphate buffer (pH 7.1) and Sadded to the dry, screened soil samples (2 ml/g of soil) at final Sconcentrations of 20, 50 and 100 mg of RNA/g of dry weight of soil.

o The tubes containing the soil suspensions were agitated by rotation for two hours at room temperature. After centrifugation, the soil 5 pellets were dried in an oven (500C) overnight. The lambda phage DNA 00 was then added to the soils 20 or 50 g/g of dry weight of soil) in order c to simulate the fate of the DNA released after cell lysis.

The DNA was extracted according to protocol 2. It was determined thereafter that an identical effect of addition of RNA on the recovery of DNA could be achieved by adding the RNA directly to the extraction buffer.

This simplified procedure was used for the clayey soil No. 5 in the experiments in which the microorganisms were inoculated in the soils.

The RNA was then added at a concentration corresponding to 50 mg of RNA/g of dry weight of soil.

1.10 QUALITATIVE AND QUANTITATIVE DETERMINATION OF THE EFFICACY OF THE EXTRACTION PROTOCOLS: The quality of the DNA (absence of degradation) was estimated on the basis of the size of the DNA fragments or the relative position of the DNA migration bands after electrophoresis of an aliquot fraction of a DNA solution on a 0.8% agarose gel.

The fluorescence intensity allowed a semi-quantitative estimation of the extraction yields.

Another aliquot fraction was used for quantitative determinations of the DNA content by hybridization (Dot-Blot) and analysis with a phospho-imager. The Dot Blot hybridization protocol has been described by Simonet et al. (1990).

SThe hybridization membranes (GeneScreen plus, Life Science Products, Boston, USA) were prehybridized for at least 2 hours in 20 ml of a solution containing 6 ml of 20 x SSC, 1 ml of Denhardt's solution, 1 ml of SDS and 5 mg of salmon sperm DNA.

The hybridization was carried out overnight in the same solution in 00 the presence of a labelled probe prior to two washes of the membranes in San SSC 2 x buffer for 5 minutes at room temperature, followed by a third 0 wash in a SSC 2 x, 0.1% SDS buffer and a fourth wash in an SSC 1 x, 0.1% SDS buffer for 30 minutes at the hybridization temperature.

The hybridization signals were quantified with a Biorad radioanalytical imaging system (Molecular Analyst Software, BIORAD, Ivry-sur-Seine, France).

In order to quantify the total amount of DNA derived from the indigenous microflora, the various soils were extracted according to protocols 1 to 5. The non-amplified DNA was applied to the Dot-Blot membranes and hybridized using the universal probe FGPS431 (Table 2).

This probe, which hybridizes to positions 1392-1406 of the E.coli 16S rDNA gene (Amann et al. (1995)) was labelled at its ends with a 3 2 P ATPoc using a polynucleotide T4 kinase (Boehringer Mannheim, Melan, France).

A calibration curve was prepared using E.coli DH5a DNA. The conversion of the calculations to the soil bacteria required a simplification, starting from the hypothesis that the average number of copies (rrn) is 7, as for E.coli.

The lambda phage DNA digested with Hindll was used to quantify the recovery of the extracellular DNA. Non-amplified extracts from soils, to which lambda phage DNA had been added, were hybridized with lambda phage DNA digested with Hindlll and labelled at random using the Klenow fragment (Boehringer Mannheim, Melan, France).

SThe amounts of DNA were calculated by interpolation using a calibration curve prepared with the purified DNA.

o The total amount of DNA extracted from soils 1, 2, 3, 4 and 6 according to protocol 2 (grinding) was also quantified by colorimetric means according to the technique described by Richard (1974).

oO Briefly, the DNA was mixed with concentrated HCI0 4 (the final cN concentration of HCIO 4 was 1.5 2.5 volumes of this solution were mixed O with 1.5 volumes of DPA (diphenylamine, Sigma-Aldrich, France) and the mixture was left to incubate at room temperature for 18 hours, prior to determination of the OD at 600 nm. The soil DNA extracts were quantified relative to a standard curve prepared with the DNA extracted from E.coli according to the standard protocols (Sambrook et al., (1989)).

1.11 DEVELOPMENT OF A DNA QUANTIFICATION TECHNIQUE USING PCR AMPLIFICATION AND HYBRIDIZATION: For the PCR -amplifications, DNA Taq polymerase (Appligene Oncor, France) was used according to the manufacturer's instructions.

The PCR programme used for all the amplifications is as follows: initial denaturing for 3 minutes at 95°C, followed by 35 cycles consisting of 1 minute at 95°C, 1 minute at 55°C and 1 minute at 720C and then a final extension at 72°C for 3 minutes.

The DNA isolated and purified from Streptosporangium fragile was used as control at concentrations ranging from 100 fg to 100 ng.

In order to amplify specifically the DNA of this bacterial genus, the primers FGPS122 and FGPS350 (Table 2) were selected, which are complementary to a portion of the 16S rDNA, after alignment of the sequences of actinomycetes 16S rDNA. Their specificity was tested on a collection of actinomycetes strains (Streptomyces, Streptosporangium and other highly similar genera).

SThe PCR products were hybridized with the oligonucleotide probe 7 FGPS643 (Table In order to simulate the level of purity routinely obtained with DNA extracted from the soil, controls of pure DNA from S. fragile were mixed with the soil extracts obtained after treatments according to the lysis protocols 4b and 5b and then purified according to oo protocol D.

c Before use, the soil extracts were treated with DNase (one unit of SDNase/ml, Gibco BRL) for 30 minutes at room temperature. The DNase was then inactivated by heating at 65°C for 10 minutes. Verification of the inactivation was carried out by PCR. The humic acid concentrations were measured by spectrophotometry (OD 28 0 nm) against a standard curve of commercial humic acids (Sigma).

Soil solutions treated with undiluted, 10-fold diluted and 100-fold diluted DNase were mixed with from 100 fg to 100 ng of S. fragile DNA before the PCR amplification. In another series of experiments, the increasing concentrations of Streptomyces hygroscopicus DNA (from 100 pg to 1 pg) were added to the S. fragile DNA in order to simulate the presence of non-target DNA and its influence on the PCR process.

1.12 PURIFICATION OF THE CRUDE DNA EXTRACTS: Four DNA purification methods were compared. The DNA was extracted from 1 g (dry weight of soil) according to protocol 4a and resuspended in 100 pil of buffer TE8 (50 mM Tris, 20 mM EDTA, pH Protocol A Elution through two successive Elutip d columns (Schleicher and Schuell, Dassel, Germany) (Picard et al., (1992)).

Protocol B: Elution through a Sephacryl S200 column (Pharmacia Biotech, Uppsala, Sweden) followed by an elution through an Elutip d column (Nesme et al. (1995)).

00 In Protocol C: c Separation using a two-phase aqueous system with 17.9% (weight/weight) of PEG 8000 (Merck, Darmstadt, Germany) and 14.3% (weight/weight) of (NH 4 2

SO

4 (Zaslavsky, (1995)).

After vigorous vortex mixing, the two phases were left at room temperature to separate.

1 ml of each of the phases was transferred into another tube, mixed with 100 j1 of the sample and left at 4°C overnight to allow separation.

The lower phase was dialysed for one hour through a Millipore membrane in the presence of an excess of a TE 7.5 buffer (10 mM Tris, 1 mM EDTA at pH 7.5 and 1M MgCI2) in order to remove the excess salts.

Protocol D: Elution through a Microspin Sephacryl S400 HR column (Pharmacia Biotech, Uppsala, Sweden), followed by elution through an Elutip d column.

Each protocol is completed by a step of precipitation with ethanol and the DNA is resuspended in 10 .l of TE 7.5 buffer. The efficacy of the purification protocols was checked by PCR amplification of undiluted N aliquot fractions of the DNA solutions and of 10-fold and 100-fold diluted Saliquot fractions, using standard protocols (see below).

1.13 RECOVERY OF THE DNA FROM INOCULATED

MICROORGANISMS:

00 The cells, spores and hyphae were washed twice and counted by C counting on a plate or by direct microscopic counting. 5 g batches of dry, O screened soil (soils 2, 3 and 5) were inoculated with 100 l of a suspension of S. lividans spores and hyphae at concentrations corresponding to 0, 103 0 110 5 10 7 and 10 9 spores/g of dry weight of soil, or with B.anthracis vegetative cells at concentrations corresponding to 0, 107 and 10 9 cells per gram of dry weight of soil.

The amounts of S. lividans hyphae were calculated on the basis of the number of spores from which they originate. After addition of the bacterial suspensions, the soil samples were vortexed vigorously for minutes before grinding. The DNA was extracted according to protocol 6 (see below).

PCR amplification followed by Dot-Blot hybridization and phosphorescence imaging (phospho-imaging) was used in order to quantify the amounts of DNA recovered from the cells and spores and from the bacterial mycelium inoculated in the soils.

The DNA extraction was carried out according to lysis protocol 6.

The PCR amplification and the hybridization were carried out as described above. The primers and probes are targetted on chromosome regions located outside the 16S region, and are highly specific for the respective organisms, so as to avoid background signals.

For the soils inoculated with B. anthracis, the primers R499 and R500 were used (Patra et al. (1996)) and the amplification products were hybridized with the oligonucleotide probe C501 (Table 2).

SFor the soils inoculated with S. lividans, the PCR reactions were .carried out using the primers FGPS516 and FGPS517, and the amplification products were hybridized with the oligonucleotide probe FGPS518 (Table 2).

The amplified region is a portion of the cassette constructed o00 specifically to obtain the strain OS48.3 (Clerc-Bardin et al., unpublished).

The calibration counts were obtained in all cases using the O purified DNA from the target organism.

2. RESULTS 2.1 CHOICE OF THE EXTRACTION BUFFER: different soils were used in order to determine the optimum pH of the DNA extraction buffer. For all the soils, the DNA yield increases as the buffer pH increases. The yield for each pH sd), calculated as the percentage of the highest value for each of the soil, is as follows: pH 6.0 31 13; pH 7.0:43 16; pH 8.0:60 14; pH 9.0:82 12; pH 10.0:98 3.

For 16 out of the 20 soils, the highest yield was obtained at pH 10.0, whereas for the other four soils, the highest yield was obtained at pH 9.0. However, at pH 10.0, larger amounts of humic material were released, compared with pH 9.0 (results not presented). Consequently, pH was chosen for all the experiments presented below.

2.2 EFFICACY OF THE DNA EXTRACTION PROTOCOLS: The total DNA from the indigenous soil organisms was extracted and quantified so as to evaluate the efficacy of several in situ cell lysis protocols. Soil samples 1-6 (Table 1) were treated according to protocols 1 to 5 described in the Materials and Methods section (Figure 1).

After the DNA extraction, the soil suspensions were precipitated with isopropanol, and aliquot fractions of the resuspended pellets were analysed by gel electrophoresis, in a first step, in order to estimate the quality and quantity of the DNA released.

However, the colour of the DNA extract turned darker and darker 00 _as the number of lysis steps increased, due to the co-extraction of Scompounds, such as humic acids, with the DNA.

SSome of these dark-coloured crude extracts do not migrate in the expected manner in the agarose gels.

Consequently, the crude DNA solutions were purified (protocol B) before quantification. The gel electrophoreses of the purified solutions obtained after the various lysis treatments are given as examples on soil 3 (Figure 2).

S A visual comparison by ultraviolet radiation of the intensities of the coloured DNA allowed a semi-quantitative estimation of the efficacy of the treatments. Furthermore, the presence of migration profiles of multiple sizes of DNA fragments (discrete bands) and the disappearance of the long fragments indicates that a degradation of the DNA has taken place.

No DNA could be extracted from the clayey soil No. A more precise quantification of the DNA from all the soils, extracted according to protocols 1 to 5, was carried out by Dot-Blot hybridization without a prior PCR amplification step and using an oligonucleotide probe complementary to a highly conserved sequence of the 16S rDNA region (probe FGPS 431, Table 2).

The DNA was detected in the extracts of all the soils after each of the various lysis steps, except for the clayey soil No. The results agree with the estimations made after gel electrophoresis.

V In order to compare with an independent quantification method, the DNA extracted according to protocol 2 (from all the soils except soil No. 5) was also quantified using a colorimetric DNA detection method (Richard, 1974).

Good correlation was found (r 0.88) between the DNA 00oo quantified using this colorimetric technique and the results obtained by DotcBlot hybridization/radio-imaging, confirming the hypothesis that the average O number of copies of the soil bacteria (rrn) is 7.

The hybridization (Dot-Blot) showed that the amounts of extracellular DNA, as determined by extraction without a lysis treatment (protocol ranged from 4 jtg/g for the acidic soil (No. 6) to 36 CIg/g for soil No. 3 (Table 3).

Grinding of the soil (protocol 2) increased the amounts of DNA extracted from all the soils 26 ptg/g of soil for soil No. 6 and 59 [g/g of soil for soil No. 3) (Table 3; Figure 2).

For the two grinding treatments (see the Materials and Methods section), the discrete DNA migration was detected on the agarose gels, indicating that the DNA molecules were partially degraded (Figure 2).

The size of the DNA fragments is between 20 and 0.2 kb. The band intensity of the smallest fragments is very low, indicating that most of the fragments are much bigger than 1 kb.

Protocol 3 comprises a step of homogenization in an Ultra-turrax mixing device after addition of the extraction buffer to the soil samples.

This step leads to an increase in the amounts of DNA extracted, as determined by Dot-Blot hybridization for two of the soils (the sandy soil No. 3 and the acidic soil No. whereas the two soils rich in organic matter (soils No. 1 and No. 2) led to the production of smaller amounts of

DNA.

QProtocols 4a and 4b made it possible to evaluate the effect of two

(N

dtypes of sonication on the yields of DNA from pre-ground and pre- O homogenized soils.

The sonication had no positive effect on the DNA yield, compared with protocol 3, except for soil No. 6. However, the lysis efficacy for the two o00 _types of sonicator differs. For soils 2, 3 and 4, the largest amounts of DNA Sextracted were obtained using the titanium micropoint (Table 3; Figure 2), o whereas for soils Nos. 1 and 6, the DNA yield was higher using the Cup SHorn device.

Contradictory results were also obtained when a step of enzymatic/chemical lysis was added (protocols 5a and 5b) after the sonication step; in certain cases, the amounts of DNA extracted were larger than those recovered according to protocols 4a and 4b, whereas in other cases the yields were lower (Table 3).

2.3 DIRECT COUNTING OF THE MICROORGANISMS: Counting by microscope of the total number of bacterial cells after staining with acridine orange was carried out for all the soils, before and after grinding.

Before grinding, the number of bacteria per gram of dry weight of soil ranged from 1.4 x 109 0.4) in the tropical soil No. 5, to 10 x 109 0.7) in the soil obtained from the Saint-Andre coast (soil No. 3) (Table 1).

After grinding, the number of cells were, respectively, 45, 74, 54, 34 and 75% of the initial values for soils Nos. 1 to 6.

N 2.4 COUNTING OF THE CULTURABLE ACTINOMYCETES BELONGING 0 TO DIFFERENT GENERA: A modification in the populations of actinomycetes in soil No. 3 was noted after the various lysis treatments (Figure 3).

5 For example, the colonies of Streptomyces sp. dominated the 00oo viable actinomycetes flora when no lysis treatment was applied (protocol 1) ci and represented 65% of the total number of colonies identified. After 0 grinding, the percentage of Streptomyces colonies fell to 51%, whereas the proportion of colonies belonging to the Micromonospora genus increased by 14% to 41%.

The chemical/enzymatic lysis (protocols 5a and 5b) appeared to be particularly effective for the lysis of Streptomycetes. When all the lysis treatments were applied, including a chemical/enzymatic lysis (protocols and 5b), the actinomycetes microflora, which still comprised more than 106cfu/g of soil, was dominated by the species belonging to the Micromonospora genus, while few or no Streptomyces colonies were recovered.

The organisms belonging to genera such as Streptosporangium, Actinomadura, Microbispora, Dactilosporangium and Actinoplanes appeared in small number on the plates of the total number of colonies identified) after grinding, homogenization with the Ultra-turrax device and sonication, but were generally absent when these treatments were combined with a chemical/enzymatic lysis.

The total number of culturable bacteria remaining after each lysis treatment (protocols 2 to 5) was also investigated for soil No. 4. The results indicate that the number of culturable bacteria does not decrease with the intensity of the lysis treatments (about 2 x 106 cfu/g of soil in all cases, and also when a treatment is not applied, such as according to protocol 1).

SThe production of these low cfu values is probably due to the fact d that dry soil was used and that only the most resistant bacteria multiplied on the plates. The number of actinomycetes forming colonies was generally greater than that of the total cfu (all the bacteria) due to the fact that a spore-germination step, included in the actinomycetes detection 00o _protocol, was missing during the control of the total bacteria.

O 2.5 RECOVERY OF THE LAMBDA PHAGE DNA ADDED: The aim of these experiments was to estimate the way in which successive lysis treatments might affect the recovery of naked DNA, and whether or not these successive lysis treatments contributed to its degradation.

The DNA could be either a fraction of extracellular DNA released from already-dead organisms, which can persist in the soil for months (Ward et al., 1990), or DNA released from organisms readily lysed during the first steps of the treatment. In order to simulate this situation, lambda phage DNA digested with Hindlll was added, at various concentrations, to the soils before and after grinding. In addition to grinding, a combination of the other lysis treatments was tested, including sonication (Cup Horn device, see protocol 4b) and heat shocks (see the Materials and Methods section).

After extraction, aliquot fractions which theoretically needed to contain from 25 to 150 ng of lambda phage DNA were analysed by gel electrophoresis. No DNA fragment specific for the lambda phage could be observed when the DNA was inoculated into the soil samples prior to grinding, independently of the dose or of the type of soil.

When the DNA was added after grinding, and extracted without an additional lysis treatment step, the specific lambda phage DNA profiles were detected in the extracts of four out of the five soils tested.

SIn all these cases, a direct cause-and-effect relationship was Sobtained between the amount of DNA added and the intensity of the signals on the agarose gels. However, the signal intensities were less than the signal intensities expected when compared with those of the molecular standards.

00 Furthermore, the band at 23 kb was absent in several cases, Sindicating that the long fragments were preferentially adsorbed onto the soil particles, or were more sensitive to degradation, compared with the short fragments.

No band was detected in the samples of tropical soil No. 5 which is characterized by a very high clay content (Table 1).

For a more precise quantification, the recovery of DNA was determined on a phosphorescence imaging device (phospho-imager) after Dot-Blot hybridization. According to this technique, the DNA was detected in all the samples, including those which had been inoculated before grinding, except for soil No. 5 in which no DNA could be detected.

In all the other soils, the amount of DNA extracted increases as the size of the inoculum increases (Figures 4a-d).

However, the recoveries of lambda phage DNA were low. When grinding was the only lysis treatment applied, the recoveries were between 0.6 and 5.9% of the DNA added when this DNA was added before grinding, and from 3.6 to 24% of the DNA added when the latter was added after grinding. The highest levels of recovery were obtained from soil No. 2.

Gel electrophoresis of aliquot fractions of samples treated by heat shock and sonication did not allow any DNA bands to be observed in any of the samples, including the tests in which the DNA had been added after grinding. The Dot-Blot hybridization experiments confirmed these results.

SThe hybridization signals obtained from soil suspensions which were treated with heat shocks and sonications were, at best, low.

The sample showing the largest amount of DNA (15 /vg of DNA/g of dry weight of soil) was the only one for which the signal obtained was substantially different from the background level.

00 INo difference (or only small differences) was observed between the samples treated with heat shock and those treated with heat shocks O and sonication, indicating that the heat shocks have a harmful effect on the DNA. The best recoveries were observed for soil No. 2, which has the highest organic matter content (Table whereas no DNA was recovered from the clayey soil No. Additional experiments were carried out with non-ground samples of soils No. 4 and No. 5, which were inoculated with 20 and 50 lg of lambda phage DNA per gram of soil.

The samples were extracted immediately or after an incubation period of one hour at 28 0 C, and the DNA extracts were then purified and analysed by gel electrophoresis.

The incubation of soil No. 4 for one hour after the inoculation did not give profiles that were qualitatively or quantitatively different from those obtained without incubation or from those observed previously when the DNA was added after grinding.

These results indicate that the enzymatic degradation by the soil nucleases is not thought to be involved in the low level of DNA recovery.

Furthermore, the absence of a grinding step does not allow an increase in the recovery of the DNA from soil No. 5, indicating that the changes to the structure of the soil due to the grinding do not significantly increase the adsorption of the nucleic acids onto the colloids.

N 2.6 SATURATION OF THE ADSORPTION SITES WITH RNA: SMost of the profiles obtained on the agarose gels do not differ significantly from the previous profiles in which the RNA treatment was not carried out.

For example, no band was detected from the clay-rich soil No. 00oo independently of the RNA concentrations and of the lambda phage DNA C concentrations used.

O Furthermore, the specific bands of lambda phage DNA digested with Hindll remained undetectable in the sandy compost treated with RNA (soil No. 4) when the RNA is added before grinding.

The intensity of the bands obtained from samples inoculated with DNA after grinding increases as the RNA concentration increases, indicating that the treatment might have a positive effect.

However, the results after hybridization and analysis by phosphorescence imaging did not confirm the electrophoresis results. For example, the positive effect of the RNA treatment on the recovery of DNA from the clayey compost, when DNA was added after grinding, did not appear clearly.

On the other hand, a positive effect of the RNA was found for the clay-rich soil (No. 5) when the DNA was added after grinding.

Although the hybridization signals for the control samples do not differ from the background noise levels, significant amounts of DNA were released from the samples treated with RNA, and the signals increased as the amount of DNA added increased and as the RNA concentration increased.

However, even for the highest RNA concentration (100 mg/g of weight of dry soil), the recovery level never exceeded 3%.

2.7 PURIFICATION OF THE CRUDE DNA EXTRACTS: d OOf the four protocols tested, the best amplification of the undiluted DNA extracts (1 yl of extract in 50 /1 of PCR mixture) was observed after elution through Microspin S400 columns followed by an elution through an Elutip d column as shown by the gel electrophoresis of the PCR products.

00 SThe DNA purified by the two-phase aqueous system (protocol C) gave smaller amounts of PCR products after amplification starting with undiluted DNA extract.

c No amplification product could be obtained from the undiluted extracts after amplification following the use of protocols A or B.

Consequently, protocol B (see Materials and Methods section) was used for all the experiments in which the PCR amplifications and/or the Dot-Blot hybridizations were performed.

2.8 QUANTIFICATION BY PCR AND HYBRIDIZATION: The first step was to determine whether or not the amounts of PCR product were proportional to the number of target DNA molecules initially present in the reaction tube. DNA from Streptosporangium fragile was used as target (see Materials and Methods section).

The primers used were the primers FGPS122 and FGPS350 (Table Gel electrophoresis of the PCR products showed that the band intensity increases as the concentration of the targets increases. The PCR products were hybridized with the oligonucleotide probe FGPS643 (Table and the signals were quantified by phosphorescence imaging (phospho-imaging).

A good correlation (r 2 0.98) was found between the log[number of targets] and the log[intensity of the hybridization signal].

An investigation was then carried out to see whether or not the efficacy of the PCR amplification was affected by the humic acids and the Q non-target DNA. When analysed by gel electrophoresis, the increased 4 intensity of the bands for the PCR products, corresponding to the various amounts of target DNA, were conserved when the amplification was carried out with DNA solutions to which extracts of soil treated with DNase 5 had been added, containing humic acids at concentrations ranging up to 00 in8 ng in 50 pl of the PCR mixture.

1With 20 ng of humic acid in the PCR mixture, the bands O corresponding to the small levels of target DNA disappeared, and at humic c acid concentrations of 80 ng and at higher concentrations, no band was visible.

The varied amounts of target DNA from S.fragile made it possible to supply the expected amounts of PCR product when, before amplification, the S. fragile DNA was mixed with Streptomyces hygroscopicus DNA and added to 50 Al of the PCR mixture in a range from 100 pg to 1 tg in order to simulate the non-target DNA released from the soil microflora.

2.9 QUANTIFICATION OF THE INDIGENOUS SOIL ACTINOMYCETES AFTER DIFFERENT LYSIS TREATMENTS: Purification protocol D was applied, followed by a PCR amplification as described above, in order to quantify the actinomycetes belonging to the Streptosporangium genus in soil No. 3 after extraction in accordance with protocols 1, 2, 3, 5a and 5b (Figure After grinding (protocol the amount of target DNA originating from this actinomycete was estimated by hybridization (Dot-Blot) and radioimaging as being 2.5 1.3 ng/g of weight of dry soil.

If it is postulated that the DNA content is 10 fg per cell, as for Streptomyces (Gladek et al., 1984), this value corresponds to approximately 2.5 x 105 genomes. Similar values were obtained after the other lysis treatments (2.6 1.1 and 1.8 1.3 ng of DNA/g of dry soil,

N

a respectively, using protocols 3 and 4b, respectively).

VJ3 2.10 EFFICACY OF THE RECOVERY OF DNA FROM SOILS PRE- INOCULATED WITH BACTERIA: 00 Three soils (Nos. 2, 3 and 5) were inoculated with different concentrations of Streptomyces lividans spores or hyphae (see Materials and Methods section). The amounts of mycelium added to the soil (Figure 6b) correspond to the number of spores inoculated in the germination medium. Approximately 50% of these spores germinated. The exact number of cells in the hyphae of the germinated spores was not determined. Consequently, the amounts of spores and mycelium inoculated in the soils are not directly comparable.

For each soil sample, the extraction protocol No. 6, the purification method D and PCR amplification combined with Dot-Blot hybridization and phosphorescence imaging (phospho-imaging) were used to count the specific target DNAs which had been released. The DNA extracted can be clearly distinguished from the background noise only when the number of spores added exceeds 10 5 for soils No. 3 and No. and 107 for soil No. 2 (Figure 6a).

When the mycelium is added, the DNA extracted can be detected at and above an amount corresponding to 10 3 spores/g of soil for soils No. 2 and No. 3, and at and above 107 spores/g of soil No. 5 (Figure b).

Above the detection level, the hybridization signal increases as the amounts of inoculated cells increases.

For the spore inoculum, a 100-fold increase in the number of cells inoculated leads to a close to 100-fold increase in the DNA yield. This increase is clearly less than when the hyphae are inoculated, particularly into soils No. 2 and No. 3 (Figure 6).

SIn contrast, in the results obtained when lambda phage DNA was used as the inoculum, the DNA was also recovered from the clay-rich soil S(No. 5) when the bacterial cells were used as the inoculum. However, for the latter inoculum also, the treatment with RNA increased the recovery of Streptomyces DNA from this soil both for the spores and the mycelium 00 (Figure 6).

SInoculating the soils with vegetative Bacillus anthracis cells gave 0 recovery levels similar to those obtained for Streptomyces.

Furthermore, the levels of DNA recovery from soil No. 5 increased after treatment with RNA for this inoculum also.

Example 2: Construction of a library of low molecular weight DNA kb) using a soil contaminated with lindane, and cloning and expression of the linA gene This example describes the construction of a DNA library of the E. coli. It demonstrates the cloning and expression of small genes obtained from a non-culturable microflora.

Lindane is an organochlorine pesticide, which is recalcitrant to degradation and persistent in the environment. Under aerobic conditions, biodegradation is catalyzed by a dehydrochlorinase, encoded by the linA gene, allowing lindane to be converted into 1,2,4-trichlorobenzene. The linA gene has been identified only from two strains isolated from soil: Sphingomonas paucimobilis, isolated in Japan (Seeno and Wada 1989; Imai et al., 1991; Nagata et al., 1993) and Rhodanobacter lindaniclasticus isolated in France (Thomas et al., 1996, Nalin et al., 1999).

SHowever, the degradation potential of lindane, demonstrated by assaying the chloride ions released and PCR amplification of the linA gene from soils which have been in contact with lindane or otherwise, appears to be more widespread in the environment (Biesiekierska-Galguen, 1997).

00 n 1. Direct extraction of soil DNA c The dry soils are ground for 10 minutes in a Restch centrifugal-force 0 grinder equipped with 6 tungsten beads. 10 grams of ground soil are suspended in 50 ml of pH 9 TENP buffer (50 mM Tris, 20 mM EDTA, 100 mM NaCI, 1% w/v polyvinylpolypyrrolidone), and homogenized by vortexing for 10 min.

After centrifuging for 5 min, at 4000 x g and 4 0 C, the supernatant is precipitated with sodium acetate (3M, pH 5.2) and with isopropanol, then taken up in sterile TE buffer (10 mM Tris, 1 mM EDTA, pH The DNA extracted is then purified on an S400 molecular sieve column (Pharmacia) and on an Elutip d ion-exchange column (Schleicher and Schuell), according to the manufacturers' instructions, then stored in TE.

2. Construction of the library of DNA extracted from the soil in the vector pBluescript SK- The vector pBluescript SK- and the DNA extracted from the soil are each digested with the enzymes Hindlll and BamHI (Roche), at a rate of 10 units of enzymes per 1 /pg of DNA (incubation for 2 hours at 370C).

The DNAs are then ligated by the action of T4 DNA ligase (Roche) overnight at 150C, at a rate of one enzyme unit per 300 ng of DNA (about 200 ng of DNA insert and 100 ng of digested vector). Electrocompetent Escherichia coli cells, ElectroMAX DH10BTM (Gibco BRL) are transformed with the ligation mixture (2 pl) by electroporation (25 pF, 200 and 500 g kV) (Biorad Gene Pulser).

SAfter one hour of incubation in the LB medium, the transformed cells a are diluted so as to obtain about 100 colonies per dish, and then plated out on LB medium (10 g/l tryptone, 5 g/l yeast extract, 5 g/l NaCI) supplemented with Ampicillin (100 mg/I), y-HCH (500 mg/I), X-gal r- 5 (5-bromo-4-chloro-3-indolyl-oc-D-galactoside, 60 mg/I), and IPTG 00 n (isopropylthio-3-D-galactoside, 40 mg/I) and incubated overnight at 370C.

c Since y-hexachlorocyclohexane (Merck-Schuchardt) is insoluble in water, a 0 50 g/l solution is prepared in DMSO (dimethyl sulphoxide) (Sigma).

A library of 10,000 clones was thus obtained.

3.Cloning and expression of the linA gene Screening of the library was carried out by visualization of a lindane degradation halo around the colony (the lindane precipitating in the culture media). Out of 10,000 clones screened, 35 thus exhibited lindanedegrading activity. The presence of the linA gene in these clones was confirmed by PCR with the aid of specific primers, desribed by Thomas et al. (1996). Digestions carried out on the inserts and on the amplification products showed identical profiles between all the clones screened and the reference control, R. lindaniclasticus. The clones carrying the linA gene also had an insert of the same size (about 4 kb).

It was thus demonstrated that the soil DNA could be cloned and expressed in a heterologous host: E. coli, and that genes derived from a microflora that is difficult to culture could be expressed. Libraries prepared by partial digestion of DNA extracted from soil, with restriction enzymes such as Sau3AI, can thus be envisaged also.

N EXAMPLE 3: SProcess for preparing a collection of nucleic acids from a soil sample, comprising a step of indirect DNA extraction.

1. MATERIALS AND METHODS 00 c 1.1 Extraction of the bacterial fraction of the soil of soil are dispersed in 50 ml of sterile 0.8% NaCI, by grinding in a Waring Blender for 3 x 1 minute, with cooling in ice between each grinding. The bacterial cells are then separated from the soil particles by centrifugation on a density cushion of Nycodenz (Nycomed Pharma AS, Oslo, Norway). In a centrifugation tube, 11.6 ml of a Nycodenz solution with a density of 1.3 g.ml" 1 (8g of Nycodenz suspended in 10 ml of sterile water) are placed below 25 ml of the soil suspension previously obtained.

After centrifugation at 10,000 x g in a rotor with swing-out buckets (TST 28.38 rotor, Kontron) for 40 minutes at 4°C, the cellular ring, located at the interphase between the aqueous phase and the Nycodenz phase, is taken, washed in 25 ml of sterile water and centrifuged at 10,000 x g for minutes. The cell pellet is then taken up in a 10 mM Tris; 100 mMn EDTA pH 8.0 solution.

Prior to dispersion of the soil in the Waring Blender, a step of enrichment of the soil in a solution of yeast extract can be included in order in particular to allow the germination of the soil bacterial spores. 5 g of soil are thus incubated in 50 ml of a sterile solution of 0.8% NaCL-6% yeast extract, for 30 minutes at 400C. The yeast extract is removed by centrifugation at 5000 rpm for 10 minutes in order to avoid the formation of a foam during the grinding.

1.2 Lysis of the soil bacterial cells Lysis of the cells in liquid medium and purification on a caesium chloride gradient The cells are lysed in a 10 mM Tris, 100 mM EDTA, pH 00 solution containing 5 mg.ml- 1 of lysozyme and 0.5 mg.ml" of achromopeptidase for 1 hour at 370C A solution of lauryl sarcosyl (1% O final) and proteinase K (2 mg.ml is then added and incubated at 37°C for c 30 minutes. The DNA solution is then purified on a density gradient of caesium chloride by centrifugation at 35,000 rpm for 36 hours on a Kontron 65.13 rotor. The caesium chloride gradient used is a gradient at 1g/ml of CsCI, with a refractive index of 1.3860 (Sambrook et al., 1989).

Lysis of the cells after inclusion in an agarose block The cells are mixed with an equal volume of agarose containing (weight/volume) Seaplaque (Agarose Seaplaque FMC Products.

TEBU, Le Perray en Yvelines, France) at low melting point and poured into a 100 .1 block. The blocks are then incubated in a lysis solution: 250 mM EDTA, 10.3% sucrose, 5 mg.ml' lysozyme and 0.5 mg.ml achromopeptidase at 370C for 3 hours. The blocks are then washed in a mM Tris-500 mM EDTA solution and incubated overnight at 370C in 500 mM EDTA containing 1 mg.mlr of proteinase K and 1% lauryl sarcosyl. After washing several times in Tris-EDTA, the blocks are stored in 500 mM EDTA.

The quality of the DNAs thus extracted is checked by pulse-field electrophoresis.

The amount of DNA extracted was evaluated on electrophoresis gel relative to a calibration range of calf thymus DNA.

N 1.3 Molecular characterization of the DNA extracted from soil The DNAs extracted from the soil are characterized by PCR hybridization, this method consisting in a first stage in amplifying the DNAs using primers located on universally conserved regions of the 16S rRNA 00 gene, and then in hybridizing the amplified DNAs with different CN oligonucleotide probes of known specificity (Table with the aim of O quantifying the intensity of the hybridization signal relative to an external calibration range of genomic DNA.

The DNAs extracted from the soil and the genomic DNAs extracted from pure cultures are amplified with the primers FGPS 612-669 (Table 1) under the standard PCR amplification conditions. The amplification products are then denatured with an equal volume of 1N NaOH, deposited on a Nylon membrane (GeneScreen Plus, Life Science Products) and hybridized with an oligonucleotide probe labelled at its end with g32P ATP by the action of T4 polynucleotide kinase. After prehybridization of the membrane in a solution of 20 ml containing 6 ml of SSC 20X, 1 ml of Denhardt's solution, 1 ml of 10% SDS and 5 mg of heterologous salmon sperm DNA, the hybridizations are carried out overnight at the temperature defined by the probe. The membranes are washed twice in SSC 2X for 5 minutes at room temperature, then once in SSC 2X 0.1% SDS and a second time in SSC 1X, 0.1% SDS for minutes at the hybridization temperature. The hybridization signals are quantified using the Molecular Analyst software (Biorad, Ivry sur Seine, France) and the amounts of DNA are estimated by interpolation of the calibration curves obtained from the genomic DNAs.

S2. RESULTS AND DISCUSSION a 2.1 Extraction and lysis of the bacterial fraction of the soil Separation of the microbial cells from the soil particles, prior to extraction of the DNA, is an alternative which has many advantages over 00 _the methods of direct extraction of the DNA in the soil. Specifically, Sextraction of the microbial fraction limits the contamination of the DNA Sextract with extracellular DNA freely present in the soil or with DNA of eukaryotic origin. Above all though, the DNA extracted from the microbial fraction of the soil has fragments of longer size and better integrity than the DNA extracted by direct lysis (Jacobson and Rasmussen (1992)).

Furthermore, separation of the soil particles makes it possible to avoid contamination of the DNA extract with humic and phenolic compounds, it being possible thereafter for these compounds to seriously impair the cloning efficacies.

One of the steps which is a determining factor for the extraction of the cells from the soil is the dispersion of the soil sample in order to dissociate the cells which adhere to the surface or to the inside of aggregates of soil particles. Three successive cycles of grinding for one minute each make it possible to obtain better cell extraction efficacy and a larger amount of DNA recovered, compared with a single cycle of grinding for one minute 30 seconds.

Table 5 reports the extraction efficacies obtained after centrifugation on a Nycodenz gradient, on the total viable microflora (counted by microscopy after staining with acridine orange), on the total culturable microflora (counted on solid 10% Trypticase-Soja medium), and on the actinomycetes microflora culturable on HV agar medium (after incubation at 400C in a solution of 6% yeast extract-0.05% SDS in order to bring about germination of the spores). Moreover, the extracted DNA was 100 N quantified either after lysing the cells in liquid medium (without purification a on a caesium chloride gradient) or after lysing the cells included in an agarose block (after digesting the agarose with a P-agarase).

The results show that more than 14% of the total telluric r 5 microflora is recovered by this method 2 x 108 cells per gram of soil) 00 _and that the total culturable microflora represents barely 2% of the total cmicrobial population.

O Moreover, the amount of DNA extracted from the cells is 330 ng per gram of dry soil. Estimating the DNA content per soil microbial cell to be between 1.6 and 2.4 fg, and given the amount of cells extracted (2 x 108 cells per gram of soil), it can be estimated that virtually all of the cells are lysed and that this lysis does not place any major bias on this approach.

The pulsed-field electrophoreses show that the DNA from the soil extracted after Nycodenz and CsCI gradients could be up to 150 kb in size and that the agarose block lysis allowed fragments of more than 600 kb to be extracted.

These results confirm the advantage of this approach independent of culture for the construction of environmental DNA libraries, as an alternative to the methods of direct DNA extraction.

2.2 Molecular Characterization of the DNA extracted from the soil The aim of the molecular characterization of the DNA extracted from the soil is to obtain profiles representing the proportions of the various bacterial taxons present in the DNA extract. It also involves the matter of knowing the extraction biases induced by the prior separation of the cellular reaction of the soil, in comparison with a direct extraction method in the absence of a direct visualization of the microbial diversity present in the N soils. Specifically, little information has been collected on the extraction of cells on a Nycodenz gradient as a function of their morphological structure S(cell diameter, filamentous or sporulated forms).

The methods in place hitherto were based on: quantitative hybridizations using oligonucleotide probes specific 00 _for different bacterial groups, applied directly to DNA extracted from the C environment. Unfortunately, this approach is not very sensitive and does O not allow taxonomic groups or genera present in low abundance to be detected (Amann (1995)).

quantitative PCR such as MPN-PCR (Most Probable Number) (Sykes et al. (1992)) or competitive quantitative PCR (Diviacco et al.

(1993)). The respective drawbacks of each of these approaches are the laborious nature due to the multiplication of the dilutions and repetitions, thus making the technique unsuitable for a large number of samples or pairs of primers, and (ii) the need to construct a competitor which is specific for the target DNA and which does not induce any bias in the competition.

The method introduced according to the present invention consists in universally amplifying a 700 pb fragment inside the 16S rDNA sequence, in hybridizing this amplificate with an oligonucleotide probe of variable specificity (as regards the kingdom, order, subclass or genus) and in comparing the hybridization intensity of the sample relative to an external calibration range. The amplification prior to the hybridization makes it possible to quantify genera or species of microorganisms that are relatively sparse. Furthermore, the amplification with universal primers makes it possible, during the hybridization, to use a wide series of oligonucleotide probes. It allows a comparison between different modes of lysis (direct or indirect extraction) on well defined taxonomic groups.

The results are collated in Table 6.

SThey show similar profiles between the two extraction methods S(direct and indirect). Thus, it appears that prior extraction of the telluric microbial fraction does not introduce any genuine bias among the taxons tested. The only significant difference between the two extraction 5 approaches would appear to be the greater abundance of rDNA 00 sequences beloning to y-proteobacteria in the extract by the indirect C extraction method.

Furthermore, a significant effect of incubating the soil sample in a solution of yeast extract is observed on the sporulated soil populations (Gram low percentage of GC and actinomycetes). This step brings about germination of the spores and, firstly, definitely allows better recovery of cells of this type, and, secondly, allows greater lysis efficacy on germinating cells.

This approach allows a semi-quantitative analysis, targetted on the main taxons defined using microorganisms cultured and usually found in the soils. Only molecular tools make it possible to estimate the magnitude of the various taxons, since culture methods are too restrictive and are dependent on the specificity of the medium used.

The results show that a large proportion of the microbial population is not represented in the phylogenetic groups described, thus demonstrating the existence of novel groups made up of microorganisms which have not been cultured hitherto, or which are not culturable.

Thus, novel probes can be defined using given sequences starting with DNA extracted from the soil (novel phyla composed of noncultured microorganisms, Ludwig et al. (1997)) in order to obtain a more exact image of the composition of the DNA extract.

103 N Example 4: CONSTRUCTION OF THE COSMID POS7001 O Characteristics of POS7001: Replicative in E. coli Integrative in Streptomyces oo Selectable in E. coi AmpR, HygroR and Streptomyces HygroR C The properties of the cosmid make it possible to insert large DNA O fragments of between 30 and 40 kb.

It comprises 1 The inducible promoter tipA of Streptomyces lividans 2 The integration system specific for the element pSAM2 3 The hygromycin-resistance gene 4- The cosmid pWED1, derived from 1) The inducible promoter of the tip A gene of S. lividans The tipA gene encodes a 19 KD protein whose transcription is induced by the antibiotic thiostrepton or nosiheptide. The tipA is well regulated: induction in exponential phase and in stationary phase (200X) (Murakami T, Holt TG, Thompson CJ., J. Bacteriol 1989 ;171 :1459-66).

2) The hygromycin-resistance gene Hygromycin: antibiotic produced by S. hygroscopicus The resistance gene encodes a phosphotransferase (hph) The gene used originates from a cassette constructed by Blondelet et al., in which the hyg gene is under the control of its own promoter and of the IPTG-inducible plac promoter (Blondelet-Rouault et al.; Gene 1997 ;190 :315-7) S3) The site-specific integration system o The element pSAM2 integrates into the chromosome by means of a site-specific integration mechanism. The recombination takes place between two identical 58 bp sequences present on the plasmid (attP) and oo on the chromosome (attB).

C The int gene, located close to the attP site, is involved in the site- O specific integration of pSAM2, and its product has similarities with the integrases of the temperate bacteriophages of enterobacteria. It has been demonstrated that a pSAM2 fragment containing only the attP attachment site as well as the int gene was capable of integrating in the same manner as the entire element (see French patent No. 88 06638 of 18/05/1988 and Raynal A et al., Mol. Microbiol. 1998 28 :333-42).

4) Construction of the cosmid pOS7001 Step 1/ The promoter TipA was isolated from the plasmid pPM927 (Smokvina et al., Gene 1990; 94:53-9 on a 700-base pair Hindlll-BamHI fragment and cloned into the vector pUC18 (Yannish-Perron et al., 1985) digested with Hindlll/BamHI.

Step 2/ This Hindll-BamHI fragment was subsequently transferred from pUC18 to pUC19 (Yannish-Perron et al., 1985).

Step 3/ A 1500-base pair BamHI-BamHI insert carrying the int gene and the attP site of pSAM2 was isolated from the pOSintl, represented in Figure 8 (Raynal A et al. Mol Microbiol 1998 28 :333-42) and cloned into the BamHI site of the preceding vector (pUC19/TipA), in the orientation 105 F-A which allows the int gene to be placed under the control of the promoter a TipA.

Step 4/ The BamHI site located on the 5' side of the int gene was deleted by partial digestion with BamHI followed by treatment with the Klenow o00 enzyme. A Hindlll-BamHI fragment carrying TipA-int-attP was thus isolated cN from pUC19 and transferred into pBR322 Hindlll/BamHI.

O

Step 5/ The hygromycin cassette isolated from (Blondelet-Rouault et al., 1997) on a HindllI-Hindlll fragment was cloned into the Hindlll site located upstream of the promoter TipA.

Step 6/ The Hindlll site located between the QHyg cassette and the promoter TipA was deleted by Klenow treatment after partial Hindlll digestion.

Step 7/ The plasmid obtained after the preceding step makes it possible to isolate a single Hindlll-BamHI fragment, carrying all the 2Hyg/TipA/int attP elements, which was cloned after Klenow treatment into the EcoRV site of the cosmid pWED1. The cosmid pWED1, represented in Figure 9, derived from the cosmid pWE15, represented in Figure 10 (Wahl GM, et al., Proc.

Natl. Acad. Sci. USA 1987 84:2160-4) by deletion of an Hpal-Hpal fragment carrying the Neomycin gene and the SV40 origin.

A map of the vector pOS 7001 is represented in Figure 11.

Example 5: Construction of the cosmid which is coniugative and integrative in Streptomyces, the vectors pOSV 303, pOSV306 and pOSV307 106 S5.1 Construction of the vector pOSV303 Given that the packaging selects clones larger than 30 kb, only O to 15% of the clones contain no insert, and it is thus not really necessary to have a system for selecting recombinants, thus allowing a smaller vector to be constructed.

00 eg Construction: 0 Step 1 the vector pOSV001 SCloning of an 800 base pair Pstl-Pstl fragment carrying the transfer origin OriT of the replicon RK2 (Guiney et al., 1983), into the plasmid pUC19 opened with Pstl. This cloning step makes it possible to obtain a vector which is transferable from E. colito Streptomyces by conjugation.

The map of the vector pOSV 001 is represented in Figure 17.

Step 2 the vector pOSV002 Insertion of the hygromycin marker (9hyg cassette), which is selectable in Streptomyces, such that the hygromycin-resistance gene is transferred last, thus making it possible to ensure complete transfer of the BAC with the soil DNA insert.

Cloning of the hygromycin cassette isolated from pHP45Qhyg on a Hindlll-Hindlll fragment carrying the hygromycin-resistance gene. This fragment is cloned into the Pstl site (position 201) of the vector pOSV001.

This Pstl site was chosen, given the direction of the transfer, such that the Hygro marker is transferred last during the conjugation. The Pstl and Hindlll ends are made compatible after treatment with the Klenow fragment of DNA polymerase, allowing "blunt ends" to be generated. The orientation of the 2hyg fragment is determined at the end of construction.

The map of the vector pOSV002 is represented in Figure 18.

CN Step 3 the vector pOSV010 The Xbal-Hindlll fragment isolated from the plasmid pOSV002 and containing the hygromycin-resistance marker and the transfer origin is cloned into the plasmid pOSintl digested with Xbal and Hindlll. The orientation of the sites is such that the hygromycin marker will always be oo n transferred last.

c The plasmid pOSintl, represented in Figure 8, was described in Sthe article by Raynal et al. (Raynal A et al., Mol. Microbiol. 1998 28 :333- 42).

This construct allows the expression of the integrase in E. coli and Streptomyces.

Step 4 insertion of the "cos" site The principle is to insert a "cos" site into the plasmid pOSV010, allowing packaging into the plasmid pOSV010, represented in Figure 12.

The production of the "cos" fragment is represented in Figure 13.

This fragment is obtained by PCR. Starting with a fragment carrying the cohesive ends (cos) of X (bacteriophage lambda or cosmid pHC79), a PCR amplification is carried out using oligonucleotides corresponding to the sequences -50/+130 relative to the cos site. These oligonucleotides also contain the Nsil cloning sites, Pstl compatible, the Xhol site, Sail compatible, and EcoRV, site for obtaining "blunt ends".

Addition of the rare Swal and Pad sites makes it possible to isolate and/or map the insert cloned.

The PCR fragment is delimited by a Pstl site at the 5' end and by a Hincll site at the 3' end, allowing cloning into the vector pOSV010 (Figure 12) predigested with the enzymes Nsil and EcoRV, bringing about deletion of the laclq repressor.

s The map of the vector pOSV303 is represented in Figure 14. The a vector pOSV303 contains cloning sites such as the Nsil site, Pstl o compatible, the Xhol site, Sail compatible or the EcoRV site for obtaining "blunt ends".

00 5.2 Construction of the vector pOSV306 Step 1: Construction of the vector pOSV308 The vector pOSV308 was constructed according to the process illustrated in Figure 27. A 643-bp fragment containing the cos region was amplified using a pair of primers of sequences SEQ ID No. 107 and SEQ ID No. 108 from the cosmid vector pHc79 described by Hohm B and Collins (1980).

This amplified nucleotide fragment was cloned directly into the pGEMT-easy vector sold by the company Promega, as illustrated in Figure 27, so as to produce the vector pOSV308.

Step 2: Construction of the vector pOSV306 The vector pOSV010 was constructed as described in step 3 of construction of the vector pOSV303, as described in paragraph 5.1 of the present example.

The vector pOSV10 was digested with the enzymes EcoRV and Nsil in order to excise a 7874-bp fragment, which was subsequently purified, as illustrated in Figure 28.

Next, the vector pOSV308 obtained in step 1) above was digested with the enzymes EcORV and Pstl in order to excise a 617-bp fragment, which was subsequently purified.

SNext, the 617-bp cos fragment obtained from the vector pOSV308 q was integrated by ligation into the vector pOSV10, so as to obtain the vector pOSV306, as illustrated in Figure 28.

5.3 Construction of the vector pOSV307 00 c The cosmid pOSV307 still contains the Laclq gene so as to O improve the stability of the cosmid in Streptomyces, for example in the S17-1 strain of Streptomyces.

In order to construct the vector pOSV307, the vector pOSV010 was subjected to a digestion with the enzyme Pvull, to obtain an 8761-bp fragment which was purified and then dephosphorylated.

Next, the vector pOSV308, obtained as described in step 1) of paragraph 5.2 above, was digested with the enzyme EcoRI so as to obtain a 663-bp fragment, which was then purified and treated with the Klenow enzyme.

The nucleotide fragment thus treated was integrated into the vector pOSV010 after ligation so as to obtain the vector pOSV307, as illustrated in Figure 29.

Example 6: Construction of the E. coli-Streptomyces replicative shuttle cosmid pOS700R The fragments of the plasmid pEI16 (Volff et al., 1996) represented in Figure 15 were isolated and Klenow-treated. These fragments contain the sequences required for replication and stability originating from the plasmid SCP2.

These two fragments are inserted separately into the EcoRV site of the cosmid pWED1, leading to 2 different clones.

110 u The hygromycin cassette isolated from pHP45_hyg on a Hindlll- Hindill fragment was cloned into the Hindlll site of the pWED1 cosmids o containing the ScP2 insert in the form of Pstl-EcoRI or Xbal fragments. It imparts hygromycin resistance which can be selected both in E. coli and in Streptomyces.

oo cTransformation of S. lividans and determination of the O transformation efficacy.

It was found that the cosmid containing the Xbal insert was less stable than that containing the Pstl EcoRI fragment. It is therefore the latter cosmid which was selected under the name pOS700R.

The map of the vector pOS 700R is represented in Figure 16.

Example 7: Transformation efficacy of the integrative (pOS7001) and replicative vectors Possibilities To render the strain of S. lividans resistant to thiostrepton by integrating the plasmid pTO1 carrying the thiostrepton-resistance marker.

Preparation of protoplasts from S. lividans cultured in the presence of thiostrepton.

With the pOS7001 vector, the transformation efficacy is about 3000 transformants per pg of DNA.

With the vector pOS700R, the transformation efficacy is about 30,000 transformants per pg of DNA.

Example 8: Construction of a BAC vector which is integrative in Streptomyces and coniugative Characteristics: 111 Replicative in E. coli Transferable by conjugation of E. coliwith Streptomyces Integrative in Streptomyces Selectable in E. coli and Streptomyces oo Capable of inserting large DNA fragments; it should be pointed out that it is C necessary to have available soil DNA which is between 100 and 300 kb in 0 size and which is not contaminated with small fragments. The reason for this is that the small fragments are very preferably integrated.

Endowed with a screen for selecting plasmids carrying an insert. This screen makes it possible, by removing the vectors which are closed on themselves and which are not digested, to work with a higher ratio between the vector and the DNA to be inserted, thus making it possible to have better cloning efficacy for making libraries.

Construction: Step 1 the vector pOSVO01 Cloning of an 800 base pair Pstl-Pstl fragment carrying the transfer origin OriT of the replicon RK2 (Guiney et al., 1983), into the plasmid pUC19 opened with Pstl. This cloning step makes it possible to obtain a vector which is transferable from E. colito Streptomyces by conjugation.

The map of the vector pOSV 001 is represented in Figure 17.

Step 2 the vector pOSV002 Insertion of the hygromycin marker (Qhyg cassette), which is selectable in Streptomyces, such that the hygromycin-resistance gene is transferred last, thus making it possible to ensure complete transfer of the BAC with the soil DNA insert.

112 SCloning of the hygromycin cassette isolated from pHP45Qhyg on a C Hindll-Hindlll fragment carrying the hygromycin-resistance gene. This fragment is cloned into the Pstl site (position 201) of the vector pOSV001.

This Pstl site was chosen, given the direction of the transfer, such that the Hygro marker is transferred last during the conjugation. The Pstl and 00 Hindlll ends are made compatible after treatment with the Klenow fragment CN of DNA polymerase for generating "blunt ends". The orientation of the SS2hyg fragment is determined at the end of construction.

The map of the vector pOSV002 is represented in Figure 18.

Step 3 the vector pOSV010 The Xbal-Hindlll fragment isolated from the plasmid pOSV002 and containing the hygromycin-resistance marker and the transfer origin is cloned into the plasmid pOSintl digested with Xbal and Hindlll. The orientation of the sites is such that the hygromycin marker will always be transferred last.

The plasmid pOSintl, represented in Figure 8, was described in the article by Raynal et al. (Raynal A et al., Mol. Microbiol. 1998 28 :333- 42).

Step 4 the vector pOSV014 Addition of a "cassette" making it possible at the end to select in the final construct the plasmids which have foreign DNA inserted.

This "cassette" carries the gene encoding the X phage CI repressor and the tetracycline-resistance gene. This gene carried the target sequence of the repressor in its non-coding 5' region. The insertion of DNA into the Hindlll site located in the coding sequence of CI leads to 113 Sthe non-production of the repressor and thus to the expression of Stetracycline resistance.

It is carried by the plasmid pUN99 described in the article: Nilsson et al. (Nucleic Acids Res. 1983, 11:8019-30).

A Pvull-Hindlll fragment isolated from pOSV010 and containing 00oo the sequences Int, attP, Hygro and oriT is cloned into the Mscl site of N pUN99.

O The map of the vector pOSV014 is represented in Figure 19.

Step 5 the vector pOSV 403, and integrative and coniugative BAC vector This last step of cloning into pBAC11 (represented in Figure gives the final plasmid BAC (Bacterial Artificial Chromosome) characteristics, in particular the ability to accept very large DNA inserts.

The Pstl-Pstl fragment of the vector pOSV014 carrying the set of elements and functions described previously is cloned into the plasmid pBAC11 (pBeloBAC11) digested with Notl. The ends are made compatible by treatment with the Klenow enzyme.

The map of the vector pOSV403 is represented in Figure 21. The scheme of Figure 21 indicates the orientation selected.

Step 6: The vector pOSV403 contains the Hindlll and Nsil sites. The Nsil site is quite rare in Streptomyces and has the advantage of being compatible with Pstl. On the other hand, the Pstl site is common in Streptomyces and can be used to carry out partial digestions.

The recombinant clones carrying an insert cloned into the Cl repressor, and thus inactivating this repressor, become tetracyclineresistant. Given that the BACs are present only at a rate of one copy per T cell, it is necessary to select the recombinant clones with a lower dose of C tetracycline than the usual dose of 20 Rig/ml, for example with a dose of pg/ml. Under these conditions, there is no background noise.

It is also possible to use the system developed and sold by the r- 5 company InVitrogen, in which the insertion of DNA into the vector o0 inactivates a gyrase inhibitor whose expression is toxic for E. coli. The N fragment is preferentially isolated from the vector pZErO-2 (http://www.invitrogen.com/).

Example 9 Construction of an S. alboniqer library in the integrative cosmid (pOS7001) and the replicative cosmid (pOS700R) 1) Construction of the library To evaluate the efficacy of the cloning system, the puromycin biosynthetic pathway of Streptomyces alboniger was cloned into the two shuttle cosmids pOS7001 and pOS700R. The genes of the puromycin biosynthetic pathway are carried by a BamHI DNA fragment of about kb.

The genomic DNA of Streptomyces alboniger was isolated. of this DNA has a molecular weight of between 20 and 150 kb, determined by pulsed-field electrophoresis.

The two cosmids were digested with the enzyme BamHI (single cloning site).

The conditions of partial BamHI digestion of the genomic DNA were determined (50 pg of DNA and 12 units of enzyme, digestion for minutes). After checking the size by agarose gel electrophoresis, the DNA partially digested was introduced into the vectors. In the ligation, 15 pg of genomic DNA 2 pg of the integrative vector or 5 pg of the replicative vector were used.

115 u Each ligation mixture was used for the in vitro encapsidation of the DNA into the heads of bacteriophage lambda. The encapsidation mixtures (0.5 ml) were titrated (integrative vector pOS7001 7.5 x cosmids/ml, replicative vector 5 x 10 4 cosmids/ml).

5 The cosmids were used to transfect E. coli and thus to generate 00 libraries of about 25,000 ampicillin-resistant clones. The DNA from all of Sthese clones was isolated and quantified.

0 To test the libraries, several clones were chosen, the DNA purified and digested with BamHI, in order to check the presence and size of the inserts. The clones tested contain between 20 and 35 Kb of S. alboniger insert.

2) Identification of the clones containing the puromycin biosynthetic pathway The clones liable to contain the complete puromycin biosynthetic pathway were identified by hybridization with a probe corresponding to the puromycin-resistance gene, the 1.1 kb pac gene (Lacalle et al., Gene 1989; 79, 375-80).

Library made in the integrative vector pOS 7001: Among 2000 clones analysed, 9 clones were hybridized with the probe and they contain inserts of about 40 kb.

Library made in the replicative vector pOS 700R: Among 2000 clones analysed, 12 clores were hybridized with the probe; they contain inserts of about 40 kb.

116 Using the data published by Tercero et al. (J Biol. Chem. 1996; 271, 1579-90), the clones containing the entire biosynthetic pathway were 0 identified, after hybridization with suitable probes. Certain integrative and replicative cosmids contain a 12,360-base pair fragment after Clal-EcoRV digestion, which leads to the assumption of an insert containing the entire oo puromycin biosynthetic pathway.

0 4) Checking the production of puromycin by the resistant clones

N

(Rh6ne-Poulenc).

a) Materials and Methods Strains and culture conditions: Three resistant clones were selected to check the production of puromycin. They correspond to the S. lividans recombinants containing an insert in the integrative vector pOS7001 (G 20) or an insert in the replicative vector (G21 and G22).

Reference strains were used to ensure that the culture media used allowed this production. They are the S. alboniger wild-type strain ATCC 12461, which produces puromycin, and the S. lividans recombinant strain containing the complete puromycin cluster cloned into the plasmid pRCP11 (Lacalle et al, 1992, the EMBO journal, 11, 785-792) (G23).

The strains were inoculated in a culture medium whose composition is as follows: Organotechnie bacteriological peptone 5 g/l of final medium Springer yeast extract Liebig meat extract Prolabo glucose SProlabo CaCO 3 3 a Prolabo NaCI Difco agar 1 The 3g of carbonate are mixed with 200 ml of distilled water and then oo Ssterilized separately. The addition is carried out after sterilization.

C The agar is melted beforehand in 100 ml of distilled water, after which it 0 is added to the other ingredients of the medium.

pH ajusted to 7.2 before sterilization sterilization for 25 minutes at 121°C pg/I of hygromycin and 5 pg/I of thiostrepton are added to the medium after sterilization so as to maintain a selection pressure for the clones containing an insert by means of the marker gene present on the vector (the thiostrepton-resistance gene being carried by the plasmid pRCP11).

ml of liquid culture medium, distributed in 250 ml conical flasks, are inoculated with 2 ml of aqueous suspension of spores and mycelium of each of the strains. The cultures are incubated for 4 days at 280C with stirring at 220 rpm. 50 ml of production medium, distributed in 250 ml conical flasks, are then incoulated with 2 ml of these precultures.

The production medium used is an industrial medium optimized for the production of pristinamycin (medium RPR 201). The cultures are incubated at 280C, with stirring at 220 rpm. After different incubation times, a conical flask of each culture is brought to pH 11 and then extracted with twice 1 volume of dichloromethane. The organic phase is concentrated to dryness under reduced pressure and the extract is then taken up in 10 p1 of 118 ci methanol. 100 pl of the methanol solution are analysed by HPLC equipped with a diode-bar detector, in a water-acetonitrile 0.05% TFA V/ gradient Ssystem on a C18 column for the detection of puromycin.

5 b) Results 00 The comparative HPLC analyses from the cultures of the various Cstrains show the production of puromycin in the culture of the wild-type strain at and above 24 h of incubation. A production, although lower, is also clearly detected at and above 48 h in the culture of the clone containing the cosmid pOS7001 (Figure 23). Puromycin was also detected in trace amounts in the clone G23 containing the complete operon encoding the compound in the plasmid pRCP11. However, no production was observed in the cultures of the clones G21 and G22 containing the cosmid pOS700R. The results are given in Figure 23.

c) Conclusions The results obtained make it possible to demonstrate the efficacy of the cloning system developed in the cosmid pOS7001 for expressing, in a heterologous host such as S. lividans, a complete biosynthetic pathway under the control of its own regulatory sequences. Moreover, these data also validate the screening of the libraries obtained on the basis of the resistance of the clones to puromycin since it leads to the identification, among a small number of clones, of a recombinant capable of expressing the biosynthetic pathway associated with the resistance gene. The absence of puromycin production in the other clones can probably be explained by the cloning of only a portion of the operon containing the resistance gene but devoid of certain regulatory, transduction or transcription sequences necessary for the synthesis of the compound.

119 EXAMPLE 10 CLONING OF SOIL DNA INTO VECTORS a1) Preparation of the soil DNA to be cloned The various DNA fragments need to be purified according to their destination: 00 Cosmids C The size of the molecules should be between 30 and 40 kb. Now, the DNA extracted from the soil is heterogeneous in size and comprises molecules of up to 200 or 300 kb. In order to homogenize the sizes, the DNA is broken mechanically by passing the solution through a needle 0.4 mm in diameter. The fragments of a size in the region of 30 kb are not affected by these repeated passages through a needle and it is thus not necessary to carry out a separation on the basis of size especially since the packaging in the particles automatically eliminates the short inserts.

BACs Preparation of the DNA The soil DNA is separated by pulsed-field electrophoresis (CHEF type) under conditions such that the fragments between 100 and 300 kb are concentrated in a band of about 5 mm. This is obtained by carrying out the migration in a gel containing 0.7% of normal agarose or 1% of agarose of low melting point with a pulsation time of 100 seconds, for 20 hours and at a temperature of Recovery of the DNA Two methods are used, their choice depending on the size of the molecules it is desired to isolate, either up to 150 kb or higher.

120 Q- Up to 150 kb o The porosity of a 0.7% agarose gel allows the exit of the DNA by electroelution on the condition that there is total absence of ethidium bromide. This DNA is then handled with hydrophobic and enlarged- 00 Sorifice pipetting instruments in order to avoid mechanical fragmentation Sof the molecules.

N

Between 100 and 300 kb The band containing the fragments between 100 and 300 kb in size is cut up. For the migration, a gel containing 1% of agarose of low melting point is used. This property makes it possible to melt the gel at a temperature of 65°C, which can be tolerated by the DNA, and then to digest it with agarase (Agarase sold by the company Boehringer) at a temperature of 450C according to the supplier's prescriptions.

2) Use of the integrative cosmid pOS7001 and the replicative cosmid pOS700R Construction with polyA polyT tails Principle A cosmid vector, opened at any cloning site, is modified at the 3' ends by adding a monotonous polynucleotide. Moreover, the DNA to be cloned is modified at the 3' ends by adding a monotonous polynucleotide which can pair up with the above polynucleotide.

The vector-fragment combination to be cloned is made with these Spolynucleotides and the cos sequence of the vector allows the in vitro O packaging of the DNA into lambda phage capsids.

Preparation of the vector 00 SThe vector used is a vector which is self-replicating in E. coli and O integrative in Streptomyces.

For E. coli, the selection is made on the ampicillin resistance, and for Streptomyces, it is made on the hygromycin resistance.

The cosmid is opened at one of the 2 possible sites (BamHI or Hindlll) and the 3' ends are extended with polyA with terminal transferase under the conditions in which the enzyme supplier envisages the addition of 50 to 100 nucleotides.

Preparation of the DNA to be inserted The 3' ends of the DNA are extended with polyT with terminal transferase under the conditions supplying an extension comparable to that of the vector. Under the experimental conditions described by the manufacturer, the polyA polyT tails are from 30 to 70 bases long.

Assembly of the molecules and in vitro encapsidation For the assembly of the molecules, one vector molecule is mixed per molecule of DNA inserted. The concentration of the DNA by mass is 500 pg.ml 1 122 The mixture is encapsidated and the transfection efficacy depends on the Sstrain used as recipient and the DNA inserted: zero with the test DNA and the strain DH5a, the efficacy is comparable for the SURE and strains; on extraction, the DNA yield is, however, higher with the strain 00oo c Construction by dephosphorylation 0 The soil DNA is rendered with blunt ends by removal of the protruding 3' sequences and filling in of the protruding 5' sequences. This operation is carried out with: Klenow enzyme, T4 polymerase, the 4 nucleotide triphosphates. The cosmid vector is digested with BamHI and then treated with the Klenow enzyme to make the ends blunt, then dephosphorylated to prevent it from closing up on itself. After ligation, the mixture is encapsidated and transfected as described previously.

3) Use of pBACs Principle The conjugative and integrative plasmid pBAC contains the Hindlll and Nsil sites as cloning sites. The insertion of a DNA sequence into these sites inactivates the lambda phage Cl repressor which controls the expression of the tetracycline-resistance gene. Inactivation of the repressor thus makes the cell resistant to this antibiotic (5 pg.mlf). The cloning at these sites is facilitated by modifying the vector and preparing the DNA to be cloned.

123 Q) Preparation of the vector. Hindlll example In order for the vector not to close up on itself, the Hind III site is modified: the first base is reinserted to form a protruding 5' sequence, which cannot pair up with its partners. The operation is carried out with the oo tt Klenow enzyme in the presence of dATP.

The success of the operation is checked by carrying out a self-ligation of C the vector before and after treatment with the Klenow enzyme. For an identical amount of test DNA, 3000 clones are obtained before treatment and 60 clones after treatment.

Preparation of the DNA (size between 100 and 300 kb) Giving the DNA blunt ends The DNA is given blunt ends by removing the protruding 3' sequences and filling in the protruding 5' sequences. This operation is carried out with: Klenow enzyme, T4 polymerase, the 4 nucleotide triphosphates.

Preparation of the ends. Hindlll example The addition of DNA to the vector is carried out by means of oligonucleotides which recognize the Hindlll modified sequence of the vector. They contain rare restriction sites to allow the subsequent clonings (Swal; Notl). This technique is derived from that of: Elledge SJ, Mulligan JT, Ramer SW, Spottswood M, Davis RW. Proc. Natl Acad. Sci. USA 1991 Mar 1;88(5):1731-5 Two complementary oligonucleotides are used: Oligo 1: 5'-GCTTATTTAAATATTAATGCGGCCGCCCGGG-3' (SEQ ID No 124 Oligo 2: 5'-CCCGGGCGGCCGCATTAATATTTAAATA-3' (SEQ ID No 26) They are phosphorylated at the 5' end with T4 polynucleotide kinase in the presence of ATP, after hybridization. This phosphorylation step can oo tt be eliminated by using the already-phosphorylated oligonucleotides.

SThe ligation of this double-stranded adapter with the DNA to be O inserted into a vector is carried out with T4 ligase in the presence of a very large excess of adapter (1000 adapter molecules per molecule of DNA to be inserted) over 15 hours at 140C.

The excess adapter is removed by agarose gel electrophoresis and the molecules of interest are recovered from the gel by hydrolysing it with agarase or by electroelution.

Vector-DNA ligation The ligation is carried out at 14 0 C over 15 hours with 10 molecules of vector per insert molecule.

Transformation The recipient strain is the strain DH10B. The transformation is carried out by electroporation. To express the tetracycline resistance, the transformants are incubated at 370C for 1 hour in antibiotic-free medium. The clones are selected by culturing overnight on gelled LB medium supplemented with 5 gg.ml 1 of tetracycline.

Example 11 CLONE-TO-CLONE CONJUGATION BETWEEN E. COLI AND STREPTOMYCES 125 SCONJUGATION BETWEEN E COLI STRAIN S17.1 CONTAINING PPM803 AND qj STREPTOMYCES LIVIDANS TK 21 Introduction 5 It is possible to carry out conjugations between E. coli and Streptomyces 00 (Mazodier et al, 1989). The adaptation of this method, by developing a so- Scalled drop technique in which 10 pl of an E. coli culture containing a recombinant vector are mixed with one drop of recipient S. lividans, consists in carrying out a clone-to-clone transformation while ensuring that, at the end of the operation, all of the library constructed in E. coli is introduced into S. lividans. A bulk transformation would necessarily lead to a multiplication of the Streptomyces transformant clones in order to be sure in practice that the library in E. coli is fully represented in S. lividans.

Furthermore, this method is easy to automate.

Preliminary tests Conjugation between E. coli strain S17.1 containing the vector pOSV303 and S. lividans TK21.

Under these conditions, 6 x 106 E. coli cells are mixed with 2 x 106 pregerminated S. lividans spores in a final volume of 20 pl.

Development of the method It is known that the DNA extracted from certain actinomycetes is modified and, as a result, cannot be introduced into certain strains of E. coli without it being restricted. The E. coli strain DH10B which accepts these DNAs is not capable of transferring to Streptomyces a plasmid containing only oriT, and it is thus necessary to construct such a plasmid. A derivative of RP4 126 N should be introduced therein by integration into the chromosome, this derivative being capable of trans-supplying all the functions required to r.) Sensure the transfer of the recombinant clones containing the transfer origin oriT.

00 Example 12: Construction of a cosmid library in E. coli and C- Streptomyces lividans: Cloning of the soil DNA The object is to construct a library of large-sized environmental DNA, without a prior step of culturing the microorganisms, with the aim of gaining access to the metabolic genes of bacteria (or of any other organism) which it is not known how to culture under standard laboratory conditions.

The procedure described was used to generate a DNA library in Escherichia coli using the E. coli-S. lividans shuttle cosmid pOS7001 and DNA extracted and purified from the bacterial fraction of a soil. This last method makes it possible to obtain DNA of high purity and with an average size of 40 kb. Also, in order to avoid a partial digestion of the extracted DNA in the cloning, an alternative strategy was adopted based on the use of the terminal transferase enzyme for adding polynucleotide tails to the 3' ends of the DNA and of the vector.

pg of DNA were extracted from 60 mg of "Saint-Andre coast" soil according to the protocol described in Example 3, and were treated with terminal transferase (Pharmacia) to extend the 3' ends with a monotonous polynucleotide (poly T) (Example The integrative cosmid pOS7001 is prepared according to protocol B1, Orsay. After a standard step of purification in the presence of phenol/chloroform, the DNA and the vector are assembled by mixing one T molecule of vector and one molecule of inserted DNA. The mixture is then C encapsidated in the heads of lambda bacteriophages (Amersham kit) which serve to transfect E. coli DH10B. The cells transfected are then inoculated on LB agar medium in the presence of ampicillin for the selection of the recombinants resistant to this antibiotic.

oo c A library of about 5000 ampicillin-resistant E. coli clones was obtained. Each clone was inoculated in LB or TB medium ampicillin in a microplate well (96 wells) and stored at -800C.

The sequence at the sites of insertion of the soil fragments into the vector, pOS7001, generated during the construction of the library was analysed.

For this, 17 cosmids of the libraries were purified and sequenced with a primer, seq.5' CCGCGAATTCTCATGTTTGACCG which hybridizes between the BamHI site and the Hindlll cloning site present in the vector.

The sequences obtained made it possible to estimate that the length of the homopolymeric tails at the junction points is very variable, between 13 and poly-dA/dT. Beyond the tails, the sequences of the soil fragments thus generated have a percentage of G+C of between 53 and 70%. Such high percentages were unexpected, but similar results have already been reported on crude preparations of soil DNA (Chatzinotas A. et al., 1998).

A strategy of "pooling" 48 or 96 clones was used to analyse the microbial and metabolic richness. The cosmid DNA extracted from these "pools" of clones was then used to carry out PCR or hybridization experiments.

128 Example 13: Diversity of the 16S ribosomal DNA in the cloned DNA a) Materials and methods The cosmids of the library are extracted from pools of clones by alkaline lysis and are then purified on a caesium chloride gradient, in order 00 to take up the band of cosmid DNA in supercoiled form and for the purpose c- of eliminating any Escherichia coli chromosomal DNA which might interfere 0 in the study.

After linearization of the cosmids by the action of S1 nuclease, (50 units, 30 minutes at 370C), the 16S rDNA sequences contained in the pools of clones are amplified under the standard amplification conditions, using the universal primers 63f (5'-CAGGCCTAACACATGCAAGTC-3') and 1387r (5'-GGGCGGWGTGTACAAGGC-3') defined by Marchesi et al.

(1998). The amplification products of about 1.5 kilobases are purified using the Qiaquik gel extraction kit (Qiagen) and then cloned directly into the vector pCR II (Invitrogen) in Escherichia coli TOP10, according to the manufacturer's instructions. The insert is then amplified using the primers M13 forward and M13 reverse specific for the cloning site of the vector pCR II. The amplification products of expected size (about 1.7 kb) are analysed by RFLP (Restriction Fragment Length Polymorphism) using the enzymes Cfol, Mspl and BstUI (0.1 units) in order to select the clones to be sequenced. The restriction profiles obtained are separated on Metaphore agarose gel (FMC Products) containing 0.4 mg of ethidium bromide per ml.

The 16S rDNA sequences are then determined directly using the PCR products purified with the "Qiaquick gel extraction" kit with the aid of the sequencing primers defined by Normand (1995). The phylogenetic analyses are obtained by comparing the sequences with the prokaryotic 16S rDNA sequences collated in the Ribosomal Database Project (RDP) 129 database, version 7.0 (Maidak et al. (1999)) by means of the SIMILARITY d MATCH program, which makes it possible to obtain the similarity values 0 relative to the database sequences.

1 5 b) Results 00 To determine the phylogenetic diversity represented in the library, C 47 sequences of the 16S rRNA gene were isolated from pools of 288 O clones and were sequenced almost entirely. The results are given in Table 7.

Analysis of the sequences by interrogation of the databases reveals that most of the sequences have percentages of similarity of less than or equal to 95% with identified bacterial species (Table Out of the 47 sequences analysed, 28 sequences have non-cultured bacteria as closest neighbours, the sequences of which were obtained directly from DNA extracted from the environment. The majority of these sequences moreover have very low percentages of similarity 17 sequences out of 28 thus differing by more than 5% relative to their closest neighbours.

Among the sequences which can be classified in a phyletic group, a majority of sequences belong to the proteobacteria subclass a (18 sequences with a percentage of similarity of between 89 and A second group of sequences is represented by the proteobacteria subclass g, comprising 9 sequences whose percentages of similarity range between 84 and 99%. The groups of b-proteobacteria and dproteobacteria, which are Firmicutes with a low G+C% and a high G+C%, comprise 1, 4, 3 and 5 sequences, respectively. Only one sequence could not be classified among the major bacterial taxonomic groups defined: the sequence a22.1(19), its closest neighbour Aerothermobacter marianas 130 (with a similarity of 89%) itself being a strain isolated from the marine y environment and not classified at the current time. Finally, 6 sequences 0 can be classified in the group of Acidobacterium/Holophaga. This group has the particular feature of being represented by only two cultured bacteria, Acidobacterium capsulatum and Holophaga foetida, this entire oO group being composed of bacteria for which only the 16S rRNA gene has C- been detected by amplification and cloning using DNA extracted from an 0 environmental sample (mainly from soil) (Ludwig et al., (1997)). The low values of similarity between the different sequences composing this group makes it possible to predict great heterogeneity and diversity within this group.

The set of results is represented in Table 7.

These results show that the sequences contained in the cosmid library are thought to be derived from microorganisms that are not only phylogenetically diversified but above all from microorganisms which have never been isolated to date.

The results of the sequencing of the DNAs amplified allowed the establishment of a phylogenetic tree of the organisms present in the soil sample whose characterized sequences are novel.

The phylogenetic tree represented in Figure 7 was produced from the alignment of the sequences by the MASE software (Faulner and Jurak, 1988) and corrected by the Kimura 2-parameter method (1980), and with the aid of the Neighbour Joining algorithm (Saitou and Nei, 1987). The phylogenetic analysis allowed comparison of the 16S rDNA sequences cloned in the soil DNA library, with sequences of prokaryotic 16S rDNA collated in the Ribosomal Database Project (RDP) databases (version SIMILARITY-MATCH program, Maidak et al., 1999) and in the GenBank base by means of the BLAST 2.0 software (Atschul et al, 1997).

Example 14: Genetic preselection of the library to evaluate the a metabolic richness To characterize the library obtained in terms of metabolic diversity and to identify the clones containing inserts carrying genes which may be involved in biosynthetic pathways, genetic screening techniques based on 00 nPCR methods were developed according to the invention in order to detect and identify type I PKS genes.

Cl 1 Bacterial strains, plasmids and culture conditions S. coelicolor ATCC101478, S. ambofaciens NRRL2420, S.

lactamandurans ATCC27382, S. rimosus ATCC109610, B. Subtilis ATCC6633 and B. licheniformis THE1856 (collection RPR) were used as DNA sources for the PCR experiments. S. lividans TK24 is the host strain used for the shuttle cosmid POSI700.

For the preparation of genomic DNA, suspensions of spores and protoplasts and for the transformation of S. lividans, the standard protocols described in Hopwood et al.(1986) were followed.

Escherichia coli Top10 (INVITROGEN) was used as host for the cloning of the PCR products and E. coli Sure (STRATAGENE) was used as host for the shuttle cosmid pOS7001. The E. coi culture conditions, the preparation of plasmids, the digestion of the DNA and the agarose gel electrophoresis were carried out according to the standard procedures (Sambrook et al.,1996).

2. PCR primers: The primer pairs al-a2 and bl-b2 were defined by the team of N. Bamas-Jacques and their use was optimized for the screening of the SDNA from the pure strains and of the soil library for the investigation of Sgenes encoding PKSI.

Table 8 5 PCR primers that are homoloqous to the PKSI genes used for 00 t screening the library.

In al 5' CCSCAGSAGCGCSTSTTSCTSGA 3' a2 5' GTSCCSGTSCCGTGSGTSTCSA 3' bl 5' CCSCAGSAGCGCSTSCTSCTSGA 3' b2 5' GTSCCSGTSCCGTGSGCCTCSA 3' Amplification conditions: For the investigation of PKS I from the DNA of pure strains, the amplification mixture contained: in a final volume of 50 pl, between 50 and 150 ng of genomic DNA, 200 pM of dNTP, 5 mM of MgCl2 final, 7% DMSO, 1x Appligene buffer, 0.4 pM of each primer and 2.5 U of Appligene Taq polymerase. The amplification conditions used are: denaturing at 950C for 2 minutes, hybridization at 65 0 C for 1 minute, elongation at 720C for 1 minute, for the first cycle, followed by 30 cycles in which the temperature is reduced to 580C, as described in K. Seow et al., 1997. The final extension step is carried out at 72°C for 10 minutes.

For the investigation of PKS I from the DNA of the library, the PCR conditions are the same as above for the al-a2 pair using between 100 and 500 ng of cosmid extracted from pools of 48 clones.

133 For the bl-b2 primer pair, 500 ng of cosmids derived from pools of 96 clones were used. The amplification mixture contained 200 pM of dNTP, mM of MgCI2 final, 7% DMSO, lx Quiagen buffer, 0.4 pM of each primer and 2.5 U of hot-start Taq polymerase (Qiagen). The amplification conditions used are: denaturing for 15 minutes at 95°C followed by 00 cycles: 1 minute of denaturing at 95°C 1 minute of hybridization at for the first cycle and 620C for the other cycles, 1 minute of elongation at S72°C, final extension step of 10 minutes at 720C.

c The identification of the positive clones from the pools of 48 or 96 clones is carried out using replicas of the corresponding parent microplates on solid medium or any other standard replication method.

3 Subcloning and sequencing The PCR products of the clones identified were sequenced according to the following protocol: The fragments are purified on agarose gel (gel extraction kit (Qiagen)) and cloned into E.coli TOP 10 (Invitrogen) using the TOPO TA cloning kit (Invitrogen). The plasmid DNA of subclones is extracted by alkaline lysis on a Biorobot (Qiagen) and dialysed for 2 h on a 0.025 pm VS membrane (Millipore). The samples are sequenced with the "universal" and "reverse" M13 primers on the ABI 377 96 sequencer (Perkin Elmer).

4) Results Definition and validation of the PCR primers Two highly conserved regions of actinomycetes type I PKS, comprising the active site of the enzyme, were targeted for the amplification of homologous genes with degenerate primers. These two 134 Sregions correspond to the sequences PQQR(L)(L)LE and VE(A)HGTGT, d respectively.

Primers (Table 8) were tested with the DNA of strains producing or not producing macrolides: Streptomyces coelicolor, Streptomyces 00 ambofaciens, producing spiramycin, and Saccharopolyspora erythraea, C-i producing erythromycin. Irrespective of the primers used, bands representing fragments of about 700 pb and corresponding to the length of the expected fragment were obtained with all the strains.

These results demonstrate the specificity of the primers a and b for the PKS I genes of productive strains or of silent genes in S. coelicolor.

The sequencing of the PCR products obtained with the al-a2 primer pair made it possible to identify, from the S. ambofaciens strain, the sequence of a KS gene already described (European patent application No. EP 0 791 656) as belonging to the pathway for the biosynthesis of plantenolide, a macrolide precursor of spiramycin, and two sequences never described, Stramb 9 and Strambl2 (see sequence listing).

As regards S. erythraea, the screening method allowed the identification of a sequence of KS (sacery17) which is identical to that of the KS of module 1 already published in Genebank (Access number M63677), encoding synthetase 1 (DEBS1) of 6-deoxyerythronolide

B.

Another sequence not correlated to the erythromycin biosynthetic pathway was identified and is the sequence SEQ ID No 32.

Conclusion A method for analysing the presence of genes encoding type I PKSs by PCR from different microorganisms has been developed. The highly conserved structure of the type I keto-synthetase domain made it possible 135 to produce a PCR method based on the use of GC-biased degenerate primers for the choice of the codons.

This approach shows the possibility of identifying genes or clusters involved in the biosynthetic pathway of type I polyketides. The cloning of these genes allows the creation of a collection which may then 00 be used to construct polyketide hybrids. The same principle can be applied Sto other classes of antibiotics.

0' The results obtained here also show the presence of genes which may belong to silent clusters (SEQ ID No 30 to 32).

The presence of silent clusters has already been documented in S. lividans and their expressions are triggered by specific or pleiotropic regulators (Horinouchi et al.; Umeyama et al. 1996). These results suggest that the detection of genes belonging to so-called silent pathways in reality encode active enzymes capable of directing, in combination with the other specific enzymes of the pathway, the enzymatic steps required for the synthesis of the secondary metabolites.

Screening of the library The screening was carried out under the conditions described in the Materials and Methods section using the primer pairs validated from productive strains.

In the presence of the al-a2 primer pair, the size of the PCR products obtained from cosmid DNA extracted from pools of 48 or 96 clones was about 700 bp, which is thus in agreement with the expected results.

The intensity of the bands obtained was variable, but only one amplification band was present for each pool of target DNA.

136 C Under these conditions, 8 groups of target DNA were detected, a corresponding to 9 positive clones after dereplication.

The screening carried out with the second primer pair, bl-b2, gave less specific amplification results since many satellite bands were observed alongside the 700 bp band. Nevertheless, 9 groups of target 00oo DNA were detected, corresponding to 14 positive clones after dereplication ci starting with these positive clones, and the DNA was extracted for the 0 steps of sequencing and transformation of S. lividans.

Analysis of the cosmids Digestion of the cosmids identified by PCR with the enzyme Dral, which recognizes an AT-rich site, frees a fragment greater than 23 kb (Figure 22). This suggests that the PCR method preferentially targets soil DNA containing a high percentage of G+C. This result is the consequence of the degeneracy of the primers used, which are GC-biased, for the choice of the codons. The inserts, as expected in the case of cosmids, are larger than 23 kb in size, except in one case (clone a9B12), which might reflect a certain level of instability of the cosmids. Moreover, among all the clones selected, only two of them, GS.F1 and GS.G11, showed the same restriction profile, indicating a low level of redundancy in the library.

The cosmids selected were transferred into Streptomyces lividans by transformation of protoplasts in the presence of PEG 1000. The transformation efficacy ranges between 30 and 1000 transformants per ptg of cosmid DNA used.

Sequencing and phvlogenetic analysis of the soil PKS I genes The PCR method developed on the pure strains was used as described on the cosmids of the library and 24 clones were thus identified.

Q)The PCR products of about 700 bp obtained from the DNA of two pools (48 clones) and of 8 unique clones, were cloned, after purification on agarose gel, and sequenced. This allowed the identification of 11 sequences.

00 oo t The alignment of the deduced protein sequences of soil PKSs I with other PKSs I present in different microorganisms (Figure 24) shows the F presence of a highly conserved region which corresponds to the cl consensus region of the active site of 1-ketoacyl synthetase.

Analysis of the sequences obtained with the "codon preference" method (Gribskov et al., 1984 Bibb et al., 1984) revealed the presence of a strong bias in the use of codons rich in G+C in a single reading frame.

The proteins deduced according to this reading frame show strong similarity with known type I KSs (Blast program). In particular, the similarity between the sequences of KSs from the soil and of KSs of the erythromycin cluster is about 53%.

After dereplication of a pool and identification of the unique clone, the sequence of the PCR product obtained from this clone is identical to that of the pool, which confirms the reliability of the method used.

Analysis of the sequence of the PCR product of a clone allowed the probable identification of 3 different KSI genes. One of these sequences (SEQ ID No 34) has a similarity of 98.7% with the sequence of another pool, suggesting that they encode the same enzyme. The other two sequences are different but strongly homologous.

The cloning and identification in a soil DNA library of pathways for the biosynthesis of secondary metabolites containing genes encoding type I KSs is described here for the first time.

C The high percentage of G+C in the soil sequences suggests that they may derive from genomes having a codon use similar to that of actinomycetes.

Although the data available in the literature is limited, it is known that the genes encoding type I PKSs are highly diversified on account of their oo physical organization in the genome, size and the number of modules c contained in each gene.

0 The presence of several domains originating from a single clone is confirmation that they belong to asymmetric polyketide clusters. In a single case, two clones appear to form a contiguum since they share the same sequence for the KS domain.

The size of the genetic regions involved in PKSI synthesis ranges between a few kb for penicillin to about 120 kb for rapamycin. The size of the cosmid inserts may thus not be sufficient for the expression of the most complex clusters.

Genes encoding PKSs I, capable of working iteratively like the PKSs II and of controlling the synthesis of aromatic polyketides, have been described (Jae-Hyuk et al., 1995). The study of soil PKS I clusters may provide further novelties in this field.

Identification of 6 genes encoding polyketide synthases On continuing the screening of the cosmid library according to the protocols described in the present example, the inventors identified a cosmid clone containing a 34071-bp insertion containing several open reading frames encoding polypeptides of the polyketide synthase type.

More specifically, the cosmid thus identified by screening the library contains six open reading frames encoding polyketide synthase 139 Spolypeptides or very closely related polypeptides, non-ribosomal synthase g peptides. A detailed map of this cosmid is represented in Figure 36.

The complete nucleotide sequence of the cosmid constitutes the sequence SEQ ID No. 113 of the sequence listing. The DNA insertion contained in the sequence SEQ ID No. 113 constitutes the complementary 00 nucleotide sequence strand) of the nucleotide sequence encoding the Svarious polyketide synthases.

O The nucleotide sequence of the DNA insertion contained in the

C

N cosmid in Figure 36 which comprises the open reading frames encoding the polyketide synthase polypeptides strand) is represented schematically in Figure 37 and constitutes the sequence SEQ ID No. 114 of the sequence listing.

Furthermore, a detailed map of the various open reading frames contained in the DNA insertion of this cosmid is represented in Figure 37.

The characteristics of the nucleotide sequences comprising open reading frames contained in the DNA insertion of this cosmid are detailed below.

ORF1 Sequence The orfl sequence comprises a partial open reading frame 4615 nucleotides long. This sequence constitutes the sequence SEQ ID No. 115, which starts at the nucleotide in position 1 and ends at the nucleotide in position 4615 of the sequence SEQ ID No. 114.

The sequence SEQ ID No. 115 encodes the 1537-amino acid ORF1 polypeptide, this polypeptide constituting the sequence SEQ ID No. 121.

The polypeptide of sequence SEQ ID No. 121 is related to the nonribosomal synthase peptides. This polypeptide has a degree of amino acid 140 identity of 37% with the synthase peptide of Anabaena sp.90 referenced n under the access number "emb CAC01604.1" in the Genbank database.

ORF2 sequence 00 CO The orf2 nucleotide sequence is 8301 nucleotides long and CN constitutes the sequence SEQ ID No. 116, which starts at the nucleotide in 0 position 4633 and ends at the nucleotide in position 12933 of the sequence SEQ ID No. 114.

The ORF2 sequence encodes the 2766-amino acid ORF2 peptide, this polypeptide constituting the sequence SEQ ID No. 122.

The polypeptide of sequence SEQ ID No. 122 has an amino acid sequence identity of 41% with the MtaD sequence of Stigmatella aurantiaca referenced under the access number "gb AAF 19812.1" from the Genbank database.

The ORF2 polypeptide constitutes a polyketide synthase.

ORF3 sequence The orf3 nucleotide sequence is 5292 nucleotides long and constitutes the sequence SEQ ID No. 117. The sequence SEQ ID No. 117 corresponds to the sequence which starts at the nucleotide in position 12936 and which ends at the nucleotide in position 18227 of the sequence SEQ ID No. 114.

The nucleotide sequence SEQ ID No. 117 encodes the 1763-amino acid ORF3 polyketide synthase polypeptide, this polypeptide constituting the sequence SEQ ID No. 123 according to the invention.

The ORF3 polypeptide of sequence SEQ ID No. 123 has an amino acid identity of 42% with the MtaB sequence of Stigmatella aurantiaca referenced under the access number "gb AAF 19810.1" from the Genbank database.

ORF4 sequence S 00 t The orf4 nucleotide sequence is 6462 nucleotides long and constitutes the sequence SEQ ID No. 118 according to the invention.

O The nucleotide sequence SEQ ID No. 118 corresponds to the sequence starting at the nucleotide in position 18224 and ending at the nucleotide in position 24685 of the nucleotide sequence SEQ ID No. 114.

The nucleotide sequence SEQ ID No. 118 encodes the 2153-amino acid ORF4 polyketide synthase polypeptide, this polypeptide constituting the sequence SEQ ID No. 124 according to the invention.

The ORF4 polypeptide of sequence SEQ ID No. 124 has an amino acid sequence identity of 46% with the epoD sequence of Sorangium cellulosum referenced under the access number "gb AAF62883.1" of the Genbank database.

sequence The orf5 nucleotide sequence is 5088 nucleotides long and constitutes the sequence SEQ ID No. 119 according to the invention.

The sequence SEQ ID No. 119 corresponds to the sequence starting at the nucleotide in position 24682 and ending at the nucleotide in position 29769 of the nucleotide sequence SEQ ID No. 114.

The nucleotide sequence SEQ ID No. 119 encodes the 1695-amino acid ORF5 polyketide synthase polypeptide, this polypeptide constituting the sequence SEQ ID No. 125 according to the invention.

142 The ORF5 polyketide synthase polypeptide of sequence SEQ ID No. 125 has an amino acid identity of 43% with the epod sequence O of Sorangium cellulosium referenced under the access number "gb AAF 62883.1" of the Genbank database.

S oo SORF6 sequence O The orf6 nucleotide sequence is 4306 nucleotides long and c constitutes the sequence SEQ ID No. 120 according to the invention. The nucleotide sequence SEQ ID No. 120 corresponds to the sequence starting at the nucleotide in position 29766 and ending at the nucleotide in position 34071 of the sequence SEQ ID No. 114.

The sequence SEQ ID No. 120 contains a partial open reading frame encoding the 1434-amino acid ORF6 polypeptide of the polyketide synthase type, this polypeptide constituting the sequence SEQ ID No. 126 according to the invention.

The polypeptide of sequence SEQ ID No. 126 has an amino acid identity of 43% with the epoD sequence of Sorangium cellulosum referenced under the access number "gb AAF 62883.1" of the Genbank database.

EXAMPLE 15: Construction of shuttle vectors of integrative BAC type in Streptomyces Construction of shuttle vectors of the integrative and coniugative BAC type in Streptomyces 15.1 Construction of the vector pMBD-1 143 SThe vector BAC pMBD-1 was obtained according to the following steps: Step 1: The vector pOSVO10 was subjected to a digestion with the enzymes PsTI and BstZ171 in order to obtain a 6.3-kb nucleotide 00 n fragment.

SStep 2: The vector pDNR-1 was digested with the enzymes Pstl and c Pvull in order to obtain a 4 145-kb nucleotide fragment.

Step 3: The 6.3-kb nucleotide fragment derived from the vector pOSV017 was fused by ligation with the 4.15-kb fragment derived from the vector pDNR-1, so as to produce the vector pMBD-1, as illustrated in Figure 15.2 Construction of the vector pMBD-2 The vector pMBD-2 is a vector of the BAC type containing an "Oc31 int-Lhyg" integrative box.

Oc31 is a broad host spectrum temperate phage whose site of attachment (attP) is well localized. The Oc31 int fragment is the minimum fragment of the actinophage Oc31 capable of inducing the integration of a plasmid into the chromosome of Streptomyces Lividans.

Qhyg is a derivative of the Q interposon capable of conferring hygromicin resistance in E.coli and S.Lividans.

BAC vectors containing the Oc31 integration system are described by Sosio et al. (2000) and in PCT patent application No. 99/6734 published on 29 December 1999.

144 0 The vector BAC pmBD-2 was constructed according to the following steps: V. Step 1: Construction of a Oc31int ihyg integrative box in an E.coli multicopy plasmid.

The Oc31int fragment was first amplified from the plasmid pOJ436 00 Iusing the following pair of primers: The primer EVpc311 (SEQ ID No.109) (which allows the Sintroduction of an EcoRV site into the 5' end of the (c31 sequence) and the cN primer BIlOc31F (SEQ ID No. 110) (which allows the introduction of a BgLII site into the 3' end of the Oc31 sequence).

The 9hyg fragment was obtained by digestion using the BamHI enzyme of the plasmid pHP45 Qhyg described by Blondelet-Rouault (1997).

Next, the Oc31 int-Qhyg integrative box was cloned into the vector pMCS5 digested with the enzymes Bglll and EcoRV.

Step 2: Construction of the vector pMBD-2 The bacterial artificial chromosome pBAce3.6 described by Frengen et al. (1999) was digested with the enzyme Nhel and then treated with the enzyme Eco polymerase.

Next, the vector pMCS5 Oc31 int-Qhyg was digested with the enzymes SnaBI and EcoRV so as to recover the integrative box.

The detailed map of the vector pMBD2 is represented in Figure 31.

15.3 Construction of the vector pMBD-3 The vector pMBD-3 is an integrative (Oc31 int) and conjugative (OriT) vector of the BAC type, which comprises the selection marker Qhyg.

145 u The map of the vector pMBD-3 and also the method for constructing Sit are illustrated in Figure 31.

The vector pMBD-3 was obtained by amplifying the OriT gene starting with the plasmid pOJ436 using the pair of primers of sequences SEQ ID No. 111 and SEQ ID No. 112 which contain pad restriction sites.

00 The nucleotide fragment amplified using the primers SEQ ID cl No. 111 and SEQ ID No. 112 was cloned into the vector pMBD2 O predigested with the Pad enzyme. The scheme for constructing the vector pMBD-3 is illustrated in Figure 31.

15.4 Construction of the vector pMBD-4 The detailed map of the vector pMBD-4 is represented in Figure 32.

The vector pMBD-4 was obtained by cloning the Oc31 int-Qhyg integrative box into the vector pCYTAC2.

15.5 Construction of the vector The scheme for constructing the vector pMBD-5 is illustrated in Figure 33.

The vector pMBD-5 was constructed by recombination of the nucleotide fragment included between the two loxP sites of the vector pMBD-1 illustrated in Figure 33 with the loxp site contained in the BAC vector designated pBTP3, a detailed map of the plasmid pBTP3 being represented in Figure 34.

0 15.6 Construction of the vector pMBD-6 The vector pMBD-6 was constructed by recombining the nucleotide fragment included between the two loxP sites of the vector pMBD-1 into the loxP site of the BAC pBeloBacl 1 vector, as represented in Figure 00 2005211587 19 Sep 2005 TABLE 1 Location of the sampling sites and characteristics of the soils used in the various experiments.

The direct microbial counts using staining with acridine orange were carried out before and after grinding the soil.

Number Origin Texture Amount of Organic pH Number of Number of sand loam clay matter cells before cells after (g/kg of dry grinding grindinga soil) a(x10 9 /g dry (x10 9 /g dry weight of soil) weight of soil) 1 Australia Sandy clay 62 22 6 49.7 5.8 6.5(0.9) 2.9(1.3) 2 Peyrat le Chateau, Sandy clay 61 26 13 48.2 4,9 7.3(0.6) 5.4(0.8) France 3 St-Andre coast, Sandy compost 50 41 9 40.6 5.6 10.0(0.7) 7.5(1.4) France 4 Chazay d'Azergue, Clayey sandy 34 47 19 13.9 5.8 7.8(1.1) 4.2(0.6) France compost Guadeloupe, France Clay 27 26 47 17.0 4.8 1.4(0.4) 0.5(0.1) 6 Dombes, France Clayey sandy 20 67 13 30.3 4.3 7.5(0.5) 5.6(0.9) compost a n=3; standard deviation in parentheses 2005211587 19 Sep 2005 ~-r TABLE 2 Primers and probes used for the PCR amplification and the dot-blot hybridization Primer or probe Targeta) Sequence to Reference No.

FGPS431 probe Universelle (1392-1406) ACGGGCGGTGTGT(A/G)C Amann et al., 1995 FGPS122 primer Bacteries (6-27) GGAGAGTTTGATCATGGCTCAG Amann et al., 1995 FGPS350 primer Streptosporangium (616-635) CCTGGAGTTAAGCCCCCAAGC This study FGPS643 probe Streptosporangium (122-142) GTGAGTAACCTGCCCC(T/C)GACT This study R499 primer Bacillus anthracis TTAATTCACTTGCAACTGATGGG Patra et al., 1996 R500 primer Bacillus anthracis AACGATAGCTCCTACATTTGGAG Patra et al., 1996 C501 probe Bacillus anthracis TTGCTGATACGGTATAGAACCTGGC Patra et al., 1996 FGPS516 primer S. lividans 0S48.3 TCCAGATCCTTGACCCGCAG This study FGPS517 primer S. lividans 0S48.3 CACGACATTGCACTCCACCG This study FGPS518 probe S. lividans 0s48.3 CCGTGAGCCGGATCAG This study a) The positions on the E.coli 16S rRNA gene are given in parentheses. For B. anthracis and S. lividans, the primers and probes target chromosomal sequences specific for the respective organisms. These sequences are not located in the 16S rRNA gene. The cassette containing the target region of S. lividans is described by Clerc-Bardin et al. (unpublished).

2005211587 19 Sep 2005 I 14 TABLE 3 Amount of DNA extracted from different soils after lysis treatments according to protocols 1 to 5 (pg ADN/g of weight of dry soil standard deviation)a Soils1, 2,3 and 6; n 3; soil 4: n =1.

Soil Lysis protocol numberb Number and origin 1 2 3 4a 4b 5a 1. Australia 17+/-2 52+/-2 32+/-5 16+/-3 33+/-2 59+/-1 27+/-0 2. Peyrat 29+/-2 58+/-1 40+/-2 29+/-2 18+/3 56+/-1 15+/-1 3. St-Andre coast 36+/-7 60+/-6 148+/-10 94+/-7 38+/-6 73+/-5 47+/-6 4. Chazay 9 16 ND 32 15 15 6. Dombes 26+/-3 43+/-1 66+/-1 160+/-7 102+/-5 a Quantification by phosphorescence imaging after dot-blot hybridization with the universal probe FGPS431 (Table 2).

l1: no treatment; 2: dry-grinding of the soil 3: Cr Ultra-turrax homogenization 4a: G H Microtip sonication 4b: G H+ Cup Horn sonications 5a: Cr H NT chemical/enzymatic lysis.

Claims

1. Process for preparing a collection of nucleic acids from an environmental sample containing organisms, the said process comprising 00oo the following sequence of steps: -II production of a suspension by dispersing the environmental sample in Sliquid medium and then homogenizing the suspension by gentle stirring; and (ii) separating the organisms and the other inorganic and/or organic constituents of the homogeneous suspension obtained in step by centrifugation on a density gradient; and (iii) lysis of the organisms separated out in step (ii) and extraction of the nucleic acids; and (iv) purification of the nucleic acids on a caesium chloride gradient.

2. Process according to claim 1, characterized in that the nucleic acids are DNA molecules.

3. Process for preparing a collection of recombinant vectors, characterized in that the nucleic acids obtained by the process according to claim 1 or 2 are inserted into a cloning and/or expression vector.

4. Process according to claim 3, characterized in that the nucleic acids are separated as a function of their size prior to inserting them into the cloning and/or expression vector. Process according to claim 3, characterized in that the average size of the nucleic acids is made substantially uniform by physical rupture, prior to inserting them into the cloning and/or expression vector. P:\OPER\Kbm\12665720 Div.doc-16/09/05 -172- i) 6. Process according to claim 3, characterized in that the cloning and/or expression vector is of the plasmid type.

7. Process according to claim 3, characterized in that the cloning and/or oo I 5 expression vector is of the cosmid type. S8. Process according to claim 7, characterized in that it is a cosmid which is Sreplicative in E. coli and integrative in Streptomyces.

9. Process according to claim 8, characterized in that it is the cosmid pOS7001. Process according to claim 7, characterized in that it is a cosmid which is conjugative and integrative in Streptomyces.

11. Process according to claim 10, characterized in that the cosmid is chosen from cosmids pOSV303, pOSV306 and pOSV307.

12. Process according to claim 7, characterized in that it is a cosmid which is replicative both in E. coli and in Streptomyces.

13. Process according to claim 12, characterized in that it is the cosmid pOS 700R.

14. Process according to claim 7, characterized in that it is a cosmid which is replicative in E. coli and Streptomyces and conjugative in Streptomyces. Process according to claim 3, characterized in that the cloning and/or expression vector is the BAC type. P:\OPER\Kbm\l 2665720 Div.doc-16/09)/5 -173- S16. Process according to claim 15, characterized in that it is a BAC vector which is integrative and conjugative in Streptomyces.

17. Process according to claim 16, characterized in that the vector is oo m 5 chosen from BAC vectors pOSV403, pMBD-1, pMBD-2, pMBD-3, pMBD-4, pMBD- and pMBD-6. O 18. Process for preparing a recombinant cloning and/or expression vector, characterized in that the step of inserting a nucleic acid into the cloning and/or expression vector comprises the following steps: opening the cloning and/or expression vector at a chosen cloning site, using a suitable restriction endonuclease; adding a first homopolymeric nucleic acid to the free 3' end of the open vector; adding a second homopolymeric nucleic acid, whose sequence is complementary to the first homopolymeric nucleic acid, at the free 3' end of the nucleic acid from the collection to be inserted into the vector; assembling the nucleic acid of the vector and the nucleic acid of the collection by hybridizing the first and second homopolymeric nucleic acids of mutually complementary sequence; closing the vector by ligation.

19. Process according to claim 18, characterized in that: the first homopolymeric nucleic acid is of poly(A) or poly sequence; and the second homopolymeric nucleic acid is of poly(T) or poly(A) sequence. Process for preparing a recombinant vector according to either of claims 18 or 19, characterized in that the size of the nucleic acid to be inserted is at least 100 kilobases, preferably at least 200 kilobases. PAOPER\Kbm\1266720 Div.do-16/09/05 -174- a, v. 21. Process for preparing a recombinant vector according to one of claims 18 to 20, characterized in that the nucleic acid to be inserted is contained in the collection of nucleic acids obtained by the process according to claim 1 or 2. 00o n 5 22. Process for preparing a recombinant cloning and/or expression vector, characterized in that the step of inserting a nucleic acid into the cloning and/or Sexpression vector comprises the following steps: creation of blunt ends on the ends of the nucleic acid of the collection by removing the protruding 3' sequences and filling in the protruding sequences; opening the cloning and/or expression vector at a chosen cloning site using a suitable restriction endonuclease; creation of blunt ends at the ends of the nucleic acid of the vector by removing the protruding 3' sequences and filling in the protruding sequences, then dephosphorylating the 5' ends; adding complementary oligonucleotide adapters; inserting the nucleic acid of the collection into the vector by ligation.

23. Process for preparing a recombinant vector according to claim 22, characterized in that the size of the nucleic acid to be inserted is at least 100 kilobases, preferably at least 200 kilobases.

24. Process for preparing a recombinant vector according to either of claims 22 or 23, characterized in that the nucleic acid to be inserted is contained in the collection of nucleic acids obtained by the process according to one of claims 1 or 2. Process according to one of claims 18 to 24, characterized in that the nucleic acids are inserted as obtained, without treatment with one or more restriction endonucleases prior to inserting them into the cloning and/or expression vector. P:\OPER\Kbm\12665720 Div.doc- 16/09/ -175- _o 26. Collection of nucleic acids consisting of the nucleic acids obtained by the process of one of claims 1 or 2. oo I 5 27. Nucleic acid, characterized in that it is contained in the collection of nucleic acids according to claim 26. O O 28. Nucleic acid according to claim 27, characterized in that it comprises a nucleotide sequence encoding at least one operon, or part of an operon.

29. Nucleic acid according to claim 28, characterized in that the operon encodes all or part of a metabolic pathway. Nucleic acid according to claim 29, characterized in that the metabolic pathway is the polyketide synthesis pathway.

31. Nucleic acid according to claim 30, characterized in that it is chosen from polynucleotides comprising the sequences SEQ ID No 30 to 44 and SEQ ID No. 115 to 120.

32. Nucleic acid according to claim 27, characterized in that it comprises all of a nucleotide sequence encoding a polypeptide.

33. Nucleic acid according to one of claims 27 to 32, characterized in that it is of prokaryotic origin.

34. Nucleic acid according to claim 33, characterized in that it originates from a bacterium or from a virus.

35. Nucleic acid according to one of claims 27 to 29 and 32, characterized in that it is of eukaryotic origin. P:\OPER\Kbm\l2665720 Div.doc-16/09/i0 0 -176- o 36. Nucleic acid according to claim 35, characterized in that it originates from a fungus, a yeast, a plant or an animal. 00oo n 5 37. Recombinant vector, characterized in that it is chosen from the following recombinant vectors: n a) a vector comprising a nucleic acid according to one of claims 31 to 36; O b) a vector obtained according to the process of one of claims 18 to 21 and c) a vector obtained according to the process of one of claims 22 to

38. Vector, characterized in that it is the cosmid pOS7001.

39. Vector, characterized in that it is the cosmid pOSV303. Vector, characterized in that it is the cosmid pOSV306.

41. Vector, characterized in that it is the cosmid pOSV307.

42. Vector, characterized in that it is the cosmid pOS700R.

43. Vector, characterized in that it is the BAC vector pOSV403.

44. Vector, characterized in that it is the vector pMBD-1. Vector, characterized in that it is the vector pMBD-2.

46. Vector, characterized in that it is the vector pMBD-3.

47. Vector, characterized in that it is the vector pMBD-4. P:\OPER\Kbm\2665720 Div.doc-1619/05 -177- S48. Vector, characterized in that it is the vector

49. Vector, characterized in that it is the vector pMBD-6. 00 I 5 50. Collection of recombinant vectors as obtained according to the process of one of claims 3 to 17, 21 and 24. O O51. Recombinant cloning and/or expression vector, characterized in that it is contained in the collection of recombinant vectors according to claim

52. Recombinant host cell comprising a nucleic acid according to one of claims 27 to 36 or a recombinant vector according to claim 51.

53. Recombinant host cell according to claim 52, characterized in that it is a prokaryotic or eukaryotic cell.

54. Recombinant host cell according to claim 53, characterized in that it is a bacterium.

55. Recombinant host cell according to claim 54, characterized in that it is a bacterium chosen from E. coli and Streptomyces.

56. Recombinant host cell according to claim 54, characterized in that it is a yeast or a filamentous fungus.

57. Collection of recombinant host cells, each of the constituent host cells of the collection comprising a nucleic acid from the collection of nucleic acids according to claim 26. P:\OPERibm\12665720 Div.doc-16/09i05 -178- v. 58. Collection of recombinant host cells, each of the constituent host cells of the collection comprising a recombinant vector according to either of claims 37 and 51. 00o n 5 59. Process for detecting a nucleic acid of given nucleotide sequence, or whose nucleotide sequence is structurally similar to a given nucleotide sequence, Sin a collection of recombinant host cells according to either of claims 57 and 58, 0 characterized in that it comprises the following steps: (N placing the collection of recombinant host cells in contact with a pair of primers which hybridize with the given nucleotide sequence or which hybridize with the nucleotide sequence that is structurally similar to a given nucleotide sequence; carrying out at least three amplification cycles; detecting any nucleic acid amplified. Process for detecting a nucleic acid of given nucleotide sequence, or whose nucleotide sequence is structurally similar to a given nucleotide sequence, in a collection of recombinant host cells according to either of claims 57 and 58, characterized in that it comprises the following steps: placing the collection of recombinant host cells in contact with a probe which hybridizes with the given nucleotide sequence or which hybridizes with a nucleotide sequence that is structurally similar to the given nucleotide sequence; detecting the hybrid possibly formed between the probe and the nucleic acids included in the vectors of the collection.

61. Process for identifying the production of a compound of interest by one or more recombinant host cells in a collection of recombinant host cells according to either of claims 57 and 58, characterized in that it comprises the following steps: culturing the recombinant host cells of the collection in a suitable culture medium; P:\OPER\Kbm\12665720 Div.doc.160905 S-179- detecting the compound of interest in the culture supernatant or in the cell lysate of one or more of the recombinant host cells cultured.

62. Process for selecting a recombinant host cell which produces a 00oo n 5 compound of interest in a collection of recombinant host cells according to either of claims 57 and 58, characterized in that it comprises the following steps: N culturing recombinant host cells of the collection in a suitable culture medium; detecting the compound of interest in the culture supernatant or in the cell lysate of one or more of the recombinant host cells cultured; selecting recombinant host cells which produce the compound of interest.

63. Process for producing a compound of interest, characterized in that it comprises the following steps: culturing a recombinant host cell selected according to the process of claim 62; -recovering and, where appropriate, purifying the compound produced by the said recombinant host cell.

64. Compound of interest, characterized in that it is obtained according to the process of claim 63. Compound according to claim 64, characterized in that it is a polyketide.

66. Polyketide, characterized in that it is produced by means of expressing at least one nucleotide sequence comprising a sequence chosen from sequences SEQ ID No 30 to 44 and SEQ ID No. 115 to 120.

67. Composition comprising a polyketide according to claim 65 or 66. P:\OPER\Kbm\12665720 Div.doc-1609/05 -180-

68. Pharmaceutical composition comprising a pharmacologically active Camount of a polyketide according to claim 65 or 66, in combination with a pharmaceutically compatible vehicle. r"- oo 00

69. Process for determining the diversity of the nucleic acids contained in a collection of nucleic acids and most particularly of a collection of nucleic acids Noriginating from an environmental sample, preferentially from a soil sample, the said process comprising the following steps: placing the nucleic acids of the collection of nucleic acids to be tested in contact with a pair of oligonucleotide primers which hybridize with any sequence of bacterial 16S ribosomal DNA; carrying out at least three amplification cycles; detecting the amplified nucleic acids using an oligonucleotide probe or a plurality of oligonucleotide probes, each probe hybridizing specifically with a sequence of 16S ribosomal DNA common to a bacterial kingdom, order, subclass or genus; where appropriate, comparing the results of the preceding detection step with the detection results, using the probe or the plurality of probes, for nucleic acids of known sequence constituting a calibration range. Process according to claim 69, characterized in that the pair of primers which hybridize with any sequence of bacterial 16S ribosomal DNA consists of the primer FGPS 612 (SEQ ID No 12) and the primer FGPS 669 (SEQ ID No 13).

71. Process according to claim 69, characterized in that the pair of primers which hybridize with any sequence of bacterial 16S ribosomal DNA consists of the primer 63 f (SEQ ID No 22) and the primer 1387 r (SEQ ID No 23).

72. Nucleic acid comprising a 16S rDNA nucleotide sequence chosen from the sequences having at least 99% nucleotide identity with the sequences SEQ ID No 60 to SEQ ID No 106. P:\OPER\Kbm\ 2665720 Div doc-16M 9/05 -181- _o 73. Process for producing a type I polyketide synthase, the said production process comprising the following steps: production of a recombinant host cell comprising a nucleic acid encoding a oo type I polyketide synthase comprising a nucleotide sequence chosen from the sequences SEQ ID No 33 to SEQ ID No 44 and SEQ ID No 30 to SEQ ID No 32 Sand SEQ ID No. 115 to SEQ ID No. 120; O culturing of the recombinant host cells in a suitable culture medium; recovery and, where appropriate, purification of the type I polyketide synthase from the culture supernatant or from the cell lysate.

74. Polyketide synthase comprising an amino acid sequence chosen from the sequences SEQ ID No 45 to 59 and SEQ ID No. 121 to SEQ ID No. 126.

75. Antibody directed against a polyketide synthase according to claim 74.

76. Process for detecting a type I polyketide synthase or a peptide fragment of this enzyme, in a sample, the said process comprising the steps of: a) placing an antibody according to claim 75 in contact with the sample to be tested; b) detecting any antigen/antibody complex possibly formed.

77. Kit for detecting a type I polyketide synthase in a sample, comprising: a) an antibody according to claim b) where appropriate, reagents required for detecting any antigen/antibody complex possibly formed. DATED this 16 th day of September, 2005 AVENTIS PHARMA S.A. By DAVIES COLLISON CAVE Patent Attorneys for the Applicants