Bacterial Polypeptide Family
This invention relates to a family of bacterial polypeptides which are required for growth of both gram negative and gram positive bacteria, the genes which encode them and the use of such polypeptides and genes as tools for identifying novel broad spectrum antibiotics.
New antibiotics are urgently needed in current medical practice as both serious bacterial infections and multiply antibiotic resistant strains are becoming increasingly prevalent (Proc. Natl. Acad. Sci USA (1994) 91 :2420-2427; New England J. Med. (1994)
330:1247-1251). The increase in number of serious infections has been ascribed to a variety of causes, including: 1) Increasing age of the general population, 2) increasingly long and complex surgeries and 3) a growing immuno-suppressed population associated with cancer therapies, organ transplants and HIV infection. Overuse of antibiotics in both medical and agricultural settings, improper sanitation and a general lack of concern about antibiotic resistant organisms have all contributed to the increasing frequency of multiply antibiotic resistant bacteria. Taken together, these two trends suggest that we will soon be faced with bacterial infections which are resistant to all therapies. Indeed, the first report of vancomycin-resistant S. aureus has just been published (Lancet (1997) 350:1670-1673).
Identification of conserved essential proteins is a key step in the development of broad- spectrum antibiotics. If a target protein is conserved across taxonomic lines, the possibility that antibiotics acting on that protein will be effective on a wide range of bacteria is maximized. As examples, DNA gyrase and RNA polymerase are found in all bacteria, which helps to explain why quinolones and rifampicin are good broad- spectrum antibiotics. However, not all bacteria synthesize peptidoglycan, which explains why b-lactam antibiotics are ineffective against Chlamydia, Rickettesia and Legionella species. The recent publication of several complete eubacterial genomic sequences (Science (1995) 270:397-403; Science (1997) 277: 1453-1474; Nature (1997) 390:249-256) allows the identification of bacterial proteins which have orthologues in all of the sequenced genomes. This approach has lead to the identification of many
conserved protein families (Science (1997) 278:631-637). In some cases a biochemical function for the conserved family may be deduced from their predicted amino acid sequence. In other cases no function can be predicted for the protein family. However, it is impossible to predict the physiological role of a protein or protein family without detailed characterisation of at least one family member.
Following identification of a conserved bacterial protein family, the protein must be shown to be essential for bacterial viability if it is to serve as an antibiotic target. Genetic systems have been developed to demonstrate a genes essentiality in both E. coli (J. Bacteriol. (1997) 179:6228-6237) and B. subtilis (Genes Dev. (1991) 177:4194- 4197). In some instances these systems suffer either from a reliance on negative data, failure to disrupt a given gene, or insufficient repression of the candidate gene, which can lead to misidentification of genes essentiality. Clean data from taxonomically diverse bacteria, such as gram negative and gram positive strains offers the best evidence that a conserved bacterial protein family is essential for viability and will make a good broad-spectrum antibiotic target.
We have identified a family of conserved bacterial genes which we have designated the yihA gene family, after the name given to the E. coli gene family member. These genes have not been previously isolated nor the polypeptides expressed as no function has been ascribed to these genes. It has now been discovered that this family of genes encodes a family of polypeptides which are essential for the survival their host bacteria.
The invention therefore provides an isolated polypeptide of the yihA family as defined below particularly for use in the identification of novel antibiotic agents. The polypeptides of the present invention are believed to be essential to the viability of a wide range of bacteria including both gram positive and gram negative bacteria.
Any one of the following three methods may be used to identify members of the yihA family as claimed herein;
BLAST searches (J. Mol. Biol. (1990) 215:403 -10 and Meth. Enzymol. (1996) 266: 131-141, 227-258 both incorporated herein by reference) may be carried out using the yihA family member sequences as described in Figure 1. Such searches involve using in succession as query sequences, each of the existing yihA protein family member sequences to identify other full length members of the yihA family of proteins. Such family members yield high-scoring segment pairs (HSP) scores of greater than 100 in comparison to at least one member of the yihA family when the BLAST algorithm described in the reference above is used with a particular scoring matrix (a BLOSUM62 matrix - Proteins (1993) 17:49-61 incorporated herein by reference).
Profile based searches (Proceedings of the second International Conference on Intelligent Systems for Molecular Biology, pp28-36, AAAI Press, Menlo Park California, 1994 incorporated herein by reference) may be carried out using position- dependent scoring matrices defined for the yihA family members. These searches use a table compiled from a multiple sequence alignment which describes distinctive sequences of amino acids as probability values for each residue at each position in the gene family to identify other proteins which contain similar sequences of amino acids.
Motif based searches (Nucleic Acids Res. (1995) 24:189-196 incorporated herein by reference) may be carried out using PROSITE patterns defined for the yihA family members. These searches involve the representation as patterns, of the conserved sequence elements identified in the profile searches.
The isolated polypeptides of the invention may therefore be characterised by:
i) an HSP score of greater than or equal to 100 when compared with one of the sequences of Figure 1 when the BLAST algorithm is used with a BLOSUM62 scoring matrix ; or
ii) containing a set of amino acid sequences which are positively identified when position dependent scoring matrices according to Tables 1 -4 are used with MAST to yield a p-value of less than lxlO"30; or
iii) comprising at least one of the following amino acid sequences:
E-X(4)-G-[GR]-[STAG]-N-X-G-K-S-[STAG]; [VILM]-A-X(2)-S-X(2)-[PT]-G-X-T-[RKQN]-X(2)-N-X-[FY];
where, the letters denote an amino acid in one letter code, the square brackets denote a single amino acid, the amino acids within the square brackets are alternatives,
X is any one amino acid residue, and the numbers in the curved brackets refer to the number of residues at that position.
In a preferred aspect of the invention both of the amino acid sequences listed under iii) are present.
The invention also provides an isolated polypeptide sequence as set out in any of Figures 2a-d.
The polypeptides are preferably recombinant and ideally purified to homogeneity.
Also included as polypeptides according to the invention are variants, analogues and derivatives. Particularly those in which a number of amino acids have been substituted, deleted or added. Polypeptides which have at least 70% identity to any of the polypeptide sequences according to the invention, in particular the sequences of Figures 2a-d are encompassed within the invention. Preferably the identity is at least 80%, more preferably at least 90% and still more preferably at least or greater than 95% identity for example 97%, 98% or even 99% identity to any of the sequences according to the invention, in particular the sequences of Figures 2a-d.
Such polypeptides may also be fragments. In this regard a fragment is a part of a polypeptide according to the invention which retains sufficient identity of the original polypeptide to be effective for example in a screen. Such fragments may be fused to other amino acids or polypeptides or may be comprised within a larger polypeptide. Such a fragment may be comprised within a precursor polypeptide designed for expression in a host. Therefore in one aspect the term fragment means a portion or portions of a fusion polypeptide or polypeptide derived from a polypeptide according to the invention.
Fragments also include portions of a polypeptide according to the invention characterised by structural or functional attributes of a polypeptide according to the invention. These may have similar or improved chemical or biological activity or reduced side-effect activity. For example fragments may comprise an alpha helix or alpha-helix forming region, beta sheet and beta-sheet forming region, turn and turn forming regions, coil and coil-forming regions, hydrophilic regions, hydrophobic regions, amphipathic regions (alpha or beta), flexible regions, surface-forming regions, substrate binding regions and regions of high antigenic index.
Fragments or portions may be used for producing the corresponding full length polypeptide by peptide synthesis.
Specific polypeptides according to the invention include the polypeptides of Helicobacter pylori, Haemophilus influenza, Mycoplasma genitalium, Mycoplasma pneumoniae, Streptococcus pneumoniae, Streptococcus pyogenes, Pseudomonas aeruginosa, Saccharomyces cerevisiae, Methanobacterium jannaschii, Neisseria gonorrhoea, Neisseria meningitides, Staphylococcus epidermidis, Aquifex aeolicus, Bacillus subtilis and Escherichia coli.
The present invention further provides isolated polynucleotides which encode the polypeptides as defined herein, polynucleotides complementary thereto, or polynucleotides hybridising to any of the aforesaid polynucleotides. Isolated polynucleotides have been removed by separation from their natural environment and
those materials with which they are naturally associated. Preferably these polynucleotide molecules are provided in recombinant form (i.e. combined with one or more heterologous sequences).
Polynucleotide molecules which hybridise to polynucleotides encoding substances of the present invention, or to complementary polynucleotides thereto, preferably do so under stringent hybridisation conditions. One example of stringent hybridisation conditions which is sometimes used is where attempted hybridisation is carried out at a temperature of from about 35°C to about 65°C using a salt solution which is about 0.9 molar. However, the skilled person will be able to vary such conditions as appropriate in order to take into account variables such as probe length, base composition, type of ions present, etc.
The invention also provides polynucleotide variants, analogues, derivatives and fragments which encode polypeptides according to the invention. Polynucleotides are included which preferably have at least 70% identity over their entire length to a polynucleotide encoding a polypeptide according to the invention, most preferably those set out in Figures 2a-d. More preferred are those sequences which have at least 80% identity over their entire length to a polynucleotide encoding a polypeptide according to the invention. Even more preferred are polynucleotides which demonstrate at least 90% for example 95%, 97%, 98% or 99% identity over their entire length to a polynucleotide encoding a polypeptide according to the invention.
Polynucleotide molecules of the present invention may be used as probes for other members of the gene family or in anti-sense therapy to block or to reduce the expression of one or more of the polypeptides of the invention. Since these substances are believed to be essential to the bacteria expressing them, blocking or reducing their expression can provide an effective way of treating bacterial mediated diseases or disorders. Polynucleotides may also be used directly in screening and in generating whole cell screens by expression of a polypeptide of the inventions.
As part of the isolation process or thereafter the polynucleotides may be joined to other polynucleotides such as to form fusions or to regulatory elements for expression. Isolated polynucleotides alone or joined to other polynucleotides can be in introduced into a vector which itself will contain other elements of DNA or RNA for expression in a host cells. The invention therefore comprises a vector containing a polynucleotide generally operatively linked to appropriate expression control sequences.
Vectors for use in the invention include plasmid vectors, phage vectors and DNA or RNA viral vectors. These vectors may include gene sequences which render them inducible under certain conditions such as manipulation of the environmental conditions under which the host cells are maintained for example by temperature alteration or nutrient additives. Regulatory sequences include for example a promoter to direct mRNA transcription. Such promoters include for example E. coli. lac, trp, tac and araBAD as well as the SV40 early and late promoters Such systems and sequences would be well known to those skilled in the art.
Host cells expressing a polynucleotide of the present invention can be generated by any of the traditional routes such as transfection or electroporation see for example Davis et al, Basic Methods in Molecular Biology, (1986) and Sambrook et al Molecular Cloning: A Laboratory Manual, 2nd Edition., Cold Spring Harbor Lab. Press, Cold Spring Harbor, N.Y. (1989).
This invention also provides a method for identification of molecules such as antagonists, that bind to the polypeptide or a polynucleotide encoding a polypeptide of the present invention.
Selective whole-cell screens combine the sensitivity and specificity of in vitro biochemical assays with the direct demonstration of in vivo activity seen in whole cell screens. Biochemical assays for inhibition of polypeptide activity with purified polypeptides or bacterial extracts can be more sensitive than whole cell killing assays and provide direct evidence for a compound's mode of action. However, this approach requires that the target polypeptide is known and the activity of the polypeptide be
amenable to in vitro assays. Nor does it address other factors, such as membrane permeability or compound stability, which can limit a compounds effectiveness as an antibiotic.
Whole cell screening of compounds for killing activity will identify molecules which kill cells at the concentrations tested, but provide no information on the mode of action of the compound and may not have the sensitivity needed to detect less potent compounds. Bacterial strains which contain surrogate markers whose activity is linked to that of the target gene or which have been engineered to over-express or under- express the target polypeptide can be used for selective whole-cell screens.
Surrogate markers, easily assayed reporter molecules whose activity is tightly coupled to the activity of the polypeptide being studied, may be used as a means of assaying antibiotics. The invention further provides a host cell comprising a vector as defined herein and a reporter gene encoding a reporter molecule whose activity is linked to that of the polypeptide encoded by the vector. Examples of such systems include a transcriptional fusion of the E. coli lacZ gene to vanH promoter in a B. subtilis strain expressing VanS and R as a reporter for inhibition of cell wall biosynthesis (J. Bacteriol. (1996) 178:6305-6309), the use of lacZ transcriptional and translational fusions to rpoB and rpoC to monitor RNA polymerase activity (Mol. Microbiol. (1996) 19:483-493) and the use of a secA-lacZ gene fusion as a reporter for inhibition of secA activity (Genetics (1988) 118:571-579).
When the function of a gene is unknown, surrogate markers for the activity of the gene can be identified using at least two approaches. Two dimensional electrophoresis coupled with mass spectrometry analysis of isolated polypeptides, proteome mapping, has been used to identify specific polypeptides which increase in abundance in response to polypeptide or RNA synthesis inhibitors (Microbial & Comparative Genomics (1996) 1 :375). Tightly regulated promoters used to demonstrate that the E. coli and B. subtilis conserved, essential polypeptides are essential can also be used to reduce the concentrations of these polypeptides. In a manner similar to that described above, proteome maps generated from bacteria depleted of the conserved essential genes can
be used to detect polypeptides which change in abundance as compared to wild-type bacteria. Transcriptional or translational fusions to these polypeptides can be used as reporter molecules to screen for antagonists of members of the conserved essential gene family. As an alternative to proteome mapping, transposons or other mobile genetic elements containing reporter genes can be used to search for reporter molecules. Such an approach has been used to identify vancomycin responsive genes in S. aureus (Antibiot. (Tokyo) (1991) 44:210-217). As with proteome mapping, bacteria in which conserved essential genes are controlled by tightly regulated promoters can be used to screen for transposon carrying strains in which expression of promoterless reporter genes is induced upon depletion of the polypeptides.
Once a reporter gene has been identified, screening of compounds for induction or inhibition of the marker can be undertaken. Standard broth or plate assays can be used in many different formats. Such assays will detect molecules which antagonise the response which couples the activity of the conserved, target polypeptide to the reporter molecule. Thus, the compounds identified may act directly upon the target polypeptide or on another stage in the pathway which leads to activation of the reporter.
Screens for inhibitors of the target which do not require the use of surrogate markers may be designed by manipulating expression levels of the target polypeptide. For example, quinolone resistant strains of E. coli have been made by over-expression of gyrA (FEMS Microbiol. Lett. (1997) 154:271-276), over-expression of alanine racemase has been shown to increase resistance to cycloserine in M. smegmatis (J. Bacteriol. (1997) 179:5046-5055), and multicopy plasmids carrying murZ have been shown to increase phosphomycin resistance in both E. coli (J. Bacteriol. (1992)
174:5748-5752) and A calcoaceticus (FEMS Microbiol. Lett. (1994) 117:137-142). Similarly, strains more sensitive to antibiotics may be made by reducing expression levels of the polypeptide targeted by the antibiotic. Over or under-expression of members of the conserved, essential gene family may be used to screen for antibiotics which act either directly on gene or gene product or indirectly on the pathway which it is involved.
Another example of an assay for antagonists is a competitive assay that combines the polypeptide of the present invention and a potential antagonist with membrane-bound binding molecules, recombinant binding molecules, natural substrates or ligands, or substrate or ligand mimetics, under appropriate conditions for a competitive inhibition assay. The polypeptide can be labelled, such as by radioactivity or a colorimetric compound, such that the number of polypeptide molecules bound to a binding molecule or converted to product can be determined accurately to assess the effectiveness of the potential antagonist.
The present invention therefore provides a method of assaying compounds for activity against bacteria comprising:
i) providing a polypeptide according to the invention; ii) contacting said polypeptide with candidate inhibitory compounds; and iii) measuring for binding to said polypeptide or fragment.
The present invention also provides a method of assaying compounds for activity against bacteria comprising:
i) expressing a polypeptide according to the invention in a host cell; ii) contacting said cell with candidate inhibitory compounds; and iii) measuring cell death.
The present invention further provides a method of screening for an antibiotic which method comprises:
i) transfecting a host cell with a vector comprising a polynucleotide encoding a polypeptide as defined herein; ii) allowing the host cell to express the polynucleotide; iii) increasing the level of expression of the polypeptide as defined herein; and iv) assaying for increased resistance.
Alternatively the method may be carried out as above but the level of expression of the polypeptide is decreased and the cells are assayed for increased sensitivity to an inhibitor.
The present invention also provides a method of assaying compounds for activity against bacteria comprising:
i) generating a bacterial strain containing a reporter gene linked to the gene encoding a polypeptide according to the invention; ii) contacting said strain with candidate inhibitory compounds; and iii) measuring for induction or inhibition of said marker.
Potential antagonists include small organic molecules, ions which interact specifically with a polypeptide or polynucleotide for example a substrate, cell membrane component, receptor a fragment thereof or a peptide. Such molecules may include antibodies, antibody-derived reagents or chimaeric molecules.
Potential antagonists also may be small organic molecules, a peptide, a polypeptide such as a closely related protein or antibody that binds to the same sites on a binding molecule without inducing functional activity of the polypeptide of the invention.
The antibodies may be monoclonal or polyclonal. Techniques for producing monoclonal and polyclonal antibodies which bind to a particular polypeptide are now well developed in the art. They are discussed in standard immunology textbooks, for example in Roitt et al (Immunology, Churchill Livingston, 2nd Edition (1989)).
In addition to whole antibodies, the present invention covers variants thereof which are capable of binding to an epitope present or a substance of the present invention. The variants may be antibody fragments or synthetic constructs. Examples of antibody fragments and synthetic constructs are given by Dougall et al in Tibtech 12 372-379 (September 1994). Antibody fragments include Fab and Fv fragments.
Other synthetic constructs include CDR peptides. These are synthetic peptides comprising antigen binding determinants. Peptide mimetics may also be used. These molecules are usually conformationally restricted organic rings which mimic the structure of a CDR loop and which include antigen-interactive side chains. Synthetic constructs include chimaeric molecules. Thus, for example, humanised antibodies or derivatives thereof are within the scope of the present invention. An example of a humanised antibody is an antibody having human framework regions, but a rodent or other non-human hypervariable regions. Synthetic constructs also include molecules comprising a covalently linked moiety which provides the molecule with some desirable property in addition to antigen binding. For example the moiety may be a label (e.g. a fluorescent or radioactive label) or a pharmaceutically active agent.
Other potential antagonists include antisense molecules (see Okano, J. Neurochem. 56:560 (1991); Oligodeoxynucleotides As Antisense Inhibitors Of Gene Expression, CRC Press, Boca Raton, FL (1988), for a description of these molecules).
In a particular aspect the invention provides the use of the polypeptide, polynucleotide or antagonist of the invention to interfere with the initial physical interaction between a pathogen and mammalian host responsible for sequelae of infection.
The invention further includes molecules which block the function of the polypeptides according to the invention or a polynucleotide encoding the same, identifiable by any of the above described methods.
An antagonist of the invention may be provided in pharmaceutical compositions which may include a carrier. They may be provided in unit dosage form. Such agents and pharmaceutical compositions are within the scope of the present invention. In order to prepare such pharmaceutical compositions the inhibitors will normally be provided in substantially pure form. They can then be combined with a carrier under sterile conditions.
The present invention also provides a method of treatment which comprises administering to a patient an effective amount of an antagonist of the expression or function of a polypeptide as defined herein.
The present invention further provides the use of an antagonist of a polypeptide as defined herein or a polynucleotide encoding the same for the manufacture of a medicament for the treatment of a bacterial infection.
Figures
Figure 1 shows the multiple sequence alignment of the yihA family members which may be used for BLAST based identification.
Figures 2a-d shows both the position-dependant scoring matrices used for profile- based identification of yihA family members and examples of the motifs recognised by each matrix in the family members. Figure 2a shows examples of motif 1 in the yihA family. Figure 2b shows examples of motif 2 in the yihA family. Figure 2c shows examples of motif 3 in the yihA family. Figure 2d shows examples of motif 4 in the yihA family.
Figure 3 shows the PROSITE patterns which may be used to recognize yihA family members.
Figure 4 shows the outline cloning strategy for a gene disruption plasmid. The black box represents the adapter sequence.
Figure 5 shows Growth dependence on arabinose of a conditional mutant in the E coli gene yihA. An E. coli MG1655 derivative in which the chromosomal areBAD genes have been replaced with yihA and the native yihA gene has been deleted is shown on the upper half of each plate and a wild-type control is shown on the lower half of each plate.
Figure 6 is a diagram of the vector used to create conditional mutants in B. subtilis.
Figure 7 shows growth dependence on xylose of a conditional mutant in the B. subtilis yihA orthologue ysxC.
Examples
Example 1. Identification of conserved bacterial open reading frames.
The predicted open reading frames obtained from the complete E. coli genomic sequence (Science (1997) 277: 1453-1474) were compared in a serial manner to the predicted open reading frames of the H. influenzae (Science (1995) 270:397-403), M. genatilum (Science (1995) 270:397-403), Synechocystis (Nuc. Acids Res. (1998) 26: 63-67) and B. subtilis (Nature (1997) 390:249-256) complete genome sequences using the BLAST algorithm (J. Mol. Biol. (1990) 215:403-10). All matches with BLAST Score of greater than 75 were then analysed in a pair- wise fashion using the SIM algorithm (Advances in Applied Mathematics (1991) 12:337-357). The SIM score was then divided by a '"selfSIM" score, a value obtained when the query protein is compared to itself using SIM algorithm with the PAM200 matix, to yield a similarity value of between 1.0 and 0. Proteins for which this similarity value was greater than 0.2 when the E. coli protein was compared to either the B. subtilis or M. genatilum genome where then compiled into a list and manually screened to identify proteins of unknown function. Those open reading frames which also had high similarity values in other bacteria were then considered as candidate genes and targets for gene disruption.
Example 2. Demonstration of essentiality of yihA genes in E. coli.
2A - In-frame deletion of selected genes in E. coli.
A disruption plasmid was constructed using DNA containing an in-frame deletion of the gene of interest plus -900 base pairs of 5' and 3' flanking DNA for homologous recombination. The plasmid was cloned into the gene-replacement vector pKO3 as
follows: Two separate PCR reactions were used to amplify fragments of approximately 900 base pairs of 5' and 3' sequence flanking the gene of interest. Chromosomal DNA from E .coli strain MG1655 was used as the template. Primers 2 and 3 carry a 5' extension of a 33 bp adapter sequence
adaptor sequence forward direction 5'-gttataaatttggagtgtgaaggttattgcgtg; adaptor sequence reverse direction 5'-cacgcaataaccttcacactccaaatttataac.
Subsequently, the 2 PCR products were purified using High Pure™PCR Product Purification Kit (Boehringer Mannheim Inc., Mannheim, GE). Using the adapter sequence, the 2 PCR products are assembled in a second PCR reaction to give a single product . Following restriction enzyme digestion, preparative agarose gel electrophoresis and purification using Jetsorb™Gel Extraction Kit (Genomed Inc.) the final product was cloned into pKO3 using standard techniques. This clone is referred to as the disruption plasmid. All PCR reactions described in this section were performed with PWO™ DNA Polymerase (Boehringer Mannheim Inc., Mannheim, GE). In the final product the gene of interest was deleted from the start to the stop codon and replaced by the 33 bp adapter sequence [e.g. 5'-
ATGgttataaatttggagtgtgaaggttattgcgtgTAA-3']. As a consequence the reading frame is maintained.
2B - Construction of an in- frame deletion mutant of Escherichia coli
The disruption vector pKO3 (A.J.Link et al., J. Bacteriol. 179:6228-6237,1997) is a derivative of pMAK700 (C.A.Hamilton et al, J. Bacteriol. 171.4617-4622). It features the repA (Ts) replication origin derived from pSClOl [permissive at 30°C but inactive at 42 to 44°C], the cat gene encoding chloramphenicol resistance and the sacB gene for counter selection against vector sequences in the presence of 5% sucrose.
The disruption plasmid described above was transformed into MG1655. Subsequently, chromosomal integrates (cointegrates produced by a single homologous recombination event) of the plasmid were isolated by selecting clones on chloramphenicol at 44°C.
Following 2-times purification under the same conditions, the cointegrates are grown at 30°C in the presence of 5% sucrose to force resolution of the cointegrate and elimination of the plasmid from the cell. At this step, a preliminary assignment if a given gene is essential or non-essential for growth of E. coli in complex media was made. The genotype of the chloramphenicol-sensitive clones obtained following cointegration and resolution of the disruption plasmid was determined by colony-PCR using primers cl and c2 (see Fig.4). In the case of a non-essential gene, the second recombination event can result in either a wild-type or a mutant genotype. The testing of 20 independent clones, showed routinely that a ~l:l distribution of wild-type versus mutant genotype in case of a non-essential gene. Recovery of only wild-type genotype in 50 independent clones was considered as preliminary evidence for a gene's essentiality.
2C - Construction of a conditional mutant and final proof that a given gene is essential for growth of E. coli
A vector, pRDC15 was designed, which allows a copy of a putative essential gene to be placed in ectopic position on the chromosome under the control of a tightly regulated promoter. The plasmid is a derivative of pKO3. In addition to the attributes of pKO3, pRDCl 5 carries a DNA fragment consisting of the araC gene, the arabinose promoter, a cloning site [BamRl-Nhel-Sfil-Xhol-Sphl-Sftl] and the polB gene. The wild-type copy of a putative essential gene was amplified by PCR and cloned into the vector pRDC15 using restriction sites Nhel and Xhol. The resulting construct was used for gene replacement in a manner identical to the disruption plasmids described above. In this case the araC and polB genes of pRDC 15 represent the homologous DΝA for recombination at the araCBADpolB locus of the E. coli chromosome. Following cointegration and resolution, the araBAD genes in the E. coli chromosome are replaced by the wild-type copy of the gene of interest, which is now under the control of the arabinose promoter. This merodiploid strain is then used to construct an in frame deletion of the wild-type target gene using the disruption plasmid described above in the presence of 0.2% arabinose. In this case, the deletion mutant can be obtained since a wild-type copy is expressed in trans from the arabinose locus. The resulting strain is a
conditional mutant as expression of the target gene is now dependent on the presence of arabinose. The inability of such a strain to grow in the absence of arabinose is a final proof that a given gene is essential for growth of E. coli. Figure 5 shows that the gene yihA is essential in E. coli.
Example 3 ysxC is an essential gene in Bacillus subtilis.
3 A - Construction of a B. subtilis integrative plasmid for xylose controlled gene expression.
An integrative plasmid allowing the expression of genes under the control of a xylose inducible promoter was constructed as follows: A DNA fragment carrying the repressor gene xylR and the xylA promoter was PCR amplified from B. subtilis genomic DNA with the following primers:
pxyl-4: 5 '-atcgctcgagAGATGCACCTTCTATACCCG-3 ' pxyl-7: 5'-atcgaagcttAGCGATCCTACACAATCATG-3'
The primers were designed such that they introduced a unique Ec RI site at the 5' end of the PCR product and a unique BamHl site at the 3' end of the product. The PCR fragment was then cloned as an EcoRl-BamHI fragment into the B. subtilis integrative vector pDG648 to yield pRDC9 (Figure 6).
3B - Construction of the disruption plasmid.
A DNA fragment containing approximately 100 bp sequence from the 5' region of ysxC was amplified by PCR from B. subtilis genomic DNA. The PCR primers were designed such that the resulting PCR product contains unique restrictions site at both the 5' and 3 'ends of the PCR product. Subsequently, the PCR product was cloned into pRDC9.
3C - Construction of a conditional mutant.
The disruption plasmid was inserted into B. subtilis strain JH642. Chromosomal integration of the plasmid via single-reciprocal Campbell-like recombination at the ysxC locus into the chromosome was driven by selection on LB plates containing erythromycin (1 μg/ml), lincomycin (25 μg/ml) and 10 mM xylose. The resulting strain is a conditional mutant in which expression of ysxC is dependent on the presence of xylose into the growth medium.
3D - Confirmation that ysxC is an essential gene.
Confirmation of that ysxC is essential for growth was obtained by streaking the ysxC conditional mutant on LB plates plates containing erythromycin (1 μg/ml), lincomycin (25 μg/ml) with or without 10 mM xylose. The strain formed single colonies only on xylose containing plates thereby indicating that expression of ysxC is indispensable for growth (Figure 7).
Example 4 - Characterisation of the yihA polypeptide family
4A - Repetitive BLAST searches
Repetitive BLAST searches (Altschul, S.F., Gish, W., Miller, W., Myers E.W., and. Lipman, D.J. (1990). Basic local alignment search tool. J. Mol. Biol. 215:403-10) in which each of the of the yihA protein family members described below were used in succession as query sequences to identify other members of the yihA family as proteins which yield high-scoring segment pairs (HSP) scores of greater than 100 in comparison to at least one member of the yihA polypeptide sequnces shown in figure 1 when a BLOSUM62 scoring matrix is used.
Sources for each of the sequences set out in Figure 1 are given below:
H. influenzae - yihA, Swissprot accession number P46453 E. coli - yihA, Swissprot accession number P24253
S. epidermidis - Glaxo Wellcome S. epidermidis genomic sequencing project ORF
Z0304001 (B. Kimmerly, unpublished data) B. subtilis - ysxC, Swissprot accession number P38424
S. pyogenes - gnl|OUACGT|Contig301 from S. pyogenes genome sequencing project, B.A. Roe, S. Clifton, Mike McShan and Joseph Ferretti
(http ://www. genome, ou. edu/strep . html S. pneumoniae - Glaxo Wellcome S. pneumoniae genomic sequencing project contig
SP07_00013 (G. Feger, unpublished data) M. jannaschii - Y320, Swissprot accession number Q57768 M. genitalium - Y335, Swissprot accession number P47577 M. pneumoniae - Y335, Swissprot accession number P75303 S. cerevisiae - D9651.4, Swissprot accession number Q05473 N. gonorrhoea - genebank accession number gl 914833 A. aeolicus - genebank accession number g2984110 P. aeruginosa - gnl|PAGP|Contig639, the Pseudomonas Genome Project
(http://www.pseudomonas.com) N meningitidis - contig GΝMCY55F, Ssequence data for N meningitidis was obtained from The Institute for Genomic Research website at http://www.tigr.org. H. pylori - HP 1567, genebank accession number g2314750
4B - Profile based searches
Multiple sequence alignments of the yihA family members have been used to identify short patterns of amino acid sequences, which are common to all of the family members. Four motifs have been identified in the yihA gene family using the motif discovery tool, MEME (Bailey, T. L. and Elkan, C, Fitting a mixture model by expectation maximization to discover motifs in biopolymers, Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pp. 28-36, AAAI Press, Menlo Park, California, 1994). Each of the four motifs are shown as they exist in each of the family members and are explicitly described as position-dependent scoring matrices, or profiles. Together these profiles can be used by the motif
alignment and search tool, MAST, described in the same reference, to search databases for yihA family members, which are positively identified when p-values of less than 1 x 10"30 are obtained. Where p-values are based on a random sequence model that assumes each position in a random sequence is generated according to the average letter frequencies of all sequences in the peptide non-redundant database (ftp://ncbi.nlm.nih.gov/blast/db/) on September 22, 1996.
Tables 1 to 4 show the position dependent scoring used to define the yihA family. Values in the position-dependent scoring matrix are calculated by taking the log (base 2) of the ratio p/f at each position in the motif where p is the probability of a particular letter at that position in the motif, and f is the average frequency of that letter in the training set. Columns correspond to 1 letter amino acid codes and rows correspond to the position in the motif.
Table 1 - Position-dependent scoring matrix. Values are the position-dependent scoring matrix are calculated by taking the log (base 2) of the ratio p/f at each position m the motif where p is the probability of a particular letter at that position in the motif, and f s the average frequency of that letter in the training set. Columns correspond to 1 letter amino acid codes and rows correspond to the position in the motif. log-odds matrix: alength= 20 w= 19 n= 2828 bayes= 7.551
A C D E F G H I K L M N P Q R S T V W Y
1 -4 627 -5 566 -1 994 3 934 -5 992 -5 13. -4 319 -5 263 -2 733 -6 028 -5 261 -4 113 -5 943 -3 377 -5 195 -5 180 -5 125 -5 356 -5 664 -5 723
2 -3 632 -3 194 -5 898 -5 772 -3 519 -5 982 -5 786 3 052 -5 545 -2 032 1 337 -5 403 -5 893 -5 608 -5 850 -5 283 -3 559 2 720 -5 293 -4 633
3 3 001 2 369 -4 617 -4 272 -3 291 -2 960 -4 078 1 008 -4 231 -2 612 -2 179 -4 156 -4 831 -4 133 -4 190 -2 080 -2 662 0 958 -4 024 -3 972
4 -4 441 -3 810 -6 755 -6 354 3 779 -6 364 -4 859 -1 516 -6 175 1 552 -1 063 -6 052 -5 833 -5 098 -5 765 -5 706 -4 489 0 962 -3 440 -2 903
5 2 260 -1 636 -4 165 -3 572 -1 822 -3 155 -2 660 -0 854 -3 287 0 502 2 231 -3 197 -3 684 -2 961 -3 158 -0 234 -1 826 1 831 -2 536 -2 261
6 -3 229 -4 353 -3 776 -4 543 -5 442 3 784 -4 184 -5 595 -4 364 -5 904 -4 855 -3 514 -4 954 -4 740 -4 265 -3 737 -4 690 -5 096 -4 672 -5 030
7 -3 596 -2 891 -4 449 -4 574 -4 716 -1 839 -2 006 -4 160 -1 434 -4 147 -3 764 -3 344 -3 774 -2 453 4 113 -3 831 -3 889 -4 786 -3 066 -4 173
8 -1 748 -1 931 -3 399 -3 981 -3 613 -3 218 -3 189 -3 780 -3 158 -3 970 -2 935 -1 905 -3 410 -3 351 -3 068 3 411 0 897 -3 748 -3 757 -3 487
9 -4 577 -4 085 -3 088 -4 952 -4 646 -4 342 -1 904 -4 290 -4 121 -5 108 -4 290 4 350 -4 739 -3 657 -4 372 -2 917 -3 619 -4 710 -4 165 -4 371
10 2 063 -2 724 -5 672 -5 589 -4 330 -3 735 -5 152 -1 555 -5 640 -3 411 -3 208 -4 913 -4 771 -5 121 -5 328 -3 060 -3 101 3 284 -5 428 -5 211
11 -3 229 -4 353 -3 776 -4 543 -5 442 3 784 -4 184 -5 595 -4 364 -5 904 -4 855 -3 514 -4 954 -4 740 -4 265 -3 737 -4 690 -5 096 -4 672 -5 030
12 -3 867 -3 862 -4 925 -4 583 -5 500 -4 863 -3 812 -4 287 3 990 -5 023 -4 091 -3 799 -4 740 -4 012 -0 799 -4 651 -4 128 -4 881 -4 315 -4 890
13 -4 038 -6 288 -6 639 -6 087 -5 193 -5 958 3 726 -3 427 -3 021 -3 852 -2 811 -1 774 -3 320 -3 ?43 -2 936 3 310 1 414
15 1 994 -3 319 -6 193 -5 572 3 131 -5 434 -4 351 0 501 -5 279 1 257 -0 928 -5 197 -5 317 -4 498 -4 980 -4 710 -3 609 -1 791 -3 464 -3 333
16 -3 717 -3 345 -5 557 -5 469 -2 811 -5 755 -5 203 3 366 -5 150 1 590 -1 382 -5 084 -5 616 -5 013 -5 350 -4 988 -3 578 0 649 -4 414 -4 095
Table 2 - Position-dependent scoring matrix.
Values are the position-dependent scoring matrix are calculated by taking the log (base 2) of the ratio p/f at each position in the motif where p is the probability of a particular letter at that position in the motif, and f is the average frequency of that letter in the training set. Columns correspond to 1 letter ammo acid codes and rows correspond to the position m the motif.
6 1 315 -3 131 -1 890 -1 433 -3 636 -0 123 -1 525 -3 408 2 806 -3 268 -2 398 -1 520 -3 090 0 825 -1 109 0 796 -1 936 -2 924 -3 543 -2 911
Table 3 - Position-dependent scoring matrix.
Values are the position-dependent scoring matrix are calculated by taking the log (base 2) of the ratio p/f at each position in the motif where p is the probability of a particular letter at that position in the motif, and f is the average frequency of that letter in the training set. Columns correspond to 1 letter amino acid codes and rows correspond to the position in the raotif. l og -od l mat rix a leng t 20 w- 55 n~ 7288 baye - 7 56739
A C D E F G H I K L M N P Q R S T V W Y
1 -3 934 -3 203 -3 997 -4 110 -0 772 -4 162 -3 343 -3 909 -3 898 -2 533 -2 718 -3 774 -4 478 -2 019 -2 222 -4 098 -3 943 -3 4B6 6 039 -1 474
2 -1 730 -3 370 -1 305 0 733 -3 473 2 151 -1 392 -3 123 0 731 -3 034 -2 181 -1 447 -2 907 2 255 -1 013 -1 775 -1 855 0 945 -3 411 -2 764
3 -2 000 1 756 -2 177 -1 473 -3 631 -0 022 -1 342 -3 203 2 339 -3 046 1 484 1 539 -3 146 -0 829 2 065 -1 981 -1 986 -2 845 -3 338 -2 801
4 0 197 -1 385 -3 750 -3 109 0 849 -3 085 1 460 -0 433 -2 791 0 816 3 712 -2 735 -3 240 -2 463 -2 685 -0 022 -1 413 0 719 -2 049 -1 725
5 -2 336 -2 185 -4 401 -0 157 -1 693 -4 023 -2 996 2 136 -3 501 2 448 -0 465 -3 6u2 -3 907 -3 072 -3 337 -3 164 0 088 -0 762 -2 772 -2 568
6 -1 558 -2 953 -1 447 2 354 -2 931 1 805 -1 265 -2 579 -0 536 -0 373 -1 813 -1 378 -2 788 -0 781 0 664 -1 621 1 rr3 -2 265 2 109 -2 391
7 0 111 -4 114 1 524 2 820 -4 021 -3 034 -1 861 -3 596 -1 272 -3 485 -2 611 -1 804 -2 991 0 991 -2 033 -2 083 0 274 -3 042 -4 072 0 924
8 -3 624 -3 259 -4 426 -4 323 1 439 -4 726 -1 099 -3 333 -4 143 -0 268 -2 691 -3 697 -4 537 -3 692 -3 822 -3 70b -3 959 -3 401 -1 033 4 503
9 -3 202 -2 837 -4 756 -4 018 -1 598 -4 753 -3 330 0 694 -0 719 3 054 -0 114 -4 149 -3 973 -3 054 -3 345 -3 896 -3 091 -1 498 -3 001 -3 026
10 -1 176 -2 763 -1 190 1 190 -2 792 -2 302 -0 893 -2 593 -0 410 -0 469 1 608 -0 °"5 -2 377 1 676 0 588 0 878 2 017 -2 179 -2 857 -2 159
12 -3 448 -4 066 -1 819 -3 977 -0 660 -3 566 -3 158 -3 616 -2 231
13 0 111 -4 176 1 526 2 380 -4 066 -3 094 -1 900 -3 614 1 778 -3 498 -2 623 -1 909 -3 001 0 996 -2 024 -2 130 -2 202 0 046 -4 104 -3 322
14 -1 881 1 779 -0 942 1 251 -3 569 -2 232 -1 293 -3 561 -1 197 -3 447 -2 593 2 570 -2 950 1 769 -1 757 1 497 -1 825 -3 126 -3 663 -2 793
r-
)7 49? _ 1114 ! 400 2 <14( 5 71 •> i ,10 2 801 4 1 )' -? 095 -4 5C4 -3 7(1 0 674 -1 1 I) ill ) 11 ? 911 1 |1<I ) 1 HOB - ', 701 -4 645 38 -5 310 -4 123 -5 717 -5 836 3 300 -5 661 -1 848 -4 275 -5 286 ~3 732 -3 628 0 710 -5 577 -4 311 -4 677 -4 909 -4 984 -4 358 4 483 2 548
3) 1 j28 -2 HI - ' 484 4 848 2 411 -4 750 -3 726 1 r41 -4 545 1 844 -0 609 -4 470 -4 698 ϊ 904 4 299 -3 970 -2 946 -1 304 -3 043 -2 928
40 -1 951 -3 441 -1 932 0 508 -3 651 -2 975 3 060 -3 254 2 310 -3 077 -2 240 -1 645 -3 094 1 802 -0 526 -1 957 -1 971 0 986 -3 375 -2 811
41 0 737 -2 472 0 374 0 375 -0 515 -2 336 1 464 -2 468 -0 710 -2 359 -1 570 -1 029 -2 435 -0 708 -1 174 0 058 0 938 -2 139 -1 961 3 008
42 -0 547 -1 644 -2 512 -2 296 1 711 -3 019 1 293 -1 842 -2 275 -0 816 -1 258 -2 268 -2 787 -2 246 -2 177 -0 050 0 106 -1 749 -0 550 3 741
43 -2 070 -3 618 1 532 0 624 -3 858 2 269 -1 490 -3 840 -1 439 -0 442 -2 881 1 704 -3 104 0 917 -2 059 -1 649 -2 048 -3 378 -3 935 -3 042
44 -2 732 -2 530 -4 827 -4 444 -2 206 -4 543 -3 727 3 099 -3 939 0 766 -0 935 -4 055 -4 579 -3 830 1 720 -3 707 -2 657 0 836 -3 376 -3 060
45 -0 210 -3 173 -3 137 -2 991 -3 775 -3 343 -2 837 -3 571 -2 788 -3 408 -3 176 -3 264 4 034 -2 590 -2 998 -2 275 -2 559 -3 123 -4 296 -4 141
46 -1 096 -1 575 -0 912 -3 271 0 410 -3 733 -2 671 0 487 -3 279 -1 847 -1 461 -3 387 -3 329 -3 219 -2 973 -3 017 -0 475 3 375 -3 325 -3 318
47 -1 334 -1 801 -2 194 0 276 -1 754 -2 731 2 946 1 772 -1 351 1 128 -0 721 -1 788 -2 844 0 793 -1 683 -1 717 0 385 -0 863 -2 318 -1 878
48 -0 207 -2 002 -4 133 -3 894 -2 913 -4 368 -3 392 1 652 -3 925 -0 559 -1 822 -4 015 -3 935 -3 913 -3 650 -3 694 -2 109 3 289 -4 065 -3 997
49 -2 847 -2 579 -5 366 -4 797 -2 054 -4 756 -3 855 1 613 -4 518 2 072 -0 948 -4 427 -4 725 -4 043 -4 362 -3 969 -2 823 2 239 -3 352 1 093
50 1 318 1 712 -4 308 -3 647 0 781 -3 626 -2 645 0 687 -3 335 2 154 1 658 -3 294 -3 646 -2 883 -3 133 -2 761 -1 929 -0 756 -2 394 -2 169
51 -2 241 -2 252 -3 594 -4 028 -3 774 -3 825 -3 207 -2 770 -3 223 -3 897 -2 266 -2 041 -3 867 -2 922 -3 251 1 317 3 584 -2 395 -3 813 -4 069
52 -3 540 -3 578 -4 644 -4 201 -5 184 -4 616 -3 450 -3 932 3 944 -2 689 -3 715 -3 458 -4 474 -3 554 -0 458 -4 311 -3 789 -4 508 -4 028 -4 556
53 3 217 -0 823 -3 447 -0 897 -0 292 -1 809 -2 875 -0 486 -3 013 -2 266 0 560 -3 076 -3 999 -2 947 -2 983 -0 892 -1 855 -0 986 -2 896 -2 998
54 -3 227 -3 646 4 064 -1 034 -4 141 -3 671 -2 470 -4 063 -3 784 -4 397 -3 716 -0 857 -4 576 -3 351 -3 793 -3 229 -1 684 -3 849 -3 918 -3 682
55 -3 722 -3 725 -4 788 -4 431 -5 370 -4 738 -3 668 -4 141 3 979 -4 880 -3 943 -3 651 -4 610 -3 853 -0 650 -4 506 -3 983 -4 735 -4 181 -4 751
Table 4 - Position-dependent scoring matrix.
Values are the position-dependent scoring matrix are calculated by taking the log (base 2) of the ratio p/f at each position in the motif where p is the probability of a particular letter at that position in the motif, and f is the average frequency of that letter in the training set. Columns correspond to 1 letter ammo acid codes and rows correspond to the position in the motif. log-odds matrix alength- 20 w- 11 n» 2948 bayes= 8 42265
A C D E F G H I K L M N P Q R S T V W Y
1 -3 251 -2 238 -4 266 -4 273 4 412 -4 520 -3 589 -2 262 -4 336 -1 531 -1 815 -4 133 -3 973 -4 350 -4 429 -2 348 -3 957 -2 565 -1 706 -0 484
2 -1 492 -1 656 -3 075 -3 660 -3 319 -2 959 -2 883 -3 481 -2 845 -3 673 -2 640 -1 597 -3 127 -3 047 -2 764 3 350 1 017 -3 475 -3 467 -3 189
3 1 357 -1 700 -3 628 -4 024 -3 973 -2 362 -3 467 -4 035 -3 291 -4 205 -3 211 -2 436 -3 343 -3 451 -3 508 3 217 -0 385 -3 124 -4 097 -3 973
4 -1 177 -1 778 -1 669 0 660 -1 660 -2 557 -1 342 -0 886 -1 111 1 591 -0 557 -1 589 1 218 1 041 -1 493 -1 542 -1 246 1 374 -2 258 -1 823
5 -1 178 -2 769 -0 870 0 822 -2 840 -2 048 -0 762 -2 695 2 664 -2 568 -1 681 1 078 -2 327 -0 349 -0 720 0 472 0 645 -2 274 -2 856 -2 124
6 -3 300 -3 580 -4 306 -3 397 -5 045 -4 242 -2 601 -3 834 3 788 -4 167 -3 321 -2 986 -4 249 - - 218 1 205 -3 746 -3 350 -4 104 -3 840 -4 086
7 0 309 -1 990 -1 651 -0 858 0 045 -2 442 -0 735 1 287 0 696 -1 620 -0 853 -1 265 -2 463 3 158 -1 010 -1 366 -1 221 -1 206 -2 321 -1 888
8 -1 982 -3 297 -2 620 -3 364 -4 418 3 699 -3 109 -4 447 -3 195 -4 879 -3 686 -2 322 -3 970 -3 610 -3 132 -2 548 -3 557 -3 938 -3 660 -3 973
9 -2 759 -2 728 -3 818 -3 828 -2 304 -4 251 -3 613 3 -32 -1 540 -0 408 0 447 -3 395 -4 271 -3 502 -3 681 -3 302 -2 474 1 261 -3 271 -2 772
10 -3 199 -3 601 4 081 -0 980 -4 085 -3 653 -2 431 -4 009 -3 759 -4 341 -3 662 -0 818 -4 529 -3 320 -3 751 -3 218 -3 612 -3 800 -3 857 -3 631
11 -1 899 -4 815 1 200 3 072 -4 647 -3 178 -2 177 -3 839 1 599 -0 901 -2 898 -1 886 -3 017 -0 736 -2 292 -2 380 -2 441 -3 183 -4 622 -3 780
4C - PROSITE based searches
The conserved sequence elements identified with MEME can also be represented as PROSITE patterns using the conventions outlined in PROSITE: A dictionary of protein sites and patterns (http://www.expasy.ch/sprot/prosite.html) and Bairoch A., Bucher P., Hofmann K. The PROSITE datatase, its status in 1995. Nucleic Acids Res. 24:189- 196(1995). YihA family members are positively identified when exact matches to any one of the two prosite patterns pattern 1 or pattern 2 described in figure 3 are obtained.