AU2004231248A1 - Streptococcus pneumoniae Polynucleotides and Sequences - Google Patents

Streptococcus pneumoniae Polynucleotides and Sequences Download PDF

Info

Publication number
AU2004231248A1
AU2004231248A1 AU2004231248A AU2004231248A AU2004231248A1 AU 2004231248 A1 AU2004231248 A1 AU 2004231248A1 AU 2004231248 A AU2004231248 A AU 2004231248A AU 2004231248 A AU2004231248 A AU 2004231248A AU 2004231248 A1 AU2004231248 A1 AU 2004231248A1
Authority
AU
Australia
Prior art keywords
protein
gene
sequence
nucleic acid
dna
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
AU2004231248A
Other versions
AU2004231248B2 (en
Inventor
Steven C. Barash
Gil H. Choi
Patrick J. Dillon
Brian A. Dougherty
Michael Fannon
Charles A. Kunsch
Craig A. Rosen
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Human Genome Sciences Inc
Original Assignee
Human Genome Sciences Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from AU69090/98A external-priority patent/AU6909098A/en
Application filed by Human Genome Sciences Inc filed Critical Human Genome Sciences Inc
Priority to AU2004231248A priority Critical patent/AU2004231248B2/en
Publication of AU2004231248A1 publication Critical patent/AU2004231248A1/en
Application granted granted Critical
Publication of AU2004231248B2 publication Critical patent/AU2004231248B2/en
Priority to AU2008203821A priority patent/AU2008203821A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6888Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms
    • C12Q1/689Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms for bacteria
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/195Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from bacteria
    • C07K14/315Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from bacteria from Streptococcus (G), e.g. Enterococci
    • C07K14/3156Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from bacteria from Streptococcus (G), e.g. Enterococci from Streptococcus pneumoniae (Pneumococcus)

Landscapes

  • Chemical & Material Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Analytical Chemistry (AREA)
  • Biophysics (AREA)
  • Engineering & Computer Science (AREA)
  • Genetics & Genomics (AREA)
  • Zoology (AREA)
  • Molecular Biology (AREA)
  • Biochemistry (AREA)
  • Wood Science & Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Pulmonology (AREA)
  • Biotechnology (AREA)
  • Immunology (AREA)
  • Microbiology (AREA)
  • Gastroenterology & Hepatology (AREA)
  • Medicinal Chemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Peptides Or Proteins (AREA)
  • Medicines Containing Antibodies Or Antigens For Use As Internal Diagnostic Agents (AREA)

Description

P100/01 1 28/5/91 Regulation 3.2
AUSTRALIA
Patents Act 1990
ORIGINAL
COMPLETE SPECIFICATION STANDARD PATENT Name of Applicant: Actual Inventors Address for service is: Human Genome Sciences, Inc.
Charles A KUNSCH Gil H CHOI Patrick J DILLON Craig A ROSEN Steven C Barash Michael FANNON Brian A DOUGHERTY WRAY ASSOCIATES Level 4, The Quadrant 1 William Street Perth, WA 6000 Attorney code: WR Invention Title: "Streptococcus pneumoniae Polynucleotides and Sequences" The following statement is a full description of this invention, including the best method of performing it known to me:- S1/2 O Streptococcus pneumoniae Polynucleotides and Sequences 'FIELD OF THE INVENTION The present invention relates to the field of molecular biology. In 00 particular, it relates to, among other things, nucleotide sequences of Streprococcus Cpneumoniae, contigs, ORFs, fragments, probes, primers and related Cn polynucleotides thereof, peptides and polypeptides encoded by the sequences, and uses of the polynucleotides and sequences thereof, such as in fermentation.
O 10 polypeptide production, assays and pharmaceuticaJ development, among others.
BACKGROUND OF THE INVENTION Streptococcus pneumoniae has been one of the most extensively studied microorganisms since its first isolation in 1881. It was the object of many investigations that led to important scientific discoveries. In 1928, Griffith observed that when heat-killed encapsulated pneumococci and live strains constitutively lacking any capsule were concormtantly injected into mice, the nonencapsulated could be convened into encapsulated pneumococci with the same capsular type as the heat-killed strain. Years later. the nature of this "transforming principle," or carrier of genetic information, was shown to be DNA. (Avery. O.T., et al., J. Exp. Med.. 79:137-157 (1944)).
In spite of the vast number of publications on S. pneumoniae many questions about its virulence are still unanswered, and this pathogen remains a major causative agent of serious human disease, especially community-acquired pneumonia. (Johnston, et al, Rev. Infect. Dis. 13(Suppl. 6):S509-517 (1991)). In addition, in developing countries, the pneumococcus is responsible for the death of a large number of children under the age of 5 years from pneumococcal pneumonia. The incidence of pneumococcal disease is highest in infants under 2 years of age and in people over 60 years of age. Pneumococci are the second most frequent cause (after Haemophilus influenzae type b) of bacterial meningitis and otitis media in children. With the recent introduction of conjugate vaccines for H.
influenzae type b, pneumococcal meningitis is likely to become increasingly prominent. S. pnevmoniae is the most important etiologic agent of community- ^1- 0 0 O acquired pneumonia in adults and is the second most common cause of bacterial Z meningitis behind Neisseria meningitidis.
Mc- The antibiotic generally prescribed to treat S. pneumoniae is C benzylpenicillin, although resistance to this and to other antibiotics is found occasionally. Pneumococcal resistance to penicillin results from mutations in its 00 penicillin-binding proteins. In uncomplicated pneumococcal pneumonia caused by ,i a sensitive strain, treatment with penicillin is usually successful unless started too late. Erythromycin or clindamycin can be used to treat pneumonia in patients hypersensitive to penicillin, but resistant strains to these drugs exist. Broad spectrum antibiotics the tetracyclines) may also be effective, although 0 tetracycline-resistant strains are not rare. In spite of the availability of antibiotics, the mortality of pneumococcal bacteremia in the last four decades has remained stable between 25 and 29%. (Gillespie. et al., J. Med. MicrobioL 28:237- 248 (1989).
S. pneumoniae is carried in the upper respiratory tract by many healthy individuals. It has been suggested that attachment of pneumococci is mediated by a disaccharide receptor on fibronectin. present on human pharyngeal epithelial cells.
(Anderson. et al., Immunol. 142:2464-2468 (1989). The mechanisms by which pneumococci translocate from the nasopharynx to the lung, thereby causing pneumonia, or migrate to the blood, giving rise to bacteremia or septicemia, are poorly understood. (Johnston, et al., Rev. Infect. Dis. 13(Suppl. 6):S509- 517 (1991).
Various proteins have been suggested to be involved in the pathogenicity of S. pneumoniae, however, only a few of them have actually been confirmed as virulence factors. Pneumococci produce an IgAl protease that might interfere with host defense at mucosal surfaces. (Kornfield, er al., Rev. Inf. Dis. 3:521- 534 (1981). S. pneumoniae also produces neuraminidase, an enzyme that may facilitate attachment to epithelial cells by cleaving sialic acid from the host glycolipids and gangliosides. Partially purified neuraminidase was observed to induce meningitis-like symptoms in mice: however, the reliability of this finding has been questioned because the neuraminidase preparations used were probably contaminated with cell wall products. Other pneumococcal proteins besides neuraminidase are involved in the adhesion of pneumococci to epithelial and endothelial cells. These pneumococcal proteins have as yet not been identified.
Recently, Cundell et.. al., reported that peptide permeases can modulate
O
O
0 pneurococcal adherence to epithelial and endothelial cells. It was, however, unclear whether these permeases function directly as adhesions or whether they i enhance adherence by modulating the expression of pneumococcal adhesions.
(DeVelasco. et al., Micro. Rev. 59:591-603 (1995). A better understanding of the virulence factors determining its pathogenicity will need to be developed to cope with the devastating effects of pneumococcal disease in humans.
I [ronically, despite the prominent role of S. pneumoniae in the discovery of C DNA, little is known about the molecular genetics of the organism. The S.
pneumoniae genome consists of one circular, covalently closed, double-stranded DNA and a collection of so-called variable accessory elements, such as prophages, Splasinids, transposons and the like. Most physical characteristics and almost all of the genes of S. pneumoniae are unknown. Among the few that have been identified, most have not been physically mapped or characterized in detail. Only a few genes of this organism have been sequenced. (See, for instance current versions of GENBANK and other nucleic acid databases, and references that relate to the genome of S. pneumoniae such as those set out elsewhere herein.) It is clear that the etiology of diseases mediated or exacerbated by S.
pneumoniae, infection involves the programmed expression of S. pneumoniae genes, and that characterizing the genes and their patterns of expression would add dramatically to our understanding of the organism and its host interactions.
Knowledge of S. pneumoniae genes and genomic organization would improve our understanding of disease etiology and lead to improved and new ways of preventing, ameliorating, arresting and reversing diseases. Moreover, characterized genes and genomic fragments of S. pneumoniae would provide reagents for, among other things, detecting, characterizing and controlling S.
pneumoniae infections. There is a need to characterize the genome of S.
pneumoniae and for polynucleotides of this organism.
0 4 SUMMARY OF THE INVENTION 'l The present invention is based on the sequencing of fragments of the Streptococcuspneumoniae genome. The primary nucleotide sequences which were 00 generated are provided in SEQ ID NOS: 1-39 I.
eN" The present invention provides the nucleoide sequence of several hundred conuigs of the Streptococcus pneumoniae genome, which are listed in tables below enl and set out in the Sequence Lisdng submitted herewith, and representative "0 fragments thereof, in a form which can be readily used, analyzed, and interpreted oby a skilled artisan. In one embodiment, the present invention is provided as 'l contiguous strings of primary sequence information corresponding to the nucleotide sequences depicted in SEQ ID NOS:i-391.
The present invention further provides nucleotide sequences which are at least 95% identical to the nucleotide sequences of SEQ ID NOS: 1-39 1.
The nucleotide sequence of SEQ I1D NOS: 1-391. a representative fragment thereof, or a nucleotide sequence which is at least 95% identical to the nucleotide sequence of SEQ ID NOS: [-391 may be provided in a variety of mediums to facilitate its use. In one application of this embodiment, the sequences of the present invention are recorded on computer readable media. Such media includes, but is not limited to: magnetic storage media, such as floppy discs, hard disc storage medium, and magnetic tape; optical storage media such as CD-ROM, electrical storage media such as RAM and ROM: and hybrids of these categories such as magnetic/optical storage media.
The present invention further provides systems, particularly computer-.
based systems which contain the sequence information herein described stored in a data storage means. Such systems are designed to identify commercially important fragments of the Streptococcus pneumoniae genome.
Another embodiment of the present invention is directed to fragments of the Streptococcus pneumoniae genome having particular structural or functional attributes. Such fragments of the Streptococcus pneumoniae genome of the present invention include, but are not limited to, fragments which encode peptides, hereinafter referred to as open reading frames or ORFs, fragments which modulate the expression of an operably linked ORF, hereinafter referred to as expression modulating fragments or EMFs. and fragments which can be used to diagnose the
O
O
C>
Z presence of Streptococcus pneumoniae in a sample, hereinafter referred to as Cc diagnostic fragments or DFs.
N 'Each of the ORFs in fragments of the Streptococcus pneumoniae genome disclosed in Tables 1-3, and the EMFs found 5' to the ORFs. can be used in 00 5 numerous ways as polynucleotide reagents. For instance, the sequences can be C- used as diagnostic probes or amplification primers for detecting or determining the Spresence of a specific microbe in a sample, to selectively control gene expression in C< a host and in the production of polypeptides, such as polypeptides encoded by o ORFs of the present invention, particular those polypeptides that have a o 10 pharmacological activity.
l The present invention further includes recombinant constructs comprising one or more fragments of the Streptococcus pneumoniae genome of the present invention. The recombinant constructs of the present invention comprise vectors, such as a plasmid or viral vector, into which a fragment of the Streptococcus pneumoniae has been inserted.
The present invention further provides host cells containing any of the .isolated fragments of the Streprococcus pneumoniae genome of the present invention. The host cells can be a higher eukaryotic host cell. such as a mammalian cell, a lower eukaryotic cell, such as a yeast cell, or a procaryotic cell such as a bacterial cell.
The present invention is further directed to isolated polypeptides and proteins encoded by ORFs of the present invention. A variety of methods, well known to those of skill in the art, routinely may be utilized to obtain any of the polypeptides and proteins of the present invention. For instance, polypeptides and proteins of the present invention having relatively short, simple amino acid sequences readily can be synthesized using commercially available automated peptide synthesizers. Polypeptides and proteins of the present invention also may be purified from bacterial cells which naturally produce the protein. Yet another alternative is to purify polypeptide and proteins of the present invention from cells which have been altered to express them.
The invention further provides methods of obtaining homologs of the fragments of the Streptococcus pneumoniae genome of the present invention and homologs of the proteins encoded by the ORFs of the present invention.
Specifically, by using the nucleotide and amino acid sequences disclosed herein as 0 0 o a probe or as primers, and techniques such as PCR cloning and colony/plaque Z hybridization, one skilled in the art can obtain homologs.
eThe invention further provides antibodies which selectively bind polypeptides and proteins of the present invention. Such antibodies include both monoclonal and polyclonal antibodies.
00 The invention further provides hybridomas which produce the abovec-,i described antibodies. A hybridoma is an immortalized cell tine which is capable of esecreting a specific monoclonal antibody.
C",i The present invention further provides methods of identifying test samples o IO derived from cells which express one of the ORFs of the present invention, or a Ohomolog thereof. Such methods comprise incubating a test sample with one or more of the antibodies of the present invention, or one or more of the DFs of the present invention, under conditions which allow a skilled artisan to determine if the sample contains the ORF or product produced therefrom.
In another embodiment of the present invention, kits are provided which contain the necessary reagents to carry out the above-described assays.
Specifically, the invention provides a compartmentalized kit to receive, in close confinement, one or more containers which comprises: a first container comprising one of the antibodies, or one of the DFs of the present invention; and one or more other containers comprising one or more of the following: wash reagents, reagents capable of detecting presence of bound antibodies or hybridized DFs.
Using the isolaied proteins of the present invention, the present invention further provides methods of obtaining and identifying agents capable of binding to a polypeptide or protein encoded by one of the ORFs of the present invention.
Specifically, such agents include, as further described below, antibodies, peptides, carbohydrates, pharmaceutical agents and the like. Such methods comprise steps of: contacting an agent with an isolated protein encoded by one of the ORFs of the present invention; and determining whether the agent binds to said protein.
The present genomic sequences of Streptococcus pneumoniae will be of great value to all laboratories working with this organism and for a variety of commercial purposes. Many fragments of the Streptococcus pneunoniae genome will be immediately identified by similarity searches against GenBank or protein databases and will be of immediate value to Streptococcus pneumoniae researchen
S
7
O
0 Z and for immediate commercial value for the production of proteins or to control e-n gene expression.
C1 The methodology and technology for elucidating extensive genomic sequences of bacterial and other genomes has and will greatly enhance the ability to 00 5 analyze and understand chromosomal organization. In particular, sequenced 1 contigs and genomes will provide the models for developing tools for the analysis of chromosome structure and function, including the ability to identify genes within C large segments of genomic DNA, the structure, position, and spacing of regulatory Selements, the identification of genes with potential industrial applications, and the O to ability to do comparative genomic and molecular phylogeny.
DESCRIPTION OF THE FIGURES FIGURE 1 is a block diagram of a computer system (102) that can be used to implement computer-based systems of present invention.
FIGURE 2 is a schematic diagram depicting the data flow and computer programs used to collect, assemble, edit and annotate the contigs of the Streptococcuspneumoniae genome of the present invention. Both Macintosh and Unix platforms are used to handle the AB 373 and 377 sequence data files, largely as described in Kerlavage et al., Proceedings of the Twenty-Sixth Annual Hawaii International Conference on System Sciences. 585, IEEE Computer Society Press, Washington D.C. (1993). Factura (AB) is a Macintosh program designed for automatic vector sequence removal and end-trimming of sequence files. The program Loadis runs on a Macintosh platform and parses the feature data extracted from the sequence files by Factura to the Unix based Streptococcus pneumoniae relational database. Assembly of contigs (and whole genome sequences) is accomplished by retrieving a specific set of sequence files and their associated features using Extrseq, a Unix utility for retrieving sequences from an SQL database. The resulting sequence file is processed by seqfilter to trim portions of the sequences with more than 2% ambiguous nucleotides. The sequence files were assembled using TIGR Assembler, an assembly engine designed at The Institute for Genomic Research TIGR for rapid and accurate assembly of thousands of sequence fragments. The collection of contigs generated by the assembly step is loaded into the database with the lassie program. Identification of open reading 0 0 ci 0 frames (ORFs) is accomplished by processing contigs with zorf or GenMark. The Z ORFs are searched against S. pneumoniae sequences from GenBank and against all n protein sequences using the BLASTN and BLASTP programs, described in Aitschul el al., J. Mol. Biol. 215: 403-410 (1990)). Results of the ORF determination and similarity searching steps were loaded into the database. As 00 described below, some results of the determination and the searches are set out in Tables 1-3.
C'i DETAILED DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS 0 0 O The present invention is based on the sequencing of fragments of the Streptococcus pneumoniae genome and analysis of the sequences. The primary nucleoide sequences generated by sequencing the fragments are provided in SEQ 1D NOS: 1-391. (As used herein, the "primary sequence" refers to the nucleotide sequence represented by the [UPAC nomenclature system.) In addition to the aforementioned Streptococcus pneumoniae polynucleotide and polynucleotide sequences, the present invention provides the nucleotide sequences of SEQ ID NOS: 1-391, or representative fragments thereof, in a form which can be readily used. analyzed, and interpreted by a skilled artisan.
As used herein, a "representative fragment of the nucleotide sequence depicted in SEQ ID NOS :I-391" refers to any portion of the SEQ ID NOS: 1-391 which is not presently represented within a publicly available database. Preferred representative fragments of the present invention are Streptococcus pneumoniae open reading frames ORFs expression modulating fragment EbFs and fragments which can be used to diagnose the presence of Streptococcus pneumoniae in sample DFs A non-limiting identification of preferred representative fragments is provided in Tables 1-3. As discussed in detail below.
the information provided in SEQ ID NOS: 1-391 and in Tables 1-3 together with routine cloning, synthesis, sequencing and assay methods will enable those skilled in the art to clone and sequence all "representative fragments" of interest, including open reading frames encoding a large variety of Streptococcus pneumnioniae proteins.
While the presently disclosed sequences of SEQ ID NOS: 1-391 are highly accurate, sequencing techniques are not perfect and, in relatively rare instances, further investigation of a fragment or sequence of the invention may reveal a O9
O
0 Z nucleotide sequence error present in a nucleotide sequence disclosed in SEQ ID SNOS:1-391. However, once the present invention is made available once the information in SEQ ID NOS:1-391 and Tables 1-3 has been made available), resolving a rare sequencing error in SEQ ID NOS:1-391 will be well within the 00 5 skill of the art. The present disclosure makes available sufficient sequence Sinformation to allow any of the described contigs or portions thereof to be obtained readily by straightforward application of routine techniques. Further sequencing of C such polynucleotide may proceed in like manner using manual and automated sequencing methods which are employed ubiquitous in the art. Nucleotide sequence editing software is publicly available. For example, Applied Biosystem's C (AB) AutoAssembler can be used as an aid during visual inspection of nucleotide sequences. By employing such routine techniques potential errors readily may be identified and the correct sequence then may be ascertained by targeting further sequencing effort, also of a routine nature, to the region containing the potential error.
Even if all of the very rare sequencing errors in SEQ ID NOS:1-391 were corrected, the resulting nucleotide sequences would still be at least 95% identical, nearly all would be at least 99% identical, and the great majority would be at least 99.9% identical to the nucleotide sequences of SEQ ID NOS: 1-391.
As discussed elsewhere herein, polynucleotides of the present invention readily may be obtained by routine application of well known and standard procedures for cloning and sequencing DNA. Detailed methods for obtaining libraries and for sequencing are provided below, for instance. A wide variety of Streptococcus pneumoniae strains that can be used to prepare S. pneumoniae genomic DNA for cloning and for obtaining polynucleotides of the present invention are available to the public from recognized depository institutions, such as the American Type Culture Collection ATCC While the present invention is enabled by the sequences and other information herein disclosed, the S.
pneumoniae strain that provided the DNA of the present Sequence Listing, Strain 7/87 14.8.91, has been deposited in the ATCC, as a convenience to those of skill in the art. As a further convenience, a library of S. pneumoniae genomic DNA, derived from the same strain, also has been deposited in the ATCC. The S.
pneumoniae strain was deposited on October 10, i996, and was given Deposit No.
55840. and the cDNA library was deposited on October 11, 1996 and was given Deposit No. 97755. The genomic fragments in the library are 15 to 20 kb 0 0 o fragments generated by partial Sau3AI digestion and they are inserted into the Z; BarnMI site in the well-known lambda-derived vector lambda DASH II (Stratagene, nLa Jolla. CA). The provision of the deposits is not a waiver of any rights of the inventors or their assignees in the present subject matter.
The nucleotide sequences of the genomes from different strains of 00 Streptococcus pneumoniae differ somewhat. However, the nucleo'de sequences of the genomes of all Streptococcus pneunoniae strains will be at least identical, in corresponding part, to the nucleotide sequences provided in SEQ ID NOS:1-391. Nearly all will be at least 99% identical and the great majority will be 99-9% identical.
OThus, the present invention further provides nucleotide sequences which are at least 95%, preferably 99% and most preferably 99.9% identical to the nucleotide sequences of SEQ ID NOS: 1-39 1, in a form which can be readily used.
analyzed and interpreted by the skilled artisan.
Methods for determining whether a nucleotide sequence is at least 95%, at least 99% or at least 99.9% identical to the nucleotide sequences of SEQ ID NOS: 1-391 are routine and readily available to the skilled artisan. For example, the well known fasta algorithm described in Pearson and Lipman, Proc. Nat. Acad.
Sci USA 85: 2444 (1988) can be used to generate the percent identity of nucleotide sequences. The BLASTN program also can be used to generate an identity score of polynucleotides compared to one another.
COMPUTER RELATED EMBODIMENTS The nucleotide sequences provided in SEQ ID NOS: 1-391, a representative fragment thereof, or a nucleotide sequence at least 95%, preferably at least 99% and most preferably at least 99.9% identical to a polynucleotide sequence of SEQ ID NOS: 1-391 may be "provided" in a variety of mediums to facilitate use thereof.
As used herein, provided refers to a manufacture, other than an isolated nucleic acid molecule, which contains a nucleotide sequence of the present invention; i.e., a nucleotide sequence provided in SEQ ID NOS: 1-391. a representative fragment thereof, or a nucleotide sequence at least 95%, preferably at least 99% and most preferably at least 99.9% identical to a polynucleotide of SEQ ID NOS: 1-391.
Such a manufacture provides a large portion of the Streptococcus pneunoniae genome and parts thereof a Streptococcus pneumoniae open reading frame (ORF)) in a form which allows a skilled artisan to examine the manufacture using
II
O
O
>c 0 Smeans not directly applicable to examining the Streptcoccus pneumoniae genome Sor a subset thereof as it exists in nature or in purified form.
In one application of this embodiment, a nucteotide sequence of the present invention can be recorded on computer readable media. As used herein, "computer readable media" refers to any medium which can-be read and accessed directly by a computer. Such media include, but are not limited to: magnetic storage media, n such as floppy discs, hard disc storage medium. and magnetic tape; optical storage
C
media such as CD- ROM; electrical storage media such as RAM and ROM; and O hybrids of these categories, such as magnetic/optical storage media. A skilled o0 artisan can readily appreciate how any of the presently known computer readable mediums can be used to create a manufacture comprising computer readable medium having recorded thereon a nucleotide sequence of the present invention.
Likewise, it will be clear to those of skill how additional computer readable media that may be developed also can be used to create analogous manufactures having recorded thereon a nucleotide sequence of the present invention.
As used herein, "recorded" refers to a process for storing information on computer readable medium. A skilled artisan can readily adopt any of the presently know methods for recording information on computer readable medium to generate manufactures comprising the nucleotide sequence information of the present invention. A variety of data storage structures are available to a skilled artisan for creating a computer readable medium having recorded thereon a nucleotide sequence of the present invention. The choice of the data storage structure will generally be based on the means chosen to access the stored information. In addition, a variety of data processor programs and formats can be used to store the nucleotide sequence information of the present invention on computer readable medium. The sequence information can be represented in a word processing text file, formatted in commercially- available software such as WordPerfect and MicroSoft Word, or represented in the form of an ASCII file, stored in a database application, such as DB2, Sybase. Oracle, or the like. A skilled artisan can readily adapt any number of data-processor structuring formats text file or database) in order to obtain computer readable medium having recorded thereon the nucleotide sequence information of the present invention.
Computer software is publicly available which allows a skilled artisan to access sequence information provided in a computer readable medium. Thus, by providing in computer readable form the nucleotide sequences of SEQ ID NOS: 0 0 o 391, a representative fragment thereof, or a nucleodde sequence at least Z preferably at least 99% and most preferably at least 99.9% identical to a sequence nof SEQ ID NOS:1-391 the present invention enables the skilled artisan routinely to access the provided sequence information for a wide variety of purposes.
The examples which follow demonstrate how software which implements 00 the BLAST (Altschul et at, J. Mot Biol. 215:403-410 (1990)) and BLAZE Ci (Brutlag et at, Comp, Chem. 17:203-207 (1993)) search algorithms on a Sybase system was used to identify open reading frames (ORFs) within the Streptococcus pneumoniae genome which contain homology to ORFs or proteins from both o i0 Streptococcus pneumoniae and from other organisms. Among the ORFs discussed O herein are protein encoding fragments of the Streptococcus pneumoniae genome useful in producing commercially important proteins, such as enzymes used in fermentation reactions and in the production of commercially useful metabolites.
The present invention further provides systems, particularly computerbased systems, which contain the sequence information described herein. Such systems are designed to identify, among other things, commercially important fragments of the Streptococcus pneumoniae genome.
As used herein. "a computer-based system" refers to the hardware means, software means, and data storage means used to analyze the nucleotide sequence information of the present invention- The rninimum hardware means of the computer-based systems of the present invention comprises a central processing unit (CPU), input means, output means. and data storage means. A skilled artisan can readily appreciate that any one of the currently available computer-based systems are suitable for use in the present invention.
As stated above, the computer-based systems of the present invention comprise a data storage means having stored therein a nucleotide sequence of the present invention and the necessary hardware means and software means for supporting and implementing a search means.
As used herein, "data storage means" refers to memory which can store nucleotide sequence information of the present invention, or a memory access means which can access manufactures having recorded thereon the nucleotide sequence information of the present invention.
As used herein, "search means" refers to one or more programs which are implemented on the computer-based system to compare a target sequence or target structural motif with the sequence information stored within the data storage '3 means. Search means are used to identify fragments or regions of the present ci genornic sequences which match a particular target sequence or target motif. A variety of known algorithms are disclosed publicly and a vahriey of commercially 00 available software for conducting search means are and can be used in the computer-based systems of the present invention. Examples of such software includes, but is not limited to, MacPattern (EIMHL), BLASTN and BLASTX en (NCBIA). A skilled artisan can readily recognize that any one of the available algorithms or implementing software packages for conducting homology searches can be adapted for use in the present computer-based systemns.
Ci0t As used herein, a "target sequence' can be any DNA -or amnino acid sequence of six or more nucleotides or two or more amino acids. A skilled artisan can readily recognize that the longer a target sequence is. the less likely a target sequence will be present as a random occurrence in the database. The most preferred sequence length of a target sequence is from about 10 to 100 amino acids IS or from about 30 to 300 nucleotide residues- However, it is well recognized that searches for commercially important fragments. such as sequence fragments involved in gene expression and protein processing. may be of shorter length.
As used herein, 'a target structural motif," or "target motif." refers to any rationally selected sequence or combination of sequences in which the sequence(s) are chosen based on a ftee-dimensionalcngutif which is formed upon the folding of the target mnotif. There are a variety of target motifs known in the art.
Protein target motifs include, but are not lim-ited to, enzymic active sites 'and signal sequences. Nucleic acid target motifs include, but arm not limited to, promoter sequences, hairpin structures and inducible expression elements (protein binding sequences).
A variety of structural formats for the input and output means can be used to input and output the information in the computer-based systems of the present invention. A preferred format for an output means ranks fragments of the Streptococcus pnieumnoniae genornic sequences possessing varying degrees of homology to the target sequence or target motif. Such presentation provides a skilled artisan with a ranking of sequences which contain various amounts of the target sequence or target motif and identifies the degree of homology contained in the identified fragment.
A variety of comparing means can be used to compare a target sequence or target motif with the data storage means to identify sequence fragments of the o Streptococcus pnewnoniae genome. In the present examples, implementing Z software which implement the BLAST and BLAZE algorithms, described in Cf Altschul et all, J1. Mo. Bil. 21S: 403-430 (1990), is used to identify open reading frames within the Streptococcus pneumonioe genome. A skilled artisan can readily Srecognize that any one of the publicly available- homology search programs can be 00 used as the search means for the computer-based systems of the present invention.
Cl Of course, suitable proprietary systems that may be known to those of skill also may be employed in this regard.
Figure I provides a block diagram of a computer system illustrative of embodiments of this aspect of present invention. The computer system 102 O includes a processor 106 connected to a bus 104. Also connected to the bus 104 are a main memory 108 (preferably implemented as random access memory, RAM) and a variety of secondary storage devices 10, such as a hard drive 112 and a removable medium storage device 114- The removable medium storage device 114 may represent, for example, a floppy disk drive, a CD-ROM drive, a magnetic tape drive, etc. A removable storage medium 116 (such as a floppy disk, a compact disk, a-magnetic tape. enc.) containing control logic and/or data recorded therein may be inserted into the removable medium storage device 114. The computer system 102 includes appropriate software for reading the control logic and/or the data from the removable medium storage device 11[4, once it is inserted into the removable medium storage device 114.
A nucleotide sequence of the present invention may be stored in a well known manner in the main memory 108, any of the secondary storage devices 110.
and/or a removable storage medium 116. During execution, software for accessing and processing the genomic sequence (such as search tools, comparing tools, etc.) reside in main memory josh in accordance with the requirements and operating parameters of the operating system, the hardware system and the software program or programs.
0 0 ci 0 z mBIOCHEMICAL EMBODIMENTS Other embodiments of the present invention are directed to isolated 0 fragments of the Streptococcus pneumoniae genome. The fragments of the Streptococcus pneumoniae genome of the present invention include, but are not limited to fragments which encode peptides and polypeptides, hereinafter open C reading frames (ORFs). fragments which modulate the expression of an operably Slinked ORF, hereinafter expression modulating fragments (EMFs) and fragments o which can be used to diagnose the presence of Streptococcus pneumoniae in a sample, hereinafter diagnostic fragments (DFs).
As used herein, an "isolated nucleic acid molecule" or an "isolated fragment of the St:eptococcus pneumoniae genome" refers to a nucleic acid molecule possessing a specific nucleotide sequence which has bccn subjected to purification means to reduce, from the composition, the number of compounds which are normally associated with the composition. Particularly, the term refers to the nucleic acid molecules having the sequences set out in SEQ ID NOS: 1-391, to representative fragments thereof as described above, to polynucleotides at least preferably at least 99% and especially preferably at least 99.9% identical in sequence thereto, also as set out above.
A variety of purification means can be used to generate the isolated fragments of the present invention. These include, but are not limited to methods which separate constituents of a solution based on charge, solubility, or size.
In one embodiment. Streptococcus pneumoniae DNA can be enzymatically sheared to produce fragments of 15-20 kb in length. These fragments can then be used to generate a Streptococcus pneumoniae library by inserting them into lambda clones as described in the Examples below. Primers flanking, for example, an ORF such as those enumerated in Tables 1-3 can then be generated using nucleotide sequence information provided in SEQ ID NOS:1-391. Well known Sand routine techniques of PCR cloning then can be used to isolate the ORF from the lambda DNA library or Streptococcus pneumoniae genomic DNA. Thus, given the availability of SEQ ID NOS:1-391. the information in Tables 1, 2 and 3, and the information that may be obtained readily by analysis of the sequences of SEQ ID NOS: 1-391 using methods set out above, those of skill will be enabled by the present disclosure to isolate any ORF-containing or other nucleic acid fragment "6f the present invention.
0 0 c> 0 The isolated nucleic acid molecules of the present invention include, but are ^Z not limited to single stranded and double stranded DNA, and single stranded RNA.
SAs used herein, an "open reading frame," ORF, means a series of triplets coding for amino acids without any termination codons and is a sequence translatable into protein.
00 Tables 1, 2. and 3 list ORFs in the Streptococcus pneumoniae genormic Ciq contigs of the present invention that were identified as putative coding regions by Sthe GeneMark software using organism-specific second-order Markov probability C'l transition matrices, It will be appreciated that other criteria can be used, in o 10 accordance with well known analytical methods, such as those discussed herein, to 0 generate more inclusive, more restrictive, or more selective lists.
Cil Table 1 sets out ORFs in the Streptococcus pneumoniae contigs of the present invention that over a continuous region of at least 50 bases are 95% or more identical (by BLAST analysis) to a nuccleotide sequence available through GenBank in October. 1997.
Table 2 sets out ORFs in the Streptococcus pneumrnoniae contigs of the present invention that are not in Table 1 and match, with a BLASTP probability score of 0.01 or less, a polypeplide sequence available through GenBank in October, 1997.
Table 3 sets out ORFs in the Streptococcus pneumoniae contigs of the present invention that do not match significantly, by BLASTP analysis, a polypeptide sequence available through GenBank in October, 1997.
In each table, the first and second columns identify the ORF by, respectively, contig number and ORF number within the contig; the third column indicates the first nucleotide of the OR.F (actually the first nucleotide of the stop codon immediately preceeding the ORF), counting from the 5' end of the contig strand; and the fourth column, "stop indicates the last nucleotide of the stop codon defining the 3'end of the ORF.
In Tables 1 and 2, column five, lists the Reference for the closest matching sequence available through GenBank. These reference numbers are the databases entry numbers commonly used by those of skill in the art, who will be familiar with their denominators. Descriptions of the nomenclature are available from the National Center for Biotechnology Information. Column six in Tables I and 2 provides the gene name of the matching sequence; column seven provides the BLAST identity score and column eight the BLAST similarity score from the 0 0 >c O comparison of the ORF and the homologous gene; and column nine indicates the Slength in nucleotides of the highest scoring segment pair identified by the BLAST n identity analysis.
Each ORF described in the tables is defined by "start and "stop nucleotide position numbers. These position numbers refer to the
OO
00. boundaries of each ORF and provide orientation with respect to whether the Cl forward or reverse strand is the coding strand and which reading frame the coding Ssequence is contained. The "start" position is the first nucleotide of the triplet C encoding a stop codon just 5' to the ORF and the "stop" position is the last o 10 nucleotide of the triplet encoding the next in-frame stop codon the stop codon O at the 3' end of the ORF). Those of ordinary skill in the art appreciate that preferred fragments within each ORF described in the table include fragments of each ORF which include the entire sequence from the delineated "star" and "stop" positions excepting the first and last three nucleotides since these encode stop codons. Thus, polynucleotides set out as ORFs in the tables but lacking the three 5' nucleotides and the three 3' nucleotides are encompassed by the present invention. Those of skill also appreciate that particularly preferred are fragments within each ORF that are polynucleotide fragments comprising polypeptide coding sequence. As defined herein. "coding sequence" includes the fragment within an ORF beginning at the first in-frame ATG (triplet encoding methionine) and ending with the last nucleotide prior to the triplet encoding the 3' stop codon. Preferred are fragments comprising the entire coding sequence and fragments comprising the entire coding sequence, excepting the coding sequence for the N-terminal methionine. Those of skill appreciate that the N-tenrinal methionine is often removed during post-translational processing and that polynucleotides lacking the ATG can be used to facilitate production of N-termainal fusion proteins which may be benefical in the production or use of genetically engineered proteins. Of course, due to the degeneracy of the genetic code many polynucleotides can encode a given polypeptide. Thus, the invention further includes polynucleotides comprising a nucleotide sequence encoding a polypeptide sequence itself encoded by the coding sequence within an ORF described in Tables 1-3 herein. Further, polynucleotides at least 95%, preferably at least 99% and especially preferably at least 99.9% identical in sequence to the foregoing polynucleotides, are contemplated by the present invention.
0 0 o Polypeptides encoded by polynucleotides described above and elsewhere Z herein are also provided by the present invention as are polypeptide comprising a an amino acid sequence at least about 95%, preferably at least 97% and even more preferably 99% identical to the amino acid sequence of a polypeptide encoded by an 0 ORF shown in Tables 1-3. These polypeptides may or may not comprise an Nterrunal meihionine.
C, The concepts of percent identity and percent similarity of two polypeptide ensequences is well understood in the art- For example, two polypeptides 10 amino acids in length which differ at three amino acid positions at positions 1. 3 O 10 and 5) are said to have a percent identity of 70%. However, the same two polypeptides would be deemed to have a percent similarity of 80% if, for example at position 5, the amino acids moieties, although noi identical, were "similar" possessed similar biochemical characteristics). Many programs for analysis of nucleotide or amino acid sequence similarity, such as fasta and BLAST specifically list percent identity of a matching region as an output parameter. Thus, for instance, Tables 1 and 2 herein enumerate the percent identity of the highest scoring segment pair in each ORF and its listed relative, Further details concerning the algorithms and criteria used for homology searches are provided below and are described in the pertinent literature highlighted by the citations provided below.
It will be appreciated that other criteria can be used to generate more inclusive and more exclusive listings of the types set out in the tables. As those of skill will appreciate, narrow and broad searches both are useful. Thus, a skilled artisan can readily identify ORFs in contigs of the Streptococcus pneumoniae genome other than those listed in Tables 1-3, such as ORFs which are overlapping or encoded by the opposite strand of an identified ORF in addition to those ascertainable using the computer-based systems of the present invention.
As used herein, an "expression modulating fragment," EMF, means a series of nucleotide molecules which modulates the expression of an operably linked ORF or EMF.
0 0 ci 0 As used herein, a sequence is said to "modulate the expression of an Z operably linked sequence" when the expression of the sequence is altered by the c-I presence of the EMF. EMs include, but are not limited to, promoters, and promoter modulating sequences (inducible elements). One class of EMFs are 00 5 fragments which induce the expression or an operably linked ORF in response to a Sspecific regulatory factor or physiological event.
EMF sequences can be identified within the contigs of the Streptococcus Spneumoniae genome by their proximity to the ORFs provided in Tables 1-3. An intergenic segment, or a fragment of the intergenic segment, from about 10 to 200 O 10 nucleotides in length, taken from any one of the ORFs of Tables 1-3 will modulate C- the expression of an operably linked ORF in a fashion similar to that found with the naturally l'nked ORF sequence. As used herein, an "intergenic segment" refers to fragments of the Streptococcus pneumoniae genome which are between two ORF(s) herein described. EMFs also can be identified using known EMFs as a target sequence or target motif in the computer-based systems of the present invention. Further, the two methods can be combined and used together.
The presence and activity of an EMF can be confirmed using an EMF trap vector. An EMF trap vector contains a cloning site linked to a marker sequence. A marker sequence encodes an identifiable phenotype, such as antibiotic resistance or a complementing nutrition auxotrophic factor, which can be identified or assayed when the EMF trap vector is placed within an appropriate host under appropriate conditions. As described above, a EMF will modulate the expression of an operably linked marker sequence. A more detailed discussion of various marker sequences is provided below. A sequence which is suspected as being an EMF is cloned in all three reading frames in one or more restriction sites upstream from the marker sequence in the EMF trap vector. The vector is then transformed into an appropriate host using known procedures and the phenotype of the transformed host in examined under appropriate conditions. As described above, an EMF will modulate the expression of an operably linked marker sequence.
As used herein, a "diagnostic fragment," DF, means a series of nucleotide molecules which selectively hybridize to Streptococcus pneumoniae sequences.
DFs can be readily identified by identifying unique sequences within contigs of the Streptococcus pneumoniae genome, such as by using well-known computer Sanalysis software, and by generating and testing probes or amplification primers
O
O
0 consisting of the DF sequence in an appropriate diagnostic format which Z determines amplification or hybridization selectivity.
n The sequences falling within the scope of the present invention are not limited to the specific sequences herein described, but also include allelic and species variations thereof. Allelic and species variations can be routinely 00 0. determined by comparing the sequences provided in SEQ ID NOS:1-391, a C representative fragment thereof, or a nucleotide sequence at least 95%, preferrably en at least 99% and most at least preferably 99.9% identical to SEQ ID NOS:1-391.
with a sequence from another isolate of the same species. Furthermore, to o to accommodate codon variability, the invention includes nucleic acid molecules 0 coding for the same amino acid sequences as do the specific ORFs disclosed herein. In other words, in the coding region of an ORF, substitution of one codon for another which encodes the same amino acid is expressly contemplated. Any specific sequence disclosed herein can be readily screened for errors by resequencing a particular fragment, such as an ORF, in both directions sequence both strands). Alternatively, error screening can be performed by sequencing corresponding polynucleotides of Streptococcus pneumoniae origin isolated by using pan or all of the fragments in question as a probe or primer.
Preferred DFs of the present invention comprise at least about 17, preferrably at least about 20, and more preferrably at least about 50 contiguous nucleotides within an ORF set out in Tables 1-3. Most highly preferred DFs specifically hybridize to a polynucleotide containing the sequence of the ORF from which they are derived. Specific hybridization occurs even under stringent conditions defined elsewhere herein.
Each of the ORFs of the Streptococcus pneumoniae genome disclosed in Tables 1, 2 and 3, and the EMFs found 5' to the ORFs, can be used as polynucleotide reagents in numerous ways. For example, the sequences can be used as diagnostic probes or diagnostic amplification primers to detect the presence of a specific microbe in a sample, particularly Streptococcus pneumoniae.
Especially preferred in this regard are ORFs such as those of Table 3, which do not match previously characterized sequences from other organisms and thus are most likely to be highly selective for Streptococcus pneumoniae. Also particularly preferred are ORFs that can be used to distinguish between strains of Streptococcus pneumoniae, particularly those that distinguish medically important strain, such as drug-resistant strains.
O
O
o In addition, the fragments of the present invention, as broadly described, can be used to control gene expression through triple helix formation or antisense DNA or RNA, both of which methods are based on the binding of a polynucleotide sequence to DNA or RNA. Triple helix-formation optimally results in a shut-off of RNA transcription from DNA, while antsense RNA hybridization blocks translation of an mRNA molecule into polypeptide. Information from the sequences of the present invention can be used to design antisense and triple helixe forming oligonucleotides. Polynucleotides suitable for use in these methods are usually 20 to 40 bases in length and are designed to be complementary to a region O 10 of the gene involved in transcription, for triple-helix formation, or to the m.RNA itself, for antisense inhibition. Both techniques have been demonstrated to be effective in model systems, and the requisite techniques are well known and involve routine procedures. Triple helix techniques are discussed in, for example.
Lee et at., NucL Acids Res. 6:3073 (1979); Cooney er al., Science 241:456 (1988): and Dervan et al.. Science 251:1360 (1991). Antisense techruques in general are discussed in, for instance, Okano. J. Neurochem. 56:560 (1991) and Oligodeoxynucleorides as Antisense Inhibitors of Gene Expression, CRC Press.
Boca Raton, FL (1988)).
The present invention further provides recombinant constructs comprising one or more fragments of the Streptococcus pneumoniae genomic fragments and contigs of the present invention. Certain preferred recombinant constructs of the present invention comprise a vector, such as a plasmid or viral vector, into which a fragment of the Streptococcus pneumoniae genome has been inserted, in a forward or reverse orientation. In the case of a vector comprising one of the ORFs of the present invention, the vector may further comprise regulatory sequences, including for example, a promoter, operably linked to the ORF. For vectors comprising the EMFs of the present invention, the vector may further comprise a marker sequence or heterologous ORF operably linked to the EMF.
Large numbers of suitable vectors and promoters are known to those of skill in the art and are commercially available for generating the recombinant constructs of the present invention. The following vectors are provided by way of example. Useful bacterial vectors include phagescript, PsiX174, pBluescript S K, pBS KS, pNH8a. pNH16a, pNH8a, pNH46a (available from Stratagene): pTrc99A, pKK223-3. pKK233-3, pDR540, pRITS (available from Pharmacia).
Useful eukaryotic vectors include pWLneo, pSV2cat, pOG44, pXTI, pSG
O
O
O (available from Stratagene) pSVK3, pBPV, pMSG, pSVL (available from Z Pharmacia).
SPromoter regions can be selected from any desired gene using CAT (chloramphenicol transferase) vectors or other vectors with selectable markers.
Two appropriate vectors are pKK232-8 and pCM7. Particular named bacterial 00 promoters include lac, lacZ, T3. T7, gpt. lambda PR, and trc. Eukaryotic C promoters include CMV immediate early, HSV thymidine kinase, early and late LTRs from retrovirus, and mouse metallothionein- I. Selection of the appropriate vector and promoter is well within the level of ordinary skill in the an.
O 10 The present invention further provides host cells containing any one of the O isolated fragments of the Streptococcus pneumoniae genomic fragments and contigs of the present invention, wherein the fragment has been introduced into the host cell using known methods. The host cell can be a higher eukaryotic host cell, such as a mammalian cell, a lower eukaryotic host cell, such as a yeast cell, or a procaryotic cell, such as a bacterial cell.
A polynucleotide of the present invention, such as a recombinant construct comprising an ORF of the present invention, may be introduced into the host by a variety of well established techniques that are standard in the art, such as calcium phosphate transfection. DEAE, dextran mediated transfection and electroporation.
which are described in, for instance, Davis, L. et at. BASIC METHODS IN MOLECULAR BIOLOGY (1986).
A host cell containing one of the fragments of the Streptococcus pneumoniae genomic fragments and contigs of the present invention, can be used in conventional manners to produce the gene product encoded by the isolated fragment (in the case of an ORF) or can be used to produce a heterologous protein under the control of the EMF. The present invention further provides isolated polypeptides encoded by the nucleic acid fragments of the present invention or by degenerate variants of the nucleic acid fragments of the present invention. By "degenerate variant" is intended nucleotide fragments which differ from a nucleic acid fragment of the present invention an ORF) by nucleotde sequence but, due to the degeneracy of the Genetic Code, encode an identical polypeptide sequence.
Preferred nucleic acid fragments of the present invention are the ORFs and subfragments thereof depicted in Tables 2 and 3 which encode proteins.
0 0 0 A variety of methodologies known in the anrt can be utilized to obtain any Z one of the isolated polypeptides or proteins of the present invention. At the Ssimplest level, the amino acid sequence can be synthesized using commercially available peptide synthesizers. This is particularly useful in producing small peptides and fragments of larger polypeptides -Such short fragments as may be 00 obtained most readily by synthesis are useful, for example, in generating antibodies Cl against the native polypeptide, as discussed further below.
In an alternative method, the polypeptide or protein is purified from Cl bacterial cells which naturally produce the polypeptide or protein. One skilled in o 10 the art can readily employ well-known methods for isolating polypeptides and 0 proteins to isolate and purify polypeptides or proteins of the present invention produced naturally by a bacterial strain, or by other methods. Methods for isolation and purification that can be employed in this regard include, but are not limited to, immunochromatography. HPLC. size-exclusion chromatography, ionexchange chromatography, and immuno-affinity chromatography.
The polypeptides and proteins of the present invention also can be purified from cells which have been altered to express the desired polypeptide or protein.
As used herein, a cell is said to be altered to express a desired polypeptide or protein when the cell, through genetic manipulation, is made to produce a polypeptide or protein which it normally does not produce or which the cell normally produces at a lower level. Those skilled in the anrt can readily adapt procedures for introducing and expressing either recombinant or synthetic sequences into eukaryotic or prokaryotic cells in order to generate a cell which produces one of the polypeptides or proteins of the present invention.
Any host/vector system can be used to express one or more of the ORFs of the present invention. These include, but are not limited to, eukaryotic hosts such as HeLa cells, CV-1 cell, COS cells, and Sf9 cells, as well as prokaryotic host such as E. coli and B subtilis. The most preferred cells are those which do not normally express the particular polypeptide or protein or which expresses the polypeptide or protein at low natural level.
O
O
O "Recombinant," as used herein, means that a polypeptide or protein is Z derived from recombinant microbial or mammalian) expression systems.
c "Microbial" refers to recombinant polypeptides or proteins made in bacterial or fungal yeast) expression systems. As a product. "recombinant microbial"defines a polypeptide or protein essentially free of native endogenous 00 substances and unaccompanied by associated native glycosylation. Polypeptides or C<I proteins expressed in most bacterial cultures. E. coll. will be free of Sglycosylation modifications; polypeptides or proteins expressed in yeast will have a C glycosylation pattern different from that expressed in mammalian cells.
o 10 "Nucleotide sequence" refers to a heteropolymer of deoxyribonucleotides.
O Generally, DNA segments encoding the polypeptides and proteins provided by this invention are assembled from fragments of the Streptococcus pneumonine genome and short oligonucleotide linkers, or from a series of oligonucleotides, to provide a synthetic gene which is capable of being expressed in a recombinant transcriptional unit comprising regulatory elements derived from a rucrobial or viral operon.
Recombinant expression vehicle or vector" refers to a plasmid or phage or virus or vector, for expressing a polypeptide from a DNA (RNA) sequence. The expression vehicle can comprise a transcriptional unit comprising an assembly of a genetic regulatory elements necessary for gene expression in the host, including elements required to initiate and maintain transcription at a level sufficient for suitable expression of the desired polypeptide, including, for example, promoters and, where necessary, an enhancer and a polyadenylation signal; a structural or coding sequence which is transcribed into mRNA and translated into protein, and appropriate signals to initiate translation at the beginning of the desired coding region and terminate translation at its end. Structural units intended for use in yeast or eukaryotic expression systems preferably include a leader sequence enabling extracellular secretion of translated protein by a host cell.
Alternatively, where recombinant protein is expressed without a leader or transport sequence, it may include an N-terminal methionine residue. This residue may or may not be subsequently cleaved from the expressed recombinant protein to provide a final product.
"Recombinant expression system" means host cells which have stably integrated a recombinant transcriptional unit into chromosomal DNA or carry the recombinant transcriptional unit extra chromosomally. The cells can be prokaryotic or eukaryotic. Recombinant expression systems as defined herein will express 0 0 0 heterologous polypeptides or proteins upon induction of the regulatory elements linked to the DNA segment or synthetic gene to be expressed.
Mature proteins can be expressed in mammalian cells, yeast, bacteria, or other cells under the control of appropriate promoters. Cell-free translation systems can also be employed to produce such proteins using RNAs derived from the DNA constructs of the present invention. Appropriate cloning and expression Cvectors for use with prokaryotic and eukaryotic hosts are described in Sambrook er eat., Molecular Cloning: A Laboratory Manual, 2 nd Edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York (1989), the disclosure of which O to is hereby incorporated by reference in its entirety.
C, Generally, recombinant expression vectors will include origins of replication and selectable markers permitting transformation of the host cell, e.g., the ampicillin resistance gene of E. coli and S, cerevisiae TRP 1 gene, and a promoter derived from a highly expressed gene to direct transcription of a downstream structural sequence. Such promoters can be derived from operons encoding glycolytic enzymes such as 3- phosphoglycerate kinase (PGK), alphafactor, acid phosphatase, or heat shock proteins, among others. The heterologous structural sequence is assembled in appropriate phase with translation initiation and termination sequences, and preferably, a leader sequence capable of directing secretion of translated protein into the periplasmic space or extracellular medium.
Optionally, the heterologous sequence can encode a fusion protein including an N terminal identification peptide imparting desired characteristics, stabilization or simplified purification of expressed recombinant product.
Useful expression vectors for bacterial use are constructed by inserting a structural DNA sequence encoding a desired protein together with suitable translation initiation and termination signals in operable reading phase with a functional promoter. The vector will comprise one or more phenotypic selectable markers and an origin of replication to ensure maintenance of the vector and, when desirable, provide amplification within the host.
Suitable prokaryotic hosts for transformation include strains of E. colt, B.
subtilis, Salmonella typhinurium and various species within the genera Pseudomonas and Streptomyces. Others may, also be employed as a matter of choice.
As a representative but non-limiting example, useful expression vectors for bacterial use can comprise a selectable marker and bacterial origin of replication o derived from commercially available plasmids comprising genetic elements of the Z well known cloning vector pBR322 (ATCC 37017). Such commercial vectors M include, for example. pKK223-3 (available form Pharmacia Fine Chemicals, Uppsala, Sweden) and GEM I (available from Promega Biotec, Madison, WI, USA). These pBR322 "backbone" sections-aie combined with an appropriate 00 promoter and the structural sequence to be expressed.
c",I Following transformation of a suitable host strain and growth of the host strain to an appropriate cell density, the selected promoter, where it is inducible, is ci derepressed or induced by appropriate means temperature shift or chemical induction) and cells are cultured for an additional period to provide for expression Oof the induced gene product. Thereafter cells are typically harvested, generally by centrifugation, disrupted to release expressed protein, generally by physical or chemical means, and the resulting crude extract is retained for further purification.
Various mammalian cell culture systems can also be employed to express recombinant protein. Examples of mammalian expression systems include the COS-7 lines of monkey kidney fibroblasts, described in Gluzman, Cell 23:175 (198 and other cell lines capable of expressing a compatible vector, for example, the C127, 3T3, CHO, HeLa and BHK cell lines.
Mammalian expression vectors will comprise an origin of replication, a suitable promoter and enhancer, and also any necessary ribosome binding sites.
polyadenylation site. splice donor and acceptor sites, transcriptional termination sequences, and 5' flanking nontranscribed sequences. DNA sequences derived from the SV40 viral genome, for example, SV40 origin, early promoter, enhancer, splice, and polyadenylation sites may be used to provide the required nontranscribed genetic elements.
Recombinant polypeptides and proteins produced in bacterial culture is usually isolated by initial extraction from cell pellets, followed by one or more salting-out, aqueous ion exchange or size exclusion chromatography steps.
Microbial cells employed in expression of proteins can be disrupted by any convenient method, including freeze-thaw cycling, sonication, mechanical disruption, or use of cell lysing agents. Protein refolding steps can be used, as necessary, in completing configuration of the mature protein. Finally, high performance liquid chromatography (HPLC) can be employed for final purification steps.
0 0 o The present invention further includes isolated polypeptides, proteins and nucleic acid molecules which are substantially equivalent to those herein described.
CAs used herein, substantially equivalent can refer both to nucleic acid and amino acid sequences, for example a mutant sequence, that varies from a reference sequence by one or more substitutions, deletions, or additions, the net effect of 00 which does not result in an adverse functional dissimilarity between reference and csubject sequences. For purposes of the present invention, sequences having equivalent biological activity, and equivalent expression characteristics are Cconsidered substantially equivalent. For purposes of determining equivalence.
o ]O truncation of the mature sequence should be disregarded.
The invention further provides methods of obtaining homologs from other strains of Streptococcus pnewmoniae, of the fragments of the Streptococcuv pneumoniae genome of the present invention and homologs of the proteins encoded by the ORFs of the present invention. As used herein, a sequence or protein of Streptococcus pneumoniae is defined as a homolog of a fragment of the Streptococcus pneumoniae fragments or contigs or a protein encoded by one of the ORFs of the present invention, if it shares significant homology to one of the fragments of the Streprococcus pneumoniae genome of the present invention or a protein encoded by one of the ORFs of the present invention. Specifically, by using the sequence disclosed herein as a probe or as primers, and techniques such as PCR cloning and colony/plaque hybridization, one skilled in the art can obtain homologs.
As used herein, two nucleic acid molecules or proteins are said to "sham significant homology" if the two contain regions which possess greater than sequence (amino acid or nucleic acid) homology. Preferred homologs in this regard are those with more than 90% homology. Especially preferred are those with 93% or more homology. Among especially preferred homologs those with or more homology are particularly preferred. Very particularly preferred among these are those with 97% and even more particularly preferred among those are homologs with 99% or more homology. The most preferred homologs among these are those with 99.9% homology or more. It will be understood that, among measures of homology, identity is partic ularly preferred in this regard.
Region specific primers or probes derived from the nucleotide sequence provided in SEQ ID NOS: 1-391 or from a nucleotide sequence at least particularly at least 99%, especially at least 99.5% identical to a sequence of SEQ 0 0 o ID NOS: 1-391 can be used to prime DNA synthesis and PCR amplification, as Z well as to identify colonies containing cloned DNA encoding a hornolog. Methods suitable to this aspect of the present invention are well known and have been described in great detail in many publications such as, for example, Innis et at, PCR ProtocoLs. Academic Press, San Diego, CA (1990)).
00 When using primers derived from SEQ ID NOS: 1-391 or from a nucleotide sequence having an aforementioned identity to a sequence of SEQ ID NOS: 1-39 1, one skilled in the art will recognize that by employing high stringency conditions annealing at 50-60°C in 6X SSPC and 50% formarnide, and washing at o t0 65°C in 0.5X SSPC) only sequences which are greater than 75% homologous to ,O the primer will be amplified. By employing lower stringency conditions hybridizing at 35-37 0 C in 5X SSPC and 4045% formamide, and washing at 42 0
C
in 0.5X SSPC), sequences which are greater than 40-50% homologous to the primer wilt also be amplified.
When using DNA probes derived from SEQ ID NOS: 1-391, or from a nucleotide sequence having an aforementioned identity to a sequence of SEQ D NOS: 1-391, for colony/plaque hybridization, one skilled in the art will recognize that by employing high stringency conditions hybridizing at 50- 65 0 C in SSPC and 50% formamide, and washing at 50- 65C in 0.5X SSPC), sequences having regions which are greater than 90% homologous to the probe can be obtained, and that by employing lower stringency conditions hybridizing at 35-37 0 C in 5X SSPC and 40-45% formamide, and washing at 42 0 C in 0.SX SSPC), sequences having regions which are greater than 35-45% homologous to the probe will be obtained.
Any organism can be used as the source for homologs of the present invention so long as the organism naturally expresses such a protein or contains genes encoding the same. The most preferred organism for isolating homologs are bacteria which are closely related to Streptococcus pneumoniae.
ILLUSTRATIVE USES OF COMPOSITIONS OF THE
INVENTION
Each ORF provided in Tables I and 2 is identified with a function by homology to a known gene or polypeptide. As a result, one skilled in the an can use the polypeptides of the present invention for commercial, therapeutic and industrial purposes consistent with the type of putative identification of the 0 0 c> o polypeptide. Such identifications permit one skilled in the art to use the Streptococcus pnewumoniae ORFs in a manner similar to the known type of sequences for which the identification is made; for example, to ferment a particular sugar source or to produce a particular metabolite. A variety of reviews illustrative of this aspect of the invention are available, including the following reviews on the industrial use of enzymes, for example, BIOCHEMICAL ENGINEERING AND BIOTECHNOLOGY HANDBOOK. 2nd Ed., MacMillan Publications. Ltd. NY S(1991) and BIOCATALYSTS IN ORGANIC SYNTHESES, Tramper et at., Eds., Elsevier Science Publishers, Amsterdam, The Netherlands (1985). A variety of O 10 exemplary uses that illustrate this and similar aspects of the present invention are 0, discussed below.
1. Biosynthetic Enzymes Open reading frames encoding proteins involved in mediating the catalytic reactions involved in intermediary and macromolecular metabolism, the biosynthesis of small molecules, cellular processes and other functions includes enzymes involved in the degradation of the intermediary products of metabolism, enzymes involved in central intermediary metabolism, enzymes involved in respiration, both aerobic and anaerobic, enzymes involved in fermentation, enzymes involved in ATP proton motor;force conversion, enzymes involved in broad regulatory function, enzymes involved in amino acid synthesis, enzymes involved in nucleotide synthesis, enzymes involved in cofactor and vitamin synthesis, can be used for industrial biosynthesis.
The various metabolic pathways present in Streptococcus pneumoniae can be identified based on absolute nutritional requirements as well as by examining the various enzymes identified in Table 1-3 and SEQ ID NOS:1-391.
Of particular interest are polypeptides involved in the degradation of intermediary metabolites as well as non-macromolecular metabolism. Such enzymes include amylases, glucose oxidases, and catalase.
Proteolytic enzymes are another class of commercially important enzymes.
Proteolytic enzymes find use in a number of industrial processes including the processing of flax and other vegetable fibers, in the extraction, clarification and depectinization of ftruit juices, in the extraction of vegetables' oil and in the maceration of fruits and vegetables to give unicellular fruits. A detailed reviewobf the proteolytic enzymes.used in the food industry is provided in Rombouts et at., 0 0 0 Symbiosis 21:79 (1986) and Voragen et at. in Biocaralysts In Agricultural Z Biotechnology, Whitaker et aL, Eds., American Chemical Society Symposium M Series 389:93 (1989).
The metabolism of sugars is an important aspect of the primary metabolism of Streptococcus pneumoniae. Enzymes involved in the degradation of sugars, 00 such as, particularly, glucose, galactose, fructose and xylose, can be used in C industrial fermentation. Some of the important sugar transforming enzymes, from a commercial viewpoint, include sugar isomerases such as glucose isomerase.
C Other metabolic enzymes have found commercial use such as glucose oxidases o 10 which produces ketogulonic acid (KGA). KGA is an intermediate in the O commercial production of ascorbic acid using the Reichstein's procedure, as described in Krueger et al., Biotechnology 6(A Rhine et al., Eds., Verlag Press, Weinheim, Germany (1984).
Glucose oxidase (GOD) is commercially available and has been used in purified form as well as in an immobilized form for the deoxygenation of beer.
See, for instance, Hartmeir et al., Biotechnology Letters 1:21 (1979). The most important application of GOD is the industrial scale fermentation of gluconic acid.
Market for gluconic acids which are used in the detergent, textile, leather, photographic, pharmaceutical, food, feed and concrete industry, as described, for example, in Bigelis et al.. beginning on page 357 in GENE MANIPULATIONS AND FUNGI; BenetL et al., Eds., Academic Press, New York (1985). In addition to industrial applications, GOD has found applications in medicine for quantitative determination of glucose in body fluids recently in biotechnology for analyzing syrups from starch and cellulose hydrosylates. This application is described in Owusu et al., Biochem. et Biophysica. Acra. 872:83 (1986), for instance.
The main sweetener used in the world today is sugar which comes from sugar beets and sugar cane. In the field of industrial enzymes, the glucose isomerase process shows the largest expansion in the market today. Initially, soluble enzymes were used and later immobilized enzymes were developed (Krueger et al., Biotechnology, The Textbook of Industrial Microbiology, Sinauer Associated Incorporated, Sunderland. Massachusetts (1990)). Today, the use of glucose- produced high fructose syrups is by far the largest industrial business using immobilized enzymes. A review of the industrial use of these enzymes is provided by Jorgensen, Starch 40:307 (1988).
O
O
0 Proteinases, such as alkaline serine proteinases, are used as detergent Z additives and thus represent one of the largest volumes of microbial enzymes used C in the industrial sector. Because of their industrial importance, there is a large body of published and unpublished information regarding the use of these enzymes in industrial processes. (See Faultman et at, Acid Proteases Structure Function and 0. Biology, Tang, ed., Plenum Press, New York (1977) and Godfrey e al., NC Industrial Enzymes, MacMillan Publishers, Surrey, UK (1983) and Hepner er al., Report Industrial Enzymes by 1990, Hel Hepner Associates, London (1986)).
r Another class of commercially usable proteins of the present invention are o 10 the microbial lipases, described by, for instance. Macrae et aL, Philosophical STransactions of the Chiral Society of London 310:227 (1985) and Poserke. Journal of the Amtrican Oil Chemist Society 61. 1758 (1984). A major use of lipases is in the fat and oil industry for the production of neutral glycerides using lipase catalyzed inter-esterification of readily available triglycerides. Application of lipases include the use as a detergent additive to facilitate the removal of fats from fabrics in the course of the washing procedures.
The use of enzymes, and in particular microbial enzymes, as catalyst for key steps in the synthesis of complex organic molecules is gaining popularity at a great rate. One area of great interest is the preparation of chiral intermediates.
Preparation of chiral intermediates is of interest to a wide range of synthetic chemists particularly those scientists involved with the preparation of new pharmaceuticals, agrochemicals, fragrances and flavors. (See Davies et al., Recent Advances in the Generation of Chiral Intermediates Using Enzymes, CRC Press, Boca Raton. Florida (1990)). The following reactions catalyzed by enzymes are of interest to organic chemists: hydrolysis of carboxylic acid esters, phosphate esters, amides and nitriles, esterification reactions, trans-esterification reactions, synthesis of amides, reduction of alkanones and oxoalkanates, oxidation of alcohols to carbonyl compounds, oxidation of sulfides to sulfoxides, and carbon bond forming reactions such as the aldol reaction.
When considering the use of an enzyme encoded by one of the ORFs of the present invention for biotransformation and organic synthesis it is sometimes necessary to consider the respective advantages and disadvantages of using a microorganism as opposed to an isolated enzyme. Pros and cons of using a whole cell system on the one hand or an isolated partially purified enzyme on the other 0 0 ci o hand, has been described in detail by Bud et al.. Chemistry in Britain (1987), p.
Z 127.
Amino transferases. enzymes involved in the biosynthesis and metabolism of amnino acids, are useful in the catalytic producution of amino acids. The advantages of using microbial based enzyme systems is that the amino transferase 00 enzymes catalyze the stereo- selective synthesis of only L-amino acids and c- generally possess uniformly high catalytic rates. A description of the use of anmino transferases for amino acid production is provided by Roselle-David, Methods of C-i Enzymology 136:479 (1987).
1 0 Another category of useful proteins encoded by the ORFs of the present O invention include enzymes involved in nucleic acid synthesis, repair, and recombination.
2. Generation of Antibodies As described here, the proteins of the present invention, as well as homologs thereof, can be used in a variety of procedures and methods known in the art which are currently applied to other proteins. The proteins of the present invention can further be used to generate an antibody which selectively binds the protein. Such antibodies can be either monoclonal or polyclonal antibodies, as well fragments of these antibodies, and humanized forms.
The invention further provides antibodies which selectively bind to one of the proteins of the present invention and hybridomas which produce these antibodies. A hybridoma is an immortalized cell line which is capable of secreting a specific monoclonal antibody.
In general, techniques for preparing polyclonal and monoclonal antibodies as well as hybridomas capable of producing the desired antibody are well known in the art (Campbell. A. Monoclonal Antibody Technology: Laboratory Techniques In Biochemistry And Molecular Biology, Elsevier Science Publishers, Amsterdam, The Netherlands (1984); St. Groth et al., J. Immunol. Methods 35: 1- 21 (1980), Kohler and Milstein, Nature 256:495-497 (1975)), the trionma technique, the human B-cell hybridoma technique (Kozbor et al., Immunology Today 4:72 (1983), pgs. 77-96 of Cole et al., in Monoclonat Antibodies And Cancer Therapy, Alan R. Liss, Inc. (1985)). Any animal (mouse, rabbit, etc.).which is known to produce antibodies can be immunized with the pseudogene polypeptide. Methods for immunization are well known in the art. Such methods
O
O
0 include subcutaneous or interperitoneal injection of the polypeptide. One skilled in Z -the art will recognize that the amount of the protein encoded by the ORF of the C l present invention used for immunization will vary based on the animal which is immunized, the antigenicity of the peptide and the site of injection.
The protein which is used as an immunogen may be modified or 0 administered in an adjuvant in order to increase the protein's antigenicity. Methods l of increasing the antigenicity of a protein are well known in the art and include, but Sare not limited to coupling the antigen with a heterologous protein (such as globulin C, or galactosidase) or through the inclusion of an adjuvant during immunization.
o 10 For monoclonal antibodies, spleen cells from the immunized animals are S removed, fused with myeloma cells, such as SP2/0-Agl4 myeloma cells, and allowed to become monoclonal antibody producing hybridoma cells.
Any one of a number of methods well known in the art can be used to identify the hybridoma cell which produces an antibody with the desired characteristics. These include screening the hybridomas with an ELISA assay, western blot analysis, or radioimmunoassay (Lutz et al., Exp. Cell Res. 175:109- 124 (1988)).
Hybridomas secreting the desired antibodies are cloned and the class and subclass is determined using procedures known in the art (Campbell, A. M., Monoclonal Antibody Technology: Laboratory Techniques in Biochemistry and Molecular Biology, Elsevier Science Publishers, Amsterdam. The Netherlands (1984)).
Techniques described for the production of single chain antibodies S.
Patent 4,946,778) can be adapted to produce single chain antibodies to proteins of the present invention.
For polyclonal antibodies, antibody containing antisera is isolated from the immunized animal and is screened for the presence of antibodies with the desired specificity using one of the above-described procedures.
The present invention further provides the above- described antibodies in detectably labelled form. Antibodies can be detectably labelled through the use of radioisotopes, affinity labels (such as biotin, avidin, etc.), enzymatic labels (such as horseradish peroxidase, alkaline phosphatase, etc.) fluorescent labels (such as FITC or rhodamine, etc.), paramagnetic atoms, etc. Procedures for accomplishing such labeling are well-known in the art, for example see Stemberger et at. J.
Histochem. Cytochem. 18:315 (1970); Bayer, E. A. et at., Meth. Enzym. 62:308 "II--I I I III 0 0 o (1979); Engval, E. e at., Immunol. 109:129 (1972); Goding, 1. 3. Immunot.
Z Meth. 13:215 (1976)).
CThe labeled antibodies of the present invention can be used for in vitro. in viva, and in situ assays to identify cells or tissues in which a fragment of the Streptococcus pneumoniae genome is expressed.
The present invention further provides the above-described antibodies C,1 immobilized on a solid support. Examples of such solid supports include plastics such as polycarbonate. complex carbohydraies such as agarose and sepharose, acrylic resins and such as polyacrylamide and latex beads. Techniques for o LO coupling antibodies to such solid supports are well known in the art (Weir, D. M.
0 er al., "Handbook of Experimental Immunology" 4th Ed., Blackwell Scientific Publications, Oxford. England, Chapter 10 (1986); Jacoby, W. D. et al., Meth.
Enzym. 34 Academic Press. N. Y. (1974)). The immobilized antibodies of the present invention can be used for in vitro, in viva, and in situ assays as well as for imimunoaffinity purification of the proteins of the present invention.
3. Diagnostic Assays and Kits The present invention further provides methods to identify the expression of one of the ORFs of the present invention, or homolog thereof, in a test sample, using one of the DFs or antibodies of the present invention.
In detail, such methods comprise incubating a test sample with one or more of the antibodies or one or more of the DFs of the present invention and assaying for binding of the DFs or antibodies to components within the test sample.
Conditions for incubating a DF or antibody with a test sample vary.
Incubation conditions depend on the format employed in the assay, the detection methods employed, and the type and nature of the DF or antibody used in the assay. One skilled in the art will recognize that any one of the commonly available hybridization, amplification or immunological assay formats can readily be adapted to employ the DFs or antibodies of the present invention. Examples of such assays can be found in Chard, An Introduction to Radioimmunoassay and Related Techniques. Elsevier Science Publishers. Amsterdam, The Netherlands (1986); Bullock, G. R. et al, Techniques in Jmmunocytochemistry, Academic Press, Orlando, FL Vol. 1 (1982), Vol, 2 (1983), Vol. 3 (1985); Tijssen, Practice and Theory of Enzyme Immunoassays: Laboratory Techniques in Biochemistry and o Molecular Biology. Elsevier Science Publishers, Amsterdam, The Netherlands Z (1985).
The test samples of the present invention include cells, protein or membrane extracts of cells, or biological fluids such as sputum, blood, serum, plasma, or 00 5 urine. The 'test sample used in the above-described method will vary based on the assay format, nature of the detection method and the tissues, cells or extracts used Cl as the sample to be assayed. Methods for preparing protein extracts or membrane en extracts of cells are well known in the art and can be readily be adapted in order to obtain a sample which is compatible with the system utilized, In another embodiment of the present invention, kits are provided which 0 contain the necessary reagents to carry out the assays of the present invention.
Sp'xcifically, the invention provides a compartmentalized kit to receive, in close confinement, one or more containers which comprises: a first container comprising one of the DFs or antibodies of the present invention; and one or more other containers comprising one or more of the following: wash reagents, reagents capable of detecting presence of a bound OF or antibody.
-in detail, a compartmentaized kit includes any kit in which reagents are contained in separate containers. Such containers include small glass containers, plastic containers or strips of plastic or paper. Such containers allows one to efficiently transfer reagents from one compartment to another compartment such that the samples and reagents are not cross-contamninated, and the agents or solutions of each container can be added in a quantitative fashion from one compartment to another. Such containers will include a container which will accept the test sample, a container which contins the anibodies used in the assay, containers which contain wash reagents (such as phosphate buffered saline, Tris-' buffers, etc.), and containers which contain the reagents used to detect the bound antibody or DF.
Types of detection reagents include labelled nucleic acid probes, labelled secondary antibodies, or in the alternative, if the primary antibody is labelled, the enzymatic, or antibody binding reagents which axe capable of reacting with the labelled antibody. One skilled in the art will readily recognize that the disclosed DFs and antibodies of the preent invention can be readily incorporated into one of the established kit formats which are well known in the an.
4. Screening- Assay for Binding Agents 0 0 o Using the isolated proteins of the present invention, the present invention Z further provides methods of obtaining and identifying agents which bind to a Cprotein encoded by one of the ORFs of the present invention or to one of the fragments and the Streptococcus pneumoniae fragment and contigs herein described.
00 In general, such methods comprise steps of: contacting an agent with an isolated protein encoded by one of the ORFs of the present invention, or an isolated fragment of the Streptococcus Cpnewmoniae genome; and o to determining whether the agent binds to said protein or said fragment.
O The agents screened in the above assay can be, but are not limited to, peptides. carbohydrates, vitamin derivatives, or other pharmaceutical agents. The agents can be selected and screened at random or rationally selected or designed using protein modeling techniques.
For random screening, agents such as peptides, carbohydrates, pharmaceutical agents and the like are selected at random and are assayed for their ability to bind to the protein encoded by the ORF of the present invention.
Alternatively, agents may be rationally selected or designed. As used herein, an agent is said to be "rationally selected or designed" when the agent is chosen based on the configuration of the particular protein. For example, one skilled in the art can readily adapt currently available procedures to generate peptides, pharmaceutical agents and the like capable of binding to a specific peptide sequence in order to generate rationally designed antipeptide peptides, for example see Hurby et al., "Application of Synthetic Peptides: Antisense Peptides," in Synthetic Peptides, A User's Guide, W. H. Freeman, NY (1992), pp. 289-307, and Kaspczak er aL, Biochemistry 28:9230-8 (1989), or pharmaceutical agents, or the like.
In addition to the foregoing, one class of agents of the present invention, as broadly described, can be used to control gene expression through binding to one of the ORFs or EMFs of the present invention. As described above, such agents can be randomly screened or rationally designed/selected. Targeting the ORE or EMF allows a skilled artisan to design sequence specific or element specific agents, modulating the expression of either a single ORF or multiple ORFs which rely on the same EMF for expression control.
~37 0 0 z One class of DNA binding agents are agents which contain base residues which hybridize or form a triple helix by binding to DNA or RNA. Such agents ccan be based on the classic phosphodiester, ribonucleic acid backbone, or can be a variety of sulfbydryl or polymeric derivatives which have base attachment capacity, o 5 Agents suitable for use in these methods usually contain 20 to 40 bases and are designed to be complementary to a region of the gene involved in transcription (triple helix see Lee er al, Nucl. Acids Res. 6:3073 (1979); Cooney et at., Mfl Science 241:456 (1988); and Dervan et aL. Science 251:1360 (1991)) or to the mRNA itself (antisense Okano, J. Neurochem. 56:560 (1991); o t0 Oligodeoxynucleotides as Antisense Inhibitors of Gene Expression, CRC Press, Boca Raton, FL (1988)). Triple helix- formation optimally results in a shut-off of RNA transcription from DNA, while antisense RNA hybridization b]ocks translation of an mR.NA molecule into polypeptide. Both techniques have been demonstrated to be effective in model systems. Information contained in the sequences of the present invention can be used to design antisense and triple helixforming oligonucleotides. and other DNA binding agents.
Pharmaceutical Compositions and Vaccines The present invention further provides pharmaceutical agents which can be used to modulate the growth or pathogenicity of Streptococcus pneumoniae, or another related organism, in vivo or in vitro. As used herein, a "pharmaceutical agent" is defined as a composition of matter which can be formulated using known techniques to provide a pharmaceutical compositions. As used herein, the "pharmaceutical agents of the present invention" refers the pharmaceutical agents which are derived from the proteins encoded by the ORFs of the present invention or are agents which are identified using the herein described assays.
As used herein, a pharmaceutical agent is said to "modulate the growth pathogenicity of Streptococcus pneumoniae or a related organism, in vivo or in vitro," when the agent reduces the -rate of growth, rate of division, or viability of the organism in question. The pharmaceutical agents of the present invention can modulate the growth or pathogenicity of an organism in many fashions, although an understanding of the underlying mechanism of action is not needed to practice the use of the pharmaceutical agents of the present invention. Some agents will modulate the growth by binding to an important protein thus blocking the biological activity of the protein, while other agents may bind to a component of the outer 0 0 o surface of the organism blocking attachment or rendering the organism more prone Z to act the bodies nature immune system- Alternatively, the agent may comprise a en protein encoded by one of the ORFs of the present invention and serve as a vaccine. The development and use of a vaccine based on outer membrane components are well known in the at. 00 As used herein, a "related organism" is a broad term which refers to any c',i organism whose growth can be modulated by one of the pharmaceutical agents of ethe present invention. In general, such an organism will contain a homolog of the ci protein which is the target of the pharmaceutical agent or the protein used as a o t0 vaccine. As such, related organisms do not need to be bacterial but may be fungal Oor viral pathogens.
Cl, The pharmaceutical agents and compositions of the present invention may be administered in a convenient manner, such as by the oral, topical, intravenous, intraperitoneal. intramuscular, subcutaneous, intranasal or intradermal routes. The pharmaceutical compositions are administered in an amount which is effective for treating and/or prophylaxis of the specific indication. In general, they are administered in an amount of at least about I mg/kg body weight and in most cases they will be administered in an amount not in excess of about I g/kg body weight per day. In most cases, the dosage is from about 0.1 mg/kg to about 10 g/kg body weight daily, taking into account the routes of administration, symptoms, etc.
The agents of the present invention can be used in native form or can be modified to form a chemical derivative. As used herein, a molecule is said to be a "chemical derivative" of another molecule when it contains additional chemical moieties not normally a pan of the molecule. Such moieties may improve the molecule's solubility, absorption, biological half life, etc. The moieties may alternatively decrease the toxicity of the molecule, eliminate or attenuate any undesirable side effect of the molecule. etc. Moieties capable of mediating such effects are disclosed in, among other sources, REMINGTON'S PHARMACEUTICAL SCIENCES (1980) cited elsewhere herein.
For example, such moieties may change an immunological character of the functional derivative, such as affinity for a given antibody. Such changes in immunomodulation activity are measured by the appropriate assay, such as a competitive type immunoassay. Modifications of such protein properties as redox or thermal stability, biological half-life, hydrophobicity, susceptibility to proteolytic degradation or the tendency to aggregate with carriers or into multimers also may o be effected in rthis way and can be assayed by methods well known to the skilled z The therapeutic effects of the agents of the present invention may be obtained by providing dhe agent to a patient by any suitable means inhalation, 00 5 intravenously, intramuscularly, subcutaneously, enterally. or parenrerally). ic is preferred to administer (he agent of the present invention so as to achieve an effective concentration within the blood or tissue in which the growth of the en organism is to be controlled. To achieve an effective blood concentration, the preferred method is to admidnister the agent by injection. The adiinistrauion may be O t by continuous infusion, or by single or multiple injections.
c~KI In providing a patient with one of the agents of the present invention, the dosage of the administered agent will vary depending upon such factors as the patient's age, weight, height, sex, general medical condition, previous medical history. etc. In general, it is desirable to provide the recipient with a dosage of 1s agent which is in the range of from about I pg/kg to 10 mg/kg (body weight of patient), although a lower or higher dosage may be administered. The therapeutically effective dose can be lowered by using combinations of the agents of the present invention or another agent.
As used herein, two or more compounds or agents are said to be administered "in combination" with each other when either the physiological effects of each compound, or the serum concentrations of each compound can be measured at the same time. the composition of the present invention Can be administered concurrently with, prior w, -or following the administration of the other agent.
The agents of the present. invention are intended to be provided to recipieni subjects in an amount sufficient to decrease the rate of growth (as defined above) of the target organism.
The administration of the agent(s) of the invention may be for either a prophylactic" or "therapeutic" purpose. When provided prophylactically, the 3D agent(s) are provided in advance of any symptoms indicative of the organisms growth. The prophylactic administration of the agent(s) serves to prevent, attenuate, or decrease the rate of onsii of any subsequent infection. When provided therapeutically, the agent(s) are provided at (or shortly after) the onset of an indication of infection. The therapeutic administration of the compound(s) serves to attenuate the pathological symptoms of the infection and to increase the Z rate of recovery.
nThe agents of the present invention are administered to a subject, such as a mammal, or a patient, in a pharmaceutically acceptable form and in a therapeutically effective concentration. A composition is said to be "pharmacologically acceptable" 00 if its administration can be tolerated by a recipient patient. Such an agent is said to Cl be administered in a "therapeutically effective amount" if the amount administered is physiologically significant. An agent is physiologically significant if its presence c" results in a detectable change in the physiology of a recipient patient.
to The agents of the present invention can be formulated according to known Omethods to prepare pharmaceutically useful compositions, whereby these materials, C,,l or their functional derivatives, are combined in a mixture with a pharmaceutically acceptable carrier vehicle. Suitable vehicles and their formulation, inclusive of other human proteins, human serum albumin, are described, for example, in REMINGTON'S PHARMACEUTICAL SCIENCES, 16 th Ed.. Osol, Ed., Mack Publishing. Easton PA (1980). In order to form a pharmaceutically acceptable composition suitable for effective administration, such compositions will contain an effective amount of one or more of the agents of the present invention, together with a suitable amount of carrier vehicle.
Additional pharmaceutical methods may be employed to control the duration of action. Control release preparations may be achieved through the use of polymers to complex or absorb one or more of the agents of the present invention.
The controlled delivery may be effectuated by a variety of well known techniques, including formulation with macromolecules such at, for example, polyesters, polyamino acids, polyvinyl, pyrrolidone, ethylenevinylacetate, methylcellulose, carboxymethylcellulose. or protamine, sulfate, adjusting the concentration of the macromolecules and the agent in the formulation, and by appropriate use of methods of incorporation, which can be manipulated to effectuate a desired time course of release. Another possible method to control the duration of action by controlled release preparations is to incorporate agents of the present invention into particles of a polymeric material such as polyesters, polyamino acids, hydrogels.
poly(lactic acid) or ethylene vinylaceate copolymers. Alternatively, instead of incorporating these agents into polymeric particles, it is possible to entrap these materials in microcapsules prepared, for example, by coacervation techniques or by interfacial polymerization with, for example, hydroxymethylcellulose or gelatine- 0 0 ci Srmicrocapsules and poly(methylmnethacylate) microcapsules, respectively, or in colloidal drug delivery systems, for example. liposomes, albumin rmicrospheres, Cf nicroemuisions, nanoparticles, and nanocapsules or in macroemulsions. Such techniques are disclosed in REMINGTON'S PHARMACEUTICAL SCIENCES 00 5 (1980).
SThe invention further provides a pharmaceutical pack or kit comprising one or more containers filled with one or more of the ingredients of the pharmaceutical Scompositions of the invention. Associated with such container(s) can be a notice in the form prescribed by a governmental agency regulating the manufacture, use or O 10 sale of pharmaceuticals or biological products, which notice reflects approval by C the agency of manufacture, use or sale for human administration.
In addition, the agents of the present invention may be employed in conjunction with other therapeutic compounds.
6. Shot-Gun Approach to Megabase DNA Sequencing The present invention further demonstrates that a large sequence can be sequenced using a random shotgun approach. This procedure, described in detail in the examples that follow, has eliminated the up front cost of isolating and ordering overlapping or contiguous subclones prior to the start of the sequencing protocols.
Certain aspects of the present invention are described in greater detail in the examples that follow. The examples are provided by way of illustration. Other aspects and embodiments of the present invention are contemplated by the inventors, as will be clear to those of skill in the art from reading the present disclosure.
ILLUSTRATIVE EXAMPLES LIBRARIES AND SEQUENCING 1. Shotgun Sequencing Probability Analysis The overall strategy for a shotgun approach to whole genome sequencing follows from the Lander and Waterman (Landerman and Waterman, Genomics 2:231 (1988)) application of the equation for the Poisson distribution. According to this treatment, the probability, P that any given base in a sequence of size L. in nucicotides, is not sequenced after a certain amount, n, in nucleotides, of random 0 o sequence has been determined can be calculated by the equation P e-m, where m Z is Un. the fold coverage. For instance, for a genome of 2.8 Mb, m=l when 2.8 en Mb of sequence has been randomly generated (IX coverage). APthat point, P e- I 0.37. The probability that any given base has not been sequenced is the same as the probability that any region of the whole sequence L has not been deterni"ed 00 and, therefore, is equivalent to the fraction of the whole sequence that has yet to be c'i determined. Thus, at one-fold coverage, approximately 37% of a polynucleotide of e) size L, in nucleotides has not been sequenced. When 14 Mb of sequence has been c'i generated, coverage is 5X for a 2.8 Mb and the unsequenced fraction drops to 1- .0067 or 0.67%. 5X coverage of a 2.8 Mb sequence can be attained by sequencing Oapproximately 17,000 random clones from both insert ends with an average sequence read length of 410 bp.
Similarly, the total gap length, G. is determined by the equation G Le m and the average gap size, g, follows the equation, g Un. Thus, 5X coverage is leaves about 240 gaps averaging about 82 bp in size in a sequence of a polynucleotide 2.8 Mb long.
The treatment above is essentially that of Lander and Waterman, Genomics 2:231 (1988).
2. Random Library Construction In order to approximate the random model described above during actual sequencing, a nearly ideal library of cloned genomic fragments is required. The following library construction procedure was developed to achieve this end.
Streptococcus pneumoniae DNA is prepared by phenol extraction. A mixture containing 200 jig DNA in 1.0 ml of 300 mM sodium acetate, 10 raM Tris- HCI, 1 mM Na-EDTA, 50% glycerol is processed through a nebulizer (IPI Medical Products) with a stream of nitrogen adjusted to 35 Kpa for 2 minutes. The sonicated DNA is ethanol precipitated and redissolved in 500 p1 TE buffer.
To create blunt-ends, a 100 pil aliquot of the resuspended DNA is digested with 5 units of BAL31 nuclease (New England BioLabs) for 10 min at 30 0 C in 200 Wz BAL31 buffer. The digested DNA is phenol-extracted, ethanol-precipitated.
redissolved in 100 tl TE buffer, and"then size-fractionated by electrophoresis through a 1.0% low melting temperature agarose gel. The section containing DNA fragments 1.6-2.0 kb in size is excised from the gel, and the LGT agarose is melted 3s and the resulting solution is extracted with phenol to separate the agarose from the 0 0 ci 0 DNA. DNA is ethanol precipitated and redissolved in 20 j.l of TE buffer for ligation to vector.
l A two-step ligation procedure is used to produce a ptasmid library with 97% inserts, of which >99% were single inserts. The first ligation mixture (50 ul) o00 5 contains 2 gg of DNA fragments, 2 jg pUCI8 DNA (Pharmacia) cut with Sinai and dephosphorylated with bacterial alkaline phosphatase, and 10 urjits of T4 ligase (GIBCO/BRL) and is incubated at 14"C for 4 hr. The ligation mixture then is n phenol extracted and ethanol precipitated, and the precipitated DNA is dissolved in 20 pl TE buffer and electrophoresed on a 1.0% low melting agarose gel. Discrete bands in a ladder are visualized by ethidiumrn bromide-staining and UV illumination Cl and identified by size as insert vector v+I. v+2i, v+3i, etc. The portion of the gel containing v+I DNA is excised and the v+l DNA is recovered and resuspended into 20 pJ TE. The v+-I DNA then is blunt-ended by T4 polymerase treatment for 5 min. at 37°C in a reaction mixture (50 ul) containing the v+l linears.
500 p.M each of the 4 dNTPs, and 9 units of T4 polymerase (New England BioLabs). under recommended buffer conditions. After phenol extraction and ethanol precipitation the repaired v+l linears are dissolved in 20 pi TE. The final ligation to produce circles is carried out in a 50 p1 reaction containing 5 pi of v+I tinears and 5 units of T4 ligase at 14 0 C overnight. After 10 min. at 70PC the following day, the reaction mixture is stored at This two-stage procedure results in a molecularly random collection of single-insert plasmid recombinants with minimal contamination from double-insert chimeras or free vector Since deviation from randomness can arise from propagation the DNA in the host, E. col host cells deficient in all recombination and restriction functions Greener, Strategies 3 (1990)) are used to prevent rearrangements, deletions, and loss of clones by restriction. Furthermore, transformed cells are plated directly on antibiotic diffusion plates to avoid the usual broth recovery phase which allows multiplication and selection of the most rapidly growing cells.
Plating is carried out as follows. A 100 p1 aliquot of Epicurian Coli SURE 1 Supercompetent Cells (Stratagene 200152) is thawed on ice and transferred to a chilled Falcon 2059 tube on ice. A 1.7 pJ aliquot of 1.42 M beta-mcrcaptoethanol is added to the aliquot of cells to a final concentration of 25 mM. Cells are incubated on ice for 10 rmin. A 1 pl aliquot of the final ligation is added to the cells and incubated on ice for 30 min. The cells are heat pulsed for 30 see. at 42 0 C and 0 0 Cl o placed back on ice for 2 rain. The outgrowth period in liquid culture is eliminated Z from this protocol in order to minimize the preferential growth of any given |n transformed cell. Instead the transformation mixture is plated directly on a nutrient rich SOB plate containing a 5 ml bottom layer of SOB agar SOB agar 20 g tryptone, 5 g yeast extract, 0.5 g NaCI, 1.5% Difco Agar per liter of media). The 00 ml bottom layer is supplemented with 0.4 rml of 50 mg/ml ampicillin per 100 ml Cl1 SOB agar. The 15 ml top layer of SOB agar is supplemented with 1 ml X-Gal 1 ml MgC[ (I and I ml MgSO /100 ml SOB agar. The 15 ml top layer Cl is poured just prior to plating. Our titer is approximately 100 colonies/I 0 ul aliquot o 10 of transformation. 4 O All colonies are picked for template preparation regardless of size. Thus, only clones lost due to "poison" DNA or deleterious gene products are deleted from the library, resulting in a slight increase in gap number over that expected.
3. Random DNA Sequencing High quality double stranded DNA plasrnid templates are prepared using a "boiling bead" method developed in collaboration with Advanced Genetic Technology Corp. (Gaithersburg, MD) (Adams et al., Science 252:1651 (1991); Adams et aL, Nature 355:632 (1992)). Plasmid preparation is performed in a 96well format for all stages of DNA preparation from bacterial growth through final DNA purification. Template concentration is determined using Hoechst Dye and a Millipore Cytofluor. DNA concentrations are not adjusted, but low-yielding templates are identified where possible and not sequenced.
Templates are also prepared from two Streptococcus pneumonhie lambda genomic libraries. An amplified library is constructed in the vector Lambda GEM- 12 (Promega) and an unamplified library is constructed in Lambda DASH II (Stratagene). In particular, for the unamplified lambda library, Streptococcus pnewnoniae DNA 100 kb) is partially digested in a reaction mixture (200 ul) containing 50 pg DNA, IX Sau3AI buffer, 20 units Sau3AI for 6 min. at 23°C.
The digested DNA was phenol-extracted and electrophoresed on a 0.5% low melting agarose gel at 2V/cm for 7 hours. Fragments from 15 to 25 kb are excised and recovered in a final volume of 6 ul. One .1 of fragments is used with I pi of DASHII vector (Stratagene) in the recommended ligation reaction. One 1 of the ligation mixture is used per packaging reaction following the recommended protocol with the Gigapack II XL Packaging Extract (Stratagene, #227711). Phage 0 0 o are plated directly without amplification from the packaging mixture (after dilution Z with 500 p1 of recommended SM buffer and chloroform treatment). Yield is about i0 3 pfu/ul. The amplified library is prepared essentially as above except the lambda GEM- 12 vector is used. After packaging, about 3.5x i04 pfu are plated on the restrictive NM539 host. The lysate is harvested in 2 ml of SM buffer and 00 stored frozen in 7% dimethylsulfoxide. The phage titer is approximately Ixl09 C'1 pfumnl.
Liquid lysates (100 pl) are prepared from randomly selected plaques (from ci the unamplified library) and template is prepared by long-range PCR using T7 and jo T3 vector-specific primers.
O Sequencing reactions are carried out on plasmid and/or PCR templates using the AB Catalyst LabStation with Applied Biosystems PRISM Ready Reaction Dye Primer Cycle Sequencing Kits for the Ml 3 forward (M13-21) and the M3 reverse (Ml3RPI) primers (Adams e at.. Nature 368:474 (1994)). Dye terminator sequencing reactions are carred out on the lambda templates on a Perkin-Elmer 9600 Thermocycler using the Applied Biosystems Ready Reaction Dye Terminator Cycle Sequencing kits. T7 and SP6 primers are used to sequence the ends of the inserts from the Lambda GEM-12 library and T7 and T3 primers are used to sequence the ends of the inserts from the Lambda DASH II library.
Sequencing reactions are performed by eight individuals using an average of fourteen AB 373 DNA Sequencers per day. All sequencing reactions are analyzed using the Stretch modification of the AB 373, primarily using a 34 cm well-to-read distance. The overall sequencing success rate very approximately is about 85% for M13-21 and MI3RPI sequences and 65% for dye-terminator reactions. The average usable read length is 485 bp for M[3-21 sequences, 445bp for M13RPI sequences, and 375 bp for dye-terminator reactions.
Richards et aL, Chapter 28 in AUTOMATED DNA SEQUENCING AND ANALYSIS, M. D. Adams, C. Fields, J. C. Venter, Eds., Academic Press, London, (1994) described the value of using sequence from both ends of sequencing templates to facilitate ordering of contigs in shotgun assembly projects of lambda and cosmid clones. We balance the desirability of both-end sequencing (including the reduced cost of lower total number of templates) against shorter read-lengths for sequencing reactions performed with the M 13RPl (reverse) primer compared to the M13-21 (forward) primer. Approximately one-half of the tempiates are sequenced from both ends. Random reverse sequencing reactions are 0 0 O done based on successful forward sequencing reactions. Some MI3RPI Z sequences are obtained in a semi-directed fashion: M 13-21: sequences pointing S' outward at the ends of contigs are chosen for MI3RPI sequencing in an effort to specifically order contigs.
S4. Protocol for Automated Cycle Sequencing C'l The sequencing is carried out using ABI Catalyst robots and AB 373 SAutomated DNA Sequencers. The Catalyst robot is a publicly available sophisticated pipetting and temperature control robot which has been developed o 10 specifically for DNA sequencing reactions. The Catalyst combines pre-aliquoted Stemplates and reaction mixes consisting of deoxy- and dideoxynucleotides, the thermostable Taq DNA polymerase, fluorescently-labclled sequencing primers, and reaction buffer. Reaction mixes and templates are combined in the wells of an aluminum 96-well thermocycling plate. Thirty consecutive cycles of linear amplification one primer synthesis) steps are performed including denaturation, annealing of primer and template, and extension; DNA synthesis. A heated lid with rubber gaskets on the thermocycling plate prevents evaporation without the need for an oil overlay.
Two sequencing protocols are used: one for dye-labelled primers and a second for dye-labelled dideoxy chain terminators. The shotgun sequencing involves use of four dye-labelled sequencing primers, one for each of the four terminator nucleotide. Each dye-primer is labelled with a different fluorescent dye, permitting the four individual reactions to be combined into one lane of the 373 DNA Sequencer for electrophoresis, detection, and base-calling. ABI currently supplies pre-mixed reaction mixes in bulk packages containing all the necessary non-template reagents for sequencing. Sequencing can be done with both plasmid and PCR- generated templates with both dye-primers and dye- terminators with approximately equal fidelity, although plasmid templates generally give longer usable sequences.
Thirty-two reactions are loaded per AB373 Sequencer each day, for a total of 960 samples. Electrophoresis is run overnight following the manufacturer's protocols, and the data is collected for twelve hours. Following electrophoresis and fluorescence detection, the ABI 373 performs automatic lane tracking and basecalling. The lane-tracking is confirmed visually. Each sequence electropherogram (or fluorescence lane trace) is inspected visually and assessed for quality. Trailing 0 0 c4
O
z sequences of low quality are removed and the sequence itself is loaded via software to a Sybase database (archived daily to 8mm tape). Leading vector polylinker Ci sequence is removed automatically by a software program. Average edited lengths of sequences from the standard ABI 373 are around 400 bp and depend mostly on 00 5 the quality of the template used for the sequencing-reaction. ABI 373 Sequencers convened to Stretch Liners provide a longer electrophoresis path prior to fluorescence detection and increase the average number of usable bases to 500-600 1bp.
oo INFORMATICS Cl 1. Data Management A number of information management systems for a large-scale sequencing lab have been developed. (For review see, for instance, Kerlavage et al., Proceedings of the Twenty-Sixth Annual Hawaii International Conference on System Sciences, IEEE Computer Society Press, Washington D. 585 (1993)) The system used to collect and assemble the sequence data was developed using the Sybase relational database management system and was designed to automate data flow wherever possible and to reduce user error. The database stores and correlates all information collected during the entire operation from template preparation to final analysis of the genome.- Because the raw output of the ABI 373 Sequencers was based on a Macintosh platform and the data management system chosen was based on a Unix platform, it was necessary to design and implement a variety of multi- user, client-server applications which allow the raw data as well as analysis results to flow seamlessly into the database with a minimum of user effort.
2. Assembly An assembly engine (TIGR Assembler) developed for the rapid and accurate assembly of thousands of sequence fragments was employed to generate contigs. The TIGR assembler simultaneously clusters and assembles fragments of the genome. In order to obtain the speed necessary to assemble more than 104 fragments, the algorithm builds a hash table of 12 bp oligonucleotide subsequences to generate a list of potential sequence fragment overlaps. The number of potential overlaps for each fragment determines which fragments are likely to fall into repetitive elements. Beginning with a single seed sequence fragment, TIGR Assembler extends the. current contig by attempting to add the best matching 0 0 o fragment based on oligonucleotide content. The contig and candidate fragment are Z aligned using a modified version of the Smith-Waterman algorithm which provides M for optimal gapped alignments (Waterman. M. Methods in Enzymology 164:765 (1988)). The contig is extended by the fragment only if strict criteria for the quality of the match are met. The match criteria include the minimum length of 00 overlap, the maximum length of an unmatched end, and the minimum percentage match. These criteria are automatically lowered by the algorithm in regions of minimal coverage and raised in regions with a possible repetitive element. The .I number of potential overlaps for each fragment deterrrunes which fragments are O in likely to fall into repetitive elements. Fragments representing the boundaries of o repetitive elements and potentially chimeric fragments are often rejected based on partial mismatches at the ends of alignments and excluded from the current contig.
TIGR Assembler is designed to take advantage of clone size information coupled with sequencing from both ends of each template- It enforces the constraint that sequence fragments from two ends of the same template point toward one another in the contig and are located within a certain range of base pairs (definable for each clone based on the known clone size range for a given library).
The process resulted in 391 contigs as represented by SEQ ID NOs: 1-391.
3. Identifying Genes The predicted coding regions of the Streptococcus pneumoniae genome were initially defined with the program GeneMark, which finds ORFs using a probabilistic classification technique. The predicted coding region .equences were used in searches against a database of all nucleotide sequences from GenBank (October. 1997), using the BLASTN search method to identify overlaps of 50 or more nucleotides with at least a 95% identity. Those ORFs with nucleotide sequence matches are shown in Table 1. The ORFs without such matches were translated to protein sequences and compared to a non-redundant database of known proteins generated by combining the Swiss-prot, PIR and GenPept databases. ORFs that matched a database protein with BLASTP probability less than or equal to 0.01 are shown in Table 2. The table also lists assigned functions based on the closest match in the databases. ORFs that did not match protein or nucleotide sequences in the databases at these levels are shown in Table 3.
S49
O
O
0 Z ILLUSTRATIVE APPLICATIONS C, 1. Production of an Antibody to a Streptococcus pneumoniae C Protein Substantially pure protein or polypeptide is isolated from the transfected or 00 5 transformed cells using any one of the methods known in the art. The protein can •4 also be produced in a recombinant prokaryotic expression system, such as E. coli.
or can be chemically synthesized. Concentration of protein in the final preparation (C is adjusted, for example, by concentration on an Anicon filter device, to the level of a few micrograms/mi. Monoclonal or polyclonal antibody to the protein can then be prepared as follows.
2. Monoclonal Antibody Production by Hybridoma Fusion Monoclonal antibody to epitopes of any of the peptides identified and isolated as described can be prepared from murine hybridomas according to the classical method of Kohler, G. and Milstein, Nature 256:495 (1975) or modifications of the methods thereof. Briefly, a mouse is repetitively inoculated with a few micrograms of the selected protein over a period of a few weeks. The mouse is then sacrificed, and the antibody producing cells of the spleen isolated.
The spleen cells are fused by means of polyethylene glycol with mouse myeloma cells, and the excess unfused cells destroyed by growth of the system on selective media comprising aminopterin (HAT media). The successfully fused cells are diluted and aliquots of the dilution placed in wells of a microtiter plate where growth of the culture is continued. Antibody-producing clones are identified by detection of antibody in the supernatant fluid of the wells by immunoassay procedures, such as ELISA, as originally described by Engvall, Meth.
EnzymoL 70:419 (1980), and modified methods thereof. Selected positive clones can be expanded and their monoclonal antibody product harvested for use. Detailed procedures for monoclonal antibody production are described in Davis, L. et al., Basic Methods in Molecular Biology. Elsevier, New York. Section 21-2 (1989).
0 0
O
0 3. Polyclonal Antibody Production by Immunization Z Polyclonal antiserum containing antibodies to heterogenous epitopes of a Cil single protein can be prepared by immunizing suitable animals with the expressed protein described above, which can be unmodified or modified to enhance 00 5 immunogenicity. Effective polyclonal antibod-production is affected by many factors related-both to the antigen and the host species. For example. small molecules tend to be less immunogenic than others and may require the use of C carriers and adjuvant. Also, host animals vary in response to site of inoculations 4- and dose, with both inadequate or excessive doses of antigen resulting in low titer antisera. Small doses (ng level) of antigen administered at multiple intradermal Ssites appears to be most reliable. An effective immunization protocol for rabbits can be found in Vairukaitis. J. et nl., J. Clin. Endocrinol. Metab. 33:988-991 (1971).
Booster injections can be given at regular intervals, and antiserum harvested when antibody titer thereof, as determined serm-quantitatively, for example, by double immunodiffusion in agar against known concentrations of the antigen.
begins o fall. See. for example. Ouchterlony, O. et al., Chap. 19 in: Handbook of Experimental Immunology. Wier. ed. Blackwell (1973). Plateau concentration of antibody is usually in the range of 0.1 to 0.2 mg/mi of serum (about 12M).
Affinity of the antisera for the antigen is determined by preparing competitive binding curves, as described, for example, by Fisher, Chap. 42 in: Manual of Clinical Immunology, second edition, Rose and Friedman, eds., Amer. Soc. For Microbiology, Washington, D. C. (1980) Antibody preparations prepared according to either protocol are useful in quantitative immunoassays which determine concentrations of antigen-bearing substances in biological samples; they are also used semi- quantitatively or qualitatively to identify the presence of antigen in a biological sample. In addition, antibodies are useful in various animal models of pneumococcal disease as a means of evaluating the protein used to make the antibody as a potential vaccine target or as a means of evaluating the antibody as a potential immunotherapeutic or immunoprophylactic reagent.
O
Z 4. Preparation of PCR Primers and Amplification of DNA C Various fragments of the Streptococcus pneumoniae genome, such as those Sof Tables 1-3 and SEQ ID NOS: 1-391 can be used, in accordance with the present invention, to prepare PCR primers for a variety of uses. The PCR primers are 00 5 preferably at least 15 bases, and more preferably at least 18 bases in length. When C selecting a primer sequence, it is preferred that the primer pairs have approximately the same G/C ratio, so that melting temperatures are approximately the same. The C PCR primers and amplified DNA of this Example find use in the Examples that follow.
oio Gene expression from DNA Sequences Corresponding to ORFs A fragment of the Streptococcus pneumwnoniae genome provided in Tables 1- 3 is introduced into an expression vector using conventional technology.
Techniques to transfer cloned sequences into expression vectors that direct protein translation in mammalian, yeast, insect or bacterial expression systems are well known in the art. Commercially available vectors and expression systems are available from a variety of suppliers including Stratagene (La Jolla, California).
Promega (Madison, Wisconsin). and Invitrogen (San Diego, California). If desired, to enhance expression and facilitate proper protein folding, the codon context and codon pairing of the sequence may be optimized for the particular expression organism, as explained by Hatfield et U. S. Patent No. 5,082,767, incorporated herein by this reference.
0 0 c> o The following is provided as one exemplary method to generate Spolypeptide(s) from cloned ORFs of the Streptococcus pneumoniae genome F fragment. Bacterial ORFs generally lack a poly A addition signal. The addition signal sequence can be added to the construct by, for example, splicing out the poly A addition sequence from pSG5 (Stratagene) using BglI and Sail restriction
OO
0 endonuclease enzymes and incorporating it into the mammalian expression vector C, pXTI (Stratagene) for use in eukaryotic expression systems. pXTI contains the SLTRs and a portion of the gag gene of Moloney Murine Leukemia Virus. The positions of the LTRs in the construct allow efficient stable transfection. The O 10 vector includes the Herpes Simplex thymidine kinase promoter and the selectable 0 neomycin gene. The Streptococcus pneumoniae DNA is obtained by PCR from the bacterial vector using oligonucleotide primers complementary to the Streptococcus pneumoniae DNA and containing restriction endonuclease sequences for PstI incorporated into the 5' primer and BglII at the 5' end of the corresponding Streptococcus pneumoniae DNA 3' primer, taking care to ensure that the Streptococcus pneumoniae DNA is positioned such that its followed with the poly A addition sequence. The purified fragment obtained from the resulting PCR reaction is digested with Pstl, blunt ended with an exonuclease, digested with BglII, purified and ligated to pXTI, now containing a poly A addition sequence and digested BgllI.
The ligated product is transfected into mouse NIH 3T3 cells using Lipofectin (Life Technologies, Inc., Grand Island. New York) under conditions outlined in the product specification. Positive transfectants are selected after growing the transfected cells in 600 ug/ml G418 (Sigma, St. Louis, Missouri).
The protein is preferably released into the supematant. However if the protein has membrane binding domains, the protein may additionally be retained within the cell or expression may be restricted to the cell surface. Since it may be necessary to purify and locate the transfected product, synthetic 15-mer peptides synthesized from the predicted Streptococcus pneumoniae DNA sequence are injected into mice to generate antibody to the polypeptide encoded by the Streptococcus pneumoniae
DNA.
o 53
O
Alternatively and if antibody production is not possible, the Streptococcus 0 pneumnoniae DNA sequence is additionally incorporated into eukaryotic expression vectors and expressed as, for example, a globin fusion. Antibody to the globin moiety then is used to purify the chimeric protein. Corresponding protease cleavage sites are engineered between the globin moiety and the polypeptide encoded by the Streptococcus 00 pneumoniae DNA so that the latter may be freed from the formed by simple protease digestion. One useful expression vector for generating globin chimerics is S(Stratagene). This vector encodes a rabbit globin. Intron II of the rabbit globin gene O facilitates splicing of the expressed transcript, and the polyadenylation signal incorporated into the construct increases the level of expression. These techniques are well known to those skilled in the art of molecular biology. Standard methods are published in methods texts such as Davis et al., cited elsewhere herein, and many of the methods are available from the technical assistance representatives from Stratagene, Life Technologies, Inc., or Promega. Polypeptides of the invention also may be produced using in vitro translation systems such as in vitro ExpressTM Translation Kit (Stratagene).
While the present invention has been described in some detail for purposes of clarity and understanding, one skilled in the art will appreciate that various changes in form and detail can be made without departing from the true scope of the invention.
All patents, patent applications and publications referred to above are hereby incorporated by reference.
Throughout the specification, unless the context requires otherwise, the word "comprise" or variations such as "comprises" or "comprising", will be understood to imply the inclusion of a stated integer or group of integers but not the exclusion of any other integer or group of integers.
2004231248 23 Nov 2004 TABLE I S. preumonlae Coding reglona containlng kno- sequences Contig j OR I Start to p etch match gene name I crcentl lISP nt o IF n I. In it I nl I Intl I cqelon IIr.r In~ D 1 ft) at AcssonI[dent j length lo Lese 4- 4)1 1003 igbll11ISI [treotcoccvs pneumona. pept ids methionine sulloxlde reductase IsrA) and 9l z 00 567 hmogering kinhs hoerlog Ithras genes. complete Cdi I 7210 IgbUG4O4j IStreptococcus pneumonia. 657 dextran iucoldese gene end insertion 9 I4 4 450 sequence 151202 tranaprsee gene. complete Cis 651 enbSI11~SPI I cap)*B.~.~rc l~lJ u gene. dTOP-rhantnose 9 0 1 I 6 659 6117 tIZbliSISPZeIn Spnawoniae das, capi*.C.s genes. dTrliawose9g 21 36 o uynotheaa genes Ard AllA gene I I I I 3 111 I 8719 9147 Ie.bIhI3liiIBPZ Spnmona dem cew- phIAecocrcn.JK genes. ilTOP-rhamnose 1 1 4241 24 S112 110419 111 Ibj~] lJI]IISI Sopneumonlae dl. caphIA.5.C.D.I VtG.I.i.J.XI genes. dl'b-rIaass 91 511 I Sig blceynthesla genes end II geneI I 111 1546 111019 1gb1U47526l 1 Streptococcus pnewmoniae ceuramlnldese B Inen9i gene. complete ols, end 99 I74 414 1 neuraelnidaso InenAl gene. partial cdI 114 112011 IgbI4)SII Streptococcus pneusonle nuralnldase a Inanbi gene, complete cds. *nd 99 1159 1 1159 neutamInldacs inanl gene. parrie cdi j 4 4 I 2 15 1)41 II~i gbU~l3Il Strepocccus pnetsonlee neuralnldaee B (nnBI gene. complete ril. and 99 9I16 gsej 1 neuraminlise (nenki gene. partirle l 1 i 132 111IS~I)21 It ptecoccus pesumonlee ne.remndec 5 (nnll gene, complte cdi. end I 9, 143 11 Uh I neuxamlnidese 14inAl gene, partial cds r ,l 3 I 11zIt) 117213 gbIU4Jl6j IStreptococcus pnaumoniae neureminidee. 1 Inanl gene, complete cds. ed I 11 21;; I I I 1 I 1 nrurammnldiea (nan~l gone, partial cs I I I III 11761 11191 Iblt1S)j SCreto.cs pneuenila nereinidase I Inanhl gene, coplet cdi, and Y ls gone 11p1il Ineurlnidase InnAl geiis. partial cds 1 1 46 116 I4Y jembIlk453IPI IStretococcus pnswsuonie dnafl rpo. cpOA ganes and OcF3 and OnF5 99 1 11O1 B 1141 1 I--4 4 211 2129 IeWNII3453ISPON IStreptococco. pnowumnle dn&C, rpot. cpoA genes and 01F3 and ORF5 1 99 016 1332 1 F 1 1127 11073I gu413s~Streptococcus pneu~soniae peptde metrhioning euioideO Ildutase Imerl and 1 2 I I 1 homoa&rine hinase hoeclog (thril genes, complete edi I I 1 I 1 7131 1 )64 IebIZlSllPISPRS I.pneusionlee DNA for Insertion sequence IS11 (231 bpi 5 I 10 240 I 13 1 7570 sIebIZll771SPIPS lS.pnaurae DNA for Insertion Sequence ISIISI 196 bpl 9 6 j 34$ 4 1211 1109 1 1981 It7 b l.lZm S 115 IS.pneuIonle. dex. cr Iet s.e Fqunce9F II Xgnes. 15)81 1966 bp I 96 I 65 1 41 11111o alhi genes1and 1li gen dI 465 j 7 I@ *301 I 16 ItubotI33JSIS0 S.pneu~noae deal. capiIADC. DE.Co.II.s.J.t genes. 1 i I I blosynthesis genes and ellA gene J 2004231248 23 Nov 2004 TABLE 1 S. pneumonia. ceding regions containing known sequences I D lp I r I n aejpecn NS ni 0v t C Ani t Io n Sta t Stop math match gene nam iden.t I Lngth Aegt.
1-- I" 1 9024 J 8106 jamb IJI I jbio~ynche5IjagIno. and AIIA gene -gns dVPriaoeI I~ 1 0 11 J 3 f 4 "07 IgbIL1'"'I Streptococcoe pIreulonie methlmv transferae* intri gene cluster, complete j 93 St) 122;" I1 1 cdin III I i I 548 915 Iab;s"so S.vnaaMnniu yor(IA ,C.0,E3. (tet, popX and ragkt genes 1 i 36 li12I ii I 24 1910 Ieeb~tlti~ o Is~pi umontae j C~t (teL, ppE and reR genies 1 9 tea 20 4 It 91138 34 ebIIII SOO R IS.Pneumonlee yCltA,8 CDZl, CteL, pispg and regR genes 9 15 2 1 243 ii 7 43 11 em~lilI~o I~nenis X~tA9CDg.II.pp and riS genes I I~ 1 31; W 1m It 1 6 lit I.bI l I OOt js.pae moniee yoctIA.S.CDEI. p p X and reg R genes 1 96 1 %1 211 -T a iiO 4l 2 1164 IamblI276UISO IStretooccus pno ieA.pCOe gen f or pe ncilin binng Irt~ 9w I 111 1 M Ibo 1on 1a penicillian bid n gentee end ll gee 1 I0 4 j 1 j 13 2 1 jem IiIu)I5ISPlG I5.pneumcnla* e x genes geneh&,n z i7 34 4 I- Me9 1 4 2M 18 lbIl2I 1 IStraPtococcuB pneuMeeniee4 ttachmentse USA sequco e nce IACA x9teta 1& 4no 4
I
20 1 937 ill UbIUII)3SI fpsrretCocua, cpnetie cdt ee ata d omeec tmltn Isi IglD19 1treptdcoprcuso ic~u oeCI itiyp e plro ysa. cc ard l asnd lpoe 16 j4 414 I I I I I ruor I cs1oADCL C1KK geneeees, c~lCJ-Ar enda tR J A-Gi gn 1 40 43 lmlI11 S treIS ptocusa s DNAumornertiLn geene partia cci, co ptnc c1atn 98 1 l11ca41 IL II I-;1b 4 IIF I3regaloator~ ie gnes cepst 1dt tliA-6r entog-lgne
I
2004231248 23 Nov 2004 TABLE I S. pneu"niaa Coding regions containing know Sequences I onti 9 tOr Strt Stop I xcch Jatch gone nIperentl HP t 0I o 10 1io intl Intl scession II 1 t length ongth I )IFS 604 bloP 6 11 Streptocorcus pneumcniae competeance stiulating C~ptide precu so !oC C 412 a 17 M CI hi ime Icnasa hoaolaog ,occoroa and response reoul tor ::01 0; COPE ICOMEJ gene, cmplete eds 0 M 1 21 5 54 biU7D62)0 Streptococcus pneuonia*s C co ptenc ee pati r ti rec u sor ~o iSCCIQOO CI n ~nl R01tlrr-rggcn, ~e SOlr~uence. end putative ps 1705 ID Jrino protesse Isphtris), SPSpoJ ispapaail, inltiator protein IepdnSal and bate subunit of ONA pojymorese III lspdnani genes, coepiete cdI S 4 5141 obi~rO j951 StrePtococcus pneuoninse I1 tJA-Arg gene, partial sequence, end putative 99 771 I'nne protease isphtra), SPfp isppoi nititor protein Epdnei and I b subunit o DNA poiymeravg III ppdnan gangs, Complete cdt 41Streptococcus pneu.ieg 501 tRmA-Arg gone, partil sequence, and putative j 9 J l I Ierine proteese isphtr(, SPSpoJ lepepoij, initiator protein ipdnes and I I subunlt of DNA polynerese III (pdnenI genes. complete cd1 120 IgS 10 0 95 F 2 f 11 I g roo000 6 5g Streptococcus preum onia g R @it tANA-A r gene. Partial aequence und ap t ve ftg 1 8 ilia beta Subunit Of DNA polymseo 11IFp n ens oplt d Ierine proseass isphtrel, SPSpoJ iapspol. initiator protein ispdnaj and -I-beta sr ubuniLt of DNA palynecase III Irpdl n genes, complots cdo s L 1 P 1C-- u m
L
4 In ccusg onu, nl.e 0,t tA-Arg gene, partial sequence, and Putative 99 13 it f berta u r i"(4ra1 "Po-p I-PSPOJI, Initiator Protein Ispdn o) and bets eubunD o A polymere III ispdnenj gtAw&, complete cda U 2 114 11;76 I1'24' 1eI1315zsIlipnnponiee DNA fer Insertion sequence 11)18 -17 -a 1 bi l l I S P S .V n euT nIL m D N A o r I n s e rto n s q u n q S 1 142 b p i 1 9 7 3 s i i s Icitn c I 4IM 011771 61SiS 1S. neumonlDNA For Insertion sequence 1£311 (11 t p bpend 96 1 11)1 I 1 neueoniei genes encodinog galacturonceyl trensferese end transposase end gg 45 I I I I I 1 ~inxertion-sequence isisis je2 j 41 r- 21 0 3 *o 4202 I hbjxSlc474lsrrL Is.pneumonle ply gene i oe p eueoiyein I 19 j 42 1422 2) 5 I M IbI 7j iS.pnew ia pneuoolyoLn gone, coplete- ;7 .11 28* 4 I gd- pnemoia i4ewunugbobulin A pIotase ligal1 gene, complete91 ii 2O eds 1 99 IS[ 340IS1~L 1 2 1 1 gIStreptococcus pneumonlao Irrlunog(obulln At prlesei Iia gene
IO
110 5 141 4 21 j1 617m 1 IvflauiO~lp Ist0rptcous p ode 1 (IIII ~ltll :t rCu nu.llimunolobuln AZ Plotelse ieu gene. complete I 1 I0 tiT 2004231248 23 Nov 2004 TABtLE 1S. osumoniag Coding r0g1iis containlng kumom sequences 1-11-1 C0*ILIQ ORF Stnr- Stop match match gene n145 Cl CCI' nt 1 0 1D Intl (nt pals 1 prcontC )4S nt I oltrgnt iden t length Ian n I a i4isis11454 ebZ~uI~lnhS-Plmeunnl deos, cap~lA,B,c.,,G,, ,j genes, dlCP-rhamnoee 9 blosynthesis goeaend allA geng 1 I -ao 6-2iebJtB~llSjl1s 6S.pneuroniee dens, caplIA,s.C.orF..ir.j..x 1 gene, d 1 np-rhmno e I III 111 Iiceynthesglos gene. and ellA gene iuco-el-r-r~lsequence 151202 trensposate gene, conmple cdogm n lsrjn9 I 1 1 enbjz~jlil51 bluoynLhcsl goee end 1AOC0EFGH gene genes, dTosP-vhamnose 59 426 I U 0 Streto coccus pneumon ie 555 dotren glucoeldese gene And Insertion 9 1 .50 1 4501 sequence 15120 2 transpaae gene. complete de I~ I 28 3 Ig 125 010bl 3295 tse IEptococcue pneumoniae 55 dextrem gucca sidas gene end Insertion 96 15 lo S; Jsequence ISl I transposse gene. Coplets cdo
I
4 m7 II207 neltodentrln perifeese I-eEC and maID1 genes, complete cda I167 bj [I 6iij I Streptocvccus pneumoniae *ltoiSelmtoextrin uptake ImaIc Andnd two j i4 195 gIs% I me tode xr n pesme se (.IC nd .a D genes. comple te eds I
I
3 1 4100 IbL ltS1I4 IStreptcoccus iPueonlee aelA gene, Complete Cd: mae gene, complete cd I is 1 111 82 I i 3 i. 190 44; I9bL~t9 Istreptoco)cus pnegmonlae matA gene, complete cdi:j meels gene, coeplete eds 98 9 13 99e 4 I 3 14 1 D 1 7 4 1507 IobIL laseI Igtrepcococcus pneumoniee Pl gentde om l ete c p Kdi o ma l d c g ne Icom let nd 1 1 1 11 1 1 1 homosor kln::hsio homolothell genes, CoMpIetC dc I r I Ill 1439 Itmb~-lnII Sonweonlee daP..ll.,C.D.C,,r.H.Ji genes. dtDP-rham noseee7 240 26 Is. ib I I I I I*Ioasg end ellA gene p J C 264 Srtcousposoletype I9F cepsular polyeecchiarlds blpeynth isi 4 Ia 16p SS1 99456 jDC pIIIDEGIIIJKUQO lA enes, complete de, a.d elsA geno:e, 9 I 11per tlea cdi Isrp~ccv den.cut~le Crl 8149 ceolC ic cdi ~R 0 clera cg 979 99 Scpe24, cpsiI, cpsidJ. cpelet, cpeldL, tlA genes I 18 11 L 16)70 Iembbl 5]33Sj SPze IS.pneumoni a t dee cep ilA.S.C. E .0 ,G 11.47 K genes, r 92 1 71) 1 biosyntheels genes and &lLA gene Slrrplpe~~~~~ceuli Duno y 1 capsular Dglyaacchliejic bls,~laoeie 6 i( 1 0 *2Peron. lcpsei9 teeCDCEFJrrcmzjio gens, complete cda. end iltA gene,
I
pertiel cdi 2004231248 23 Nov 2004 TABLE Is. pneumona. Coding regions containing known sequence.
I I 1 Cn t ig O F St er ct I $ap M tch aCi, en name I I I I ntl n Iprcent HSPnt Ieason I ent lengt I ngth 4- P I I r [37604 II'I I~bl~ll~ I S~pncueionla. at~m cpiliA. cpsIlg. cpejIC cplD, srlt., EP~Imr, nCPJ. I legt J ocrI n 1 CO&pAll' CO ClaJ. 'PO14K, CPit., tJAM genes length I 60-9-1 6 lop 0916 1 BS? IgbIulua766I 'strn~tOcoccus PiteumIonl surface antigen A variant prceur I and 1 58 I6 [3831I j hoe protein gents- complete cd. and ORI gen, paral cdi6 f I- 20 [19934. 1IBSE IgblLJSIO5 I Straptococcu, pneutoniae surerc adhesin A grecurmor Ipeaki gone. complete 99 969 161 I r -I c d i 9 6 9 .1 3 I7 274j 17g Iabz£77)l1sppo I.pneae parC, part and cransposeee genea end unknown ol 9 I 6 256 -a13 53 1 )7 4bjI'Jl PJ I LpneUMOntae pa C. part end trenaposae genes end unknoewn 0(1 i f 1900 I162 7 Z6773915PP ISn eumontaa parC. part and traneposama gen es mnd unknown ol 9 $1 65 7 S;3 S; on-br 4 3 1 61; 1 5933 sIe b lZs9ll s FppA 1 .pneumonlae parC. part and transpogase gen i and unknown o r[ 1 96 33 1 3 19 1 i S 1 99 i1 J129191 Sl pneIgmnulsj promoter region G74 1 00 64 JOU 4- I -9 t31 vj I reo pt ocecenea pne u o'n iaea peptid 1-a oe thl 6onln i affsd rencas 'a and en hogoserine kinasa hoeolog (trl genes. complete cd s I Io "92 S homaotre~tcO~cu; 1 homokog (thrl genes, eonplato cds
I
I0 t 25 g 1 b 2 36a16 1 JS.pneumonlae mismatch repair Ilexh gene, complete cdi 966 00 I 41 1 i 1017 Imbll7ol siAC Is.pneumonlaa reeF gene encoding AecA 1 119 I 1027 1 1035j 1 4t 12I I i1oi J 3 41bjt243031CI [LreptocoCCum pneuninLa@ con operon encodIng the dnA, rec, din. lytA 99 1 1306 1 gone, and dowmacre.. sequences I I r I I 4l 1 1 61 22 1 4041 iqIIIS2I IS.pneumon$*e autolymi n ClytAl gene, complete ed. '9 96) 1 963 -I 41[ 4 I j 309£ JgbIlNII631 Is.pnumoneeo autolysin Ilytl gone, complete eds 1 100 177 1 177 ii 1 S 1603 1 1160 1gbl13143u 1 JSpneuaontso aucolysin (lyli gene, complete cde 1 100 1 28 1g no58 4- I I I 6 s5 5143 Igbltq6s1 Istreptococcus pneumonia ORT. complete cds 1 28 1 I00 404 1 1- 9bj 0 20 S6 g~~8 1 IStreptococcus pneumonJa 5 OAF, complete cdo I 9$ 1 647 47 l ;112 6938 IsJL36660I Str eptococcus pnoumonoi t OP, complete cdi I 98 433 I 801 41 1 f j 6936 1 1119 IgbIL36660I Istreptococrus pneumoniaa ORF, complete cdi 1 300 1 204 4 1 t I 0 1 1 0 0 12 1 7 6 6 0 l gu I L 3 n I I S t r e p o c o c u s p e a m onn l oe Oe c o p l e t e c d s 543 5 7 9 16 8 I;9 7 L9 6 6 6 0 Int rFpt o o.cu e c mp l e t e C 3 3 12 0 I ii I 760 7979 Pnemona. e r *C 41 13 I169 J 6717 IambIZ1777TIS lSpnewmoniee DNA tor insertion sequence 151318 321 bol 1 353 45f3 1 o 2004231248 23 Nov 2004 TABLE I S. pneueonia* Coding regions Containing hnoan sequences Cont I e O tn ar, StOP m satch I march gene name ipercent) lISP tc GOP 'nc 10 It) i Intl I ntl aCeasion I i ent j Leng th length -4 S- I l pi; 9133 93 IeabtlI SIs IS.pneu4Monise 4) A tor inserti *on aeuec 11302 4964 bp) 95 1 IG 402 4 I 14 169 j 947 Zeb;"O;I Speuoie cAgne en Open reading (rame** too Lo 115i 4 14 5 J U190 7555 IembIZ6lO0iISP9I jB.pneumonas pcpA gene end open reading Items* 1 '9 366 1 366 I I6 S9 7601 IfbZlJIPSI~pesna i~o Insert$on sequence 31 1 12 bpi 1 17 1 43) 1 45 I 1~ Il 44$62 023 3b11211Ipemn DNA 1risrinsqec 51391 1966 bpl 1 95 3 61) 1 402 I 4 I'l 859 365 ebtZOl Is.pneuxonas pepA gene end open reading troae 100 Let 196 I 4, 1 1 40 617 IobIL39aitI Iltreptococcus pneumciee n'cyuvete oxidase ispxul gene._completes cda 1 99 1 1714 1 1794 49 I 3 1 331 2 603 Igb1L2016II Istraptococcus pnuuaonlee tsp7 gene partial cdi too 3276 2111 I-241--21-gbUAOl Streoptocac cus -pea ne -a 1 -$-SZdextren, 91UC_, g iuc od e gene, e nd-insrto 1 A91'art-' I~ )5I I I I Iequance 3S1202 transposessi gene, complete Crie I I I I I j 12566 Z40 j embjZltSjSIP5 5.pneumoniaa damb.elA~...~%GhlJs genes. d1TPP-rhsvwtoae1014 n II 1;4~ 131§ 1 mba SIts iayntheis genes end ellA geneI I I I 23 0, j2994 ImtI~i$sz Ssmna.0 d gene fisolae I ti 140 1 540 1- t ItI 1164 1 990P Iembjtl6OIIPNA2., 1treptococcue Pneueonlae ella gene I sit 29(5 I 1 2 1 nlq: Igbli211719 5.9onanone muiemach repair protein IexAl gene, complete cds o o I200 1 2717 43 I2 13) 2 613 1gbI11191 Is-pneuaontse mismatch repair protein heoAl gene, complete cda I 99 2110 2319 I 3 13 I2551 22 IgIl7S .1 ISpneumonise iaeIc; h repair protein lheAl ge ne, -complete cdi 99 242) 1 3 216 466 1 T129I ISonenonise~ mismatch repeir protein IhasAI gene, complete cds 1 %5 1 9 I 5 1 1 -1-6 4- 4 67 12 1 1143 417 IgbIloiflI Scraptococcue pneuaonIa hysiuronldase gone, complete cdi 1 99 1 2936 1 991 1 gb I 1 70 obN4401 ISpnesaoniee OpnIgeerio ncdn acnd dpnn, complete Cde 9 0 4 3 0 2 349 1140 IobIM14340I I5.pnomoniae tDpnt gene region encoding dpnC end dpn0, complete cds 100 443 403I
C
1 5 1 2di 91310 1g1i431 Spne:aoniw OMn gene region encoding dpnn, dpnk&, dpiie. Complete cdi 96 462 I)119 C 191 4316 1gb14042341 jS.pneumonias eeodeomyribonucies eogts comleecdi I 9 1 e81 5 92 2004231248 23 Nov 2004 TAHLE I S. pneuonnia Coding region: containing known sequences IF a tart Stop match match g4ne naa pe HSP nt e nt 1 ID Intl j t i CuGn Idnt leAn le n 'I ma :dCrnr I tph Imgth 0 1) 108 97 bIL3OS62I Strptococcus pneuwoniae CopS gene. pertici cd 1 91 24 167 1 .1 i 1 12 1;96 126341 mbIx4iI0aISPO ISpneueonlee macA-Desx 16---4 4 I- 2 4615007 55 IS.pneusonlse 153221 genes for Aps:. Subunit. AlPaca b subunit and A.7'Pe&* I 7 4 IC Io Iu I Imz 5I SP i c suunit I 12 I I Im Xl 360 bI ISPBO S. pneuon isa usi* A-box 1 I 1z 139 I 3 11 36' f7 1b13044791 1S.pnaUshonlee 0541 poiymecae I IpolAJ gene. complete ede 9 9 I 60 Jill 36 7I 1 48.1 5379 IP hIM36L80I StrepocaccuA pneolnl ao trann pasaso, icaff-A nd cowill d SAICAR ynth Ise is 38111 516 4 I p4iC) genes, complete cds IOSIA vtaee 9 1 6 p15 ebbIzI]2flSIPzs Sopneumonle deem. capLI.1.CDJ I genes. dilIP-rhamnose 93 624 1 62 1 biosynthesis gones Ind ellA guns 71 41 334( 231) IembiZ3ll ISel ISpneumonIL dam. cpt IA, .CD. IJ.xI gns. dftol-hnvlsose b I ci) blosyntheals genes Ind 611A gens I 18 1I09 1 eib151724l91Spe6 IS.neuaonle (161 clao/clss/ genes i '1 139 ;0 S _;u II )I II~s i~i I ~br~~~c~lrpr n onll h 1I t lnicr cl rm/eia$4 genes-~ 2 12- 1 2 13 11436 IgbltlSUl lstreptococcus pneumonia. ribnucl epase I lmnhil gne. complate cd 1 91 1 951 9691 82 1 3 136 122704 iebIUtlqlI IStrs occcus pneumonlia ribonuclease Il (rnh ai gene, complete ode I 980 1 32%9 I 5 I S 312 I 110 Ienblt77727IS P15 lS.pneomaonia 0541 bor Inerln sequence 11118 623 bol I '7o 1 2Oi19 110 466 I 6131 I ;GbIUih eI Straptoco cus pneumonl ae trsnspibo a (csA and emOl ad SAICt Cyfl tse I 1 2390 I 190 I I i I purCi genes, complet cdi II I 1 I B) Sai I1 5) IubI~3SlOOI irptococcus pnuMnlae trnposase, Icoe nd ComI and SCICAR ynhcsa i s Ite J I11 I I I II I b O IpurC gone$ g complte cdi I ii I11 1 1 51 1 1 ;S::aD tococcuS Dpeumon ae S -NrelCCAR synthetase 1 n, o~p~r 1 107 3715 15090112 1 gbIlO I IStreptecoccus pneuronlo be npa eeyIoxosamenld ol~ aed Lc g snt, co eplete J1 S65 I 3~ 3 1227p 12350S1) IblUf3I I Streptococcus pfleuoniak a -I-cteeosmn~ 5 istrili ee complete 99 3126 so )I I I I 2004231248 23 Nov 2004 TABLE I S. pocumonlie Coding regions containing known sequenes 4 tig InP tr I#1 S &c csio mac gnrecnt HSe It OR0n HO iso I~ddkntl iit eslnIdn lngh ength ~2 ohLS3Jl trptcocu Pl*C~lbe betA-?J-aCeerlheuose.inrsaue [&trill gene. complete 19 1 7021 ISveuenleOcli C*lA*C..,.agjjjgne.. dTOP-riu anneus 451 J 33 ub~I3I~ i~uIb1oaynthea13 gene. end ellA gene "1 11 SI9 S31 510 196 6l9 423 636 I i 5 j 357 3531 ighIH3nlsoI 5trepr~coccu# pneueonlaw tiransposaue. IcomA and camel end SAYCARsytea 9 53 I I I I lpurC) genes, complete eda Inets j 94 55 j 1 144; 4269 1 gb1441618OJ Istreptococcum pqeumonia. transpoase, Iconk end comll aQd LAICAR evntibetoe s"o 9* I 18714 I7L1103 IobIM2IIIOI10 Streptococcus pneumonia* trensposase, IcamA and cashl and skICAR eocnhetaze 9-1 2t11 216 I~ 1 i pusC) genes. comtplete cds I [IO We11 1 eyazS.ssz ~nth.::s e::snl.n:ie dO-hsna I I 31 b 10 I 1 4345 1emb136360IS~jPao IS.041t4$SnfLe Moek-*pa 89 1 31 1 1 10 5 70 1bl413 itptceu pneumonia* peptlde tethlonini 104 Coxidi ind11ct.,. Imorhi end I 9j 14 Io I I homoserIne kint hosolog ILire) genes, complete eds I 4 i3 tI2 1 1 1 -Po IabiIJ13flISPB $,pneumonia& doe. apascocrcgiatigene*, dTOQ-rh~mofe 9) 53 k I I I I I I boufynthaela genu. and allA gone I I~ J 12 913 1 7 i73 7 IemIIlseAa ISceoptecoccus pneumonla *uul locus conferring tmlnopteuln ruescstnc. 1 n9 1 e 1 9 9 3 I 2794 1 1712 Jambi IV73J7ISPAN IStreptoCoccug4 pneumonia* ant locus conferring aminoptarin resistance 1 99 10) Oil) is 4 113; 1 275 Iert13173)713SPAM Istraptococcua pnewnoniae Afl locus conferring amlnoptecln reistance LO 10 35 j 45 4- 4- 4- S $14 I 1 u 714 IeibIXllI37ISPaLm Iscreptococcus oneumonlae ami locusc conferring amlnoprerin res1ljanCe 100 1 5)4 1536 9 i 6 I 122 3277 IeebIll1ilea Ilcreptoceccui pniurooniae amiloIcur contoerring emlnopterin resistance I 190 I 31 im I3 I I1 1P 14 1 em12suils 901 I 4 11 Iobx4a~ SE IS~ rneumsnlae epuhA i an d dA genes lor 7 kDa protein and) membrane 99 2i 1 liii 301 j liE' 185 l7 I5 151 PNI p emna spuN and endA genes ioc I li~e pooton and m brane j 129002* I- 10,3O lebX41 1 5 j pneumonia. *PvA end endA Venes for I kna protein Anid ruembrane I o a0 1622 I I I I II endonucleage .s a 102 1 54 $041 IembIi§$lSPt Ireprcne~w pnaunoreea sodA gene 1 100 396 S tG 1104 1 1347 1556 IembIl7727ISrTS IS.pnoumonlac DNA% for Insertion sequience ISMS1 122 bpli I a 206 310 2004231248 23 Nov 2004 TABLE I 4. pnmonia. Coding regions containing known sequences Contig 1011F Sctt ttch -etch Ieeo. hr I Ipercentj NSKF ORF t I10 iD Incl Intl I acoslon geneIdent Lngth lngth JosI S0 J 5 6 51 6024 1emb1671?39j19 IS.pneuorolse peaC. part and Lean3poeese genes and unknown or( 96 3 1 354 I _1 I 0$ I 6 CI 506 30 9 5 je*bZ7l7lIPPA IS.rneuwonlae parC. pert and ttrnspPoaSe genes and unknown ott I 9 64 712 a la 1 1880 IeMblX1603JIPPE jS.pnetionitea 0*4 gmnS I 91 72 906 it 107 5 2113 I498; IbIXLl602719P'E .pneumonla. panA gene 1 99 j1652 201 j 91 j 10?3 45IImlLOI~eI~~vmnr ~rs~.r s lo~ r I1? 1 5595 bi 13 161S PA SItr ptoto;u V Mae eanA gone for p I Milln binding protein 30 11 11 I I lacking H-tat. tponicillin e~sistant strinr9L 10761 0 I I M0Ei I mb IeebZl6l139SIPPA lS.pnumonisa parC. part end transpose.. genes and unknoo ccl I 5 291 I I is n1 line;O 120,22 IeenbIZ61139IISPP IS.pneueonlie parC. prt end transposase genes and unkon o I 9' I 191 31 Ia 20 3 lr 274 2243 lemb~l~~l2lISt S.pneuoniee CNN* lot inrln mqunce 25231 (546 bp I 96 5 2 811 209 4 364 2855 lebrl 1esu S.pneumoniae DeJA for Insertion sequence 52181 (3312 bi 1 96 1 $is I; 09 I 5 2642 I 3269 Inbajt2127S1IS.pnainiae DNA tsr Inrtin sequence igI329 (131 bpi I 6 1 It) I 16V 09 6 5)2 I 3594 10b11812S72 I.pnewonla iMAtfor sreirron srence IS118 ene coplet c j 1 97 1 353 i 313 jlje I 1bl341101) Sregoocua pneun.s traneporie.. (comA and comep and 5AW~R Cvnthetese 4 95 I 2(9 I 4;; I I (purC i mn9n coplete cs I I I I istr13 120 1 9781nIu8132sIet.61n994001go.A(ISmpnandmoniee deckSgeneRendnOA Ic j9 429 j 121j II 1I 10 198 IeebIxS9EoosrPA I5.pneusonse dock gene end OPF 99 1116 1226 13 1 1 Of 5 9 OA rX~(0 1SrL 5 atounlae drcA gone and ORF 1 99 Md. I 11 r -0p,_t 0 c-a u a I- and-'SA ICAR s t -a-5 41_j 5-0 -1 0 0 Imbl126781 Sreptomocus pneumnia. transpoese icond cl aid iCA rn I 1 40 I 210 I 1 ev 7 I I IS(p our s e g n co e te c I 9 2I 1 I 215 I2 11)0311092 Ig~trOQ~lI IStreptococcus pneumonia. 55? deetran, (icosidase gene end SIAserto 5 32 e I~ II r19) 1 )1 ~I sequenlrce in gene, omplete cs I I I I1 1i 896 I9340 1ebj71t1715A SrI.pnunonie renA gene h so p 2103 5406 F i and- g- 2 4311 1 325 leebItol29II9fl IS~rpneuonl neatA semmnlo iea shc prti 22(nX ee cmit d 1184 I l I 14321 199 lgIH320 Istrptococcu pneumonlee ranspoase, Icom end o ene and SICR syntloetes I3 *1 I 9 4 S ctraptococcu pnediionie heat shock protein 10 Idnati gene complte di I 122 2 3311 gb U1- and PneJ Cdr flJ gne, partial cdi I- I 2 213 41 2531 gb1U127301 Srpocolccus pnewsaonhee het hlck prten 70 idnsSI gene, coplete cds 8 9 I 1942 I 11 I I I I I eand On,) Idnaji gne, pertletI cd.
322 8 3055 5561 1b1U54041 streptococcus prenonlee1 Ki deetren olucaldese gene an Insertion 4 453 1 52 I1 I I I I I cewernce 1S3302 trnposca. gen, comlte cd 4- 2004231248 23 Nov 2004 TABLE I S. pnaeoniae Coding region$ Containing known sequences Coat sq oar St ar to cu Imacgeeae 2I SG op In m a ch match g naLI percent liSP at Ir n I D I I 1 In(V n gelon coplt cn h I I ,]Is I' 1249 ju104 eItflllll~lSIsI'zs IS1pneumonua. det. Capi iA.DtP.E. F.G.IHI.JgI genes. dTOPrlavsnoee I 5I bloynchesip gone. end &II gene Ie ill I I 4 lemblYOllSIuI j.pnuaoniee epA gene i 1 20) I 1 ttrpococcus pneumonlee choline binding protein A IbaPi gene. partial cd, a6 I m 1 1 I 2 140 8f IeaIYOIISPrt Is.pneU~Onlae apik gene iI 36 324 3 I1l 4 35 21 IghIAFo10 IrePtococcus pnoumoniac chollne binding protein A (cbpil gene. parilia cdt 1 9 2u 1ll I i 91 194 qIU jj56I 5tropzoCoCCu:,pneursnia. P2) glycerol-l-phosphe dehydroeriece binID 9~ I 2657 9II gene p&rtil d. end glycrol uptake tacilleLor igipPI and Ocr) genes.
1 1 ~coe'oiets cdi IVI 9 44 10421 IgbIUi25L5I Streptocccus pneumoniae P11 glycvpol-l-phosphata dehydrogeneec (I llpol 9 SIG I Ige, partial cdi. end glycerol uptake (laclitator igipFI and 0511 Inus.
I I I I 1 1 coplete cdi lit to 1010'. 11122 1 U 12 S 6 71 Strmplaesccu: pi~mmonI a Pll glycrol--pho sph~ ta dohydreagens so too 3 16 )is 1gn, partil c4, end glycerol uptake Cacilicacr6 igipri and 013 genes. 5M l~ 1222 I cmlee pn'moniae, tps1 giycapul ar-ph loaat dhyrogee gpa Isnh56J 2 l 1 I 1 I 6b10)1)91 joperan. icpilli~ACDgFGHiaXuoaaO gnee. complete cdi, and ili1 en, II partial eda Iii l~ 1 85 115 IebllJISPZI IS.nuniee dec, Crpi.C.ii,F. C, i. 2, J,i~ ger~nt, dTP- rherngiose 94 114 I l genes and lii gone I 1 I I 31 Its 871 161 obIZAl33Ssrza I .Dnauaonlae deags. cpi. C. ttcnrau gelis. dTDP-rlamoe 5 j 19 1 1, I II bioaynthel genes ard &IA gene I I1 I11 1 I *22b 93,7 IebILI172GISPZ I.pneuaonlia DNA tor insertion e"quence ISiLLI (131 Lpi 96 f46 1 4 u I 1 117 I 9645 JioSi IenbItlhl27ISFiS IS.pneauuonie DNA (or insertion nequence 11311 1821 bpi 9I 5~it]92 41 I1 lIt jiG~ I 155511202 IemabIlliOZISrbo I.pnuniaer wen-Bano I o 96 22 1 Il I 15)1 jmbItubl ;se Isr Isrpococus pnshieonlee incA gene I 9 I 3I J 197i 141 120171 IemblISISISI Itreptooccus pnaaonlee n a a isA gene
I
4r 14 Ii ju72 16 Imblt4SSUIpk ISrptooccus pnewsns eati gene m ~p~o d o1 r r I I I2 3 fl; Ol 1 10 I IttrpQt cco pr mnlaM utO? protein gn, cgnple e 1 21 71 r -1 1 4 6 1 a i1 1 1 1 130 102 IsbIfIOlISI Streptococcus paosionie uvs402 protein gene, complete cda j 5 1 11 ioel 2004231248 23 Nov 2004 TABLE I S. piunonlae Coding region; containing keicwn sequenres Contig RF Strtc I Stop I mth I match gene name percents egg ORP nt.
5I1 I Intl I r ntl ecuglon Idpetnt l enth la 7n 142 j S 202 115 lbIn)IgI I9rptococcug pneumonia. oustil prten gen, Coltc di I 10 I 15 576 i I e fiblZ35I)SISP AL JS.pneuoniao aliA gone for alik-1ik gene A 97 1as 19 34 71ii 1194 1g1jL205S61 Istreptococcue pneumonia@ p~pA gene, partial cds I It talk11 1524 zi I7599 ieabjZal21ojSPaz js.pneupeongee doe, cepah. cepn8 and cep)C genes and arts 99 101 1 5313 15 1 J0I90osfl I S:Ptococcs owlumonige penticillin-binding protein peonAl gene. com~plete I 19 j~ 9522 gbIN7Zl I Streptococcus pneumona. penicilTin-binding protein iee co pete '1 312 1 1d gene.8 1, J93 91 1167 146 I S 15 4 Imbll ZOO2 SP 8 1s.pneueonie pcpS and prpC gen ae I s I 1is6 I 56 4 146:mr13cjP813pomnbepp n pC gonas 1 ji 1 55 1 m I "Z 0' 1. I- t tn IebilSZOO31S1 n I.PnCuofli PCPS end pcpC genes( I15 uoasr n g ene 96oo trlgnecmlt d 1 401 S.pneusonile wig A- pit t en es 1110 mb-- X6- -60Zj5- -a0 94 1 195 1 2 4 IS 1 04 62 j b Mo0 srpodP n eld e tr sp a gcm n amladSiCg sy e and 6 I II ul g ep e pe o na pep I ds 1 t 17bo26ngSA hSoegeern. a se 61 o ibe genes; ae uu. m plet e d ubni anAl aed 6 16 I I14 10 lmb1662 8 Ispuu nia mss' 1 9400 1 "1 1 246 1 1 1101I 9Z 88ISAIobIliI 1gnsfStreptococcub pnenmoni---t-u-spo--se---- -and-co-- n- -e 5---1 I I i1 genes frlete subdnit Ih s b I sb iIa tSo 160 1 I 131 1 3 1 34 emtbjz2essiIPAT 1cnuot i P122 gene Cor ATase asubunit A Tln e b suuit end A Te c? j 90 6 720 I 1 I1 I ,01 146I emllb IZZSS1jp I SIR IO nla (1222 genesll forl d AT. alol s ubunit, A ras I ubunit end~ ATnas I 95 561 f SQI~ s I I I ubunit 1 1 1 I I 360 1312 1942 jue 12 S is r IS~neuoniae 6 l Acmigns9 1i22 4- 4 4 1 11 j7 I eebtxSllsea JS.pneuoniae orfigyrh end gyro gene encoding DNA givoe 11 sbunit I 1 4 a7 93I 6 Iasb9xinwl7Isv ISpneai orfigyrO and gyro gene encoding DNA gyrase b subunit j go 118 11 143 1 2 2155 bi 30 igb OStj IStre tp coeccs pneumo nia I s 5 ge x ne, a pe r CO c s 1 3?)v 2154 I- 2004231248 23 Nov 2004 TABLE I S. pneumonia@ Coding reOlons Containing kno sIQUegf~ I Con ig F Start Stop I match I match gene name Ipercentl lISP it car nt.
I D D IIntl i ntl F cahlion I Ian gth I t F 4- 1 I I I I IsIJiis MaPltese,: ma es gand a, n encoding p length length 1 mail end mall genes encoding membrane protein and I to 40 Z9 I~ 1 I 6 mylomaltase complete eda, end mnalt gene encoding piioepncryiaue I f isf i6~ jI j 74 14 lembIYI4631sac IStreprococcua pneumonia. dned, rpoD. cpch genes end OMn and ORF! too I )5IS 17s- 4 I4 I 32.1 1 1)2 Ie bIYl 3ta sp 01n IStreptococcu s pneumonia. doac, rpol cpoh genes and OAF) and oars 1 9) 963 I lies 4- 1 1;Q IO 7 3 Iem b Izt1 s4 I l trap ococcue pneumoniee dcCAa p O o pro 1e e n R I a d O F 9 1 I 6 ses 7) I 361 4 1 1144 18429 IembIZliS 5lISP.o IstrepocoCeue pnemvnlae eeCCA opron I 1 05 *71 -4 b I a S 67 a 4 14 1 43 Iemb ISSiISP k Isr pc oe us rncumonIae adoCtA operon 9 0 7 -3 03! 2258 1120 Sol IStreptococcus pneoinlne ta gene, partial cds I 9 2a5*e9-- I 1~I736 165$ IemhIll73Ijsrs I5.vneweoniae DNAt for Insertion aequence 1513108 (t1ll bp) 95 31 1 46 4- 17 3 462 J e 493 1blu145 Streotocoe e pna.aeonies format., &ctlrnfrs lexp'fi gefli, parflC it-,14 2520 I I i 1 pri
I
I'7b 6 1- 0 te1oocu pneumonia*eumnpoae. ICook end coeti and SAICAS erntheiage 69 31 i s 14 1S8)1 362) ImbIZ4lllO1spos IS.vneuaonlae doel CapIA, cipID and Cap3r genes end orie 1 91. es sill2 2s 9 4 S' I 391 1 26 IaabjIzlllsp I;A S.vnaumoinie peiC, part end transpose;, genes and kunknom art to10 57) 3005 0476I 1 425 ImzllsrIS.pneumoniae parC. pert and trenapoese genes and unkoo-i ott I 423 423 42 0- 1-4 II I 2 t emb~J13ISPtt Ig.pnnwmoniae deali, genesa rTcPa-rhoit,1,aej -4 I 1 bloeyncheals genes and ellA gene gns tFrana S 29 3i Is38 1 300 l ;e 55 IembIl 5ilsplv IS~pneumonioe gyfA gene I 9 al1 1230 a 10d idapat gene.) gepne. paria 99 2 I8 terccu nuolebe hc rten~ de)gnCapeers 49 0 end Onel idaaeJl gene. partial cds 4 2004231248 23 Nov 2004 TA B LE I S. pneumnIa. Coding regions containing known asquenco.
Ooa I rtIrt stop match watch gene namee 1( I kD Int Intl &cession I Ir~nn( Hengt n i proteide t long t 9' 0111 SS; gb Ilbl 727201 Straptococcu nOY i.n)a heat shack;protein 70 ldn51 gene, complete cd. 99 ]isI 16 i i I and'DOAJ Idn&Jl -qane, partial cdq I1 I- 1 4 9 I 1 0 12 U 2 0 1 j 5 t r e~ t o s p nenond hde a p a t l p r o t e n 10 Id ne i g en e c o m p l et e c d s I I and DnaJ Wei gone, partial cds 9 Streptococcus pneumonlas transp asewe IcowA and coIFO and SAICAR synthetASR I 1 281 )19 1 1 IpurCI genes, complete cds c I I I Ibiasynthesis genes end ellA gene 1 I1 I 1 1 2821II Ismbltl)Ijsnh~l I.pnh'anlse deig. Capt C. .cIi 1,51 genes. d~tiP-hanoae~".- I I Ibiosyntheula genes arnd elfA gene 1 I Io 5 I II I 1 I 1 ehIbtal))sISPl I.pnegmoniee deal, cpt 15.C II. t. J ne n.i dt35- rhamnosa61I 66I 0 1 1 1 7 b l oy n ps cl e g n e s e e a g n p r i a 1 4 1204 1 lid I I IgbILiGj lIi IStraptococcue pnumnlaq expil gnt. complete cds. race ona. 5. end I 9 1143 ii) 2 1 e 2rp cccus pnurIona peumO cc l surfa cu p rote Psp. lp spf en e. 90 Ile t III r i i I I I 1 1 Ub I3 6 11 ls LpnowmaniaoI completesequence DNA ga 4 1 1
I
g jSb1Z .3eunenlae dems, cgpl..C.rIO.Cn P-rhuiu.aeps I 14 1 41 iII biOsynthesIs genes and eILA gene 9 216 1 1 1 jeI2S)~f~J 1Is) z SZ~pneumspiIse deeD, CSpII.,C.D.E.'GI1.I,.3K) genes.r drlP-ll,anupsea j bicaynltheslg gaee~ end ellA. gen I 22 1t 3 I250 I Ma2 19bln26 6 1 S 15 IS cpne uenia e r mo e Squ gen e 99A 1 ion 102 1 22 1 I 411 4 IMb3z3fl0SnaBpnhwuoniaedaacrgo cmp e coroc.I gen5 dTp-ha 4 5 t 9a al I 39 22 I21 26 d) IebllzD~sl j Istptococcug prnoniae Idgne 1 99 I102 j 129 17 I I 90 IgbJNll~ri2II Ilpneimntae (eat, gene. C..F C1. ad: Irrr 95 I 464 604 24 3 C2 40 gI)10 Scaeptococcus pneueionIae trenaposase. Icons end cem6) and SAICAA syntherage 94 I7S I tpurc) genes, complete cds
II
1 biosynthnesis gnes end ais gena I I i) 1 3626 13 IgblM6IIac Istreptecoccua pnemoniee transposse. (cons and .111 and SAICAP syntherSe 99 I I I I I i~1 purel genes, complete cdi 9 J 36 9 S--4 zS1 1 5 1238 1 20S Inhbi)2m515P16 5.pnaunonias doe, caplIAS.C,9 F.0.c.r I.J.I genes, dlCP-rhamlnoes 95 1 420 01) biosynhoisu genes end slI gene4 I 2004231248 23 Nov 2004 TABLE 1 2. pneweonias coding regions containng known sequnces iontg jO r In In tItP a ac h match gene rms P;I~enLI 1,15 n 1 0" P M. I £00 Iembi)l(nlO lssna I Percent lISPn 01 On nt I 11a-r- b Z d t hh 201K genes5), dToP-rhasnoae 'ol a bloaynthesls genes and aIlA gene 155 j 1 I 9 1I IeehjlolDDspzg IS.pnuonaee acg nd pcpC genes "9 I 1044 151 1 I 29.3 3969 IeubI&I7IIISpP ISpneinni pC rt a~Ld arnp nd nnv t I 1 92 Sil 1 -9orolso pcS and CIP-ene 1 41 1 01 9 0 r ebpNm I I c c uc s n u n l s a l l o c u s Ic n e e n d Cs Hi L d S r C Rn r ensi stt n e 9 6 3 t I Ip o pncl ug s e complete co1l 41591204 qbJUMSGJ Streptococcs pnuaeonlae dihydropteroste synthese lauAl. dihydlrocolecs 9) 15 synthetis Iaulli, gumnoelpe triphoaphat cyclohydolase (eulc. aldolase- PI/Ophosphoklnese Ieulfl genes, complete e 1191 11)) I 1trptococcuS pnsumonlae dilydropterose synthese iaulA), dlhodrloiets 1)41 14--: syrothatess Isuill, quanosint triphosphato cyclhydrol ass (sulcl. do~l.
Pyrophosphokinase Iula genes. complete cdI 1 1 1 3601 IgbI 6 5r Streptoeccu s pcuaonls dlhydrptrobce s3ynthese lulAp, dlhydro(omare 90 114L synchetse Isulil, gumnolne triphoeptate cycLohvdrOlsee isuICI, a dolese- Pyrophophokmnase (su10 genes, cmplete cd4 14 9 gbIU16lSI j5trepocc:cu pcnjag dlhydroPrerosce synhese IsulA4, dlhrdllete 55 14£ 574 cynthotme ull, guanouine trlp0list cyclohydrolese (adO, eldulai- PytOphosphoklnaue IauO) genes. complete cde 167 4S 64 949 oblU6151 StrapLCO Ccu S Pn~unon1;a dibydropterosts orimhase (sijih). 41h-olclat 91 148 146 Iynchease lull), guenoslne trlphophlts cyclohydrolese isulICi, eldo:Les- ProphosphLIsxln e IulDi genes. complete cda 554 J 14 jblI1Sl IStreptococcue oeumonlas dihydrOPteOVAte Synhase fulA. dlhydrolo 4 -1192 1 mon--- s- 21 11 562 304 Lgjh96i61 lS~pneumoonios miameotch repair IhesilB gene, complete cd, 1 9) 1 1l0 1 1 blUD40u 71 Streptococcus pneuionlia 550 deecgra l.co4ldese gene and inserton 5O 410 I9 1 1 J 3 U4I r srusposese gins, complete cds I 410 2 1 1 2 1 0 05 1 5 2 5 l eo b l f11 1 1 S s I S.p n u q o nl e e de a kl Ic Jp9l l r c i Co n a l. 4 T O P -hm,;S 9 I 1 2 0 5 7 4 1 II~~ I I1asoenas genes end aliA gone
II
2L 15 abM SIM SPemrlsdexl. cepIJA,a,,. .caou .raei genes. dorjopha.moos s o 20241 231 Ir-- I aol Ii-- 1 embllg)Ifls r ru n pf flos 1Oi genes ad Ia gene I I l 4 1314 1 l099 IgJHI3eLtOI Streptococcus poumonlee traneposae. CcoeA and cotta) and SAICAR svnclwtase 0) 264 31 I) 1 10 I MIIIIS~ F1 ll .nl JI.nI Sh~. I ~I Il 1 I IipurCi genes, complete 4de 1 I I 9 2004231248 23 Nov 2004 TABLE 1 S. pneueoniae Coding regions containing known sequenceS ConU Jer Sart stop j match match gent flIhtian egt tnti ab S n d Iaana ahrt 29] j jI 261 Ij 774c1SPgy jS-pneuoniA4 Oyrf gen, and rnlknoun or( 1 98) I 1671 -11 -f1 a-blz47 oIfC~raSPnsp.pnoeaenlO ga&A Pi pnes and ori 511 99 29)124 45 emiIzins152un..
t g. genes 46 1leS blzs7o)7pjSP ISn.pneone siae oat pen ran g Cse*g nd u I t 1 6942 64 hoecierie Lines.hoiscio bllrynIIrduts g nd lnd gon
I
63 M le IZO lhS)I spn I S.neumoniae paCp part and openspose gedne, oad u 1r j3 1 4I 2 -t i1- j.1 6 bi SIS S.pneumwniae dexm, gpeI. m.c.0.Era gone, dlDP-rhaeioIs it '4 I bloaynthe.1 genes and eliA gentn I 262 I 362 1 I 54 I5 bltiueelr I IStrep ococcus pneumonia. ss tdets'an bionine ul toxida rgiu oui ea gene) and 7 onM I I I I I au ne 9ln ran hapolo (ehril gen e pit cdi I l 3 1 6 i 1I O N I9 b t x s l s I SK S p neu m on ia e d pI2I2 3 A. g s ri 4 C rps ub w ns d E c c b ub u n i t a pn*'Q----n~iaep~rC,'art-an -tronsosave-ents-ad-unknwn-ar 1 95 1 4)5 19 r- 340 4 116 160 lmbl msis a S.neumnla dG., genesp~. lJ~]0 dTDP-cha-m~ ~~ose 94 $71 1111 11 lemblZ831351Pll 1S6p:: iow: ontae domi). C&PIIA, fH, IJ.XI gonna. dTOPrh aanl s VS 63 11 11GO 126 IgbJU04 IPLI PI Istre tecoccus p neumoniam SZ dexten lucoldese gone and nzsrtloa 96 441 44L 1-I nce iS102 tr an~nposase gene. Complete d I 111 111 I~mbjx6S767jSPC S.pneumonla dome, cpsliA. cipsl(. cPS3C. cpajlf) eps)4. cpsl. cipal(C 94 523 Cpstil. cPS141, CPS14j, epallll. CD31L, tasA 9*-*s 2004231248 23 Nov 2004 TABLE 2 S. pnoumenlie PvtatLys coding regions oi novel prtlcns similar to known protein; ICondg omRF Sl~et Stop mat ch Imath gre nr a In Inh I ID (rU Int) I cossion I C fnm ~Iit lnt 4 4-- S22 2 160 j 1942 IeIr ~l I $ranslotlon elongation factor ty Streptococca oralls 100 too In I19 3 I 20S 1416921 Ineomycln phoephotransfrcse [Cloning vector PSLi91 Itoo ID 1 204 4260 1 I J Ivrrrilcl Irusision slongetion factor Tu Stroptacocea oalls 99 1 11)7 I 14 -I2 1394 I0@6I-91"- h- poh et-c pi I semo-phl u s iml tssnsasjP I S 96 109 2Ii 190 3l 2 ;0I10911 O-dependnt protesm proteolytc subunit IStroptoccus savsrlus5l I8 is j i l~s o~ Iiltrir i I1 n mo aonphOspb~te dehydrogen~rrse Ca~trotcoccus pyogonesi go 94 gal 136 2 1290 I 589 Ig-IiOSO 11OcZ gene product tunldsnti[sd cloning voctorl I 98 95 1 loo 1 ii I I 145 136 1v115155 phosvI$.-beta.D-gslarcoldage ICC 3.2.1 11 I Lsctococcuu ot Icroer IL s 1 9' 94 I 143 -4 -4 4 12 3 1044 I )1 191134799 lurecIl Pho.phorlbosyltranaltrss IStreptococcus ailivarlusl I 7 s 1 454 )2 8 65741 7456 ISPPIJIII;ERkS IGTP-OIN[NO POTCN EPA HOHOLOG. 96 $1 913 951 ~L 2741 pllll psphoeolpruvet; sUgs phosphoynnscrags sYtem enzyme I I5trptocccus 96 92 171Q 127 I 1 1 I 359 IgII5129, Ilnitlation factor IF-I ltctocoCcuo lactiS) 1 96 i 99 1 168 it1s jia 1101) 111154 19u1176111 5'3 D lassotEopoCCCLCtheruophilasl 1 9 1 7.
216 1 1 I1 594 11i 3456 lint rag ner v Cosggrgatlon-roei s t dheuin Itreptococcus gordonlil I g I 93 1 IA I319 2 its 1 4 1911208223 I heat-shock proteint 6/neoscyn phosphotransters fusion protein 1hsp82"neo[ 96 I 96 1 2 I i I I Ilunldentilted cloning vctorl l I I I 1 I I I 1 t1 I a622 110961 IgnhlPItdlIaot I ycuvats iormas-Iyass IStreptococcus matinsi 1 s s1 4L 2) Ilt~l~l illr~~lo~~lvn Id~l 15rposcu Iodsl 606 124 1g1416 enIacoocs a 2 6 a jM. 4 4 11 3410 1 -1-11110406 Ida [Strtocccus mactnsi I 911 a t 4 lm lK I ocsmtn lI P4 14 66 1 4- 1 9 i 731 19i171u41 Ithymldins kIngse IStrepococcua gordoni l 96 11 4j 145 I 0 I 64) 1 I t 1 9970 1UO#1gluoss Ipy SO Igu. rophos r yla s o 96 5 1 1 1 19 16 I 420 5548t 1I1113573so lIbl AI~eg lterOCOeccu Iseel It I 6 I 1 4596 311) [oil15)763 Ipismin eceptor streptoccu s pyogeneol 91 1 14 1 1084 'II I I 6204 IhIlO~idS IJotmsI-tetrahvrio~leteynthecaus IStreptococc tanul 91 j 6 36n4 4 2004231248 23 Nov 2004 TABLE 2 S. pneumoniae Putative coding regions of novel proteins slma r to known protetn.
t ID intl Intl I sm I "etno a name 11 4 4 S M oil;a a 15 0 I 470Si protein (9 -U1 Icl7lu s aubtlis I 1 1 53 12197 1gIg47342 Intitumr protein IStrPtOSCoccu pyoeneui 91 ii 124$ m ;39 1;09 IgnI1PIDI2 ILs Iribcsuual protein 57 Ilacll us .ubtlis I 91 1 16 390 I 6 1 1 3 6Z a It 109 AT4 6 I soo a pha prote n 91 I~tra cl s mutl s 93 is I6 I ;047 1109 1 1 3537 lo ;;ptid* a C 13treptococcus the mophilusl 1 93 9 1 11- I ISo 5 19 1 2 36 11 i t1 4 93 4 IAsPes, a cpha subunit 1Strp toccu a mutanl 9) 1 s 90 5sn 5 1 47 ;5 Is*Inopt. ldeam C iStretocc n Ih mp h o uys I 11 1 711 g 9 1 24 1 14 14 I oill9 Ilecb1~ Iletococue Iectlej I 93 j 5 1 jry p t h e; I I u s I 9 3s o 346 1 192 3 1 91129525i9 Irypophen synthae b eta t ubunIt tSynuchcyatlu p i iardo 1) jiti 4 4- l 3lla 3391 l 5926 g~j374i Ihvvrohtlcl I~seol rcsus inthuensee I 72 1 laS 111 $At 3 265 14 11495 Ilac [I cococcus leccisi j 2 1 03 j 70 O I n i rteoIcs II1 7 1 i I 7 I 1643 I 1040 Igi1iiS(io enuyae 15 lLaeoccu letlsi 1 92 63 a- -a 1 I I 5633 3937 IonlP IeII9on llbronecln d lne protln-J1le protain A retret'coccus gordonilj j 46 2 204 5462 liI~oaoi ignal eognit.on paricle C oi iirepoccus mutanl 1 911 4 193 10 1 I 4442 473 lp IiS1u51iSi Iriboianl protein 512 Bacillus stearotharmophl lua 91 Ij 3a A I 1 4I 16 0 I3 126871 Igroel, gene product ILaccococcu a IlI IS I91 2 j 1641 I I .s 1 il 1 2 ICp-ilta AS-dpendnt proean binding subunit lBo tauru[ I9i 79 2055 4 9 I 11070 92 1u11 53740 loucroac phosphoryisse ISetreocccus aYuransi 91 I 14 5429 I1114I0f Igl;lflh Innabrane protein IStreptococcus muterbI i1 hi 1425 IPilSOJ22IRSBS Irboooal protein L17 uaclilus *tearothermoplillue t1 31 1 j 9139 9390 Imili4lO6 3 hubst Ijacilius steerotharmophllual 11 i 91 up I I 1- 1501-- 117 I 76 I I~d I 4513 InhlPoldbOOd4 -ASTese beta subunit lEncaro cccus 1 0 I I 7 I) 1389 I 1 119 9714 11111563; Iplutalne mynheta type I Sreptococcus aglectlael I 36 i 01 1 119I 228 1112201991 idesran glucosidese PeAS lStreptococcua sull I 95 1 79 I s152m giJ15141 TP-bndingprotin ltreptcoccm ftlans 91 1 as 293 S di) 440 g1ik;99I unknown protein llnsertion sequence [5861i j i 7k a 32 7 6344 0570 IpiII lacgyceroi Aak s homolog Streptocccu swotans 1 0 1 1 I 405 2004231248 23 Nov 2004 TABLE 2 S. pnetjnogniee PatiCve Codlrno region. of nov41 ptolqine LI-flar to unoon proteins* Iotioa Str Stp( -etch Imatch geneno*%Sm dntlgh 1 22 2 41 J 17 I,11u149fl unknown protein (Insertion sequence is506110 10 23 48 ;0l;1 ~11 gnIPIjmZldlOI lafctate osidase (Strptooccus iftial I 90 1 so II~ is C1p pro e 1)211i s b il s 90 1 5 2361 I I 1 subt;I is,1 3 1aela il t cp ral a 60 71oror4cgt 13 (Q3 to--cu 2 901L133 i06) S4 4- 4 S166 4 7497 12 ;11110i "4 CK-O-pho I& uhcs bteriesm9 1 6 1 0 4 16 1i 4 04 16 9 19 1 6,1 0 highOI~ I a ffiniuv ty br n h d c a n a ino a id spo rtcu prtct i n I~rp o a go 16 j 36 I 391 1 l~s 166 1611415* 1At pryopen st h et sbn tL c tococcu ltti 1 90 6 3 131a 396 1 as 15 3 36 lI(252fl42 sntba. zIVl4O Cop %lcreptococcus *sutensj 1 so 11 11561 1 305 1 3 In 11jit1525 aeprgn synthetese A cnAl jIaM ophllue Intiusneae) IQ so f41i 4 1 6 a 7511 M 13 I pIr1A414141A454 (ribosoesi protein 1.55 Bacillus starothaermophiius I 111 16 21 3 I 19 I 6241 130242 oIlS5792 1recp paptide [Stretococcus pneuonor'iba $1 n1 1 Igo3 IL~43340 39441 IgI1303I1l (AtPiO-tructose 6-phoaphAte I1-phosphotrfans teas ae ctoccccjs iactlsi 1 9 at 10)6 4- I 1 57 1l 9666 30669 1vnio0180093) i20-tormInv NACIB OgidaaSt. trptoacoccos mut*;nsl 1 89 11 964 a- 4 3I; 41 IS 2 6 164 1giJ114l20 Igig Ilaecills subtillul I i9 al 36 I S1 1 86 4M 421 IpP14111I;RLI6.. 505s 111OSONAI PROTEIN 4ii, I 9 12 420f 61 Ii I 1ii 619 Ii11141417 I-ribosomel protein 55 I1aciialas staarotacrmophilus) 1 69 16 603 71 617 S 15(g135320 1pr ILlateria *onacytogenesI j 89 1 0 3 02) 4 4 cc.C 4 S 1 12I100 1g1781 stringent resns- like protein iSLreptoorus aquisilaI I 22S) 122 5) 715 (.hl0oeao9o unknAon I L tretoecue pneuaeoniee3 1 89 1 613 h13 2004231248 23 Nov 2004 TABLE 2 S. pnelmonloe Putative coding regions of noval proteins similar to knon protlni ontilos IOn Start Stop match match gin. h cc nt jio 1 OAF 1 3 1 aeeicgo In naeI Sim ident I length to Ito I f",I C.11 MI r'a all IStop .0 Sin I t) I 1 66t 4 1 i;14314 j-oxoprolyl pepid a IStreptococcus praoftnesi a If 466 105 3134 114~23 louaiveu Itactocecus Iscrisi r a I 1i g assi I 183 6 603 $751i 19i114411 enzyme III I"ctococcue Lctial I &I s I]Ito 23 4 114; 2793 &13373 Iacanopptdasu C Satreptococcus thermaphiLusi I 83 IS I i *1un 36 I 1 431 III 19111216922 unhnon protein lintrton aequence 15663 1 a, 7o g o I 1 119 li;s IapIl3n05I$_s JNIST1DL-TAflA SYNlHrrASs IC 6.1.1.11 1411OIN--AMA L1GASSI (HISS I S 7B 33t 3 II 3 I 1646 IM 62 o0 iputacive ABC transporter subqnlt Cones istreptococcrb gordonlil 8j 7 978 I 85 520 1 703 1 1500 Ifull I Ilbosns rotiLi ItshL suea25 ;;~j;DwI;4I PW~tveFdutait% cchbromyces corvislol so is 558 16 I"I 1136 11054 11207015 IIbooo421 putativ protein LtIS Iistreprcocs gaI as I 471 10365 1 1 0 12 Inl101431070 jI~AIpt vean eurcoi ne as.QCU i fCl tidlu tertiwa I 11 I 03 1100 1gOI1 hWAll~lllldlI 5ply -stedras betatoei Hsbtpi Istreptacoccus Ii 5 74 t- j 10 9062 1flhlP101e1161er t luIew [Clacillueum rtul subtiliel 11 3 3 106 1563 1wA- oi ym ris beta-subunitlammphi lu IubtLI l I I so 7 27 12 1 13096 12062 gnI 1 1*111466 j unkno-n Ilecllu ff subtillsi IIt 1 144 1 3B 55 sohP O~ 2 4 o Pat t o i asa tepto'occ 1Maa Iyn a at 3 162 146 I 3723r jlar 3683 s~ss 1911?17 phosphat trnpot systemdl ATP-lingrotein InahhiaIrusJanetii i I maI 6* II: 360 J 5 6276 1Ini77i$75 IAIt ate. o ncidoa (sbunit croocs utanhl 56 1 63 626 18 4 5 j 26phospha65 transportsyste mptP-b1pdlnt protecn Iocthocu o ccci jamosclII 6 g2 4 116 I III 9 5414 63)1 15327 lamiopepras b nist cS therhlcus rutais 1i6 I2; 1 I 560 9 0114016 Iho vouut to ILaoocopncu lactlpc I I 76 1 31 III 1 10 3 11 ;11535213 ominopeptp~l ase C Istreptococrus theranophlul I m74 1 i 1 11 57 i l 0 as p~ro~l roten 1.17 I0dcilluo sublllsj as70 37 u k :w r t l 99 Ilunkon protein Ilnrtion sequence 156611 I 49 613 20i7 33 5 n InllOOId z Idnlosuccino anrtrog IBtaillus subtital 1 go 1 75 1 1359 319 4 I 656 3VI 1a116026 Ierlniiihreonine Iimage irhytophchore capaIci) Is I 5 34 4044lase rprsor IStretooccu euten1 11fI 56 162 2004231248 23 Nov 2004 TABILE 2 5. pneumonia. Putative coding regons oi noosi proteins sIInAI g knon proteins eontlg ORF Start Stop ia In h I Intl I csln match gene namoe It s II ident f lnngth 49 nl nls~ 102 Ig a:ni ukwn protein flnsevtlon sequenc 153611 I 81 72 370 I-6;;ll 613 7039 Igi-it4i lriboso'#I protein SII r cillos sublillel 47 74 621 1 1 11i877422 jgelactoklneue I5tfeptococvus mutansi-3 a 40O 703 age I0 0I41iO1140 j 5 in ongaiion lector U isecillu& suttiel 1 5' 76 1 10 -4 4"'knowii protein Inertlon sequence 1S16 li 17 9 S I o liD 1 ,29D3I ~Igi lnlriO i Ipnprielr -r ynthetas m begse subunit 1illlus Subtillel I 87 1 14 1117 314 1.044 j 516 IgI13263S j gucose Inhilbited dlvis~on protein homokog GIOA ILactococcus lactic 6775 12 3 3 8 14 lduct highly similar to elongation lactor FF-C bacillus btis 91 1369 a r, Ia0 4 3091 2)89 Inl gzI I03 unknolm prte ince 54611 V 37 I 72 391 3'11 1 27 1 650 Ig iII71791 J iS rlbomamel protein iPqdjocoQCCu e ditjj O LicIl 1 87 1 1) 1 31 51 194 10 gIjl04a I Iribososeel proteIn So I ac~IlUus ubtilisi 0 7 7) 4' I 59G 12 1,9 ig Its 9I l st rel seh Ittreptococcus pneueoniej I 6 jC 68 1764 15 4 3 IS~ I201 IpirIAO;ll9IaAS9 Iriboomel protein Ui Bacillus stearochsraopiiius 1 66 77 1 84 S 1 I 09 113610 IlI44074 Iedenyiste kin44a ILectococcus lactial 1 86 1 I 1 651 4376 1 4 1 4656 1,11153709 Imnnitoi-opcIlic nayme Ilii IStreptococcus aLutanal I 86 1 161 esi Si 4 I 430 0 4956 Ieniln toS decerboylese iLectocoecus isctiu I 86 1 16 1 )I7 14.-16 34 I6 IN 7 t 660 IonilPZOlel37lSR lespartete trenscarbamyiuse lLactobecliius leic~tsnIIj *6 fie 91 I17 1 I 1~r 11)I I1111i 39i6 ls tacive abcaremlnte lpe in~laldu tetim 1 i 1 I 71 1] t1 1 01 4i7i I610 10flh1111S2283 FLNA-depondent RNA polyUmerese FStreptococcus pyogenesi I 6 1 d )723 I j 5 IGij66119) Ioilpoprotein diecyigiycscml r*Snterase IStrOPtococcus Utans)I 7 u 1 I 1 64' 1980 0g112)5sij Iglyceoal klnase ltenter ococus (escesiI 8 2 11 4 595 7 458) IgI 0) l~e ilast. sel tococcu isl anat hl 6 7-8 1i its 1 1 3 (l I l1) jI 6-phopho9Iuconete dehydrogsness Ictococcus lactiel 9 1 74 1 1434 2004231248 23 Nov 2004 TABLE 2 S. pneunonlas Putative coding regions of novel proteIns similar to known protein.
Contig IORF Store stop match I tch gene name I ID lID I Intl Intl scession IS ln I Iden I mngth a I 61 1 1675 164 1g1529 Unknown I~treovlgoccus sallvarlusl 4 I a 73 f 1 2 j I4~ iQ~26u IJAPdepn~L GlyCerldOrd.J-phophste dehvdrogn~s, isrerptcroccu. 86 13~ s mutes, 1 0 I 4;71 9 l115is1 l rnslational Incltation factor Ill itnterscoccue Ieeclun I 81 1 76 1 291) I 330 1 1A 17 I i l reg. aynthts A (cant? o phlus Inuentail 86 68 1 fa~c'ell-i 7_ 1 3 I 3909 11149909 1 ,a lll~ r(l Itr fo emspL..,.u Intisnse I as a tg- 4r.- 3s 1 245 1*3 11 610 4$ Iputetive ABC trnsporter subunit CS YB IStreptacocu s sordonill I 5 1s j2 l1i, I 1 1 1 37 Irl 3 IS 1,1120535 ICosse !Sretococcu gordonll I is 80 1~ 5 1 2 9 1 3 19 IgnlIPip0dioi z IrqiJ b cillus s btL ll35 7 I I, "19 ;i 62 14111 60 19 0 iL11531d6 Imannitol-phsoepha. dehydrogenese Istrepr.ococcu, sutanslj as 1 68 11 I ii III 11---.610 k5783 1g1114 Iphospholbosyl einolsidesole synthetese vP-_M leacillus aubilial I a I it 1 1404 ja' lK tpro tccu s m ontfo c eIj 4- S 128 I I 11of II v11 8I deh drogen crciohydroese Streptococcu thermoph liu s) J 11 8 76 I IR 1 211 4 IonhPIjdlo 114.. -PaSe alpha ubun l t o s h e 17 1 14 110 24 V 1 25 IP1 0Idl911 55 EES00 A )*t4IMiTNDS. subcIII Iease 61 1 1 9 (lI6 1 M synthetase jancillus, subtills] as 70 1421-~~r- (A601081rUCTIrIW4XJON SMIARPROUC 14 CCLI I, IFLENZI 4 I 10I ILid 1 1 i- 1 t 1'P 1 ;526 1117*14j4 IGptat i 20to protein La rtococcus ilacug 1 84 5 121 #9n -I I 5 I 126 Ia IIglriiiiO r-soiia Iltepococ f~lml uonile to2 3 I23 10 2 1 I( r0 13 0 YtIAtl v o yl e rmacf el yes 1ca tr *a i1. 6 1 1 9)] 1 ICT synthet as Ibe cillu s u btl I 8 9 2 I 13 1H 1-11 11111979 IGVP-b-nding protein- i-l-c-i-u s- bt- 1 4 17i 3441 InI2 iD e3 I ;Iputetive AM 1 a y n ra in at l oe u cti r d uc] r n 84 so 1 01 1471 1 3 12 9 5g1~a s Oil t ol is p ro Ia I m c I sub tili s)8l1 6 U IgnIIP,~-- l_,o.,dioosed_~~~~ Itoh bldiin protin ec Iflcus abtihs3~-~~ 13 I 3963""'- 121 12 1 1 20 1911299163 ea nlne dehydrogn aeg I acilu 1ubrl du I a. g *o I 2004231248 23 Nov 2004 TABLE 2t S. pneumonia. Putative coding regions of novel protein. sitiler to known protein.
Contig IORF Sttrt Stop Watch catch goe nae I Idnt 10 110 Int) I Intl II I I CnL ,n~r 10 1 730 ;6791 gn jrP Id 0196 fluu t.OIAn 29 13trOPEOCOCCO3 "Utin I S r- Igntj~~~lD~~d~~oa,4I IturknI Isrpoccu purns a
S
S3 0 15300 1911412114 jphnA protein llscnerschia coil 4 4 p 4- I 6 25S 2072 ;il 31 56hP bindi4 I protein IStjeayiea. Ilreordto cllc 1 8I 59 10 4735 7514 Igii)Slvsi Io-po, ph- guie ~r poceu tpnoumch oniae 1 4 74 19 1 1 1 1 $16 oil $105 ;QIt o, a tlt.O [tIepDCOCUS paponafl t 1 4 to
-I
42 23 4 44 luISee lOXrala; putative ILactoceccus, isetisl 6 4 I 6 6241 1 0 1 51 22e---i 4 .4 I 321 I IS' J II 1144 sac; IOaRni puttive ItacCo ctcu lctidl ti 4 4 5 44 5810 iS IiAPbsoII2 hoI pa Ss ynth Iacise euC btilis Ilnara 1 4 31121 j 192 Iail I4 1108r jgio4954 jtlperi e Ictoc ccus e cisiel I 16 A1 12431 it-; (43 1 4 6 11197753 1)72 I pct. g 1isI.a) t 13.3.1.2) acisol Aa LD b cliI I I4 7j 6 6; sinle trad-D- bndig-poten 47 M l11 I 42 45 1 10 i1520738 S jcn. qa p rein Streptococcu pro nm oaei l 83t~ s a) do03 ;;16 9;S1 9115 0 3 JcoaAI ptein 1 tre to cocu p eqr l ol 2 1 6 01 10 $1 lni Mlldtc1 oi lnkown Isocliies eubtilie) I 4 I 2 If 186240 (At 00290 a3 o 8 1 2 a* or( I to pc identical 5 a p t n residues Ci en ppro c 248 an protein Ytit _o tQi Sw P4)31 l cherichle 3)0 joii18694 a r pco204 6 310 005 1113---- I nl J En pre,; dicted coding region h30459 li4 aomophuius In en se; 6351 300 I 5 i i 5; ;S41 I~ li70 Itiroothtiesi nucleotidq binding ro in Cchoiepi6see l dlawij I NJ I t343 I ss ;ii 1 932 9 1112 g1378 Cor.Ii I ckriha Coli) j 83 ii 401 I 5 3 1853 I) C 111 1,lll149455g 1 rillZv (baclllu subuel) I '2 I 49 I Iii---t 1 4 6 3 S I2 3112 11 1 308 L3 6i l1 i h 3I 68 I 1. 86; 7.6.63 ii- ee-- oc- n- is 0 on nog atkil n Al ote as roton occus pneumansdol 1 41 1 $4 1 195 2004231248 23 Nov 2004 TABLE 2 S. pneumonia. utative coding reglons ul novel proteins similar to known protelne I Conti S rt Sop ac I match gene name match t a m I ?Aident Lenth t0 l ntl I--nt- ion no, III juts.: 11s I 1 IJ pnl IPI123522 tputtive POZ p o tein Btcillus aubtlihil 1 1 14 343 24 112 j S 931 j91j6734 I3-Ipti yi-gepticdase Itreptocoecus pyogoneil 83 1 79 12 4 2 Iglist Jim gin ainoprotase subunit Iclllu ubrdlta 81 5 361 4 140 4 31s J 300gl 67 s W~l. a asi surenicte ci ng reionu wu6na 1 81 1 also 314 hthnccu 4 371264l1437 Iramcaestetoocs ouenaj i1 p 336 I r 3cc C 83 Go 43 1 i 113 1114314? I3r5be-neare Apyrphosssoina [Seac&l ue c ytc1 5 41 1 1 1" 900 115 S jl4$446 o ein LUyn I c hius subtoll a)3 so )48 47 FleacillusI I mUbllI llus 5 b Mal3 I 6 1 11 1 7 11 1: 1 11 11 18 9 1OIIP33 10 07 Iriboemel poten 5 py tphs hiS tue subti l I cadoyi m I 60 i4 1015 1 1S 6 68 34 i jPID dtO1 rlj Putnw l ai n2s aub cil T 747wil C 4 23 I 13231111 110i104 jput(III(lved~l IEacillum ubtilisl 1 6 I 19 11 91 1i p1147 Is I inllPI dSt 9 jar I jtatcecu I utns -4-41 191740951 66 3 F I 40 I00 6ibooLmIsI r h.e bea uBa ntllu i lub cll r o I 471l I)I j3119i Ii)437 lIILI()0092 loIrl -hlh~ meebr h protein l Iceovobcru 115~ulc~ i 62 6 j 31 12 1 a 1 4;411 474$96 191114312 C5 7 ldeuritodnlpyil nepoolaeIsclu subtilial
I
71 117 113423 '14' I IPIe10 ou, Iu.tativelanct rn r us oneuubmonlee ;2 6 1626 1 9 740 6255 14treptoco ccus pr ieumo n inalu ucll~ 1 NJ.. 68 LidaI 1; 2 i d 1 13 8 1 1 6 I S 1r d1 1 1 d e x y r b o i p r i m d l e h o t l v s e 1 6 l l s s u b i l l lI a 5 9 1 I-4 O I~ o I t Ii) I Z31; 1457 11ni2iD410 iunkn0962owmhtl Ilacliu aublle5 I 6 631 j 4- sr I hl Lr I 1 1 -s11 33 1 7936 1 Jg n ilrT0 Id tOO516 j unknow 1 0acilpus subtglnxL f &2 1 6S t fig ,1 2004231248 23 Nov 2004 TABLE 2 S. pneumonia* Puative codlng regions of novel proteins simllar to known proteins RV g 01F I Strt I top I patrh match gune name sip i Idt j lngth 1D I InL I Intl aslon Intl I I 74 1 1 3771 lgnI uP dIOtlt9 Ieikollne aeyiopullulanase Isclui sp.I I 8 I1 1 771 I l £3 9 3694 351) Ienh IPlDlO~l t7 lunad protein product ISt reptococcus t hraphl Lu I I 83 I ,J 21 Ia 6 Ii 110"' 9354 l 14,3161 I5-sne Inruvrlahrl w- -1-vhos$p hta eyntkaee (Lactoc ccovlas l I 8:t i &7 1 Ila] 4- 1 I 9751 1g11400;1 Ihooou s to t.col I0S. Ibaclllvs subtlisi 62 16 1458 rI 15 8 110141 I 8412 InlI~PdIOl O 11A500311 phoaph-bet-gaiactoiidase I ILACtobacIllus pGSSril 82 14 1 114I It Ill I I I 132 IgnltIPIOIdiOO5 lt Iry l-tRIA aynchlis (Baiel lus ubl LalI I 83 I 11 133 IJ 4- I t 4 5 j 6246 IpilO01S6 ie C bct:;4 trcundi'l decoyrlbonucirase ItC 3i1.2i.11 CliA ciialn S 63 19 1901 I 4183 3503 1131 38 IA1too0054) conserved hypothetical protein 1Illllcobacter pylori 82 I 68 I tel 117 Jl $481 1 1441 lent PTIdi tlg9 IIABOOIl3 HorS Itacherlchia colil 8 1 I 55 1 1161 iS) I a 576 1prl50834114)8 Irbosomal protein St BacIllus atarotharnophilus 2 70 9)i I S I I 3811 2 CoA type I remtrlctIon-eodifIcatlon enzyme S subunit. Ituchviacl le ccli I I a, f8 I 30 j 146 IonhIPIDdl 6 I3rlbaceel prtein $15l I Ctlu subt Ilps I a1 1 -6 2 I3 i s6 ;T 5413 0DIi O7t ltrsvvtopbanrl-tJOe srthataue lClostridlus IonOIsPorLhI at Ii 1 0 930 *D 20 11 11000 1230 Ipnh1rID~lOOS83 ltranerltlon-repsc coupling factor jilacllus subtiliel I i I 63 JSII 25 2l 131 146 11058541 JDIutaLo P116 binding protein Irptococcus grdnll *l 43 1 11 I 4 2 j ]D101 1151 111402$t lend...r i9cllue aubtlsi I II 8 I 1)111 I I 8 1 1 11 )11 lured'l pttm.i a I8acluC c~ldolylcuai I II Elr I 1346 n i 4 1 I 2 1440 15n119101d100453 IH-nn eelph ate igoaer#e-. Itr1P1ooc5e5 1t.nl .sI 1.70 1014. 4 4 4 54 I 1 1l0 1 336 1vIgi46'59 Itenport prot1n IAerobctrum tumelaclnsl I I1 11 6$ 13 1136rI;147 ISad protein Itacococcu u ltlsi I 61 4 I 5161 I 6 12 814 2601 lIS$16856 Ilrne hydoyvnehylrslr&*e [Bcllus subtlll I SI 69 1172 r I,24 1 I )E000 i ii F'lorL predictedolr u3ln liPOlli Iii.) cobuctr pyloil I al 1 101 106 1 $3 I illlus pl strini w1II I S 4 I S i I 6877 1 9 i1115539 Iroup B ollgoppldese Pep! Iitreptococcu s agalacllal I II 66 I 1634 II Il 1116 1 0118212 ;lLISS94Il50 leoi uE protein tacla, aubtilla I a' I 6I 161 s 331 1 3614 1115$56 erIotl t (trptooccus tharsophllusl I #1 1 1971 2004231248 23 Nov 2004 TABLE 2 S. pneumonia. Putative coding regions of novel proteins slnmlar to known proteins Contig IOPR Strt Stop aech match ene name I @lm I I ident I imnoth.
i0 Do Intl In I caion mathntlg I 151 I 1 810 1 3 i 1 gU0)496 jeco t.ype retrlctlon-eodification eftee A Subunit Iechorclie cell) flu 59 1 362 ISI 1 11 1 122 1 7817 91,12139289 IGMP syntheti6. IBactliLu eubtitisi I i1 69 1116 4 4 1 10 1 1139 J 450 lgnh I DdoC00 I 1A001481 ru C ION UIKHOOW I ecIlu s ubtIt at Ss 2 i191 I 1759 j092 ;t119$2) ltrvptaphsn zynthae alpha subunit tLactococculacic) I I 867 1216 i 2 I 226 I 1614 Ioi1l5lS1l revrier rncrptame endonuciase irosphlia virlilsi I II I 1ii 4 4415 I 400 5I59,1164 s Iobgetal phoephctresfr enzymeS 11ts it a I JO G 4 I I 90 I 1I 4-63 190hIPIO1e3OuhSI IStroKI mthyiose %Sieaffoneiis enteriti) I 1i 60 1 660 1 I36 2 1 116 j S1 I1g1i49551 ltryptophen synthase beta subunic (LicetococCus lct1ii I 61 I 62 94 I 12 510 6766 14931 1 pntet,;,niate me--boleffi i-voproein (SrpCo-ccus utnsi I 10 64 I 471 I II lit 1 6050 1 S18 InllPl1Oj05362 Iunnemed protein product IStreptococcui tlermophliusl go 8 47 I 303 I- II 34 2 8451 9044 Ial 5O96 Ihcocn A trnoorLesyte i-utonestev geliduni I so 59 t Ole I 8 5 240 151 gI''i7 )01511 phospnsse ransport cysts.. AlP-binding protean Itle).noeovvvs Jennscl'ii I 0 18 I 53 127 1 4241 1 S79 Ig~ili5'109 1v41v1-tR14 synthese lscilius subtitle) so 60 I -4o 26 I 1 II 3)88 I011513668 In~ inluensee predicted coding regon 1110660 iIa.eophlwe Iniuenol u) nl64a 4i 32 3 0 1 31 IniIPiOIei64i2t Idhydreorote dehydrogenaae a I(.actococcvs 14ctl I 80 46 5 0Iol 1 9 j I I I 1)44 Ign1IcOle1Idiis Ihoe Isctococcus lact 80) 1 k366 1 52 I 5 1 1163 3 513 joi118384 JI&TP-bindlng subunit I5acillue subtilil I I0 1 57 1 1 54 I 4550 1 4744 I211li11S2d 11Af0042151 CuxICDPiall CuxdCDP honeoprotn Inus nusculus I go 1 40 19s I I 1 II9 1-09 jORFI. putative Islreptococcus pneunonial j 60 376 1 3 1 LZJO I 1511 IpIrIA1O2SIISS Iribososal protein L23 _Baclljus 1s0r siophIu 69 121 1 52) 1 5114 5502 Ip~IrIO)u1185 Iriosomul protein W4 Slctllua stearotfllarephliue I o 5 70 1 130 S- 64 9, 9654e ,1068~ IgiIlle3S IIAEOOO81s( conserved hypothetical protein Ifqleicobecter pyloril I so 16 5 804 I I I 1 54 1 2435 l1i1622991 Imannitot transport protein IB&clIluS Stearothrsopiallusi 1 80 1 1 fi as I I 9o 9 430 IgLIS3l0ii IpoisbeIde synthbse iaectilue subtilisl I go 46 1 I I 1 I 481V fl6 116653t)4 Ipeptide chain release feter I tIacillu3 subtillsi I Go 1 3 1093 I I1 6110 5 1436 IniIPIOldiOtIM Ihypothetlcet protein Isynschocystlis p.l I s 1 60 1it1 2004231248 23 Nov 2004 TABLE 2 S. pneumaolae Putative coding regions of novel proteins urimlar to known protein.
Cant g rJ Star top ealcli I .ach Sene Imp 9I.0 Id ID 1I Intl SI I ceslon I i I lngli.
1- S Io 104 5 I 6314 11511 IpnlIPiIlel9Ill Ilutantiaas. of cabaoyl-phophete aytlra. lLcobacilus plantarul I 30 6 5 1 1104( 1a9 2 1h0 10 IgiI00i6 toP gene product Ilccills aubtllel 0 so 11 111 1 9 -2 I 14 j 443r 1 21953 1nhiOdialISS 130S ribosesaI protein 514 I lllus sbtilisi e 6s 294 37 Ii, 11 1111 IgiliO~l IHWDP-dpndmnrc glutamate dehydroganase jlardle Inteulnelil lS I 63 I rto II Iir~ 19(1 Irl~~~o Putative transpoiass 115treptacoccos pyagonul goD 70 14 140 I 1 11 9 1942 lii7iQ Puttiv tplr~rinsposeem8 Ireptococs (Srpoaneaul I 00 I 70a 14) I171 20 17 111177423 Igaelrctoae-l-P-urrdl trsn serase (Streptococcus autansi 80 I 5 I 1491 1171 44 772 I,1a13)7300 IcyClhlin -acted protein 66 1 1 el Ill ro si 2 II I1i95 acep I;acs-ococcu, leais]II go 61 -?i 222 1 d I 129 1114241 Iribosomel protein 54 lIheclisului kl go j 70 15i I- 34 23 161 I l' 1 11100 IecF protein Srp~scccua pyomneel I *o I s 3 I 1- 1 2 15 1 1511442360 C*pe edeoale rthoaphtse (Bais llussuttl e] I s 63 1 '0 I S 5 I 530 IgIoidle2S 2 I pJRve Intlaococcus ecel 54 ma 36 161 14 1249 I 2 I Iss 4 3201 Igni IPZi IUOP-gtelucoae lpirs bt) l ui I 19 1 1314 31 2 1241 623 InljII oiII 10 Ilty Ir nercoccu hirs edul AO clls I i 56 1 131I II 113 I 7111 55 cIOi acetate kineac Ieciltus ubtlle) i 58 11 I II) 7 I tSll I 3291 gl144234s Ilhydrudlpcoiir ate ructcIcli ubtl 7 J 6 24) I65 IIII 1 671 I I~i Ig1120232&01(S jribec.t proteiLIP I tabyloccuusr I 79 I 45 1 11" i 110 I 491 $73 IunhlDIdI101091 1hypotharical protein ynechocystls sp.1 1 63 13 391 IJ 1 no; 1I 00 3gIJllea Iooraese II [Bacillus subtille! 1 is 11 41955 3 ;11169 IonlIPIoI.25509) hypothetlcal protein (Bacillus eubtlilI I 65 214 I- I 6 I 122 11 118Ilrl I1I652582 Izrphaneca dehydroenawe Itcococu lrtls 179 1 I9I 1214
I
sI I 3 I )10 1t1241it9115316 Itrlosephuephate I souerass Itactococcue jettIeS I n 4 I 1
I
IS s I 1 c2;a ,2 IgiilPld10012 Il's protein Isaleonella typhimurlu.1 1 7 I 61 I 720 2004231248 23 Nov 2004 TABLE 2 S. p0u.Onlae Put,4-e, coding regions *I novl proten. samllar to known protgin n" iq Sea Cn lo, I Stop a match gene name alm 11 Ident Ilength, i ID Inl In eglon I I I A 194 1 13119'01113) I.-galactldaiie iST trpocOCCw mtjtansi I LsI tI 2164 F07 7 Sid4 4611408 0eeie.-lnn in-endpoeinnterococcae 1aeeailts)I a 2 656 610Ip 1 3 144112213 jgiItshmI I-phophogLycereti unaee~~rtorI~~lII~ 0 11 21 5a 13popoIcrt iinse ie1rmotoga motricimal 1 19 1 40 1 121] Iri 1 rr~ I 10l11 1(5osioo ICpe rItphyococcge eurewe It I* a" IIS 1 2 1052 10 I 9LS3423 I p tae l c cc u a sa 1 1 11 40 1 Il II 771 )lo I l~ putative ILicOCr ccus laCttil I 9 1 I 17 419 1 3j6ol 41 jputative IL-aoco ccus I Etll 61 16 7 33 121 j D Io07 LnIPlOIdI l 8I rimetion UrNcNOWu. JOScllve) eubtiiisj 9 5) 110 I 1 1 3561f 450 IItatiVf Ar-bindl'Ag prtein of AaBC-Cpa eIacllus aubtilil I 79 1 61 76 Is I 1 I 4241 1 3445 OLIIlns,1 5 Indoleglycerol phosphate eynchaee ILactoroccus lactlal 79 j 6 0 611 iii I 3 I 1601 2317 1,111404c~ Imnnoag parmease eubonit It-n-Man It chrichla colil 7t I 911 12 1 1643 11 Inh191D1e0004 Igu.rdcn-l he proten Itoccus lettl 16~ I 3 957 oig I 934 A00401 arginina uccinnte synth4sa IeCilMUS aubtilll ii 4 273 0 i a23 5)0 76 1 g)ils 5 I3s ribosomal Protein iredlococcUs acldilacricII 1 9 1 1 212 360 1 1 194 2 gi13164660 Ipolviturleotid. phosphoarylase igacillus subtilisi 9 1 6 9 64 I 2 4 3 I ey Ic n i-acou ny egacIllus ebt iej 2172 I II j 10I 71 I1I1943 Iputaiv Ipcrl tcoocs. lacIve lclu 1u~I I "P J 1 2 os'' fy9 gD-wbene pro uc Ieocce acldliact iii a~ 74cliu 59 1 124 I IS 8 7344 I 8314 I7 nhI; tO1 0 Iw) (cytine Iyntetac 1A IaCLIlu ebIBIll 1 76 6) 1 11 1 2 (1 1 lilli Il~os9 Iypoatne op hor jeiosyra~rrc iabccus acrdlal Ii 76 19 1 i I 122 1311 14 IOI di03i lyqit flacilsue sythk .eubtillaj i ll btlOI i 1 6 Ir 1 2971 l IOld'2 I29311 I0sienine dehyd ren 18cilsllo ub iltls 31 54 340 I 3 I II II 101 IglId~1 Ie~parceetfn 10rligeme 1rltpllh a co~ip I 1 "I IS I Ici 16 65 1 7 11I~ 1 s Vpoxsnthine Ph 3jphrlb csy[~ tra to 0 ILACEOCOCC.6 iJ 1 71 1 S9 54% I- 1 61 2 257 Ias 119 1G1ClrCld 1 6 )l5a8arllllococce aur 76 d o0 10)9 ii-1 1)0911 ;:Innille cis ~lnln dhydreq 19 cicllus sukkil sl 11. 55 1 I 9114 1 a p artate- L~i~trrr- fik Hose a Irscherichla cill I is 1 55 1 1 ;c Fl-i 625; C gill65764 aprpe Istap lyo c auraus 1 72 40 1 too] I 2004231248 23 Nov 2004 TABLES 2 neumonia Putative coding regions of novel proeina 1miiar tO known proein.
CotgJRV Sat Stp4ac match gene name I soe 1, ienc length I ntl nt 4. 0 111 21 gaol j0 9I I~ i u M 5 a y l h d a s i l 4 d h d o y 2-butanon e.-pho phato y nhoea $8 j 2e 12222 213 1 '1214]0 l 1 ~~i62~ p LOrIC transporter, a'rP-hlnding protein Iglool Ia 1 F5 -1 2 21 I 140 Igpi)m7 oerat membrane protein leactljue eubtilli I 8 4 7 4 51 ;ji 1143 1142 1airollsl Ityolaia prti4BdIi utla 1 0 51 '024 71 11, i-;ls6 JIts loI77 calcluat channel alpha-ID aubunit llomo saiensi 7 ls I 75 1 1 64231 7572 11141 Iuqatactosa-l-P-uridyl transferase lStreptococcus sutanap I is 62 1350 I L i 12 11212s 3l206 Igl1573607 IL-lucosa lgaeeres Ifuell iloaeophllus lnglutntee I 16 60 1102 I 02 3 242) 441 I ;ii$4 IOF Xj putetive IStreptococcus mutangiIIl4 4 j3Is ati 11110 91i1i3i,, Jtaphoriboeyl amlnoimidatola carboxy formyl lomlrnfraaloit I i I ~~onophoaphaca cwclohydrolasa {PU4-Hlsgg l8*cllua aubtilal I 81;I P0o 101 1275 IIi)6 Iphosphorlboeyi amiliadaole carboayrlase I IPUR-21 [Sacltlwe eubtliis 1 8 14; ;4 564 i I97 iiiID~ 10 2 olt 74 olla I t ot cill us u l I 6 50I 3 I 141 M2 4 I11321 07k 1i1I1444 IXMA polykersa alpha -cvra-ouuunit ibaci Alus aubtl Iis j i s I 5 599) Ii 20 491I 15i7j Iul aas If caode thCItom rnl0 J 1 234 It 6) 4 49 )S37 gle las o I00 IAEOO269j 1 11 -dependent HAD synthirtese tachar~lc toil I I iIs 46 540 140 ;;osi1 252 I;II1oooP7 phOh-beta4-glucoaloaae IClostridltum longisporuel 70 64 I loll I IO S 146;; 4Old JoiI14 64 &jIsna peptidase Itectcoccua lactlal I s7 42 J 171 I 152 1 1 791 le11611915 jNAon dehydroganaae aubunic thubri ~a 75 43 1 73 142 1 4 I 91 I 10 ill 101De)2)520 Iutative VmaP protein Ifiacillue subtlilal 78 J 4 1 sea ili it 611147Iiis liactos repressor tacR: alt.) C actocoru lata 1 13 40 705 I- 14 I52 1 9S3 I~l0Iof lneta so jtymouna. mobilil i's 61 1) 2004231248 23 Nov 2004 TABE 2 S. pneumonia. Putative coding regions of novel protein. zsilar to no. proteina Cntig font I Start Stp math I match gene nagh 110c Int Intl s ceualon det lngt at I IntlLa iu %tl$ 210 I 6761 7112 Iissio 151102 I0 gene Uroduc 6~UUCL I~I Ilclg utiif n 42 38 -271- 19 n I-I I D I -1-2-04-9 -1 c a--o--lsioglycoprot cin endop cldaaa; P16175 16601 is 60 1014 61 1 r 1 ia::6itc Itacillum eubrillII 111 1 6322 Iii) Iglll 1u43 nknn Iseciliu k til l it bti1142 121 I 1 9 1 ii? ;11111100 leohoL dehydrogenese I (ntamoeba hitoitlcaj 6 a1 6 270s 2) 3 214 )076 g1. 11511047 eporrnger instion and vmgetatkue growth protein leercil liffefiophliue 78 63 763 266 I 1 4 ui 0li jputativa transpoamee I trepto QCocw pyogenaml I 651 73 214 $3 IenlIfIDdiOI Irlbeao.Ir protein WI lemcihlue eubliSi is I 511 4- ii154 1 1019 1911289241 from: ORFI 11acillue autlial I is I 1 19 Slit I I I 117 1 794 1I11216729 Icado Iltaphylooccus tureusl 1 3 I S681 342 2 762I I261 2 1i118441 9 Ipboepi.atldviglvceropliosphte@ synthase Ihella, subtlisI I 1 9 498 161 IgIhSIaCO polynucleotlde phoephorylaso leaclilue subtlli It I8I 78 54 I i I 111923 flill 139815 Icar bo I t n so~rne bet. subunit lSynechococcwe PCCi4l I 771 61 901 0 I 2 1698 2I3S 19h1363 putative ILac tcoccs laell 1I 1? 1 59 $isI r 17 II 1 6948 110 I1152 73; Ico A proteIn 15tlrptococcus pnoumonleI 60 401 jI) 1 9163 6 5 114) jLf raP isicillug subtilisi 1 17 4s I I 3 333 6 4 36 I 1 1 i 5 7 1 64 te h pa i e c e r u t la 1Iam pi i u ssiu n z 2 634l 16 Ii 1 1 4371 8054 IgiIIlbO9 IpyMuitidrug ;ar&atance protein LirA litactococcue lactisi j 5 S 224 4 5 1 2 607 L~ 12541 g 03 I l lprotein LI Iacilua tearothen'ophlmiai I 62 64 0, 6 7101 2 l i147511 ImAP lStr cc aual 1 77 2 so 370J I Ic9 .1 103Jli If~~Dd493 Inknown, Isacilkw autlillI 1J 5, 1 IU 1' I 4583 40I6 1n hlypothetical 12.2 kd protein I8scilius iubtpll4 77 J 09 1 14 fi1 11 IiII jaido'Phoophor slnoctaareeI %thanococcua JenneachiLI 71 56 1449 I 6~i 65111680Ih;sn CAPo lI5 I7cocu .LactI 1 77 1 d21 6 9I ;11 1012 ;11219Ioil92 153731 laar-bindinPoprotein istreptocoCcu# mutanuI l 1' &1 1264 2004231248 23 Nov 2004 TABLE 2 S. pneumonia. Putative coding regions of novel proteins simllar to kni. proteins C ntig I Istart I Stop mtch match gene name a al IN ident lan.9 in I tI IM, n I IC Ints j sa Inl 1 I 1 2 I 41 1176 qIi321 i8eD protein fKaeophLlug Inluenrsal I 1 17 1 I $161 4 L152 030 Ig 5lIl47JO ILhowita: reltance protein ftebI lHueeophllum Intiuenrzal 1 17 54 1 89 I 332 311 11 5721900 IP Ino permeass WAQrl Iftmaph ilwo inlutrizal 1 7 119 I ii 4 17926 I il I911L5731@ IDsh2n ptmec idaga illamopiiu i fluen tl u lslxc j7 73 Sn fgll runln-l-meehyltranelersse Itruol Iiiaeophilus Influncesi 7'1 Is 726 IgfllPIdlQ1I6 ISrb Illacllus subtilll I 1 2 I 119 1T 630 1373 lohIrPIOdltdI0l IYqit Iflatlikur eubtilisi I I 58 V44 I1 no i 18 InhPIIe3SOi) hrvohelrei prtin I8allus vbtihiei I1 2 Ii £36 369 11;P123 30) IqA ,lacilius ubtIlll I 1 j7 59 I 7501 0- iS1 9W gI ;11a28 Iyit*lnyllRNA synthetase [Bacllus subrcllsl I 1 j I I 1350 I H 1 1l15I 13,243 7115 putative trenspoha Ilreptococcue pyObflest I 17 6 199 II 941 16 16.3 I lDI eI570 63 IURFS 5 i 1-5Jl, IroaophiI vaicubal 711 so 1 Is 4 273 5 3 191156256 I 1 1, ;1 ie nocy59 4 7 i diO e I y fl- A thyrithetasa lsciiius mubtihlel 1 I7 61 2ISO 143 5 64t2 11kB WIl~*JL Idhvdroroate Oehydrogeasei t itctocoeuaisetlil 41 )r 150 '4I 7 110 1841tt ehdoentAl~~ oOcslatj 71 701 !fl1F1101094 horn019us o rcvrodiire~te rnsort flP-blnding proten P et of col 77 52 761 IBrslllqo subtills) 5 i- t i91 U .77;I 19 Igills514 jantkrani. se synthass al1ph syubunit ILsctoc ccus actisi II 7 J4 I67 I rcs INuuslusl 1 so I61 Ibacili subti- I 7 "I 09 3 I 1731 1214 Ign1PzDIdiOOS4 Ilb ooma Prote In 10 [Bacillus eubtilisi I 62 43 21 39 6 9130 rnsfer MA-Gin synhetase I acilium ztearorheraiophiiusl 71 so 732 1 .1 13 4 J 1 6 6 p ts v n e c h o c y s t is n I I 1 6 5 i 1 7 (21 2Is carrier protein (r YPteonaa ph l I 16 I 21401 9 1;06 4342 1 .1241 l Iespareoinri-IA syntheteac Bacilus aubt1Iis 6 61 1165 5 I 4531 41 IIiOIelIiSS 1hypethehlcel protein iCloatridlue perfrlingene j 76 93 1 I I 2I 9411~ 342~ I,119i672 Iploehete rneport system At-binding protin Ilethmnococcas isnnaectriii ~I 76 56 114 I- 2004231248 23 Nov 2004 TABLEL 2 T E pngvqoi[* Ntative coding regions of novel protlnir Wim Ir to knon protoins Contig lair start Stop I math I match goe name I l Ingth 10 JID 10 jInt n) ceulon I I Isnfl (nt 12 3-73 Iie renaicon initiation factor Irl IAA 1-171 [lecIllue ateerothrmophilull 16 64 376 4- I i 16 4 19 niz I ;i;li 6 ICApSQ Istephylococcum 16 eur1u 4 1 1 113 1 0 IgiD0431 glutemine ABC trnsp rte. poi41555 protein jsql's IHelico beccer Si) u T 1 ~pylonl)I I 111112521 IdCOavnihodipyr mdin. photolyase 1acililug Subtrlis j 76 56 904 I 1 1 11211051 gi~Ileii (cemu JStaphyiococcug hureusi 6 1 5 1$21 I 7 1* 564 19 Lg~lg5t 1.161 Itacherichia, ccLii1 1 1 1 1 4 1 5 16 41 I 62 5 I 4 091 IgnhItioeliJ302 IhwothuicIlI protein Iacillus eubtllis? 63 9 I *21 4441 91tI;14a L79 protIn IoAt 1-65 Isaillva *tubtlle) 74 51 219 4 ;32 117 q ~j lnhJPIO1@3e23)n lanebollc orritthine cecbaaoyitrensiora.. ltactobacllqs Planterm) 76 I 61 1064 5297 1 6005 I nlhI~iidi01d2O j1yrIedine nucleca ida phoaphorylaoe Ill elu Itu torthrophiusi I i 1293 1 I 5, I1267 1gn190124262 I unknown lMycobactwrtus tuberculosia57 3 7 14 1 1433 039 I~fllPtli aOi IC tDImoellum 02 betug l ssde P2608 I0e 1 Ibacbilius eubti l 16 6 I0 '"13 IC I I I 1 I 7936 I~lOl O I59EOO33140 1nu e ved hypot hetical protein hleicobacter jiylorlj I ;64it 294 *-l1-l-r l 19 A £3 15 05'Oi9 114ts6 1gifits3go tIelno permee (dagAi IHaLemophlve infiuanzael I 16 j t I gg S I I I j eu~~I sbtillalaytht, 5u-; tosatcon bciui L 1 II 03 l 6I~no l ro I l lu aubti l siso 1 79 1 I 11 Il 15 -1~019 **~sII Isucr~ o-n-6-phoptoat hvdroeieIstreoco Io 1 1 i 8743 ls 0 U[Srptccc m1as 16 5t 1440 11 16 Jiisi s0i IthnO~~l~ patve protein i9acillus sobtiilal 76 16 645 #2 4 1749 S3 Ihhl1b .i-eia-gluc~n branching enzyme ig1g05 fiieemophliua $ntluenuee I 4 46 231 94 S; 1 365 &gJ144313 I.0 kd air INienld C Ct) 1 74 1 73 ns 6-3--gl-e jon umococca l surface protein A IStrept coccus p n o lI 7 4 59 J 74 -s t 1qlz Il i u ubtil 1 76 '1 4117 1 41 71 7797 91g 194 1i4 Ipun ne nucleoside p osphor 0 5 I s tIlus subtilis) j is 1 G 0 25 1 131 6miiSell 1 I; 17411 40HC0 ycoplaspi pneunon ao "GMP hontolo l, rm 940it lum j 16 471 37 I I I I l mycopkr D.u. c~c~l ul~s rots onnumonk aelnl.l7( 59(I 4 2004231248 23 Nov 2004 TABLE 2 B. pnreumonias Putative coding regions cO novel protens $ktllar to know proteins CetS ~s trt J Sop watch patch 9ene naeI'* 1 WeaIdnt length 0 I 1 Ifntl Inl aiI ionI i n l~ 140 114 1172 j1251 jljli~ O J yI*C;W(1t12301 hocllu s ilcll 1 76 j i2 s*2 05 gi113795 transfer RO-?yr sinthetaae ISacillus subtili) i 61 7 L- r t i I I 1n1 1 Did l5ll9ilds IvegO laacIllug subtilisi 76 44 Ina I 110 I 51271 51 ij O IeIdi05 Iasrq 825 llu ii.1-811 l lilu autils I I 5 1710 Is II 7 14015 I 128 Igij3aBo Ianhrnlita Clnthase beta subunit Ilccococcuu tacttol 76 61 S68 I M3359 I2444 1g1-14i9 905 ID--utackc cid adding enyn- l t n-eroco ccu e lac-l-l I '4 116- 200 13 914 363P9 1143272 1l protein iBcilua subtiliil 1 7 Si 1716 I 0 1 2 Ic 27 aljlongo idestran glucouideae DewS Istrpeococcug soleSS 1 I $21 14 2Irnspoase Llt pt cccu s a uieumonlpl 16 S iag 321 3 I 2334 13411 l1111 75 lrtP-bindlng protein lCschrichia colil l s' 1074 3 1 1 I 2 I 724 ;LoiIariitS Ineur.Inidcas. S (Streptococcus pneueonlaej 76 p 60 72) 37111531032 IOPP'-tl0 [Ichsrichis C0111 16 to clii1 II 3S6 89 1 6 1 L a o I0-wlutamIc acid adding efzy.. SEnterococcus _tecalii 1 7i 60 41 67 I I 3 I 371 1 19o11 450 Iphosphoriboeyl anthrenLIat leomersue ltectococcue Lactli 1 69 1 367 I 1129 11491 Ig1I1S74293 5Inibrlel transcription regulation repressor Iplis1 Imaemophilug Itiliotse1 1)16 4 I1) 11113 jul94 IonlPIOdIOo lydili (Bacllus ulall I 75 II 61 4 I 110 I 8 6062 jgi 1S) Is3partats aminotranIes Ilecl lue sp.1 1 75 I 1221 12 SoIso 9 gil 949 scor)I IECR mothyia;; t 4C tococcus )5C I 1 75 56 1 14 I 8 266 I ;30k InlIPiOIdl10i3il lyqglIb SlCillus subtill I l 72 061 4 Isis 11 1 1 7281 19iII37111 S7 lor hypth eical protein; Plethd: onceptu l trnslaion upplid by 1561 I i 22 183-- 2728 I 015 Ii7 3 211 1 I ahor bacillus abtilal I I ;P-oc 10 1l 901 5 7829 #1g11153I11 [enyme scr-I Sltrp1 ooccss mutansi I 1 ?S 64 1 ties 31 362 ;30 giII213 I I ;I F~00 0210) pu:ative thloredoxn 12acllwo subtills is S3 33) 7 55 lgnI loliOS06o ILoramidopyrln idn e-wea glyoyiase IStreptococcu utansl 75 61 I 1 4 1735 1 44 1 LI13974 ilpe-Sir gnt product ISacllus sublll I i 1 53I ii Ir i 3 liO I 6470 I 9 -slI -II-S ;sacilsus *ubtl;is -6 02 2004231248 23 Nov 2004 TABLE 2 S. pnourmanie Putative coding regions or novel protelins 6amis to klnown prpteins CeaCIg V ir si idant I 1. t .o-a O-F I- r S-op match match gone naeI D Int (ntl cession Int) 3) 112 I 8 71M1 IPIrIA0020OFECL ;rrridoxin 16e-451 Clostridlua tihermacetcwr 1 75 5s 306 I8 I 16l|7Jgi20J1739 IF003l141) strong aiimilarity to the FASPIP2/CRaP/C ASP tanly of p 75 11 110 transporters (Caenorhabditis elegans 4I 6 23 i10 11537 1S114058 rhypotheticae 1llamophilue Intlusnuae) 7 1 *56 1 4 I31 I2jIS ;24066 9-190092 ;outer nmebrane protein ICapylobacter jjunli 1 75 56 663 F ,51 ss a I i I 2 I 21 Iqjnil InlIS-ILka gene Iictobcllus detbrseeuii l 5$ 14 831 Io 8I 11161 I5)71193 CG Site No. 1 6 2 0 1 lternate genem names he, hap her. rm apparent iremesahit 7$ 5166 in CenBrek Accession Numbar X06545 iEcherichte CoiiI a 5 IS 1966 207;19 |gijI661 eort gene product iLctobact Ilu a lras lchmlnnli I I7 so I 1114 I 47 19 I 148 7612 IgL1k1j9ol 1 1 lschcrichie oli c cs 1 i2 I .0 S071 5 1130S ibosomaia aubunir protein SIA IEscherichuia coil I 1 6 1 6 74123[ 63 8 ImiliZIDu ada98 nne phoyarbositruensuersueejscilesutia 5 7 713 9 12404 it l 9P Ic4d crboxlete transport protein ilscmophiiua Influenamil 15 ST 996 I I 2 910 0plOllg6SS ISnt i e 'llu t uril 6i -5 I si 4 ill911 2.2 ofoprotein E p t 0th 11 S| II 6 5 I 65)6 IjitlsslhlS Itto llhodobacter capsulatusl is 1 55 117 a a] 6 1938 1 2 21S IvnlIPla21IS29 iputative risK protein Ilacilus subtilie) I i I 1026 93 Iii 7366 I 531 1g13~9969 Iarthiny-tanA~ cynthetame ascllus sterothrophlual is so" 93 I3 1409 I8699 Igj)591493 jolatemin. transport ATP-blndin; protein Q iieihanococcus 3annaschiLi is 54 711 r- S -I 1 P 47 IlnhIPIPIe3lIO IlaY protein ifAc illus Subtle) j1 75 i I7 1741 103 6 194 InIPiD~es~bi9 lunkncon rlycobcterius ruberculouiei I is j 64 125, 592 915) g 160026 (repreceor protein i5trp~occus neuonieei I( 1 S 22! TI II) S I 2951 3961 InlPolDIiOiil ICOC ranporer subunit iynehcystis s.3. 175 s 1 111 104 1 69; 915 f1is002i ;repressor po ei cl ss eetosco ku geneumiexpeso Ure 54rpoc ru muat25"I i11 4 4- IOynectlacystis sp.1 55 931-- 11 5 2614 3000 Igi3iSQO4S Ii. 13anns1hi1 prd lct1d1cps o fg reo g on eJp ssi I r cAhn o ccus Je nneechl)u 75 54 1 I 367 S a is 110 10611211391116 IP-glycoprteln 5 itnf toeba hirolycica) 75 52 405 I( ~1 I399 I 9326 IPIo~IdOOSi unknown Ilaclilus subriliel I 5 S 1401 2004231248 23 Nov 2004 TABLE 2S pneumonia. Putative ceding regions oI noavl prcteLni similar to Lnown protla a Iontg lOIF )Irt I tp match I ach pans name t 41 Ident I lwo 10 I11i In Inl IIlnsion I isa I Is R0 173 14046 loads polypeptida, pert of CItA Lamily ICLtrobacter tceundiil 75 57 1410 I ;16 3 J1 10253891. JlVDF? gV'l ;l sA 4 (90MOC450 1 BACs llus subLILISI 7 611 I- Ii1 I 9869 1 21 Is~ pI0~) a2526lsi Il-ulum oa 4epmrca ithcblrs aubtillsi 1 7 61 541 17 II2S 574 Iuijl4JsnF Iplycarol dahydrogenase lacillus ateareshermophllual I 5 34 1222 I 172 9 ;1739---7313 IcnljPlOja2iIql junknowy, llyccberteriue tuberCUioai.I5 o 59 I1) 1 242 '9 IsnuIPI01e2)iE6 ;C1O00.6 (Ceenorhabdis etlegarial I isi so 1 soa) 286 2 3--4 2034 I I I ISS 3034 1 tlllaa ;;arstdinaputreucine transport ATP-binding protein JpetAl lameophIlu 75 4 In io 91 6 5123 j 4211 1,ilssi* Iphoahorboyl anthranllate rranerosa ILactococcga lacriul 12 61 1 1022 I 1 I I 1 I i) 1114~0i73 jhomcloo f £.coll rtiboomal protein U,1 Sillus. ubaa b i l 5 I i4 I 2 4;8 IpiJ)29J1S 1AP0,8220 1YtqL I acliu 1 a;ubtliat 1 75 I 59 4171 S $1 15 j1 1 IM uu~l~S ~uknwn p@tin Ilaciliva subtilisi 1 15 so 0- 37 jOA.. 1 190 73 a 493_.
p 271 2 1i7I I 626 191142011 IORyL IM I-acLlue aublll 492lir 6 0 Io 126121 117140 0i2 21 4A l a* 7- J- 52I 0401 S I 14492 I 51- p t b llus subIli 1 0 to 33 al I 3:iii )42 InIIdi Oli keg1 iarillue aubtliali 74 5 4 1 'i5 121 5085 a 80 u I12;3Si llutarayI-aeinopeptidaua tLaCtococcus lactisi 1 1i I 9 108 g s -i 24 2 I I 54; Io123141 I O I5I AaC transporter. pero M protein yasti [Helicobacter pylorl I 14 46 t93 -I-S i 3 H O-H 0 -xI- d a -a t- r q-p- c o- -d c-u- t a j s-I I 7- 18 ai ;1,1311 112,64 Ii1127024 I0F.. t~ilcharichia cctli I 74 1 57 1 53 1, I to 4 lD 8924 669 ;IlI~IolI ?-type adenoalne triphoaphatase iLihcarla uonocycognnsj 1 4 53 1 2216 1 5I)116 101 ImnlIPJ0esjiO Ica iirsphyylococcusL~ aur~eu 1~C. 14 1 64 144 1 2 ;1i12 427 jul i2293 1A patativa Ilgasa tialcllu aubtllis) 74 I 55 I 4156 I 76 101 I 1 065 IsnIPIDjdl l)25 lYqiS l lactika ubtillgi I 74 1 4I nso0 16 ;!11PS5 logo S5-1 2004231248 23 Nov 2004 TABLE 2 S, pnteoniao Putative coding regions of novel proteins similar to known protelna Contig JO I Start IStop I match I match gene name im Iidat length I 11 Inl Int Ceasion I I ?ou .1 I 0 iS I 1 4264 2137 In IP1OIe22524 IYlol protein IlecilIUS subcllsl 70 j2 109 1. 4;101 7 I 666 1 1532 InllOVuOS74l Irtethyitransleraae ILCCCoccue Jacctl laccigi2 131 I 78 146 IvnlIPIdOloo irqgZ Ilacillue eubilhej I S J iii I 1 1;6 o~~lnIIPtD)IdlOO7 130 -Arpedeollu subi 0lfserccv hrc 7t I 1 2 p e l o o- Iu41 11 ID iso. 250 InhPIDd100Sa jhig~oh leel kasgasln risne IIlacibfltu ubiia I I Ii I I 4 1 203 1g 1 12(7 5 5 Li ir ee produc rsisacube eilll I 74 I 55 1 246 17 l I 21 1;0141041 Ie ll a p shresubt rs en uIi 135 a" ;301 ;oil 3927 1 4-)r g 060 p Zct. 111actilus ubtlisl745534 -a I -I 1- d en n tran oeor o I III 1u Ill8 I nlB7I ldsnig, Iacn&al prosducts soncillu e crrcul ians 14 62 68 31 1zye 1 20 I k187 1gaj223]20i Iru o eu 1u 40 1 ))lI I 1 111 1 I I vI131 repeecln .r rego ~rche.rne carvIsiaa 1~ l ~~plu I, Ie 20 4- 111 101) 2 ;;011 ll7 0 IP dependent translacator homolog Ilbj osbpklu n~ult l 1 )s 2 6449 1 0 27 1,1149272lu IAnslarenlnes 74ctlu lihnt s tin 1 2 1 I 6I41 92 u11166119 9 unkwcln Iproductom response regulator Istreptacoccu3 mu 14 6 1 421 l I1 57 21I1 j1n1P21032 1) 2 34 l 220 1 iqiO [Bacllus uubtblitI 1 60 1 7 I- 4- 4 11 1 2 522 I660 24 alIIpluSd Itol tan DMA1 binding proteint sacilLu sub Lis i I '3 I ss I 27 32 1 %$it 36S 8 19 1663.232 IelrodoI S I y i a p c l 7,1 I i protein In s I58 4- 122 l lj,19 l A -spa-ro iness JR&C ll hlitheni 12 1 64~.n n ur F 7 1 1 34 1 126 1 9 40)398 I r- g-r- 72-4 5 S106 g 06P 1*30 62 1 nnn~ mod protein product 15trepkococcus ther"Phliusl 1 73 47 522 j 24; Qnij;;DldOOS st 9 st a d DA bindin protein aacilluojubtlital13 5S 17 i 6 6- 2004231248 23 Nov 2004 TABLE 2 S. pneumonas. Putative Coding regions of novel proteins 81.1vIl to toon proteLns ICntig J O Strt Stop nchLi Match Yana name I. Ident I length 0I ID 1Il Intl I ceoI I Intl.
.0 1. -ui11173517 Ir iollein ynthace apha eubunit ActiftobcilLu pluropneumonel 73 55 1 631 I 5 2 3592 139 lnilPiIdlOiS87 Ici;On-tranportng Alaa PacL LlynechocystCl sp.I 1 3 I 60 t 2754 I 5 11,1 11794 111566 16n11r10102653e0 Iunknon Ilycobacterium tuberculogil '3 I 52 1 0sf r i I 1 I 7I 71 61131 Irbsoat protein LI6 (Ihcillus tarohrmophul 1U 231 I 60 551 I I B S 1 6 1 1 I 7 3 I 1 111311 6I ;G 13 i19,101t2651h3 LsoF L. eotcli caslr 1ea 73 12 360 1 1 0 S10 1 5157 J 733 3li576i1 fenvlope roteinllaumgn w dinleney viru pe l I 3 I 6 1779 4 1 I 72 3 I 51 g(123177 fAOOI~Oi ransprtr a cilue lubt licl I 73 I 1 645 76 7 I 7*15 1 55 IgnhPIDVdlpIJ lyq Iacillus uillil 3 0 76 II 110(409 9533 IllIsi306 luridine kinme luridine vionphouphoknscsl ludkc IHaa4ophiiis inluenzeel 1 1) $4 1 41 ID 1 (1113 )11I1377623 IaMnopptidA itSaCItllus SuhIlIel 13 60 I 1260 4 I? 5 j 3)6S I 666 Ien41PiOIdi0is56 5dhydrcyar14 dchydratere Ilynechocystla ap.I 1) 54 1722 II S 6912 7619 10nl1PI0131455i IrtuE Ii9yobactorium tubercullsi I 56 106; I 10 Iii Iio0i6 110440 I0i135610 )regulstry protein lnrcoocus IicaIsI I fl I 56 4651 I 126 6 ).12 1422a IViIli1til lor1i091 IStreptaroccuc thermspliiius; 7) 6) I uS 1 I 75 35 1111473261 ItrmnaOrt plrote tcIri~ha olil I 7 I 60 I (13i 1-4- Il 151110 1s1542E3 arins 0-aetyltransteras. fEC 2.3.1.301 BACillus steavothermophllus 113 55 636 162 1 S 5741 1 4991 InijPIojel Sii Iputetiue Ys protein tachi e gubtili1 1) I 0litI 164 3 2133 I 775190 1oiri550 Ihypothetical protein it5V:P75161 1 Ilathanococcus lannaschill 13 1 52 1 46 16 4 6 0 4 5 54 01 1 01 F 3 1sacilius subt ill l 7) 1 54 I 33 -7 I d Ioologus of vnIdentifad protein at 9, ccLi lacillus subtliej 73 J 46 1 11 I355 I SS 011622 Ioduacin poten 6 lnd thiobl- ltilI 56 563 704 i 6 j 106 I 6276 .iOIe i f s Pri n focili thurinqiensilj3 Ol I III 1 I I $3 07 191S59 ibeonsepo ting omlg etence specilic 044-binding protein 73 I 1211 461 -1 Io 23 1 I 64r 36 1611017 InlDdD0Ihooiog o Eucoii ibogomm proten 111 I.Cl cIiilus eubril Is 'I 1 24 2k I; 3-1 05 10i117i51 ladentine phocphoribosyitransfose Irscherichls colil 1 13 1 I1 sod 2004231248 23 Nov 2004 TABLE 2 S, poreuonise Putative coding regions of novel praoten. Sitler to knon protens Concig loar I Sr Cop match I match gene name I a i I t Ident I length (ID 1 Intl I Intl J ecession I Intl 1 2 695 gnllPIDldiOllta YfqiX lBacilus Subtilin 73 36 696 I 5 3 212 j 632 1p1rA03711R74C Irlboiomel protein 1-7/L1i iclrococcus luteus I 73 J 66 j 441 141 1 I I "q1 IA 00176) hypothetical 30.4 kD protein In man-cspC Inrergenic region 7) 71 (I7 S4 I- 9 I 1 1 |(Escherichts colt)I 356 1 12 4 jgi1J2149905 lo-glutelc acid adding enzyme (EnterococcuE tea fcill 1 7] 1 50 s S 1 3165 1 411 InlPIOdlOil81) aiLdeme Ilynechocystls sp1 72 I 127 9 7195 7647 I112il6416 |numB IEscherlchia coli- 7 4 1 I37 L)743 13300 IgnliPlb1li2Sl I|Islat r to hydro .ymyrlstoyl.(acyl carrier protelni dehydratse machllus 72 59 444 1 I I subtili) 32 1161 16 5224 Iunl|PIDiLdI1979 lrlbolome releasing factor (IsynechocysCtis sp.1 12 51 I13 I7 1131t 11425 InhIPlIDdl19o O laOp) Istreptococcus utanal j 12 55 667 4 3 7 1147 1637 1u11394603 Iaspartyl-tCRN synthee.e iThermus thsrmophllusl 712 2 15211 3S 23 I137 2 1 16015 p1r111641061H641 L-:ibu:oss-phosphate 4-eptmerasa 14aiD houslog -w Macophllus inf ventee 12 1 54 ;714 I 1 I1istain Ad K420) 3 1 5 5094 lI905 IgnPIOlS4B7l Iunnwn (InycobactCertua tubrculoels) 2 56 113 1 0 6 44* 4636 |1532 lactose ropresuor treptococcus mutans 72 1 51 1681 486 151 1353 lo31101oI0 inhibin beta-A-aubunt l0vis arel I 2 33 3207 79j924241 I8 129 172I) 122424 gI1stI2)I29 1A60006231 gluteinns ABC traneporcer. peresase protein I(lnPI IleIlIcobacter I2 I 49 I I I i I I I ylori I I 1 0 I S 52 3B IgilhSDIla JYnbA Itcllue aubtillul 1( 4 51 1 144 l 27 1,11213 )15F00B2201 YtbJ Ibecitlus SUbrltlal 34 1 123 .2 13 11 11396 1I11125fsl Ideoxyribodipyrinldine phctolyeee lacillus subtlla J 72 45 I 296 1 1 JI ,igI99siS (oRoi GOT erert (echerChla coll 72 S9 #a1 5 132 3191 IgnljP3D a209886 Jurcuric resistance operon regulatory protein IBeacillus subtilis 12 44 360 76 6 6221 5771 g11142450 ahrC protein lasclliv subtilel 72 5 M 1 9 Is 16065 4592 g112201279 1AT0062201 TrcO I(acLlus subtial I 6 506 1 72 46 3 414 7 14 I1i26 112309 lbh IPIPeD23502 Iputative PriA protein leecillus subtllgi I 7 5 2411 I I I I 1 4.61 1.i 50061 KYinl gene product ISwccharomyces rerevislael 72 50 1 3 9 I I7 516 764 gII29611 lkeletal muscle sodium channel alph-subunit r(Equues eaballue) 72 J 3 1 2004231248 23 Nov 2004 TABLE 2 S- pnevwmolse Putativ, coding regions of novel proteins siiulaer to known protein.
onijaStart Stop amath jmc gene naeI elm W Adept I jnothj ;1 2 004 11;7 (1111 1014131M? ;putative Asp?) protin leacMl*us ubtiliel 1 73 40 1 m6
I
109t 1 1451 ILI loil II ilkatine phoophetese regulatory protein ImaCillus subtilial 5 13 M2 125 IgnIrl diols1l lhlutamlne-blndln, perlplaaelc protein lSyneciaocyatlu sp.1 12 46 1198 1;35 j 24 6 I ?2 11 9 5 iA05 5 I a 1 II I 16 f9 11142 v-type He-AiPoea ltnterococcus taltiel I z 46 ''51 1 140 Ito I 160t1 52 03 jgj42Z luc £A (Synechococcus sp,i 1 2 46 its
I
S toC £0 47 I;nlI?~;i;22941 IhYPocheicl Protwin 19ecillu btilijI 72 As 660 -U I 1i I, 5146 101147232 ITPP-deoendent atr~ium dehydrogenese beta-subunit ICiostridluq eagnamI 1 72 1 5 toll a£ Si 6' 1943lic INADIPIiI-depmndent dihydroayscetone-phoaphat. reductasa Ieacilw subtiii 72 j 34 1 141 111 1102614 9 675 JgraIPIDjdI2lL61 1YqgH Bacills eubtIlleg I 12 1 505;" I 13006 I 94 j9i11)16i70 1AEOOOIIOI e462. 24 Pet Identical 144 gol to 315 residuemes to.2 1 I c 7 inmbincling prottin PBrt.3ACSU 5W: V32159 1451 aol lkscherlcbia 62 ;unknownc li ac hn t e l m coo cor I. &u*1ll 72 5o 4
C
32 1 '3 1 3407 1gI17it99 Ihter lo lnati s oiu n e cehelc e 1ol 7; 50 121 I II 1 3 12 449 I 2 0 1 prti hie lCh I IIsptl CBloatr I 40 G 1 44 I00 j 614 110481 [oni7i4 6399 l Iunnow lpn el ngatt onlu ct rlesal ihi Ioi 11 so 2 1 a 7 2 jili 1144 Iall iI5 e I3-ooecrl-kinoe Cere p0rosp teLaPn aareduce i1uh lacoae1 50 1 132 4 I 5 43 li Il'538 oc~ 3 5 4t o I P j j D I 4 8 I C t e r o c l l u b t g c o d A G j P 2 2 0 1 5I f lct l u Ivt a S4 1 05 6 7 1 2004231248 23 Nov 2004 TABLE: 2 T. pneumonia. Putatlve coding regions Of novel proteni similar to know, protelns conlo Ir OA Start Il Sto atch match gene name I eaI aidnt Ingth I ID loa Intl In aesllon nth I 1 4 1 5120 I a213 IgnlIFDAdOliIS IVqG I9&lluI aubtLIs11 39 1 11 540 n r t I Io] F -1 a I9 I 5 Pi111 1 simile; to the 20.Thd protein In TETO-IXQA region of 5. subtilis 11 5 4 II No Ii"2 113630 ig11531036 I$F.OISS ltsecheclchla cll) i ii I 48 1041 S 1 121 15015 112676 111 145 Idlpoptidyi peptidse Iv ILatococus latl I I 55 2340- ;10IiFOSII urface iE3ed protein ILactObaC8I e rhanou l I 71 4 1 ot 4 I505 i 2156 ignhI1 OiDl20 l IppaceIiVllue subtila l a1 i 5 3 ll I 6S 1L559o rd Ii5tail polyperde CA I-ll ain11acilua eubltl ns 1 -II 1 o 1541 II 121)J 0597 130360 IglIIOIO2S 106F_4l4: Ganeplot suggests Iresahl t near start but none found I 1 I 50 I 221 1 I: I I _l cherl chia, cliI I I I 4 tilt I 611f 1,1 1560133 llne decesbcwylase HeartLhue eubtIlls) 1 '13 46 Jia lis I a lir to ta !bta-&ienlne synthetase encoded by Ceneank Acceslon Number 5345), ctn s ArPIGI? binding motl) Iftearamoci buracria hlorelia 1 a il-I i 6 O;j) a l)0 4594 law) Isettus ncrveglcal I 421 111) 1 I rll- LI Cnha Ip. 1 14 4 3 1' 4! ;511 i; I)t 651 i S Icrlyl-tRIIA synthitace ipro5i lIaaruopbllus in(luenteel ISl 2 1665 -I I 9 I'1iI 6516 lll1414d0 lanness permeese subunit Il-H-nan iEacharichla cciIi 71 I 5 1 4 1$ 462 304 Ign1111010132063 (I-.-elctcsvitrrnseresa itrpteoscu pneusontaei s15351 4---4 1OS 4 3619 4107 fg1113)11 0 iAFOi44iOI PpQ IStrepkococcus mutansl 1 1 58 I lI SI- I 1 II I 11)55 11355 I giliSt 2si tL2o I Ltteria monccytogenei I 11 01 601 I e I3 k 029 isi 1 9giI)i0iOI lucA, IRhisobius mlilol I 71 I 55 I 9S1 11 6 2 :4 1205 11 l6 903 1glutamine transport TP-bindlng protein GLNO [Salmonella typhimuriual f I 50 64; a 13 1 1 901 0 1 gnh1 I 5101Id1M02049 111. inluttU a hypothetical ABC transporter: P4400) 19141 Bfaclklus 51 j 195 I 4 I 1141 237 IgI1iliii 1 I 1
A
00 15 3 Aoples pneumonib Cruccse-bisphosphae aIdolase similar 71 4- 9to SwisgPro t Accesion Number P13243, from 9 subtlls (Mvcoplasna pnoumonleel p 140 1 5 $4315 I 4973 Ignh01IF IdI601 ho'o0ogue of hypothelC4L protin in a rapemycn synthesis Oat45 cluster of itci9 663 1 Streptomvcee hygroacopicus lSeclllua subtI I J; 1 141 '3 9 6. I IIAS CldlBOICCI ruwD0148 ctiUNC wIO N UNIYMOW. SIMILAR PRODUJCT 544 E COLl AND MYCOPLASIA 1 I 71 1 F 1 EUIIONIAC. Ibacillus aubtllsi .4 I 2004231248 23 Nov 2004 TABLE 2 S. pneumonia. Putative coding regions of novel protelnt slilar to known prolein I -5-to mach gene name I0 20A I Stat t Jntm j Ir~lnae f dent I ength ID To n) I I nt),nrl Li~ n~ -7 F-1 Ig-j;4ii3 ItIbososlt protein L13 IStaphylococcus Cirnrousl n 1 52 1 165 19 li 1 2205 3504 IV1I5351l fCody *rsaiiau. 2Ut111t1 153 1 201 2 246 j175 1g111737 J 1002)) hypothetical protein In purl 5' region lEaCherichia coii i S1 1137 4- -S 209 1 2022 1141 1141432 I ogn product [REsherlchl ol 73H I 43 1 32 1 310 j 9 111 371 11119116 l gen ;prduct (Bacillus eubiiol ~1 41 1161 1 23 -6 1069 1 nf 9j400 lRF gene product i3acl3;;a ::btilisp 8 ]1111 I 1 ;361 lqi3S5;S61 I ri. n I o Ida Feductase 91 subunit Nycob ;ctorlua tuberculosii I- I13 1 2001 112 2 ntIPtD1I03I32o 0 YqgA I9*Cillu, eubtills) I soi910 1 1053 jgnllulfldsoo,64 homologue of aepartoklnese 2 alpha end beta subunit LySC of 3. ubtill04 I9 I 2 1g6 712- g l known Ifhodobscter cepaulstu s 1 3 1 46is$ I 112 J 231 ;565 I r dAkbf lciltu s subti 33I I 5731 1 Ii IS IO Ihvpothtl c, I protein ISP:P32i16 d Iketl snococcse jenna chai l 11 i r 48 691 34 I I 6 gills911 I hvvothetical proteIn 4SP;P422971 Iettharioroccus lanrucichi it is 13 621 I 374 1 *l 39 I 11127526 ;clumping factor Iltaphylococcus eureusi i ti 2638 1" 1 '1 ILIi7as IV-ping factor (Staphylococcus eureusl I 1 3 3 637 69 8Il~ 1151 i INa;16;46 Unknown t3@-:1lus subtIisl I o UlS 9 075 Ignhl Ppo2iS l jPut i rLon dependent repressor IStaphylococcua epidermidksI I 70 j 46 ish 1 Ill 132024n 1u25 I4POdo~ lu teined open. reading Erese (Bacillus elearothiermophIlud)1 705 1 77 111 11 71) 11O( .i1 1)19 glPDdO0Q blotin carbosgyl C@.riez protein of acecyI-toA carbosvLase lSYneCboc ,I 71 6 1 1J I I? J 4 I 2610 I76 Ignlprojaioa Ir IfecIllnus sbtiI
I
3
I--
116 IiI292447 146;9 SLI2203 7 JAF00 A 39 1 AlPase IlACIius Subtlls 1 14) 41 F 23 1~ 1195 I~Stz 1v3 15 Ildrlt~cp I~acchaeyces cerevihibel 1 70 so01 5S13I 0 F ;3 1 353 l I' bindhing Protein ot trensport AfPases ISecillus fiia'us3 0ji" 4- 6 4.115 3 1 b 1v 10 dpnd~ rprpor1r~hlo.~u .~drid r 70 51 ))G l 2004231248 23 Nov 2004 TABLE 2 S. pneumnlg Putative coding regions of novel proteins tillar to kno. proteins V C-or-- tt-'lig I i star Stop w mch gone name a-m Went lanot- V 1 nt j Intl 'cesson I Want J nInls 5- )70 (plllb~~J~- 1103 921 1113629 Inq ;tnd d D-An banding roe ino unldantlseny yu n. hrl ugin1 16 1 l t 33 I QGJ 951 ;ql~j1II~ homlgouF to D-Mmino acid dehydragenbse enzyme (Psudoonom struginoal 70 50 1119 I )i 6 4312 1o112055l IsCoY0D [treptogo ccut gordonll 1 1 I2~ IS as 111427 IgLjSaicis Io1rJ356 Icscharlchte colil 70 41 joi 11105 Ios 9846 1gL111133i4 Irlbolauln-pecnc deainas rioaclu lurpamnae o10 to i;1bollavin-specitic doaml,.,ose jIct inobocillus pluropnr~urnlaql 7051 1209 1 42 I 2 1 o1 4ual va Ilaclus ubtillel 70 5L 1333 -0 1-51 1V-3 I 43 1 272 1 613 g11i53483 I1talne trnsport IP~bndng protin 9 IHethnococcus Sannchiii 70 4i 4 74a.
1n i so 11 I I ID) UPl0Idi 4 subunit or ADP-glucbss ptroVhvsphoryIsae 3eBcillu s rOhstaophllu-i I 70 I4 1109 1 5 67 96 11IP3d100C3O Inacpullulenase a6cElilus sp.) -0 190 3 174 7 igni IPIoIe bo IiLnooetidoe IbactO Ccut lactisi D 42 I 100 l~~r l 10 I 8 10801 6 2;3 a lia77 I011 127074 ISNF jaillus carevel 1 70 1 Si I )iII 41l I 7 I 6802 le 15707 ts thionine Ch&sw-Synths* 1oct33 lhasvnOphiiua inllvr9eep 1 0 1 53 1 111 1 7 1- 153~ i lIFlDIdlOO7 4 unknown tbc lLus eubtilisi5 Is 7221 i- 1 41- 4- a* 11 100u 1 10911 I1i131703 9 i;AC0005241 carboxynorspormile decarbcayiass inspCI lcellcobacter pvioril I 70 S6 j *1 1 l5 I1 8-124 Igliuig oalar-ose-1Puiy rnirs srooocsRtni7 "I 3 I 34*4 I *125 Inl I9i~0~ Iunn.F 311llu EA -71 bacllusautiis 47 I leoo 4 1 lIG 9369 I 724 onhPIOe0lSo Iputalusl pkn2 protein Billrercrus aubtils, 1 70 52 J 2048 t9 1 o it 211Pi1157320 6 1$K-suatie renrosinoslasills tot i m llasmophllus inf0 S2 a146 I J 1 3 3 1 5 1 1 1,0 6 1 9 1 1 4 3 3 5 0 I A 1 5 0 i s a c c h a r o m y c e s c er o w s i a e l 7 0 1 5 9 1 5 1 II) I 2 S74 1088 Is-I4)8)0 [AlSO Iaccharomyce cerevislac 301 3 61 gnri P Idlo 595 unknown (SA ilux s bti 1 701 45I 5 1 S;9 4 2 1 nil I e 6# 4 7 I actep c e t iv e c a lc i u m e n t r y c h a n n u l I i o t a u r u s j I a 35 3 1 2 1 52 4S0 34S hllPlINl0141 lYqaT Illacilus eubt Iils!0 7 04 "3 I7 4368 I194 leil2*93)12 IAV a It ul 1 1047 -I[B ls a--tr--al I 1 1 4 -6 1r4 gintl2 Ittrapto6coccua pnaumoniseoj I a4 4 1i I 12 IuI1ai1t18 1- yp f~.TPG j~trccu ia 1 70 4 41 1 43 5 1 0 440 J 3 1ifn 73 6 Itranamembrane protein Itscberlchia coiIl 1 70 1 1 4)8 2004231248 23 Nov 2004 TABLE& 2 s. pnunonlae Putatve adinO regons o0 novel proteins smlar to krtom proteins Condgj(OF Et9r) Intl j lt I re 9~n F 0 1, S tar t S to p M t ch A te g ne 0 .e Si m j Idant leng th Bali 3 j 4 1e149535 loeianlns actlvatIng Ciryi Stactobaclilum casell 10 SI 1569 0 326 27 lenloId;ID ob4O j. rol ypothetleal protein; P)3a0S 12671 flactilus eubtili) 70 t 480 21 1 217 I2869 IqnIPtDlJ ii;21 )recOAP IDictyoutetlun diacoidung 70 45 ;I8 13 1 li$i 0n lgill]2874 unnwnlhodobacter casulatusl 0535 5 Ia 121 l17554 l1845) 1en111201 0 72334, jhypothstlrai protein iheclIlus subrillel I 9 1s 44 1 46I'll 1 51)4 1911330591 14AF0067201 ProJ iflurillus subtitl I 69 48 1 11431 l 22 1 1 199; d 2I t lent I omncillus subtilsSt -4 f 1 IlPI dIG 1 Junnown btolituu subt Iles 1 7 I 5011)1 5346 lent I PIDIdlO2on2 I CA00014161 rUNCTION LMON 0 10&Cli le S jbLl list 69s 21 4 51 9 -,LA I 1 I I 10?0 91114iOo jlalohot dehydrogesag ieC 1.I. 1A l AlCaligene, eutrophusi 11994 Igijllai I I 4io it 4 40 1 I )NoIiidy junction D$A holicag *iruvA5 tHamohi lue lensel 1 i 1 45216 1 1 1 l 11t 42 1 12$ 11 l 1e11i9 73 53 lDi A-3-wat'. iadcnlne g ycosidasm 1 1t 911 I mace ophilue in luen sac e 1 &o o 1 576 4 1 6 I941 I5490 lgIISS ui8 stai ch Ibectegal lycoen has I yinhas l cllus subtl l 69 I 7 148 I3 ;3 I 4 13 249J2 12i1 Is enlIDolen70 jhypotherical protein ieaciilua jubtllis) 1 69 36 J 6o 149 6 4 1323 6$425 10t129611i stimilar to' phospiiot rans erase system ensyme 11 i~schsrichle coati 1 69 50 1 J 1~ p-me-as- I I I 9 342 itcherichia call]i II simlIpjIer~ t l o lcaitognes Outopiu p2I1051 D-ribuioa.-S"phosphateI epleeress j 4 49 75 35I4 22 0] g11423 PlliA~ plywrase I~rh)cilg. suil I 49 I 50 3 Il1 118 In IIaPlOdl Ihypothutical protein ISynchocyrLis sp.
S1- 9---Incoh~u ~lun~l69 5 -0 As- 60 1i~os I~ lpu" Cive I. CtoOcll ry cos Istisi 42 465 I l 11 qn I ;icon J;1 I I -3220 1 I-sl ,4-galectosyl t- rrsns fera a I- trsptococcus pnt v nlaa l I a 39 j 44 I 2004231248 23 Nov 2004 TABLE 2 S. pneewone putative coding regln. of novel proteins simile. to kno protein* Contig jCXlcf St art Stop Match match Went namj D £o Intl nt s ecacc n mI I IIdent Intl 2 110 as 0 23 1 11 Io IdID0649 C rL o a 69 I 3 I I I I ILii EoroS gert.n produc Ilactococcus ]scal 1) 65 I 237 I 1 I 5 t 3622 410£ 113136050 Ifcoce 005.-on protein Ituc1 CHeanophlwa lntLuenzael I 'I 12 460 i t t j 4 71 1rtCl324IC334 IhlaC homolog Bacillus aubtill1 II Il 1111311~~~ pill~ll'-- I 16 11162 11331 911143l2 phosphorlboeyl glycnsede COrmyltransferagel IUR-Ni IBCiii Subtlll ii 69 rg 4 tit 2 121 I to6 Ial ,t I1N-rutponi. element binding factor 1 [mue e4ui1.wel I 1 4 9 9 1 1 S 1 1 6 1 6 1 4 3 7 1 0 1 1 5 4 7 1 1 a n a e r o b i c r t b o n h i i e o I I d e L r l p h o s p h e t e r e d w c c te n a c t i v a t i n g p r o t e i n C o r d i6 1'l1 9 i j S j 11 j (7 j9L11514112 I IltamoqphI lue Innuones I I I 191 S I s 1 J24 4022 Ign6IP1pjdioo262 ILlur protein (Salmonella typhiarliusl 6g St 74 I 4- i 2 J J I 118 4168 1901Pb~0141129 IVJJ IbacIIllu ewbtllipj--- II J 64121 2669 1 I 65 J III 1 nl IPIDIdlll Ierq incillusi subIlial 136Ira I C 1 I~ftLIrloldloOS@l unkno feacllas outbttllxl 61 Iss
C
149 liz 9136 159655 19 OSIS11 1liioloyy with Z.-eli and .aeruglnoe lyce. gona; product of unknown it 5 1330 functlon; puLailve IPiewdomons syrlngsj
I
-p 13 4 )191 35) Igtil13o ilarno Illacillus uttilisil1 9443 365 217 84 q 2 lnhIPlIld a0S12ite mperature sensitie Cell diIIin [Bacillus aubtills1 65 49 1161 4 ISO I 143 Ilph 1m a51 lel he- v iso lun dentii d cloning wanton rl i s I s S6 12 1 I I I 9 0 I honuclbotl de retuctaig R-1 smell subunit 4cobec erlum tuberculosis fi 1 53 966 4 -r r 236 1 I 2 G61 IP£10225I022 InodulIn-26 sybtes. 1 59 1 4160 24 t- 660 typ IEnterococcuo hrel? 69 5 Isla 655 si I 51 1 640 1766 i1i14o5 IMtnhymas Irlaophlluu Influenzaee(1 69 1 41 1 1107 1 69 42 2307 243 2 65 FI 10nIPId01023S loaFS (barley yellow dwarf virus) 69 6-1 -49 211 30l 19367 1g112265721 Ieacrolide-ffrluw protein IStreptaoccus agalectiogi 1 69 j $a M2 ipllr.l- Ln (St~ r lc~ir ;Ila 1-I 1 ImnlPI *a2a2u Ipaptid., datormyles ICloecnldium ballarlncli: f 41 55 I 8 4 310 1 J 745 J 3 IsAl 91975 clumping factor IStaphylococcum aurquel I 65 21 747 4 2004231248 23 Nov 2004 TAB EI~ 2 S. PntUmOniae Putative Coding rag1pg at noual protein$ simllor to known proteins -h Conti O ISop mtc match gene name eu Idn ID In t In', M~r in I alm I Idont I o rit I h %F 1 -ar tl I I I 3 1 4 210 I 1i41d 0649 lOC-cadhrin Iroaophiia eAnogaa1 tet9 collis 1 1I II 1 2 O 04 300 Ign 1r1 oIdtoiat IA8C transporter I lYnec ocycsi sp.I1 I 8 61 3 g 10o3 12 s 1 2916 440 lgiI dielz Ibietidine kInasa (LaCtococcus lactiscremorie 8 4 31 1p r16971ps5 s Irlbobo;mi protein L9 Bacillus etearothenophiius 1 48 1 6 1 I II p'ra1u'uei A11AEp0011 o5; This 530 as ort is 33 pct Identlni C14 gapol to 525 46 4 165 resrdues of an Ippr 40 ii protein TilS-MACIN 544 P44608 lechericha Celli 1 12 4 479 I 8as 101155336s acetyl cholineaceraae IHOmko sapiensi fg6 182 I 12 3 I 24611 12S)97 1113962 IComE ORF3 i~aciiiuza aubtijhe) 1 61 )6 736j I 45 911131 66 1 46I1 51 600 AE0G46~hypothi caul nnl Z61ti n ibagr neg ne einI 6 ie I a Iri I(1 Ilr S4" irlci cl 1-71 5 I 5 rI 30(61 165Ivt.nodm14 ICC Site7 Ho01 19133 Ifiherlchi cll I 68 I st I 614-- 5633. ;0s 1115 13 ;C Sio te Nemrae nt eri prtin ollh il 68 55 aI I 1 16 1 081 9 11 5733sI15353 taembrane a eIniorhai rltJo Ofiemophilu Influentlo 68 S1 3i 68 j- Ii 12* 17560 i i t 7 i ps-lt 54 occ iclus bti l s 61 51 3188 89n 5ii IIIIO *3 Ig 4I1 45 B rnpr, (ych fyC p I 43 -1 1 im Igli N 121)5 na40c)b i predce coin reion protei nlexhporoc(Baucillulsn chii 68 50 840 1020 I 1 7 ltsee Ic i p o s coili 68 41 243 1 6 1 1 7 1 1 0 1 401 1 1 gp l l 1 3 7 4 I p 1 d g o n e p r o du t I IB c l l l u 3 o ub t l i s ]6 8 l i e 8242 _I115___G974 a- ln roto n pot* ta bomeros Pe -d mana put dal so 3 22 1 6 2004231248 23 Nov 2004 TABLE 2 1. pflumOnj@a Putative coding rgions of novel procainr gil Ar to know.n pfoelns I IDIn I Int) ttl I Cnt4 .y t IStp matchion match gene name 91 141 i) Int Zi jot In Icsso lags 00 41 L6- r in nails I a 4 I Si r 2 lej I I)9 Igr'iI~DJdIO ooui ILVAI protein l~almooala typilwur;umI~~...~ 68 4 a I 1614 )726 I;Ij.ssuo It~uao ry pLotin Irtococ 117s2; 91 411 5- 47 4 2 1 1922 V114 01 Itrn smam n. rotin I I g 535 2 29 lun14505n a II) 11) 1111111~1tra nsmembran&n ProtaLn IBsacLll1_u__s 1 bllla 8 1 1 11 S 8 I sla 66 6 I- 4 WnillOP- loose hydrolasa 4- ftele -bac~llug 4 58 661 4- 164 3T S n I4 1 le 551 i ot hmae cel g otein ba tlus subti iiaj ll.)I ii I I 661 119J le 3546 I 14 IiCSsssi h p tetCa Protein lo cocc 4 ,s lcticis inep le te pesi Olj 1 J 4 117 JI ri S 270 4 I 4241 *j 9gi118i146 lepoe cost protein lie8CIllum subtillspI I I 5 4 1C 40 Q s 96 t2 jill58 1 4- 1 0 24 j i j11 5 I I 0II I d 130 hypothetial Protein inlanallaa hyohtlit 4 5170 gflijID~e~lp4 6 iellr i,,jj litocondril 4- tr0nu19111439 52rabldopsj, thaliana) MU Crev;trsa 4 s j*313Q 4*45 A--GS2J~py~gp l--I I 111 1I o 3 1 lie 3 7441 14 Ig14301937 ;o;A n Ibea c g llu uC lut lp3u;;.;;Tl 1 1 irndd-tfflSU-nyiIe-scl--ch ISO I II 2 11(111 IeiI$606tg ILC~ EIU 1UU I 44 37 residue of an a-roa. 223 I proteinY6BA h.hnSW: PS2 t 3) 11 j 77 l IPlOel iggoP Ip~~uteiva ort(Bacll utl ri 1lls I1 I 236 1 1 ')it 1 108 IgI1, ,1 01 ,,,p1X w fbacilllre stoborlnarl 6 ad 1 637 1 1 15 10116146 thymidrIea esothse (I aoe lees).)ichh rl 6 I 1)7 1 774 Vn"1j~l0[ @2 199 u I a ef f cil us subtilisl is 47 774~ I 2004231248 23 Nov 2004 TABLE 2 5. pneumonliae Putative coding regions of novel proteinrdlmilar to knoun protein, Contig Rar Start 1 Stop I mrtch meatch gene nae sh '1 iden Ingth I E ntJ I inti ecestoni i in I8 6 2 ]417 4 k i|IS)133 (ouler menbrnu Integrity protein (tolA) (Hemnophllga inluanzeel 68 SI (4I 2 4 22 4497 |gi592161 I. jannechi predicted coding region 14.3l07 M teChanococcus jannaschill I 7 36 102 4 I ilt I |6 59 9 1139 5 JiAX000220) si1gnal' transducti on regul star- easctI I.lu asils I7 is I I 5 1 2 301 )1 1gi1333385 4A0005471 pare-Aminanensoate aynthoraae Ipabel IHelicobacter pylorl) I 4) I 94 172s S 19 1;8063 '575' IgLIiSIt lipb-ad gene product [Bacillus subrllsi I I 41 596 23 a j 74194 |69A IgII1920962 IpyrrolLoe--corboylate reduerase (Actclnidis dollciosal 1 71-1 1 so* 9 00 5 9072 19l1468745 |ptCR gene product (Bacillus brevisi i1 4 1 l3 1 I 2 i19S 565 j1423512 I(AFODISi8 PkIl IfDItyosteitue dlicoldaum I 7 49 795| n2 1l I 6649 110150 Ioil 029 10PI gene product ttacheoichis cclt 1 6I 47 I 1302 16 16 (114430 |146 gil$193142 |AOC transporter, probable ATP-binding saubunit itethanococc s jnnaschiji 67 41 717 38 9 4958 1 5192 (6gni JIe2*14803 r72e).) ICCenorhbdlie7 elegens I 67 47 -a-li 6 36 21 J-n I|,142 IgiIl 707 lcRF_o2le IEacherichia coill 1 67 5 71 91 41 9 1102 1 tI31 g11551710 Ibranchlinenayme fggloBI IEC 2.4.1 M) I18slllIus tearotherrophilusl 61 51 1246 1 4I 23 11344 I'1514 gl1413949 lipa-25d gene product lIacillus subtilikl 5 6 5 30 a3n so 2 I5)73) 95 lnlIPtDIdiOIJJO lVqi0 Icitlltun subitllal 67 5 622 53 1 1 :11 3 il15142i1 liabrial transcription regulation repressor jpIlej Iteemophilue Influen se) 61 0 429 I 55 I) 112740 111546 IaniPIDIet12S0 i|5r Y0017c ISaccharnoyces cerevilaie) 1 I 1 1S S61 9 9210 D33 1 nlP1D90e2lI lAtP-binding cassette transporter A (Stephylococcus surausl 67 50 66a 4 III 1 2 414 41 1glI1l9767 |Ivitellogealn Anaolis pulchallus) I7 36 -1 (9 1.111a1 4 I|;thoaph hoenoto vl7ruvaten nas -aphsotrasferse element rn Is Lactobaclie o -7 I 43 I I I I cucvatusi I I 1 2 15 3214 11276746 fAcyl carrier protein (Porphyra purpurel I 1 3 56 j I I 8140 6109 Igil1347744 PSR lEncerococus hiraal sl 45 I131 97 I 3 9116 151 pnIgt|PID(dlO223S 1(0006311 unnamed protein product 15treptococcus mutans 61 4 31 S103 I |61i 113 1g|1112765 |sItcce gen product (Eschechla coliJ 67 f 36 3 104 1 11p0 PS 1 ||11482t |l-D procein Ilaemophtlus Influenaei) I s 671 I 13 I 1 56 1 5956 gi1675so Iputtive celllobloess phosphatransferee enzyme III Isecillus eubtIlls I 67( 44 I 337 4- 2004231248 23 Nov 2004 TABLE 2 S. pneumcnle Putative coding region$ of novel proteIns similar to known proteins a ContiO I Stop match match gene name I aim I ident length 10 ID In tl I occsion iI Int 115 7 1607 I;11464 1 2 lblose phosphotranlraae enzyme IW Iselillus starotheramphluel I 67 S1 37 6127 7021 Igill7326 Itrensport protein hfecherichis ccliI 67 46 107 136 I 3 1 2215 1 2659 inhPIIdOSh1 Iunknown _1aclilue ubtilisl I1 9 I il I 6 11 2694 d C I Iii 101219 lph4 ll n kinaeN Iyntcto e lynachoyccre lsp.] 67 41 102 I II 6 1 111 I9nlj9lo do2 Os1 1 Ianr3i9 [Sd i llus ceusi i t 466 1601 encoded by GenBan Accession Number W99400; inactigatIo n of the orBe 17n leads to W-eenaitvity and co decrmase ot homologous recomblnalton (pIMrmtdic tesl Ia tococum I I 6 13 )0 4505 IunhIPiDIdl0l 1jqR hscciilus subtillal 1 67 41 1 1407 1 -a61 .I 107 I 11 1 2.23 Oita 2679 gI usDdcaall r lVqk Ibacillus ebll 41 41 a I I lljh 17656 666 V1g1153641 loneumoroccal surface protein A (Streptococcus pflmumonisel I67 1 50 I i 3 1530 ;'723 Ii1114*975 I)bc IThermoen*rbacCerum thetmosuifurigtnesl 67 46 17794 C) 169 6 599 1421 [gnlP0Cr a2$176 Ifypthetist proten IOecll L us Iubili 1 7 6 7 I 05 1 3 I1663 2211 116011 IOrtFo161 Itschatlal clii 4-51 207 696 3456 1g11227437 IDcsalron r Iguletd IlpoprteIn precur r ICrynebscterlun dlpht'elsel I I 41 I i 7 I46 373 IiBSS Iputative cahIcI lose a hoephotrsnulersse, enzyme III (SaCIllus aubtillel 1 67 42 364 119 I2 71 663 1911164343 ]a unknown scillue subtilisi 67 4 3 1 22 25 I1 2 74 1i1151718 JspA Iltgeptococcus pneuwonlael 67 43i 746j I (I 26 ia 211 1g112313647 11A51005651~ Lassrgiss I bh.~os nlteiicobaccer py loli~ Iurl~ 67 I 1 64 I I I J5 1 5 1 InlIlOei 7S lun1no1n oillllcobecterlu tub erlc u j 6 5-6It 1 I I I 19 3 19n11916 54 Iunnon IeclLs sutiprc sralstl 66 43 7 -t ;198 5146 gni P101055 79 ninown IKV8abnC ium Uberculosisl 1 66 56 249 1 1 2211 gIjPII6756 4 S unknow IIt 2 ae shlu ck rot sei c 66ltrc cclii I4 ei 3 14 3 o SIM 10005 ;g I~.13956 11101c [Baillus Subtllsl 66 so 153159 I S4 74 qliI8544 jlAE00201 pageShuk prtei C JSlh9jC6a, oli I- 1- 1 1 1 i 9 U I S 1 1)11 1125f1 1g11576291 Ittebrial transcription rgulation repressor Ipllli l liacmophllue Intluentsl 66 6 606 I- 2004231248 23 Nov 2004 TABLE 2 pneuonlac Purachav coding regions of novel protelns '3silae to known proteins S Contlo JORF I Stnrt tch 90ne Sima Ident Length, to In I In %ion .91,n Intl 1 I 212 I 141 1nP 101e 1 01 nknomni Iiycobacterium tUberculosial 41 1422 I11449 1 1200 01. 1520401 [cr12, j ;tort, codon Ilacllus thuringlnaim 46 *2 270 I I 1110974 9197 19112314131 IIAEOOO£S33 translation elongation factor Er-tre Itst) CHmllcobacter pyloril 66 49 j 1082 1 I 2 112 J 14 IgnI IPIOIdIOI24S JIAMOS 10055541 yabP l acIhiua subtilil b a6 is I I I112 1 is3 1gi1 9 6 Iinal paptide type 1,1 lLactocaccua atl) I As a I3 400 4 I I I 182 10 4 14n1Pf1dIe20411 I3iU g alut1I phosphate reducteae Iltreptococcua thernophiluei I 51 1269 I 2j3 173 19n1P012191 IIlacilus ivtllie3 2- 0 I I1.10 1 97 1 12 4)19 IAC00 21 1 RC transporter. A TF-bind ng protein IyhcOI leiliccb cter 6 40 447 o 1 I I 1 64 1 gilli Ior I 18acIlus culdolytlcul 1 64 I 764 I II CIsII 12~34 glll099l 44% IdentIty over 102 resIdue; with ?'ypohutlcal protein fron Sy6hocy6sla 6i 4 1119 Sp. accesslon D64004 C0D expreasion induced by environmental tress; sam I 1ilarity to glycomyl iranaferesear two potential membrane-ap~nnl I I bu 111Isellus eubthL JI 6 1 5416 I 4706 19P31911e210724 cr1 lLmttcIIliue aaks3 I I 23 )5 o II14 2 9IlhMf Itlts73352 IsoM. 1annaachil predlct coding region co02cu Is)thanooCCus )amnasehkl I d 429 1 36l,11s a !5 I~t hl~ 491 jiioi sincl-a ocats p roin DiIV Bailssbila 6~1 3 604 3-6-T O- I I l 1 I; -a-cs i 7 1 Hon a iu 1 5 1 9so1 8! ,l 741 dal Name, splansl tI r 9 lt IanIIPile432SI t IhypOtht&3I protein (Bacillus suilisi I 4 o I 192 I46 30 210 4 11 1 g322S1 i PRfl00590i Ypr ihiaobium op. NC5234) 4 I I 3 s I I t]@BS4S 1 cIo cel -b ding Ia aupyiobactcr jelunl I1 46 I 2 60 4 ig j I Idiot 11 p 62 c~l-binding 4ri s Ic to e n~p I od ysti sp. 64 4 150 I I 2441 11 IgnlI~iodome11 Iswtmnen-bndiae periplaelc proten IS~'chocyai sp. I I 44 I icI II j0 I 9740 91) In1IPzDI s;itd led, gene prodwct Iltsphyococcug curios) I I 553 72 I L 11993 11 2 IIAooos4 H. pylnrl predicted codlng region 4P0049 Ireiiccbactgg pyicril 1 *6 1 if1 1101 1 I ;l9 1 1 jl gi l Sypot kst I Illoaeophilus Inluenzael 6 1 3 79 1 p I. 1 I I 1011137462; Iraicoinaidos mononuclaotLde transporter IpnuC IIdtiophljs Inlluengasi 1 66 i 1 1 11 5703 4215 19114131? Iput. EDO repressor protein lEacherlchla cokl 1 61 60 1029 2004231248 23 Nov 2004 TABLE 2 S. pneusonlae Putalv, coding aDlons of novel protins Inhile to known Proteins 4 SContig IORf Start Sto atch jath gne name I aim I ident Ilength I tO IgD Ii IntL I &esLon I LI 81 111 6I:;SS129 ;ta~ L11 i 9r1 f factcor iaacillus SubL11131 66 51 kill I I 3 POS 1239 1pir1C3361c314 IhJIc hoed.
9 macllus subtills 66 44 a15 I 6 3~f0 4 947 Z5 j911663s4 lahikim a linesI ILactoco Scs i Ltii I I 441 31 I 1 01 6060 IgiIZO9S7li Iputatine timabr~ai-associated protein lbctinomyces fleesiundLl I 1 52 I 43 111 1 931 il ]OILS 1 19 go cillm t lial 1 66 41 1 94 93 I 9 1 1 1 191 ,1 7331 14 i 9 ari I acl c. iubtilie3a 6 91 4 S 71 rsidues o& npr o a protein YCsFI.CSU EWI SW IJ172 llcherlchia lputaive cl division protein tesw Intaroco ccus hire 1 66 46 1245 106 134 11576 1143) 101140027 Ihoetolous to 2 coll aids lbacillus eabtilll 64 52 673 I i3 I 1736 1593 g1i1609332 I Ipr& iltaeophilus intivena.) 66 43 I 176 1 ii I I I 3 I 202 IgiI12716 Iltyrip laccharoeycae Carevisidel I 66 I 56 3001 r t 013 a V I I 1)2 3 3 5r6 QInI~Idl oIzs YqiV Ibaciilus subtll I I 61 16 I 5641 1 29 i3 11 1044 IenII lI il M lORP3I illluw aulatils Io 1 66 8 ]111 11 i 1 9 31 1i1726148 lorouth easoclated protein GAP-43 IXanopus keevlsi 41 13) 6 J (46 400 IgiI484661 Inn relatd protein ISach6r61"ceca sleatieled 66 39 1 67 1 10 3 1136 574 I I 5 ;Pho; gene ProduCt 36laci~ijs eubtilisi 1 66 1 16 1 663 1 0 Il 16316 1154 191 ei 18 I. 30-msthylenetetrahYdrvtulte reduccese Itruinla csrotovoral I 66 41 515 t I 4 112 I;;n L 1136 In11 1d101 40 Itranpocase ISynechocsis apI9.1 56 91 S11 1 615 61j472)6 JTPP-dspendent acetoin dehydrogensee aIpha-subunit iCiostridlium sanues 66 4B 984 149 6 4436 5140 IgnhiPIld ;i0100 IPentoa-5-;hophet-3-epiuerais (Synachocystis sp.- I 1 I 1 9 I 11 071;4 131171 114311 Ipyruvate Iormate-lyeae activating anzyme IAA 1-24$1 iEscherichia ccii 66 62 4 4---4 I 166 4 57 7210 osnlI ~issi9 lori lntrococcus feecalilII iss I )a I I I 7 2 2340 259 gnhIliD 1l19) Inoloe gycopretin gplio lliusen iwvnodeiicincy virus type 1i1 46 Is 210 7T 1 3 ISS 3616 1g34933A gene product IBcillus eubtiliel 1 64 46 j 311 I~o I TI 3359 619 I0I() l11 5233 55 114 sis Ii 4 jhrobink receptor itricatulus ionglcegdatuui 46 38 1 23 2 2 0 I 4 I- 1 7 5 I -64 2 I 6 l l t r n a t e n a m e O I-D o -L l- ch-l-h- -o I- 6- I 2 3 4 2004231248 23 Nov 2004 TABLE 2 S. pneumonia. Putative coding regions at novel proteins slel1r to kno-t J~rctens C~nntig 10 S 51rf Stop Math astch gone name MINI Id n I n th Coni n I t InlI toc) &cession mint 22)1 I 100 i6 ;rlu 1 niP0l I111 Iinc linger protein ilactriophsge phiplel 1 66 I 6 1 gl 4 2j6j IgllliPg Ivutativa A5C transportr subunit t-PhY-lTCScUs spLderm-dls- 6 41 i 84) 1 2 672 IdbflIABOO7.21 IiASOOiLi 1dli Ileclus sszbriite 4566 I ao1 IbB 266 2 69 566 1~9l1120 utetive tranaposase tScreptococcus pyogenfal I 66 1 60 324 1 22 1 I 2 I 6 1g IO21i6966 jtn proteese Iletbnococua Jinnaechlil 66 4n 110 113909 jfl13l 1 iIlts4251 Ihv yotheticai Ill4smphilus in(luenei 1 65 j pi 11165 11,0 1g11421 54( (homologous to C coil rdC gene product and to unldenciisd protein from I Su I I I 1 I 7 2 1 64 405 plr1C4I46C641 hypotbtlca1 proteln ll-lu Inllunsas (strm Pd rn42I I 61 2 I 7 I i I 416 I liii IenhIP1lCdiOii~l J'qhU Icllue subllii 1 45 j 50 ST3j I IC I 2 1 ;a7 297 i Or-i Istraptaoccua pneumorilei 65 5 477 116 I 3 I 342 1 2222 IgnlI~IDlsligiOI Ihvothelct rotin Iscllius sutlli I 65 1 15 I 1 1 4 1 3615 I 3357 IoniIPIOle)14810 Ihyothetical rten stphloecocus ecuril 65 40 459 7- 12 34 125176 13394 1911113030 Icos lActinobAcllus pieuropneumolniee 6s1 421 1 3 I 64 20 Is 14e66 Iil 9.i leasnorhobdil elegensi 4 5 is 1 1358 I 1 110062 1)0;56 9111350 IhIpothetlcsi Il Iemophllus lnlluenzael 6 5 1 795 j 1-6--1 I 4 122151 116683 1g11i ~153fl1 Ihvvothelcat Ilemophilus Infiueraal r1 1 1 1 4 125 118027 1i85fl Ion 11101e244414 jrcHRoc. lsniS i~Secehroyces cerelsiesi 65 3 1 lv 11 5334-1I 114110420 luoteive trnecrlptlonal regulator (Bacillus scEarothrmophilusi 65 ]a 1479 -6 5-7 i- i-l7 -96 3 isopentenyl trsnsleraae (Saccherowyey c r esviiacl 6 642 019 9 I52 115 1141? 115568 lg'1t429745 Ai. jennachli predicted coding region K.3052 Ilehnococcus Irnnechiil 1 6s 1 46 61 I I- 3963 6745 1411415i Ion sea 'isrepoocrus pyogoncam 65 42 783 r Go 4 3 0 3 3 ti16ll 2 IOtol0 jitchrch is colii 1 65 46 1 964 9 69 3 231 507 Ign IiIe 1145s luntocun IiAcliu suti isl 65 I 1095 f9 1 6029 IS121 1oi1109660 Idozyriboss-phosohste ldolate Ihaclilus sublial I 4s 5 705 6 651 1 971 hlllighly si7i224 l c rrlnlrese aigc ICP:to 1435441 Ilsp ieaphiu l Iugnace I i,I 124 I j 6 Inlr -Unhnoun. hig ly similar to s- -rl spe-idin- -yntheses ile-llus -ubtilisi 6- 39 I 64 2004231248 23 Nov 2004 TABLE 2 S. pneumonia. Putative coding regios of novel proteins simIlar to knomn protein, OIF SfLirtI StOP Match match gone nane t Ealm Ident lngth I1 intl Intl aceagloan l 74 I ITII I 4091 IgnIPOdk~flZJ lDNA A FAIR PROTEIN RECH (RECONDINATION PROTEIN NJ. lESChsliChls CcII I I o L171 6 j9 80,19 7815 IgIxdeorribonuclse. sall subunit lces] Ilsuamophilus lreliueaaze1 61 li I Is I 8 j 2210 1 2352 1,112231 11A001B2 conserved hypothetical protein (Hollcobacter piorlI 1 65 at Si 4 I4 Ili 7 1 gn 1107 19fl11710d11860 1dsirydroounate *ynteeg. Ilrflchccystie ap.j I 6S I4 1069 -a r~ -306 2421 IHNO-CoA reductass ICC .1..ISI iPseudomonas masYlonlll 61 11 1264 II 86 I2413 ;zlnwInl l *CCU$ IIunneC I"artococrusL"actlsI 1 63 t O 0 3 21 81 lol j IILAR 100 nID P Ol dk. I. II 1 J 0 1011 SMILAu T Oilla14 OF Er tEROCO CCUs FACCALIS TAANSPO5QN 1191 65 41 6311 4 I II I 6 6915 1 6111 I6nllD 1e46O 61 l3 fllln clis ld e diphoephats kIns iXenopum lasulal I 65 so j s450 0; -1 IO 1 iCR IgnhIP;l;ldiOlS Jquuoelne b4oaynths.es protein QueA ISvnechocvatia o.I I p.5 1 46 1014 I I 1 1 61 311 oI49, S29l loaF) lClosrldlus pertrlngensl I i 36 In1 2 1 122 ;5I1 l2 7190 g11151557 1W4-bindinp response regulator IermOtnOV gA martimal is g1 39 66 S I 3 1 -950 0n "Ot e S 7Y I u g r b n d l e g itn -po t p r o t i l 3- 4 u- o p- I- 4 1 9 8 rI 7 54 )01 s I 36151I1421329 Ihr1 rolipoamlde aretyltransferas. Iclostridlum vagnumi 65 41 11)7 1 1 1 2 1053 1931 IgnIIPlDIdO~LLI TqgII llaclilus subtrlisi 65 42 41 III 2 312 4691 10110197 j~oc type I irsrictlon mdlllcaion enzryme K mubunlt Itscherlhl. ccli 6 1 11i 0 141 Ja oabr il us sub I~illgri is 4 0 83 i 11 730h 1i0l9) ImeRans psrein 1urlirla Peru. I 1~7 I 211 463 19141012 1 Is 1 lit i- -0 1r 16 6 I 3192 3116l 19i111533 Isleullr to purln* nucleoads hosplorylase IdooI Itschrchta clii I S (s j 4 121 a 11 4 295 12220 lgn] I PDIeIj395OO_100lootide bindn l1poproten ISLrptococcua pususonigep I E 132 iI-. 4- 1 S 3D h5oe011594 t480 rantportea, probable A7P-binding subunit Inethanococcus janneachIll 1 65 40 1 19 I Obi4$SJ PO8ASE UDP-N-AC1Tfth16AxOTLALA24Yt,-cLu.GAMYL-. 1- 65 $1 141 I 160 Nn 'mIIdCAb2 1 C 4 .3.2.1Sk acillue subtilll 204 2 22 I It 1215 1g11143156 Imebrn. bound protein Ieacllitau eubllliuI 65 I 11 1032 210~b 114 19 gI9l n ga n d prou t e 1acillu a ubtili l 1 6 5 48 348 4 1 EI002;6; I2s;; ihts %41 5i oF n 1p 32 pct dntical s l gapal to 244 42 residues of an approx. 272 AS protein AG6R-ECOtI S4 P42902 iEachorlchia call II 2004231248 23 Nov 2004 TABLE 2 S. pneumonlae Putative coding region. oi novel proteins similar to kno. proteins ICntig OAF Start SLop match match ge na im i n 0 1 1 M, I i l I "5 IunIOa 1tOSit luninown Ibycobectlum tubrculkosis) I as5 513 31-5 I I; 1 iia 91I7II64 Isannuronso C-5-splipsrase lAto bater vicelendlLl 65 S7 I103 I 20 1 1 Igal lfgO IIIt doze K. sirogane.. hietidino utilization repressor: 11230 1199) OtIA binding 61 16 517 I I liactiltusaubtiliag 4 r ;-358 1 I I g 1ntPtOIaIg50 s IvieS protein I(ecillus smbtilsl I 6A 5 1 155 j 306 I I III 1 1 669419i114'96751 Inicotlnse-rnucr ecide pyrephosphorylse lR hodospirluoo rubruat £4 47 16 $65! 6902 IvnLIPlOIdlOLtll Imethionine minopepcidse ISynochocystie sp i 64 $2 675 1 I 4 1 I 366 igijoi 91O NA hellcat. LI ltyoplar..a gsn64iu1 2; It 4 324 I 26t~19 Inil~I iO~e1isi lOris Istepocccu pnsuronss3 1 6 I 46 S I is lic F~4 Iii144lSI IVcgSIci0YIgi ho.giog Iusriliua Sulbtiiial 1 64 1 45 42a *2 I 54 1 15 IniIoldioosu Iunktnovn Isilus subtiall II 6I 31 4 1;730 I*S. 1 1 2 74 11 156 Icint 066~k1 lacitios subtill I e boU t.
C s 3 9 49; s lu l I fi I 30 117 27 12 I3 150 1i0i I9 01 95I I msthyles Io u I s touibrio Vulgesl 1i 111 tusuo668 Yp Is c In e n I us P1 2 pclasI TI so it0 3plr21Iji$jcia hypothetleet 20.3K Protein IInaarton sequence ISillit Agrobcr 64rIus 1413 457 6 Ijgn.IIPZOIsI7ln IIAJ GOOOOSI1 glucose nit isbcilus eaget eri al 1 64 1 65 1 294 2 i-31c. Is ilia liis 113, 111390E 06O2201 IA-polymarmse III alpha-chan Ihailius suttillol I 64 46 1 147 3 1 2 I t$7 I 555 lil is Iypotbetical Ifasaphttue Influenza*) 64 1 47 1 601 51106 gI;IIS I6 alanyl-LItw ynhe tos ea IflaeeophJiue Inlusnse3 64 51 263t 66 :3S; 9, Ivi'spttive coilobloe phosphotrariaerasg enzayma 11" locittus subtilll 1 64 1 42 1 21571 68 5 5 1 6 p11436 t 44 47 11449101 pue~o p SI- ES I 535 6 IgnhI~PIDOl dI0 ICdd Iacillus ubtitisi I £4 $a j 2004231248 23 Nov 2004 TABLE 2 g Pneumoniae Putative coding regions of novel proteinfr 11h11r to knon proteins Coolly OAF Sartr St8op j match I atch gene cane i Lt Inr I 10 nt I i ntl sc ion 6 tI 5036 1 glj36460 jL-glutamtine-D-fructoa-4-phosphate amidtranuioree Ipacillus aubtiliml 1 64 50 1911 I &so Homo saplenel I 1 I11 Iltfl6 142131 19I 1175 metrhnl dehydrognaga iph-lO subuni balcllus a.l I c 64 3 i 6 1 iii$ 120 9 Ivn l 3I IldIjl jqiA aeecllu s aubtallal 1 64 is 216 Ill Iloor 5300 DIM01315 l31 q (A 1116CMUS lprrbtL118 cl u utLL16 ]1' flI 'I "06) InlFi)Iljal33S6 Ihypohetcal protein ImaCliusr subtllal 64 36 611 05 1 i6 1I 0ilmlar to 6. sureu m rcurylll reducase ichrlh a cll 64 1 4 1275 4 4 113;06 1 IDIG63186450hypothetcl proin iPloadloilit 111116 liyna l1oc41 op6l 5gn 5901 02 Ihn~ r oth S.ca prte Iwrcroncbaccrldumts phEc i is 1o f4 129ill 4 I(2 i 3IU2 4i~ 216 1gnl119101f212a4 lair YVL244- ltmcchroyc.. ceceviciec) 64 40 101 14 I 3l1 1760 Inl;lIdlIII ihvohticai protein Iynochcystia aj 6 50 4- V19 1 2 21 1 706 jnil 1iDjdiCIl4 etU peaclue Cubtiliol 56 63 12t 215 g~l pj jw~lA4 oir ID24 9mcha oece c reivs1 641 521 3 il ~l 3 gil2i6i unnon Ilcllue subilial 64 42 110 4 i1 Ill 1 D 1 pLr1JI 01IIJ1 hypothetical 0rot roin Inrtn seuence Isillil AgoO.c5rlu s-; uieclmosi lerain P0221 pisid I 6 913 I 3 I 3326 1 26S1 nI0I32 D L IvqU IA 2acl V II (Bacillu s ubtll 64 *4 159 1' I I 413Q 1 1645 il I3222a Imnlnato yropli1ph it doe rboxylase if6t ue norvegicul1 ISO 4 I- 0 g l i 3 Iun -n ene product i- -tob ac -llu l nni -l j I 1 .111 6430 I IeiI?120530 I iAPOOO43O dynamin-like potin Isoa saplensi 64r 26 354 oI g 3612 In3l D diO un ow Iresebrne acl lus uebllui 64 50 011S3 61 r D 4 0 4y e~ n l k protei imm114en l 64 361 1 -t 1 4 Ij1 Inl jlldl 89; I;Onologous to Gn transport system permfose proteins leacilvus subtili zi r, 162 6 563 Ida 6362 1 0 5%201. PutatIve 42 k a protein Streptococcus pyagnhal I 4 3 I I 483 4- 164 13 970 9?6 1 I V4 homologue o1 ferric anguibactin transport system peamorass protein FatO 6I4 d 1. aneulilerum leeciLlue eubtill40 17 I 3906 4,1 9i1534045 Initer minator I laciiusubills] 1 64 39 I 693j 109 IO 6154 ;50 I IS5J1 i jrelponse regulator lLactobmilius piantaruel 1 i 1 is I 34 1 4 I ;i 15 1 I iiii I 263 jg iO Iphosphoib"oaVl anthrnliate Isomrass i1actocccga lactil 64 4 46 651 2004231248 23 Nov 2004 TABLE 2 S. pneunonlae Putative coding regions of navel protelnri'fallar To known pmo~ten.
Contig ION? Start I Stop match match gene lmae IC ib Intl Inl l ec IIWen Ilength I0 Ito I In In I &Cession I ntl
I
1 20) J I 7516 11 nlIPIOjeasjnsg O-ectylhowoserlne sullhydrylass ILeptosplra wnc44l I I l063 2 j 1 1571 I'I135'3313 Itullegenaae IprtC I xeeamophllus Inllueni.,j I I 42 3326 231 2 91 ;647 j07 I9- 1- I-cllu sut1~s I0 1- -7 -4 4- 3 10 Io 10 9 IpIrIJCi1iljcii h)pothetLcal 20,34 protein fintelton sequence 1511311 Agrobacterlum I tuml.ecens Istrln P0221 plesoid 1 620 1 Ig111377032 juntnowm I(eelllus subtllle1364 1 )1 1 419 39;iI 1 60 gtI50; colagename Imethanococcuv lannasehilt 6e 3 61 660t 4 1 263 31 1g119l65 j1 lolnpIleCchacmyces terelatjol 1 64 em 24 6t$1 0 I 0039 IgIIIS eSS IUnknown 18 ih"lus autlilsi i 64 41 I- 6 5 1 M 441) Ig Iis7Ia I 1 hypothetlsa1 Ih4aism hilus Inf 63 a0t t;2 ti I 31 9902 11 16 16 Inmbre proten [facilus *cidopullulyclcuui 61 42 $79 I S S IR~l 1 57 g17235 junIknain lfcecubacter alinuel 3t39 1 2 1 1031 309 IgnI l 01' 17 0 ji nu IL &Ctobac il Sue pl not au l I 6 1 60 1 2 11 56 5 I"s 61 giile Iunk n Ibteciliva eubtillel 1 6 s64- 26 1 4 9;011 Ole 1114 IATP-dependent nucloswe(ellu s sub~ilil 1 4 29 1 5 346 15 4192 Ig 1137lS I9 lun noint i aecilu i ll il 1 63 s 70 S 3 4 1 1 11131 1 70 g I tjd 118 Okf fteoocsubeIS 1 isj1 S 1 4 i 13 I 114-1 1 116 9Io M339 Iunknown lAcetobeeter xytinual 63 1 9 3 121 4- MgS j13 1491 Ig1 1 ihypothecical IllAeaophlus inluen zael 1 63 1 3 11 O 4- I 1I 1 13 1:191 1i11 1 ahic protein lacill lubtil f? 1 3 35 I 131 5 5 I 1 1 3 6 11 1 1 6 1 0 j n l I P Il10 11 5 0 2 R Ith r e o x n e r e d c t s o 1 1 c l l u s u t i l 1 1 6 1 1 d 0 0 15 3 Ill a 6 1 7 6554 115 g IS74 83 I le- I ca n protein llco l Ifl aee philus I nlluenzaes 1 1 I1 I11 I 110 Ie 12091719 Ioutniye IismbIeal-pasoclared protein IACti-i ffiyeas naealvndllj 6) j 41 906 4 4 4 I- 0 1 0a 940 1;1 1I7L Itucosidae lOictyomtekium Glscoldeuul 63 36 Io 4 I4- 2004231248 23 Nov 2004 TABLE 2 S. pneumoniae Putative coding regions of novol proteins irmilar to knoon proCekn.
cratig IOR Start (Stop match match gene name I aim Ident I length 0 £0 Intl I Intl I cession I I Int I I; 10 4 )053 55 1 oiig1 1 phoephoenolpyruvatc carboxylsie ICorynebacterium glutamicuml 43 46 2703 109 9 9119 8 554 |91.133099 |endornkeleaso ttl FlBacillus subtillyl 63 45 Sid4 T- 205 JI 95169 .554 911532091 lendolacis II r~seciii. subtlll 1 6 I 45 I 123 (6 134 46 IonlIPDdlII3)9 Itransposase lSynechocystIa ap.1 I 4| 39 1 153 I16 1 7 I 4511 5201 ignlIPCldiOiII cr12 titthanobcrtedrum theermauorophicaul I 6 50 |I 687 Il117 1547 gi4l72920 jv-type N-ATPate Itnterococcus hirad 3 1 7 I i 2 1 41100 4561j gliPIDIe3lO2S Ihypothetical protein (IBcillN subtillei I 43 44 (4816 0 19 5 Im711 1571 l|17i703 A 00011 Il 1 Thie 271 a. or( 1 24 pct identical 116 gepal to 265 1 Sresidue: o 0 an *wrox. 212 as protein VIODECOLI Swi P0997 (EsichrichIa colti I1 3 I 10) |14406 IgnIIPIel24SiS 1gM protee Icreptrcoccue sanguial sI dl 5£04 I 11 1 I gil341 1i 11)110 Ihypothetical 14.lkd protein Itscherichia coll)1 6) i 345 I 4 I7 7 2 1 42) 17 pip)2)3) lanknown Acetobacter lnl I inu] 4 I 4 l -r IT l7 3 13 4 i 1102 Ig9J591161 Icobalarin blcsynthesis protein N Inthanococcu ann.chI 61 1 136 1 219 1- I9 II I4 1 I21 s (IgnIDItihellai |1r IoiEntaroccccua hireal I 6) 3] 1203 0
C
234 5 1139 |127 101159162 Icobslamin blosynthesis protein W (Methanococcus jannaschill 1 £1 16 21) 249 I ii 257 g|11000451) |trup licllus aubtllial 1 63 1 41 177 1 261L2 1 I I121 1341 Il|i39646 IOrF Il cillu subtilisl I 3 44 I 231 I 292 3 1 2204 1 346 172131 Iunknon lAcetobeter xylnauml I I 37 6631 )t I 1 905 466 Il18 1 7424 JUDPMgalactese a-plerSe iStreptocmcu [ucns) a 63 44 I 420 324 1 3 2 I5 3 19l1141141 Ihlstidina pocipleruic binding protein P29 itampylbacier jalunl I 61 1 1 j I I II 21 1 Agi12232842 IiArciaa,3 No dal ni;ion lne founnd Iaribieopls thl and l 6) 33 3I 07 362 II Iat 374 9112339 juntnown (Acetrobecer xyllnuel 40 2921 I 345 1 3 I 34i 15 ll 15243 I0r1i29li Ho dafinition 1lne found lhrabldopala tholIeni1 61 I 3 201 S 2 I I I2491 1 266 onlIPlQel2I00 Ipnici)i-bindlng protein IBacillus subtllisi 62 j 42 2208 3 lIn 11"7 124231 IgniIPIO1e2S4S9) Ihypotheticl protein (Bacilius abtlll I 42 I "5 I us.
1 4 1ii 14120 112192 Ignh14De349iL4 IMnhS-lika protein inycobcterum leprael 1 £2 11 126 1 1 I 6319 I 7232 gnlPIDIdIO1I3l4 (lVqh Iacillus subtllsil 1 62 32 414( 1 |9 115444 01401 IgnnlPIodiOlIOt bats keatacyl-acyl carrier protein synthase llSynechocyatis sp. 63 43 140 2004231248 23 Nov 2004 TABLE 2 S. pneumonlae Putative coding regions at novel proteinhliller to known proteins aD20 Inl I Itl I Lh eceuh iln fl-e t a aim Ident length to [t If Intl Int) 76521 art I I I 131 Iit, I16229 1gni1l01e321514 Iputolue Fab protein ihecillup aubtlll 62 1 J 1 0 I P4( 1 ;1 S IllS!, Isilk76434 Ibete-hetoecvl-ACP synthee III ICuphe wrightIll 162 37 100 4 12 1 0 4702 1gi11572766 IAIG-A-ecific adenine glycosyiaee tautY Iiia..ophllu, Influenzae j 62 I '3 J 1303 I 33 1 9I 80l 6793s I pmnot, hmnata aetabol~lm Ilvoprotuln Ilethanococcue jenneachtl3 I 76 2 363 I 1 1 jit 9326 IpIrIJCllSIJCl hyptetCical 3.3K proten Iin.artin sequence ISlhlil Agrobecteluaw I 11 I tustacleni israin P0233 plasid Ti I 4 0 6 3442 1llisiot1i IN. lannaschll predicted coding region JO31I Iilthanococcue janno chll 1 62 43 L1 11 S 1635 gi(II49510 leak* In the expressin at kaetocn F, part at the lot opro n ljIGc ll I c iII uq I I I 5P.1
II
21 110 1 6':1 1S3 IgnIlPIOIdIOOSIO Isieller to B. subtila Dne li8cliluu iabrllial 1 62 1 43 1 121 I 3 1 16 3043 IoI13114379 IIt0001627I ABC ttrlsporter. ATP-binding protein lyhcOl Illlcobacter 62 41 1 1179 I iI I pylon 1 11 1 5 22211636 jg14iIB16 lipe-S2r gene product I9aciivs gubtIlial 1 62 44 600 1 I 31 i 1 19 1 I ,l66231l] 1.253 Ifechrlchie coll I 62 1 34I I 10 it4312 113326 IwflhIDIldIOLI04 {hypoihtlCel protein isynechocyatia opI j 62 43 n I 94 1 l II 1 313 I a;iiIpuetive Ieari llus stoilal 1 61 41 309 1 I 1247 A0011 6 o111o 100 pct Identical to the first It realdues ol the 100 6a 4 I 739 hypo4thclcl protein fragment YBB-ECOLI SW: PF043 Itacherlchia cop I I r .4 112 I 9712 9104 1;11662920 jrepresor protein IEnterococcua hiroel .62 32 j 439 51 i 54 I 7111 IgnhIPIoleOlltl jtystl methylcee ISeLonlla entricl 23 1 Isis 4 2 3 9'11 1029 I91IIIIi11 lntegral MeMbrane protein Becillus aubllsl i 62 1 41 693 I 116 11703 114704 1n112101e313026 Ihypotheticet protein Ibactliuq subtllsl) 62 1 0 99 S19 r 6 343 I 31 )1i1l0616631 junknon I~acococcus lactic Lectiul 63 32 j; 117 I6 1 S Id49 l ago ggi Ii,0)911 lpilin gone inverting protein iPIvHLI l.orplli lacunacci 62 1 q II I 70 I" 110002 110739 1,1199317 IbpiG gene product ieordotella peragl s 4 "as 1 1 1 1679 b Z035? 918015S S.ode for y C. elegans cD444 .1106: odd for by c. cnigons CDNAisi;O; I 61 IS93 j 2
O
362 to mllbioae carrier protein :?hiometilrgAhacALede porpe::g I6 3 IlCmnorhebdlitl ileganl' I I 3 126 a112 131765 InIPloIdllI311 IYqec Iaclilue aubtilist 6) 1 251 121 74 1 1 166 1103) 1,11155'113 Ihvyothet Icel leIcherlchle ow 1 62 1 j 36 1344 2004231248 23 Nov 2004 TABLE 2 S. pneueoniae Putative coiing regions of novel proteins similar to knou protein.
C n t i g O R F t r t s t o p m a t cn a ech mn I I d n t 7 t it in)1(' I.i:h~ I 0 I I 520 IbD 560 n11P101d0002 ltsaoo~(Itas rU~c'rzOn U3JKNCWI, Ihrclus Subtll 62 j l 44 24 I 7 Ii I 90639 7041 lI512463 Iprctuin-NHpiI-phcsphohlatidine-cgugur phcephocrrns*.rag. (tseherlchte clii 6l 42 2028 I. 4- 2106 346l I;nhIPIld1Ol49 IjAn (integral membrane protein? Iseudomonas asruginosel 62 I 42 9 Io 30 3 lii I 1IgiIPlO1eii)CiO Ihypothetical protein Isecllus eubtilsi 1 62 j 11 ran-5-p-O--FL-*- 1 I(1 30 22!) 1Z4 lo( jnhI~Dldlo4, 1)4. In~:lune hypothelel AB transporter; 944608 97143 )scihue 1 (1i 1 I I I I I ~subtt hl 6 4 I I ZD 2 0 3662 Ig1l3Z29 INlP ILctoCoccu s 62 1 64 1 1acis 12 4 31!4 4010 1g11574379 tIlc-I oparon protein (licl (Haephllue Intuenrel 2 19 9271 1112 161 4939 5649 1gi13524383 life-I 412 13 t 56 4 i I ISI ll I t i c -l o n p ro te in (li ced (Ifas e oph ihu. Ln il u en zea l 63 3 7 1 I 131 I 181)024 Innaero:b ice ribonunosloI de-t rsIphosphtea reductess Inrdol iimamophlus1 62--47- 1 134 I f. l 13139 gI11090126 leare ealnepeptidase ttsctobcill ua delbruocklil 62 1 6) 1 34 jii 6 3 1 118PIDjd4 3 1 cillos 4 L is 122 4 4) 4560 pIsmrlS Iss inc linger protein 1F6 Chilo iridescent virus 6 I 4 I C 451 ;,I0 4 03743 wunnon ILactocue ictil 62 409 I 49 I 2 il) 5f 1gij392i42 &aRC tronsporter, probble 'P-binding subunit jllthncgccus ,ennaechill 62 41 1 1 39 510 Is; 6055 InhIlD3 I~t a prtein 6 tcllue tlll 1 62 I 696 I 154 I 1 '45) 238 1nhP101e254144 Imebrane proten (Streptococcus pnewoniel 12 40 213 1 154 1 3'6I 923 IflIPIOjdl0aOSO Itrmnrnmbrmne (Bacllus subtIiiI 62 1 412 4. 4 I 3 2 J7P l7 29L 91I4941 ICEIl-6 5cr rTS IKiebsielle pnevtnieel 6 3 3 si -9-1 173 III 723 gi11i9)55 ptetVe ellobloes phosphoeenoisase eInyme III (Bacllus eutllol I 62 3 39I 123 3 2519 8 Jg151173 Icobalt trensport ATP-binding protein 0 Inletheococcus Ianneschtl( 1 62 1 4; 1 111 4 9 49; 1754 gttIS74 71 H. Iniluenree predicted coding region W11018 ltaeeophilue Inthuenee.; I 4 )I 1243 lot 1 6 4 j 1754 D 11115 IH. 1 62 I 42 1 852j 1 1~ iI l1 1 24(16 311 Ig1724) ee Letb~Ilu esl 1- 4 I S 1 4 311 gi121 9, 1A000733 iRhlt cbluu up. u234) 62 1 41 1 71641 11450566 rn brmne protein IBacllus uubtillsI 3 2j 7 924 202 2511 I 3421 18142219 IPIS gun. product (AA I )143 1 rcharichie ccii 6 1 43 1 891 210 I $1 1145 1g114"Is 10941 gone product bacillus eubtiltel I 6z 45 192 2004231248 23 Nov 2004 TABLE 2 9. pneuoniao Putative coding regions of novel protmlnse'lAilr to kno.n protein.
ncg bit 1 Start Stop match atch gene name I in Ident lngth 0 10D InI l IInl ateln I I iL 21111 I 3 [I 11402lll~ m~nna.. permeese subunit Ill-Nan laherlrla cclii £2 43 9 1 69 I 23 2 1 491 Io 3034 l)1o l j 1diollgO 1O;P iltreptococcu. mtane l 62 6 1441 22; 34 909 11I10083 iglycerol uptah e facilitator (SLreptocoecue pnsumonlaul 1 62 44 1 676 i 4 2so3 1 917 io I 21S9 j Aroo8ao Vtqi lead I lug aubtll* 64 isii I1 10 I 2651467 IgII~ lP~e4 Iohlctoklnase lArabldopu thalian 13 Z? -B I 62j 12j 11 j I 13 181153121 1 1AE00052 Nycoplainari pumon. hthetica I nprotein hoolog:~a siia to1 71 fi-k-E-00-0- l-r--to F IS-9-1 Swise-Pot AccesgionL Number P25355, fro. S. aubtita llycoo)iasma 4 63 60 j 11 F 3065 5 I 7 1,1115 15 bouter membrane integrity protein Itoifti Illaemophllua 10luueczael 1 £2 1 47 228 I 1it l~isio II~i9 10106162 IOXrFtll l~eclierlchlta ccli 61 41 120 6l flu~ 1225 1,112114415 tonll S.ynahocystis up. hypthetiel rotin, ec~dd by Gnfank 1 1 1 I 0 I Acceesion tJuubqr 064008 iecillius eubtillal2 1 732 04 111569 ece.. P etohcilu 1L IbIim p ]6 43 213 I 44 1 2 4061 I 437 InllPlDIdOlOl ri.ylo. repratsor ISyneclrcystis spi I is I 26 is -91 11 1 5361 26InIIldOISIqHImnlc utlt 61 1 42 I L1215 1I 57 r 6 394 8027 IgnlIPIDdlDlJi6 IYqlk ISellu ubtitllr I l Li 4 1066 55 6165 ISplP1S OIPQTC- ISPERMIDINE/PUJRESCINE TRANSPORT SYSTZE PERMEASS PRotElS ParC. 1 I 611 792 161 3 3 1 692 I$157'oi IoRLFl45 e~chrichia cclii iI 6I 4 I Ii 9 9 116 I 090 1 M1119 01 I PLI.3J gn e product IA). yLto Iupinus polyphllgsl I 61 1 4s 1 927 15107110511521 Ilr~ gene rduct Ilordtehla pertuselsi I 61 I 1 1272 I'I 9159 11102 hmnuIPDdlhhi2 Isconrprnldine daerbcyteoe ISynechocytla sp.I I 1 1 36 1 I 4 6 1-- 16 I I 1I8; 1003 IOIIPIdlOO3OS Itarnesyl diphosphate ynthese I8aciluu athsrothermophllu.I 1 Li 4 87 Iml I 3 I 4 9I14 3691 1911520913 Iunknown Iblcillus eubtLsl I I it 42 1210I 4 41 I).1)211 1161 gillIIul IPLEt00401 mathionyl-IRHA iormyitransletoas iticherichla Cc11I~ 44 9 S1 4 1 1 105 1 1S 31 1 35 IoniPIO1dlI t 51 lhnsothettcal protein ISyaEchocystio p Ip 4 J 44 1 105 15 7063 6476 t101695i7 Iputtlue ci operon regulator (Bacllus ubtlll 1 61 1 )6 1491 4 a 118 111 11 0 521 iprotein hietidine, knese Entorococcu. l eecahial I i 40 1336 2004231248 23 Nov 2004 TABLE 2 S. pneuaoniee Putative coding regions of novel protelnsurwils to known proteins Contig I OAF Se Istop I eth c e name 4 ilm idant.I length to 11D Intl I Intl I acc.lion I I(ntl 16 6 515 6735 nn1 sns 1,1 11A 1011 o Iu 1; Th ji as o1t 1 24 Pot identical (16 gep'l to 265 h61 1@260L esiduec s I n pp.O.. 212 ro tin VYID6E COL1 SWt P09997 IE chorlch is Colij I I F 6 9 n9nP1PIQ1dI1 I1 6 Yqt Il6&llIu u ubtlls] 41 1 639 Il 9 11 6SD 9 I It1226 OunnLY Slphylooccus hcii icull 1i I *l1 4 F9 111 14 55 91022 ukoit it~ ycocu httfitfh 61 41 I 10 f 139 I iii) I 511 InhP10016 Ibca-uictcldae (rhrmnsrcbuctr .thinolcu i I In 41 20 I 1) I 25)2 42 I, 010541 Ipnlcllin-blndn g protlni IA and IS isacllus aubtilisi I i I 2 211 I t 11248 t I 11 l fl4) Itetrehydcodlpiotilntai N-uccinyitranulreae (Escherichia colil I 61 1 42 702 114 1 f I fl2 I I ;Iji I4 Ign1IlPiId1OiSI2 lphcphclycoiste phomphataes ISynmchocystis op.) 61 )0 66,7 lflIi 3 1 i l0on Ipi~o]ll is ulil cci ibico. phoaphstrnse~rame system. cei; P46228 12201 I 1 I I i1 iot ~1)2 InuPID ll005E lunknown lsecillus subtilisi I] I I 61* t S- F--1 T 111 315 iL1051 hypoth.; o.1ln IG L8 5 6 1 Yny pl asma ganitalluml 1 I 24 I 3 1H I 27* 16 jIpItSPti4E Jn. jsnnaschli p~rdiceld codng region ftft0440 Iftt hnocccu lennamhl i LI 2 j 342 225 36 I 1 hypthetical tachqlchla cciii h1 CD 22 Is 24 121 5 21 j 602 IcI)0045 1ptreft ialva subtitii Io 61~ 401 51Z I 27 I 1 *462 414 lgn~lijlDdI0Gct1 109120io Esch~eichia clii 1 l 6k 1 260 257 0 I i 0 InlNP,-I.S- lunkn--n IyCb--t-ru-- -uberui-l-l 61 4211 I I l ]I pr CII- C fbhpotht~a 20l.3proten Ilnrlon aa.unce 101I111 Arbrtrus 62 I l i 1 fl iictn 2651rl piIPttI~lsOl poiji d T 1 o 10 1 1 1 '4 I 17 lc11l2L~iOt I AfOi6624i cntansifelatitrly to eyiranlasesc Itaonrhbditis siegansi 4 1 I I I 12 I 1066oi I II 1gi1153396 Ib-252 ewe'brnea ssoclaed protein ITrypncbome biucel auboroupi 41 I 38 I I 3 12 j24473 I5 I5Iunknown lOR..Otb uICh be rcul cll 60 I1 4) I 6 5 4 13 I111292256 IiAFOE22I YtoI 66ecIcioo ublisl 40 i5 1 1104 1 6 122 1L1524 11nai7 1smila0rt toocn aput-be putette ivaeI-ecoccus lae~ors) 60 44 750 SIl 13II 11105 1454( l~tSSGS96 Ilctcln r lIkcobslltu, apIl 1 601 32 251 1 06 111 1 g 111939 IID_1_;emdtOO5S4asoca e Innkncwftt (Bacillus subgrtup] 60 i 66 I 1200 1911149569 1 c jnl/ ItJlD511;unncv llcls lUbp.I 1 60 32 I 225 2004231248 23 Nov 2004 TABLE 2 S. pneumoniee Puretive coding regions of novql prateinls*91sr to knom proins Contig ORT Stutt Stop I a:tch I match gene name I l dIent Ilngth -DID I In etrn I I Inl -4 I 1O I 629 I *6 jgll I93225~ IAVQQrSflbl fla 1clII a ubtlil 60 16 £691 I 1 I 643) Intl C9 aIg1'0 n. ISabI e gene rplld gi dA el 611 4- 12 M3 I 60y 1)177 Iprotpei hin bI S rm cerev JLsl eeQ lS bC 60 I 36 1 2 4- 32 0 D;9& $44 9111293215 1 1AF1306210) YCO G (bacillus s ubtilia l 1 4 0 3 6 9 Ir 64 3 3 3269c Ion1lIWIU2362II unknos I Schlaoschroycee pnbl I 0 I t 4 1269 4- 36 I10 I hiIf 110165 Io ;I I1 94.6-3Tph-g u can branching inIyte a Iaclllus eublill ii I 60 isI 1 1 i6 119 I15111 pnIIPIDIeS)7l ain ILeCtebCCIl[Sa heLVetlcel I ro I so jj I 6 121 1)127 116)1 Ignll~dOZil11 ra30024663 urnamed proten product I(etnophllue actloycetewconllenl 1 60 22 25 I 30 1 1 I 0n1ID s2o6I3 1oar286w protein Imsudomoneg rurorl 1 60 31 26 4 63t ol Lg 13 .il I1I~D~bO6 lu h vi I callu btcinla 6nye(ailssbi lI d0 1 42 £401 4A 1 I 15169. 11 n .2051 1 fl It. onlno Illllu. lu holvatcusl so31 13 1 i 66 O 4 16 3 1650M II'S3$63 IH. Inluansa. rdcted coding r Ionuct 141054 IiH lohlleem lua iluenitanal 40 31 164I 4- I 0 Ii 61 2 1133 I Ignl IIDlej34O Ihypothetiosl protei UII ecllusM sbtilUlT 60 26I WL ~CI~CU.Is I 1 19 1 4- 4-- 1 1 418 1g115106 S3 Iip d gene 86 prdcti IBaCtI lm *ubttor I 60 33 151 97 67 lip.:nown caa proluf 6 (btlllu l~ so 4 2 6(I 111303 (11664 InhIFIOdiCl~lJ Iphoephatldete cyldylytrnulrse Iynechocystie sp.l 4 0 45 44(6 16 I' I" tolizr r~ lloo srne/hron. protein pohatece(O IFervldobacterum 60 I s s I O 1 6 I 401) I 123 161 17 laoncee pe ro ad s di gregion IH09llMen i1ecmi.Ie Cllc I 6 I 40 I 1614 I 66 4 1 I 601 55 [elpIlIptetical ro lee Ieni lubtliall 60 ]a lz 92 1 3 39 Ig)I2'6341li Ibmurine tranells uinlsll shrlcl tuU 60 45 13 3 1I 1 I 11 IrehIIdu:; 0*w~irn arnucnyll flTRrlcl CiGO# 5650 IEcerl 40 21 I 1126 1" I 7s51 I 4321 5 I gniO eI8 5 pag000 e61 prcl (clu- t d h es itil l bt subunit 1 3t n1 62 1 14 8 111509 111664 19MIPSOY dL01632 lphosphatidst cyt dylyltransfersira cynecho ysti n sp.1 1 60 45gcul s I 50 1 11 ior- o e- In@' t ,reo(' In-v qr*t--aIn" 't'as* I vJdo a ct-o,-Ium q_ I I 7 I 6136 1 32 I~i1251121 Iransketoleee Cthnococcue lennauchill 60dI I I I 102 1 JI 3063 I 263 IgPlr~l~OSe2OY9 Ihyprheticat protein (yobacteriwu tubrulcsl I so 4)i 13 2004231248 23 Nov 2004 TABLE: 2 S. pneumania Putative coding regions of novel proteinsI1 iler to known proteins 1 10 10l (Tnl I Intl j accesion n I I In I
J
106 9 111 I 9161 1cn31P101e31416 jyibs protein Ilaclilua sutitia I 0 o j 591 111 4 6361 j 0 661 191146667$ JnlIJ; Bi4S6_CIi.51 llycohecterlum lepral 1I 1 43 471 4 I 215 1~ 15 14 1,nhP503s12143 IJD01 ~L lucoldose 11 fHoso eaplengl s o 12 2212 I 122 I 1 I 41 5066 tnD141011 Ireposaa. 1ynechocyotte 'p.2 60 is 1 306 12 I 4510 5213 91e1I11711 l--g Itrnonta palliduol 3 I isII 4s 2 1 14112 Ill 0IILO2 5I 1hypothatIca1 protein Ifacilkus subtillul 40 34 1 i11l I 12 11gr 1 1 dgn1190d100490 lOaF ITheraus thereopiluul I 60 I 1 1 1741 1 ii11520 9uc 9IuiS1l14S IOKF-fi I sherci rou 60 0 1512 lol~tr l Ilchr~cli Ill alit I 51 4 eO I 2 t9 MI 1241 1912309527 Iprotein biriidine Lne Inrrococc s (aecalisl 60 I 11 I 1244 I I ii 1041 i j44 IiL ;ES Oar froe bp 1642 to loll: putative IHuen papiItomavi rus type 3I t I 34I il 5 IS 1 l 140 1114--2 I r-oin- -n-tlv e DAPIP synthese (iroFI (tacherichie calii 1 60 1 41 oLI 4 6 5 1P 9 OL10~1 putative, teacllue swbtL11s3 60 1 17 1 492 4 10 7242 a73 %gljI Deat3022 hypothetice protein leacilu aubtilia 1 60 7 1 I 163 5 344 1i ns 6 1 g 2 1in;Ar082201 brench-chain amino acid transporter aclius aubMist I 0 I SS I I i413 746 1i12104504 -putlve Oa-glucoae dehydroanms lechrh acl I 60 1 ,4 a i6 Iis 3 3516 1 7412 IcniIPVIldOOl11 Is negativa regulator of pho regulon iPaeudosonaa oarugioncai 1 60 11 645 4 1ill lgnlIPOIeo0090 product highly siilar to Bacillus anthreacie Veg protein I laclie 40 is 4 16) I 1 1 16461 lcnhlPIdlOi1i) LYqUN Imacikkus subtiltel 60 36 2 I 110 3 130 2 686 1g111571i9 II. Inlluentat predicted coding region _(12144 Ueeophilus influeneiol 1 60 3) 1441 111 7 4117 5901 11460601 IOutolllI jIEchorlchia coill I 60 44 1 1145 162 1 IIS0 2 i il[o 114127 repressor IStreptococcus pyogenaa phage Til 1 0 38)a 304 I191 jo 9 5444 142 ogil15514 Ictabolite control protein Bacillus eegateriul 0 j 42 1 011il 200 I 139 10s3 1g11416462 Icrmnscembrene protein IBcaillus subtlllrl 60 31 945 201 3 3695 1928 011471112 enzyme jlabc Pedlococcus pantosacaus 60 39 9641 214 Its 11 110439 1g911i51401 thypothetItsi tlie.'ophilus intluenzss 60 i 1 492 I ZIG I 4 I 2-14$ 3-4o n y n et t u e i6 4l 1 214s 3 3631 IgilCas]0 Invooln heavy chain kLnege A [Dictyostuliuo djecaidourno 60 31 1 I I- 2004231248 23 Nov 2004 TABLE 2 S. pneweonle. Putative coding regions of novel proteins Widiiar to knovn proteins Iontig O R rt Stop amath match gene name ia Ident length io IID 1tI ntl cetin I I n I 3- -11 I 1 3 l 9412tor IKIa l I~ bsptoll&u pnuonl s 0 41 733t I 4 1 721 3 Iglitisil Isr reglar iIebI~blal pauolsl 60 41 723 I c 4 241 I 2 166 hhI3OdIIl IScot type I restriction mod4iicstion enmyms M subunit (techerichia colil j 60 so 216 I 251 I I 01 4 IgIyU Juonon i5phyococcus eureusI so 36 661 I 11 96o 13 slotall Icy; Irpoocue gron il 6 0 31 g ts 2 a I I I663 o1;r1S3I5I09I11 I rohable transpoesae Sacillus *tearotharsopituys 60 a 2s I 171 274 I I 36 I 99 IeIll9211 fN-lthylaMs.1 n hiorohydrolese Iloathanococcus iannaech1 i6 741 9-- 1 01 463 2 ;g11737 IAC,00214i o05? Itacherichie Colli i 3 *6 9 II 319 I 1 301 IgnhIPDjeI)74 Ixrc rombinase lLactobclilu ileicheennil. 60 2 344 1 5- I-OL Irepressor proteIn JISctoriphsoe TuClOOlI 60 32 410 I 56 1L129117 11At0032) Yl IlacIllum sublisi 2$334 3--I1 JII 111140 12)41 21 onnwa Iycohacteriu tuberculast 1 '59 1 issit I f~ Ld~ 51 it 14L Ocr~ 5 1 Ii I t.A I tO I 142) 4 v 14I133860 Isl rCL il ac b deticat 5 i I3 130j I 1 911142469 &I~lsPIrs oporom regulatory protein [Bacillus subilisl 59 )4915 1691 1 1 4 qnllPtD1tG~fO633 IPC~hh l ctrcpt oc rn c BiS pneu aniae lr 5V 1 4 111 1 I o I 641)s 9156 191(II01 luni~ ILclouc subtitlE l 19 3 t 1304 Is Ill Il~o~ l I 22 4~ I1393 1 onIP1269 sl (hoperal.rguar pro tein (Seclr bllluesitl) 5) 34 1 935 I 2I S illJ43 1n111202 IePCA isrptocccu pnuunimej311 I 1 1 641 19171.
IsI i0 t 2 I 10 1 IgnlIuoli23166 IIypothetlcal protein (Bacillus eubtiliul I 59 1 7ti F i1s 131 1 51 InIPZIlO~1 2 O Iunhnown Iet obecillu.us esel I 5t I 33 1104 3 1951 3111 l91a6185l a Ihnlotetnis prinI ili s 59 1 0 1 111 114 ilu 11~11 19111657647dl~~ I~apil iphtoocu ureusl i 1 55 I 3I 9101 IL 36 Ii 116014 1 l500l 135( Ir enecllpeice dn regionr I~~O~llU II t633l Iflerhanccocrua jannaschiit I I 160rt I 1 t;o d 329 lVlqall (bacillus oubtir (I~llal 41 951 r I io I ir 4 Ir II112 1 6371 1 7137z 1I12223 IiIroI22oi YtsK (Becillus subtill,I 1 19 34 j 916 61 42 I 1 j 3163 1911161464 5 1piinogcInae &sotIarIej sb~ls 0 te 3I; I I 21 .1 172* 1n 1001 2 qke*3 'ilecilluscn~g#Iooo bcllssutlal5 6 4 2004231248 23 Nov 2004 TABE 2 S pneumonia- Putative coding reglons of novel trotens ro knon proteins C' a I, g" FR F' 'sin I Condo Clp I OjSart top mac IIC t lh gennde In Igt 110 1 Inl I Intl acei on I I ient lenth i I 7 5199 9IgI '3621 lntthernate kinesm Icoski (Hiaemophiiu Inluense I i I 31 I 11 I 7 111 11111 j10OS I nlPiDIe023504 1pokttiue FoJ protein lBacrllus subtlisi I I 44 1 12I Ito Swii-Pot Accesson Num~br P1094. roe C. coil illycopiase j I iS II Io 8766 i 8121111590896 Iii. jannreii predicted coding regin IIJQIIO IhetanoCcocus lnnauhli I I I 6 24 112) 6 36 1 nIPiDIe900S hmonlgou, o 011 in nrdcv operns ci E~cl and S.tphiurium I 1 I I 11 1W42 ii i IgnllPiDI l I winienl lbycoaterium b tbercutouial I s$ I It I 2 1 2 123901 i11 8 1I16542122 protein with homology to Vail repreasor of LB.ubtIll ILacLobacillus I I *0 5161 delbrueckll I 148 3) 1l I g14 IgntljFIiPdO ?05 1A600i4131 FUNCTION tNKNOWN. SIMILAR PRODUC i 111i, INFLUIAC 51 1 Alt) I I I II SYNCCIIOCYSTIS. i(Bcillus subtiieiI 149 li0 I 23i3 1 834 IoII'0422 Icor-bindinw-iEctor I15itephyhococcus eureusi $9 40 1 10)3 164 I) 1613 16013 jInIPiDIdI0063 IerriC enquibactln-blndlng protein pretweor Fats C t argul l n 1 VL I 43 I till I *I Iilcl ue~ ubtliltm I 16411 2 1816 1 12 lmnhI;lPldlQ9;64 Ihomolague of ferric inoulbactin trensport System pererase pro tein ratC of I 3 1014 I I I V. anguilabrum Iacillum tubtiisiI IChnoihabditia 6Iens I 1 1866 42506 11129142 Irepressor proten Ibacr iophge Tved O ti c Ii 1 I2)60 i I1 6 B11 D 39 1 0 giI610i01 OI itsherichieple t li vget framaseift linking to oiii. not found SI i 1i 1 1 1)20 i 1711 lgtII6;31fI lhimtidine protein kinase iStreptococcus pneuoniaei I 1 37 1 1410 194 I ii 1019 LonhIPIldODS75 luninoman theciiue subt1Iiisi I sI I o 6031 4- IFJ 39 1; 205 1 I;niPiDIIe113O3 lhypotheticel protein Iacllue aabtitiie 59 I t 9OP 220 3 4 3 155 j IgnhId101 3 IYqhL IbecliluS subtillsl J I 6 ol ,I 55 IIs 110)IEOOO i64i~11 1106, thi 308 as r6 ie )1 pet denticel 515 gepsi to 31 51 redues o a an epprox. a6 as prIon PFLCECOL Si P1,2675 lscherrhi j 1 4 0 1.....I1400 I- ll-us subt ii I 29 l 2004231248 23 Nov 2004 TABLE 2 S. pneueoniao Putative coding ragions G1 novel protalnr'slatIar to known proteins C Dnl Io Strt IStop *Catch gn ne min ident length InID Int ae sU 1 a I nea on (n Shelysn ynchas ip.r 5 2 1 r- Sl 4 25 1 26 1126 IghIP l olont in lag clliue rpe a Ca 3 raa r cs s 5s 20 ol 1 0 1 101 1009 l PID dn I03 L1 ecM31 is 40rl 1 1 166 1911666062 Iputative Itactaoccus hactial I is I 41 14187 4n 23 1 23 1 1 4 I,111 30I1864 juno I cc tiip 1o mll e i I 6 1 26 116$ IpnhIld5iO1a I (ura Icel h su cyacl ap 581 II 5 512 I9S lIP1DldI0047, H 4a -ATag subunit J i~nterococcua hiraoi so 39 3 416 )a 5 0 90 I g I4I bindin o a t f %ranp art A as C IF imus I so 3 0 1 23 I 6 I 2152 1210 1unI~IDdIO i i Iunkncn Iiacihiw~ ~ndla avbtiiialm I I 4 1 174 S31; 901100 .M I c 11 an 1 66)4 ;11 46k; Sl5 I, 562 i Ieb~e~ n unnon Bl lu t aubt ll I 6 41 I scherichl Sp IS 5 I 32 3 S 1 o z l d s ub 1156 St 666 49 Iv010a 11'3 06' Itlreor protein FuuiC Elsrcacciii I~r 1B I 1 3 3069 2 ]g 28 glk64nodeln-2 lad l 20s sut lyci So4 Il 11 22 I46 I IM4 3601 ginnlIPTDII .14l unkno Iwn rlaclnlga subtilieS p C I I I 4831 4 2 0 1, 1 1 22 3 52 I O O 0I I O I yt o c hm ad l b o g et l i s p o e n l c A l c b c yi I 1 8 2 5 I I 1 4 1- 413 1 342 Ig I 9f s o a pr IYt i F E 5c a i*i Io i 23o4 j 1 6 81 Is 1351 611 I,31861ai lera sos cllesee oc us tr diI I-od li -2 f h -2 3 l ly in x5I 6o 1 31 1 311 -IF )0171 a sI o 1 1 9 4 S 9 4 3 3 3 0 F 1 t r a p t o c o c c s p n e o nJ O S I 1 5 6 4 4 1 1 1 7 36 I tl l l ll I 441 ;3 911 7 35 IS oqwrts. hydrolyssa, 2 6611 7 931 0 s Ddll 3 4 15te pt rrc c u go rrdon n~i l 50 )a I )1 6 2004231248 23 Nov 2004 TABLE .2 pnewsniae Putative coding egiono ol nove3 protens stlar to krnown proteins Contig OHC I start I stop Metch n match gen name I at 1 11 idunc length ID ID Intl I Int, cson n S112 I 27 1 1720 wc ene product Itacherich e coll j 1 31 Ill r I S clJ 1St) 5240 1gi1159 2 [NC rnsporte. probable Nrl-binding subunit ibethanococcuu iennemch&Ii I 5 j 646-; 120 I l 1.4421 110 Ign1IPiDldl zo jyqgk ifacillus ubtilisl I s1 4 190 126 i] 11) 11261] giJ629110 l0RP U a lntroocu h~treln L 56rp~ccu 42unnIo 4191s I 4- I 3 611 4939 gi1600301 jactoiide-eiflur determinan lStreptococcus pneursonlaolI Si1 11 ill I 1 1 lii 890 1gn11P101e169486 lUnrnown IBactlus sbliaL 1 56 I 36 760 160 I I 6 1 9865 114gn -1301 lost In oco Icls blcril I se 6 0si 1 I ii I 4946 I I Ign1IPIOIdIO24I' Ic-' rotion rIloo sienuIcll I~lrl Lurr 58 fl 1 S1 lj S- 4 11 I I1 l a Io147511 4 Iregulatorv protein IPediococcus pentoaceuei I 5 1 0 46 4 I I 4 121 474 7S lction-ral ated protein iCraterostlgme piantagineuml 1 56 1 5 231 I IO 1 I k 144 1640 1vnlPi0ljeI46722 Icompetmnce pheromone IStreptococcus gordooit 56 I iI 177 j 4 S192 2 J l IA 134 InlIPIOdtoo 5S6 tr6 at oPIdo Ilattul rate l 1 5 t 44 I 6 1 rI 206 1 1221 I 696 lgnInlll l l~ls Iproduct sri.lr to WrbA Ilactob~ clllu s sake) I 5I 1 15 59 555 IgntlPIoIeI0 soii IiwPthetice i protein jeacillus subtlilsl I 5o 13 i 179 211 5 I525iO 4321 0g146647e Icllobm pioephtr uire Onye1"ieila tuolesphle 63 3 It? I 1 14i 5106 InIIrdo?4 S. subtills celloblao phoephotransferese aystem coeil P46317 1991 s I 1 4's 1 Srunseembrane tBacillus aubc li1 123 I I Oil 1i1172717 Icell divisi'on nP-blndlng protein Iftch IHenophilua Intluensaol so I t I0I 1 b 76 1 2 iii1 1917~)30 Iae~ Ila~c~ilue aubtillsi 1 32 14 280 j1 1i l1 slklI iui I IAOaO Ill bypoth ttL C 29.6 hO protein In hrC- t oil Interenic regin j I 1 1 i 06 1 4 3 IgnhIPDI.334180 IY)bL protein Is IIlus su tlil 1 5I 1 47 1 843 360 3 1 15 1 2092 IsplPH3llj v ClIYOtitlZCAL 45.4 Ko PROT9IN IN TIIAMINASE I S'PCREION. 56 j 2 6SS I 36) S 21 266711 jPiIiG6lt1 IS antigen precursor IPI*amodiu. (alclparumi so1 51 1 294 1 5 06 3 113924 Ib-291 membrane associated protein Itrypanosors bruteS subgroup) 1 50 J )1 1 04 362 2 411 519 IhrothLrlJCa CIL h otlcaI 20.1 protein linrt ion sequene 1511211 Agrobecterlum I 56 41 2004231248 23 Nov 2004 TABLE 2 S. pneunonle PutitLve coding regions of novel protlna asher to known proteins Contg lar SartSto II. marh 1 tiatch gene nsne 9 gI~ I Idiot leInth ID Ila I Intl I IntlI 1 ression I Intl
I
110 J(I 61 7 1g11113i15 lhIooe to nfch l rabidcd opil thlineKl 1nt~~t~u 5'n~fhl I 1 Ia Ii 1 7a llo IonlI,~llO Ihofl i0e tote peuteulrianusl C~ll I o I 4 I 41 rI 31 4 203: I 3711 1I12293213t 1A02 Ytpi acillurl 51 027 64 I33Ii It 65034 1 644 (l~IPIO)elte hypothatical prinR ~acilu subtilis I 57 1 31 I 413 43 1 9 64JI 4t 763. I ;i115 3 2 9 Iypthtiel pro e I inhallu s eub m atrie 1 57 I i03 52 -i 111. 650 I 11574i524 Ihstphosereno phoephsncae ecHthaaopiccue jnnnflchlun 1 51 31 3 9 9 I S 21 1 gi iPT I,11630 I rriicere-sbcruedpo Iyantelnhn c(net ble drf 1v i li I S3 1 1 1 j5 115i6240 IiOnglOOO?44ndd-ON*-apcLe YJuclet IrAhlobHaio In.luez G31 I 31S nI l' iDa7 19) Ilrpoos1oss0 homlogous tof~r l swisart~soaEOLL hyoteccc rotei Ihvilusrutllu I 51 40 825 l 63 51 I -T 4935 9 l111162408 C*gOOO~tol IirJ (Rhizobiva ap. NGZ)41 319 s I li 73 1 15 1 k~s; I IJ0;3 onlIP P014190192 Ihomotoouw to wwissrcr ;YloA-ECO L1 hyporhseilc I PFQL@Srt 154CllUS Vubt IIl u 1 5'14082 II2S61 I tAIS jgnlI~lO tdIOO965 Ih~ologve of RADPII-flavn axiderducta1a Frp of Y. hI Irye I 8Ct e Ia I 4 147 62 9t 5551 5163 10(1(( short region of sImIlry to glycerepioheryl di nar Plmephdlestrsees 57 31 1 ;;1 I65000264i 1 52 pe Identical (1 PepI to 322 redues of1 fragmnt I I 4I 871 i~ 1" ~i5~i P 41 ~'Ii78759 yIEtEt Sw: P1244 (22 cal) l, Eihrihlrot I 1l I 3 13 3 I IS I li71 IglI00 Iuteor mlutT protein CIlthnocccus lannssch~ll I 1 7 1 33 I Ill I 6 16 ISMz6 I1* 1 go7811612 lhronlne synthase Iaradouis rhallral I 5 r i3u 144 2 93 1131 16ll 115L12 lei't5O03( IMra protiTn Bacinllus subr lil 51 1 44 1002 I1 )i2 I 1~ S 46 1901 1h111551353 In. Jenneshil prediced coding region (430671 IRehlocCue jannechll Io 5 1 3 I 41 I1 1lr'" 1132.1 rIA56G1I~~lle56 Imetura-parale-Infca erythrocyte eudace ntkgn '(IsA PIcamoium 1I 1 9 I1 2 1 3_ 1 130 1p.r1V141411r44& Ihypoibet2c1l protein 610255 Hesttalihus influensee (tran d itto1 1 1 3I 1 76 0i 13 1 0 I 10 1 HI. InIIsDldbOpldl 11A80coding cuticle ransprt syatem perneos protein (Chiorella vuluerhl I S? I 4'd It1I~ IL9617 11923 pkrIAS60SIA4S Imstur e-parsalteIntectd ertnhrcyteSurfce a genMESA- Prsmor )ok37 2 30 2IL O. I 9 9790 g;96 53192 pnupo ygn SIttrptooccusorSisonice I 57 1 39 I 540 1I Il 139 5361 Igl4O83 InagD gene product PA 1-2101 ItreachI oll I F 36 I 71F 2004231248 23 Nov 2004 TAB3LE 2 S. peumonlaq Purative coding regions of novel proteinm 4imilar to kno-n proteins a n-d-om-I-m-" -a t-19- -n IS br- n tool lContig 4 0RC 4 Stree Stph I atch gne nsai al51 3 lost I 2 ID IInl Intl I cess~on r elm I Ident lenth II in r 214 sI1 bba1146412 iSpa~endoardlte £wv..wdennt antigen Itrptococcue ~rol UOII tI to I i I 6 4 ~243.' Peptid. 1566 ciil(Lreoococcu a sobrnusiaus I 24 125 128101. 13405. 1e1150157 Ibeta-lucoolde germanei liollum aubtiliol 4 57 38 es p Ir 14 6195 7(30 f;l19S556o Iunktpocr I5chiooaechrpyce pombel I 1( 4-21 28o gly I raoseram Inpeat urlanul 4E Into 41 57 1 34 4 I
I
V I 4871 I 1 l;961 s b t ,0 I le Chie coli) 54777 I 1 571 35 97 6I 11 910 9149 Inll1d1001 29 4osrl j kcetobacter peateuriAnupl 5 I 421 461 4 ,I 1116 311n 4 210At 1n 8le a Ion ANAOD O I cII soll LICSE C IIO1 l 57 I 1 1004 1 1-3 I I I I I Ipac;ll (ro/p cslt eubtillo Lo ccu I~rr I73 11 ;43 2 J 011 45 1*0 1143979 f Lutrlicu smLI cElricplasid gn fo e rti 1coaiis16 4lr I 31.It 24S 1 0 S;1 111411 IEtiA-gn precrorac i lmlu lprl I 5 0 4 I 1 7 1 5 106 677 nil4045 rIyehU04 4leelha c il1 5 1 170 4 )6 1 0 1 5f 1 3 i ,910 oI '1b tu moVll cyp ti m gepr for re6 p 30 n 2 p Il 10 24 i874 Ig 101d10i0 sdw-ooe permease; I T*IIynechcyeti ttp o r~i LI~brl I 16I 34 1 I I9 0 4 3is 10n1t13g4 1 rOcurso opr o crl ov fproti ipom l b 56 is I 5540 32 439 421940 4241 4905 P1Dd5200 Is coI ROABl 5CS S 1 u bi 1 4 6741 1 3 5 1I24152 leasC 513$) t-2[ I acteriu timbasi I S6 4 39 1 3554 I 16 4 9 44 2 1842 19 l1 14 0190 7 i um-rei upre in p a, as Isnracoccug1 an.sc h 56 so 454 1t 4 2 3 4 1 4 3 4 350 I3 11PI1 i o is iol F5 ct actr pthtgriau a l 56 0 1 381 10 13 ]II 7 5 9 DNA ttP&Ir P F~dlu koul n RAD2 N ethanococcu s l a-'oeschiii 56 21 1 0 ;Id 1116 (nljrrpl101001 F l00 18l cetab acter N 5FIU F.18astaprianusl56 1 II 2004231248 23 Nov 2004 TABLE; 2 S. pneumonia, Putatlue coding regions ml novet protelnlfaallac to kno,.n proteins Sontig jo ar Start IStop match I marh gene name aim I idant length I10 ID It Intl csa osion I Intl P-00-- hm--theti-o-t-pr--t aI 3-r--g I on- n--aa I-s -r Io I0 Ill Ia434 mlciIOOJi jerolide-ufItuA determinant (Streptococcus pneumonia j 56 27 1152 4 O d III) I7i 11i I9nllIe3129 IPinU iLo5clllus plncruet q 10 1 1 4 12 9 Irans criptein acvatr tacillus auhbItia 1 6 3S 1674 3~594 IsniIPiJdlOllE lebrns roten (acllus atarchmmopnhlus 56 921 Ies ie 117 II lgnIIDldlollY lORE l*Caobaccer peturlanuet 1 16 I s e I 1 ilai! I 4940 10i11rfa Ipr %wct similar to c i RFA2 protin ilacllus subtll 1 5 I 42 0376 loS I M, I1 iI gnh I PlIdiot 11 jhypothuttcal protein (Synechocyscia 'p.1 I 56 1 I In w 1S11 I 9 2111 1it IljiiO i IORE o345 Itschalchi a cIC1 56 i 1t 1044 I l2 1 3" l111164103 I h-elfnl ty parlpleamle glutamine binding protein ISslmoneila 1 30 852 4 9% I 2 I I Ifl t1 e269 Iunnw tpiiloatru Iueclt a 1931I 1 11P901@240093 lunknown _[lYCOClarulun ubarculo is I 11 246 I 1-1 4;0 107 A10ht01d102 Ilsuman non-suedes nyain heavy chain IH140 sapinsi I I 33 I s2; j(10003iI N [R Rhisobiam ap. 14CF331 56 33 204 1) 5 1 uft IptrIa la lIaeO cr51 iicel 5o oi sb 54 2934 411 I 48 1 0 4674 I3669 Igni IP0o dlJjit 11 *lhothai Cl Protein I Synachocyai I s sp I 100 l ag1 1104 1739 1unh1IP1Id101099 phosphate transport yastem pemease proteIn FrtS lISyuectsocrSti a p.1 j SE I 36 1l 1( I 4449 2143 gn'liPIOl3odE2 Jprobiysic:Ir-:peIgIc recomine. ci th. reolve.. SIcily of *fly'we j r I )1 511l( cI3l 2S7l 03tIO 149i this 311 so cr( is 27 pct identIcal 1(1 gaps to j01 I 56 o I o residues of an 4pprox. 130 as protein YXKCBACSU Sw; F39140 tEMsherchla colil i i I I I 7 1 4979 5668 10i1391353 siml~r tol Sad ibs suLl i. hypedi. 10 lcfl protein, 1(1 Ltr rm.ylI1 54 I eai 6901) IsItucherichia colii 186 3fl; 7 Ivl1fl3zboo P11 permeass for annose ubunit IPsen Ivibrio furnial, 54 36 2466 181 2 I102 IplrISS7IO4IS1 Ivlrl4 protein Streptococcus pyogones (strain CSiOI, serotype .149t I 1 35 1584 I 2004231248 23 Nov 2004 TABLE 2 3, pneumonia. PotaCi.e coding reguis fi novel proteins similar to known proein.
1 '0 laD I (UI I Intl I 'cession Iangt Ia I I tnt sr~o 304 I. 233 i 2$1 9616 JORt-CI 62eacherichia coli) j 35 I S 21 Zy6 1 4 1 163 Igii;59661 JC1 irleseid p01VI 56 le i 1110 1 1 219 8 1 1 104 s ll14611? Iputative iuaci)lus aitial b 5ti1 ia SI S 2? 1 230 12 1409 1 145 IpirICSOflhICii hypothetical protein 2 ir S' region Streptococcus ntan. istrain 56 40 1071 1 1 N1175 eoaoype A) 56I pi I 231 63 *5 96I i I;;liaawas rhoptrv protein itieeeodiun rolil I 6 34 1 4 1'I 4* r 273 j 2 1542 2724 1,1114)061 jIep protein i8 eltus subiltel I1 6 31 182 I 53 I 526 IonhIPlDlI2SO Ihpoheticel protein a~cllus subtrlial I 4I 41 16 IA' 7 1 I 100 IgLi)669? pit Identical to the first 6 esidues of the 100 am I 56 44 555 hypothetical protein fragment ICOCCOLI SW: "4744 itEcherichie Coaut 44021 (ID 1 5733SJ 1 lutr mebrne integriy proelin t~ll IweemOphiiUs Inluenscl 51 1 285 1 T I 3 1 2 1 SI I5niPZDIellI3l1 Ihypotheticat protin Ilacillus subtilisi I 5 1 501 1i II Ii 1 IgnilPIDIdiOI71 Is negative regletor of pho resulon ilseudcoenas aeruginosal I ss iI 6so D- I c, 8 28 4 I134 1618 IgllIFZC 1el5I6S IST proten IDicyosceilun dincidumi IS I 40 207 13r 4 043 Igiai i lunknown prote in lAnabasna -ri-I 55 1 1 I 5461 4 It 3 lt 9655 110)0 1 Vil10901 Is.eubt-tlis genes ipasl. rnpA. SDkd, gIdA end 9dB [Bacillus subtilie[ I 55 1 a 49 1 I SW3 6183 10iII1i695 li(At000I76 hct-resporaive regulatory protein IEscherichis colii 55 s 21 56 1 51 1 4 1 2301 1 3M 19MIPIDlD dO LM j YbbA lbocllys subiliml 55 42 6 952 1 10886 iIiSisi a 1 protein iStaphyloccUs aureusi 1 5 22 1227 5I 1 1 151) I 0 19 1eii7t604Z Iaspr loP orrelia I I S5 bur0 ;1 5$6 gl t4916 mane iu ad ob lttanpot roei I.tanccc* ananhi l i 1 I 196 1 5756 )1i114586 Itenelum end cobait rnsprt protein IHthnococcus 3annechII 55 I "i I 963 4- I 1-i 9114394NS 108 lqilii2O iglyeosyl transfereeiINesseria eonLongitidiei I S 41 33 I 5 6 1iki 4229 IniPDIelosago IN4a alcohol dehldrcganeae e5aclius aubtlJIiM I 4 I 0 I 0 1104118 I 9820 IgnhIPIDIel324SS' Ihvvthetical protein 10acillus subtilisi 1 5S 1 6I 669 113 11 1l1l InhIPiDIe)1I496 Iunknon ileclllue subtilis) IS 1 34 I 76 111 113 I1130l0 10394 Igillsl3423 l-phophofr.Ctokimase IfruTill (Raaophilus Intiuanaaei I I 31 9Is I i I I 67~6 1 fbi0 IgiIIJ9OL~i 1iac0044u hypotheical 3.7 hD protin In IbpAgyre inergedlic region iJ Iiecherlichl colil 2004231248 23 Nov 2004 TABLE 2 S. pneumonia. Putltive coding region. of novel proteIntd911milar to knouw protein.
Cti la s? Strt Stp rentch ouch gne nn S0 cim Int Hunt length 9 23 3 j2119 ;902 IohPOdll3 Po-pepIdso flacMONlus Ilh~nf iu I ssss 55 1 31 lots I 1 II 3 I 2 3 j 110 igj1g ,21113 jrz 1acilhlo e'ibtl I.I I SS 3 I 4 196 1 6 1 6I 11$916 156 sglir Ofhypothe al *ain in a rap 'yc-ln synthaais pens cluster o If 55 26 1 4 I4l 3 j ISSI I 2134 7911413330 Idhyorollpoalde .dhydroganse IClosrrli.u mognuml I SS I I 11 l 1 1101 6933 InIPl e3)lS Idihydroorocese I-actobecLlus Ielchnertlil I 55 i1 1284 r--8921 9- 1 41 1 3433 jg 0571 lperlpheral membrane protein U Ischerichla olii I SS 1 20 690 I 6 7 li71 450 1g143579 ;transppose IXanthobacter eutotrophcul S$ 1s I t 460 I 43I'e It35'~ 111650 1n2171014101331 Yqjgl Iacllui sutllal 1 ss 32 915I 1- 101 310 IAC-1 -44OOGOII cnrvd hypotica6 1l integral mombrna protein Itellicobacier 55 34 504 I I py or 11 I 9 5I97 91129053 Illlat to C. coil ORF adjacent to hut oparon; aimilar to g9itA class ofl:9 7I regulatory protein. (EsLcherichla cllI Il 164 3754 2311 [InPlp~oI sai Ihypthetlcai rote. 5saciuluab ti iii I 55 i 3i I 54 t I" 13 I 3521 IvII'~3r i lut. resliae no I ISA I 2BiI (eaiclius rhrlginamil 55 Is 3 750 33 lip 01 235ki yplslO ln~n~v IC pro lei r iun u btilisl SS 7 41
U
I t I352 21 I u.n o iIe v aC t r Iv R A tberc uloi0e4 l (Baci l s I is I1 in Ii 347 1 I3663 5345 Ioi53S53 inol ved(ZI Iho cIn prei areton Ie cilluw ubtI11a~l i 55 i 2$ or S4-- l I9 i5 423 IIf 39 IgnIIPIO)lU34I Ir~lothelca ve rotein ilacsL sbtl Ii IS II 32 I 192 5 3219 I 301 Igl~hhliil1i lolt igenl convraeisdsagyt)1 55 I 9 l~~I I 3 I~ 14s 0161130 IglLS Itanirae p loln Isynthei mrl IleohluIiu;jI I 3 l11sl DI 4 301 0 271 SgntIPiOlellIll4 Ihypotbetleal protin oellus sub tilsi I S I if L 5 I 1L 21 1 1 3 744 IgnhlPiOldIOI(1 Itrnpoase Iynehoysl. rp. I 5 ss 3j 372 219 20 1113 456 10112d6301 loan gee product saiulus megerlluI 1 5 1 33 1060 1 15 T F1 3441 191119P37 I1glr1 p rodicit proinls mo s rl atlia 55 30043 I I2T393 T75,- 10 ;2 IonIIondI0O974 unknown ileclilu e eub il u 1 5. SS 41 72 I 2k6 j 650 630 01 5A ar I roD uc iLbr I lu hhurel 1 5 31 403 237 2 J32 1595 lol1150546 Iprc Iorp9yro4onee vnivhial W5 F39 I 468 2004231248 23 Nov 2004 TABLE 2 S. pneumoniae Putative Coding region1s Of novel PrOtoine1iaC to knoiwn Pro~eins Ca 0i I J0 St r aIs in- ato t- atch m atch gen na aI~ s I idant length I 309 2 I Igi5l4l;iL Ihypotheticelitamplu inueal I 5S n 5I 7s gill )I laceS Iticherichle coill 1 I3 6 364 I 3 I 236 346 jgj3~~l4 jTh-291 a obrene associated protein lrrypamoeo, brucel subgroups I 1 3 6 31 3 9.11 j 105 121~11 71 IS antigen precursor IPleamodlum lcpruo j ISs 40 1 I i0 1 I,I1229ini6 IIA?01201 signe transducIon protein kinaso hfleciilus subtiliel I 54 I 6 j 901 )RI41fal 111 91403 putative trianaclptlcnsi. regulator l6ecallus stoemctherapbilual 5 4 27 3 1121 p -i 5497 I AI I 92 1113911A9 Imthiony-taw Syfh'etsss lmaclilJUe sttatothereopi 1v. I 54 1 15 195 I 43 1 4 1 )HOO 1236? IgnhIPlDlel4I6,1 lAWe transporter rLsctobaclhius hwivetlcusl I S4 I 251 11# 11144 ;I 210 111762163 eIro lstehylcoccus ssulens) S4 291 12601 I 1 U11531 $177sii Iendo-l.4-bstai-xrlanaee ICailutoinonas itAll S I 36 1 I 6 I44 421 IsnilIP101d01217 hypothetical Ilacliasf subtIiial5 9 0 71 1 1c.$ 111'03 1g1501 or! lech'or l t c i I I S41 i I swj a I- 127146 127;)3I ;o i2014 lartonin receptor i ettus nurvegicusl so 31 t 1020 ~Z 2 1 144 109A j1j146623 uarnB gone product I[lasmid ri 543l5 I 6i 1 11046 j~ oblae mrxlsb va4 is 3 1 734I 7 110 11 4(43 11161 1l130034Z IOA'P 3 gene product I11radyr-hitubius japonicuol S 4 32 g
I-
36403 6 Ic;;5I;15 i i 9e I aes-re Ietod protein Insoudorsonas *1caiigonoji I 541 3045 4, Ii 1 i f;61 2 I t s 3 1 0 11iil.6, It a pe o i bl e c o e tMO h C e ru 0 j n n ap o .eh i S l m n e l Ine l a 50 4 70 a 43 106~~~004 I' ILso 1A01 T10(8 SALMONELLA iTschrlchlRccli IV~t A IN IA R P A E Ila il u cclii 3l 2004231248 23 Nov 2004 TAIBLE 2 S. oneumoie Putative Coding regions of novel p'-oirl aimliar to knwn proeins t~nCIo I1C ic OC~rr S~o I j I: match gene name I O 1 l I 14) It 2 I g l- I ate I It Iddrit I 147 I 15 1 I 115 ign13pej05i lunitnown Iilvcobac rlum Lt~btCul-e-i I j j lass )19nP101e31307D IhYplothetloel proten ieirllus aubtllell I54I 32 t I 207 3 1 I 1631 IgnIIPolDdroists Ihyporhetlcel protein Ilynehocystia a. I I 24j 11 24 I I163 61 I-gniIPtjdliIJ 3Ihvvo;h et proteIn ISynecmocy 1ti ap.1I 54 1 4 26 41 I 62 50 35 IgnhPmD~loha l lrnpoiabe Iynaclocytl I I 5' I 10 1 f 3 1 363 ig 39392 I04 koP6041prot i I lepoc s. tu t ane)l I S4 1 Q Iq 1dot 307 2 I j 2 I 1 1 3i9 1 AT00 230 Y;t? Is &l ogo. u I 1 4 2o3 1 132 32$ I oti39 1 15 11151tei IIACOOOOIII TdIH l (l iu m u. NC4is S Io 26 150S a 3 1 I I it an ISYDI 141CYin lililob jo sp 54 10 449 C I I 1 301 l I g(ill S916 IAO-rbo yl lpo i r iae IdraGI Inetlnoco ecu s Jannsschlll 54 1 )Q I acnorId lItuIIre: Rod protein domain aa 165 36O ami .no acid ieaure24 al qk50 1 6 lnlP I eS i" IWPhe3lce pr t n Iacllu 1ub1 ll9 I 13 as 210 F- i I I S A.
23 1I29 23 gIP)l~~i 31151aall*a tO SALNCELLA tY~Jlnugru ILS6CY GEt F.(QL9IRP1 FOR S) 25 4ri SUeVilIN LtAnCROPweC. aclllus aubIlsj I 29 3l 5042 130121 1g314113 aLalne phosp9a"asa maguiery protein IhihI soLI) I 51 31 1 ISO 1 I141 1005 IplSlPl5 IiO IhypthetIcl peotuin X yo occum woeoe r anr,1 i I 53 I 1 f 0 1 I I &S 4 40 14 1n )(110h 7 6 3 miu hno aci llly eba ter Wlu oer loti slaj s Ib 41 41, j6 482. :mlmo,:cDc1 lastue: Aod protein doain. am 169 110; mino &c d feature ,s r( 111 111 111114 1 a u F r to col fat elit rbrol~ al S,)1 is 1 ISS I hypothetical protein stlllf lacll us Sb.11 )I S t 56 111 Il,~s I1PIII U OC(60hL IN M ACAROr MA C C. Iffackiv s subtnlil 14 1i I'0ss 1 l 1g1496 aloi gne producata I e gltoai y'a pre in jann ijj Iwtla $3 )1 30Slo0 33 I I 1 .1 5 0 I IirI3592266 Iretr ct Ion Iod tlicatioln a S subun I~l ano cccus *n*a(chll I 51 11 681 2004231248 23 Nov 2004 TABLE 2 s pne.'cn- iae Putative cadiatg regions of noui oroteairftltma let to hnown proteins C o n c i g l o s t I s t a r t I S Op t h g o n e n a m e e l m I Id eta te gt h 10 LID I i'ti I ntl I acson a n I I I lntI I 7 jlO 9431 6467 [I 1768563 I 45000 Di0 £11; Ridues -121 re 100 pct Idnrlcl to YOJLfCO LI SW; I 5) 11 I p3; 112 enda: 15-352 ewe 100 pcI identical to VWK..ECLI SW: P333399:: crlcl "11 n PI I inlPIDI2fl67 I' 624 14 ICnorha bdlti, slogans) I 5) 33 416 71 M 72 1l11393394 jTb-all aeambraru o associatd protein ITrflynosoma brucei aungroupl I 51 13 f 1769 72 'I 3 D I)I 2640 1911229)i76 jisPOOI2Z D ID IAcllius oubrllal I 53 1 2 S1947 I 1 l4 I 11 1 9212 Iv117 1159i Ip tartve cobaiamin synLhes protein iEocherichin coil] I 33 582 I 6s44 Ii1091 Iputelve iiblaL-soi~t d rotein lhc~lnoroycas neeslundll 5) J I 361 I 31 I 2311 11666 III 13 66 Ilucnae onidordut~e igoIuconobaccer ocydena) I 53 1 317Q9 1 s 96 9 66.22 l6) I,111688 jan putative 42 tfla rotein Srptococcus pyogeneel I 53 41 13o
I
rI 226 I I tt17 2 19nhlP1e317217 untnon llycobate rlum tubercioll I 53 1 31 13 1 128 liz I 1 7 8 9 9253 IgIIl( ilO u lonraion fuon rt ein Ixnopus leya l I I 1 3 651 1 92)gI160 4 Ift I I p 1(0 1j66 I ~bll hemoialn precuror Str 0 p to a c C Ia aloctra Istrain 74-360 1 1 t 5510 i i 21 2 2162 3 022 IgiliPI;iso noct~rin Ilanpu Icevis) I 53 I 30 m 61 i 3 1,11720 jPS prmees for IPCoose Subun It iCnen Ivibtio furnIssll I I 321 2 5 6 3051 I n211101d100512 Iuninowi Iaclllus aub si I I0 i S 209 1 2946 3935 gI I;Ig5 s flurIc rnterobacln transport protein ItEcherlchta cil)I I S) 1 26 1014 1 216 5 136 2406 11;016 urE gene product Ieeclilus eubtlisi 1 3 f 1 1679 3 43 70 gn iFPlDIa316 76 iYOl protein liscilius subtilis i g 5) 1 30 3261 271 I 1 1 I ISi Ig1IIlid011 moe i lusaubtkliel 5 ll 544 lbarl 116 cillus subtklIsl 53 I 1 4 332 1 1441 I 2'I tI1401266ll~l IbaI h riiiuze subt. loesi I 53 1 21 s3 4 1 2 2 2 I2 254 3441 lai1lp1 le2339l I o theclpprotein Bacillus subtilsI I I 03l 122 22402 123 6 III39 Ilc gn product legrobactorium radlobacterl 52 36 I S I 69 15 onIALe2li b protease Istrepococowa sanguisi I 2 32 1 5139 1 22 126 196 120212 1 112901 I0n' 3 (Spirocisaua aurantlIa1 52 1 35 I 2521 4 1 2 P 231340 ~24666 111209262 ICo ODan IBaclilus subtilisi 52 ]z 15271 6 1 39 1 0 191 73 1,0 (A.A ISO76 I. iue l~henllormlel 52 15 I 597I 2004231248 23 Nov 2004 TABLE 2 s. pneuionlai *vtative coding regions of navel p1ol nlaiiir to k;now proteinbs I ontig ICIF Strt Stop match I mach en name\In Id~lInl 10 D10 intl Intl acolsn IIjt I- S I 1 3604 7257 1 oils821 jputative 0-antigen traneporter escharichie olil I 52 2 21 5246 S4- 4- -4 I 1 I Iro 16( jnOInlldiO2llII I15005554 bndage ore found In C. coll and II influntasi sea 51415$310T II 51 1 36I1141 ACCI: P42100 I9acillos aubtillal S4 -r I 4 ,16 14'85 113726 InIPID 2OS1l4 jor2 [Lactobacli s hleiul I s% ms I 601 1 49 3 575$ 1g111217740 IP019871 nitroten regulatory Ilk protein IVIbric cholarsal I 52 19 S 14 1 4 772 4668 100412 IH. Jannaavhli prtdictad coding region M42517 Iieti.nocccua jsnnsechii 52 I 34 ILos 14 3 I 5250 I 4969 1 11112493rs I IAsooooiI ia I1 Rhlioblum p. 14012241 52 4 0 262 66 4 3 6lDO 955 jLd11 a ItrhG protein Itacherichlo coll) I 12 3 39 1 im4s 1 26 1 1 312 Ign1IPi01e31a992 Iuntnown (lycobactniun tubercuIoehul 1 23 61 I 1 3 I 1425 I 26 9 IpnPI e1 i I a ios l hi lnes. daxl*cor I13 aii s eusttiiej 52 12 1161 I ii Ill 10687352165 1il~~ phehrepboyi c *elnoialdezle arbi l. 2IiCijU aetcdn 52 1 36 I I I I 91 151 (IlllO Itaciliun psur btil Ffr-~ lctr el I0 I 3 1 1 I t I meLI 113 1g111281 pe ropoc ccue ther l ual 1 52 I 26 I Ituij 4111; I 6 1 0 950 Illi5i 1114 84 9 I0RF; 3~n Iscls t oacanilsmotruD1 3 1 02 1 6 Ii 11058 I79 ri51 l11 28829, ginl p~rod~ct Iharihlu igtal 5 52 31 1 82 I i I 2 52036 Ig r O ein 6 ta n t ron ICnrorococcua ucall 51 27 1 2035 1 1 11 I 2 1 I 2147 1gi1471224 lor lFIamphIlus InCluanteDel 1 2 I1 I 711 ~8 53 235 i~alSli3 j1100hd~aco~:::nEctiitPocencLater Protdin ILegion:3i: I S pnI 1eo 641 191 1,1121411 brm aha. hin I%04 Dal la ln I 52 1 36 I I0 1 122 I5 91I 6 11 4)4025 Ild ivoaahdc inleraos Ilelo r cr1nol ius j 52 12 5 214 i i n lcs Ph il m Ltl sq ga slp nrl 52 106ll~ I 34 6 J 830 I 35 g111151133 R protein trans-acting positive rgulator Istreptococu pyoganeasi 52 43 1 424 I1 1)1 I1 1238 2736 10101e250 Iunon ixyco aacariurtubol11rcuio.Iaorc I~bole~ Si I 31 147 I 1 I 131 I 231 IonhIldPI ol sn~ lunknon ilaobciliu a ub1ilol 5 1 -4 ll ~l pll~~lb)~nnm Islu uric 1 52 i 312 r'I 141 I 11 22((1I ).41 Identity aILS EsclelcLIa cIl ONA-daisga InducIe roteln I 521 34 I6 24C I I I j putatlve l(actlva aubtill I -r I~l 1 1n 1 2 9 I 3 1g1215t)a Irutat)ve or!; 019-or134 inyropisas- pnounoniael 52 I 3 J 7 2004231248 23 Nov 2004 TABLE 2 s pneumonI. e- Putav coding regionis oI novel proceinS ioI Iat Lo known proteins iCQ n-t -lg 10Cong OIF IStrt I Stop match m aelch gene name I ela I Went length I 10 I nti Intl cesslon Intl 198 16 4400 1361 IrnlDIell ll Ihypthelca1 protin hDcillu s subtll S2 26 3 1 i0 w I a13 864 9107 111497647 lUDN gyrceae subunit S Iycoplssee geniteilumi S3 1 30 1 264 21 10 524 15431 1I1350697 ;envelops protein Illusan immsunodeiclency virus type I 32 36 168 I I 25 I 1 11- I 666 1a111512771 Ihypotheircal 1iaclmerichi clii I1 321 1301 87; I 1 230 j 1( 3 6 niIPiOl dioosaR junnown Iajclus ubtliel 3 1 28 I 31 267 1 1 an 2 I~n118101a333078 Ivrote,eipeptidass IlirCobaccrlu lpreel1 3 29 I 33 2 I2 130 j 4 Iq'It (tb-l1l mcobrane associated protein jTtypsnorua brucal subgroup]jS 32 1302 22 I 2 1 2046 j 1171 In~l0Ie1349rsc Junnvn Ibycobaerlur tubarculoll I 31 30 I 176 21 1 1 742 I 1321 1i1911900 JS'-netlhvithiosdeiosins phoaphoryiac ISullobu soltatericus 51 1 II 760 I I 1397 Igil 174 1 linteurese 15treptocorrus progeles phege '1121 5 1 I 32 M iiI I4 26 119;:3 16145 g1i12314453 1A000623 transcriptional regulstor tenAJ Iiaillcobacter prloril I SI 3 262I I I 1 65 10 01e131107 tlpha-n -gluio protein Ij lnopu isevisi I 31 312 96 110 1 7935 1 105 1 114130 0 l -er 3 lAD l03i0i 25 10 Iut. (txi ir c cl I 26106 -4 Ii I 91 23 II1652 Ip; 11C19Iyoutru sre 1 27 126 I I I9 12 t 375 I al 121 6 Io iA070 contains im at toA 36yui avpuy. Cin l ebipsiaia Coall I Si 2 a 1001 4 12 a(1 0 $13 1 72114 pl 2116141 Icpp~l; iHCtrlopH(Meobt, )pra I 21 126 14 1 3 32 i 42 3poba 12101 cont anspoiml n py e ihen iciav clii n j cab do i h LiIa 51 32 2)0
S
139 110 1gii2293234 IIN16 1P52 p1te hipurete hyrlsl Illea ub Ht I si 30 41 1 149 I I i63 1313 1u11i63337 Iprbable cppgr-tus al ri 06n g rolog IEchis cotil t 51 21 I 1o1 1 I I 35 I 1 1 2513'I I 08 N 11145 ICNP-N-ac1ty neiralic acpuri srhetaae I iteChesici ccbiii I 31 I 2 1 634 1 ill 1 1 12)7 I 1S111773 1 (prbal, rr anse helea]40. protin apase Itacllil u o lli ai 31 4 2 S1 2641 I6 I I1 2231 1 1773 aniPIDl3400 lantl lcpporus rantiqenc apol epte (salmria cliareul I I1 1 20 1 439 0 4 217 1 2 1 641 1 1111 IpiprIs32ILSIS329 IpilD protein Ndeseenel gonorrhcsae I 1 S) 669 2004231248 23 Nov 2004 TABLE 2 S. pneumoniae Putative Coding regions of novel prcteindAVmlle to nvn protlns Stonti SF Start I Sto I tatph match gene name I im. I Went Ilength 10 ID I I I Intl I O.:1.e.1Inrl I $0 I1 I a9o 1 3 loiIISO"73 o.1 Itacherichie coMii I Si 30 661 111 91 17)01 4 Ieip1)933)' ITb-ll merane assoctd protein Irpnsome biucui uboroupi 51 2 31 l sn9 0105015 F 3- I n I so l t 616 4 1 4 I 232 I 2582 lgnhIPlDIe3s5IO Ihypothetical protein h0eclilus subilil I 50 is 14 43 1 1 1 5 I11 431 lgI1lS5i233 [llir to veltge-geed chloride channel proten ltsheichla coil) I 0 30 I i56, -4 I I 1701 1991 OgIV 119 Ircherthles cclii a I 1 701 is I1 I 11 431 FniIPiDiI66t jun3no7n ISacchawoetese cerevisicel 50 33 l0 iS 4 1 1439 1 5152 lnlPiflIdIO0lld junknen I*'eclilus Subtisial I so j 2 1 j 1 Si SO- Si 1' 4000O 5181 IlS1lO~( 1l Icrbamesl-.phseoIhee Sytlhee. pyriiitidine-pectIit. large suunitI J IIa I I I ilethenocaccus lannuachili I SiI I SlIt 303 1.11111143 Iyp resc rlccineoCilicelon enyle. subunit lle~hnococcus DI II1 11 ~l J 1 I 0 I r4 191144197 aecetyl esterse* iynCl ICidocellu' saccherolyticutI so I 34 7 Il 52 e 116Ju91 ]1170 IgiI)l0h339 jbasic surIfac proteln Il-accbcilluz lerisnusi 50 31 II 1 57 1 6033 j 6334 1g112351s64 160£ ribse p.rtein L78 iSch izoscchroances Poisbel I SO I 40 I 306 I 31 Ii; 1I340 ju83 IqJA Ibacilum subilisi I so 30 966 96 111 111155 110769 lgnlIPID1aJ2491 jhypoheticaL protein Sacilius subtilisi I so 24 lei I1 I 2) 1205 j330 jg01066014 sl.ler to Escherichia coil pyruvete, .&ter dikinase, S.Iss-Pro. Accession so 24 076 iuleber P23511 IPycoccus luriosusl 96 5 16) 1 19591 I10hPl01e322433 eranvl-lutamylcysteln. synthetae llrrssic4 junceal I 50 29 I 1269 ii 1 2 ill I 111L IgJISil Ileucine-. IsoleucLne-. end vuiln-binding protein IrPsudomones eezuglnoeai l So I 30 103 1 310 2105 gll31530 IC-entlien lgase Ileieoneli typhimuriumi I so 1 31 I Sig i 115 I S 4460 5900 I;glIl'5'4 Outtlrve eel oDspan regulator Bascllus subtilial 50 1 21 501 12 Il I 1559 7305 luilI675 l seeletal muscle ryariodine receptor Iama splanzi 5 I 23 1 121 11) 1 6192 1 190 1911151111 1119-hO' protein lfhitoblun eallictl so I 2 21 151 S I ;691 619 1.140341 ;put. .roIueaa np 1 1&A I AA a1 Iacillua thuringlmnes) I 0 I 3S 16 II) I 2 119 2 IeInhIP1D1dlO l 2 I 11(A001415) SIKtIlAR TO HItROfRDCtASE. Iflacilu s s ubtl I 1 so 1 29 31 4 2004231248 23 Nov 2004 TABLE 2 S. pneuminlae Putative coding regions at novel proteins sabl hr to known protein* I- I Cotig~0t Strt Stp mtc wnl 1 I up mh mmatch gene name 1 si ident length ID D (ntl ntL ceion I iint) 1 144' s1 5 5986 S 3412 1127610 |EpaG ISctrptococcus rherophilusb I so a 555 6 1t 9 17190 6333 91111186963 IAE0001793) 0Illi 2 pet Identical to the sa hypothetical protei n 10 11068 YBHE.ECOLZ SW: P52697: 3I pct Idmntical I gap$l to 167 residues of the 1 I313 a protein MLE TiRICU SW: P4601: Sw: P2617 19scherichla colll I I 16) 6 195 091 IgnhIPiDOdI013dliI |lie Ilacilus subttlHal I s 22 696 I 17 I 6 |$32 I 394o 1i14i392 ips-Ir gene product lBacililus subtiiiii 5 I 7 I29 4 14 19 I *0 I |30 I nlPa01e04540 lendoleyin (Bacterlophsgs Bestillel O I 3 I "18
I
I--48-4071 Rigt164_6 i -90;iCGpepiot suggests r ameshlt linking to o767, not found 50 27 4 210 111 1151 6414 1911130016 IHRV 2 po)yprotein (human ,hinovirus 50 5 264 1 64 1 18i j 135 lgitl9l6 ITb-92 weebrang associated protein 1Trypanaeoa br[ucl nubgroup1 50 I 404 I 0 v 591 I s0o90 IlI144iS ICr m IClostridium per inqensi 49 24 22 I 26 5 110154 I 9162 11342440 IATP-deprndent nuclease (aecLilus subtilis) 49 31 987 66 9 1 9195 1011414110 trk4 gene product Ikthmnoaarcina maneil 49I Is 2 1310 4 S1 6 5)14 I 441 gnlIPIDs2S5l22 lRecx protein IMycobacterhn segmatsl i9 I 381 7171 I I 112 13639 I1i2 I gnlPIDPle2O50I hypotheticel protein leacillus gubtillsi 1 49 30 I 61 I 9 1 144 4531 19114007 I gene product I1cillus ephariscui I 9 I I 336I 4- I 99 1 1 4049 4949 Igill 410) 0 ilic-1 operon protein ilicli (1i1emophilus Inhuenzael 49 1 I 930 t- I 9 7 605 1 (s9 Ign3Pl30e26 l 57 lunknown ibacillu subttial 49 i s 1 1110 5 3875 4436sl |l |p20 SA 1-17 1a34t8llum lcrsaniiforeal 49 I Js I 564 4 14 1 1 1413 1 1953 ignt IPDIdl0102 )regulatory components of sensory transduction system ISynechatystis sp. 1 49 2 II 1 531
I
I I1 215 I n 343 IPnOjPl011l33 Ihypothetical protein (Synsehocystis sp.i 49 25 1142 13 I 3500 2140 i giI490324 fLaR X gene product funidentifhed 1 49 30 561 18-2 ;1 L57 I 2 jgig3l 002 hIrst nethionine codon n th e ECU ORP ISai.irine herpeauvirus 21 1 49 3 I 1056 191 4 I 3667 19112-34472 [AF024 991 contains similarity to homeobox domains (Caenorhabditis elegens) 49 73 1616 2-1 I2$ 5 1 I 1119 1 1350 Igl13111 IS1RI protein ISaceharoinycee cerevslehe I *s I n, I 3n I I217 I I A00 7 1 ,1129484* b0r (LIs k i (Vibrio cholgrei I 49 1 I 1 455 4 I 21 3 i 1435 1 .47 IgI')'S24 jpho.phatidyliioitol-4.-diphospbat, 3-klnsse Jlctyoseliue dlscoldeusl 1 49 1 14 5 1 2004231248 23 Nov 2004 TABLE 2 S. pneuuonlee Putative coding regions ot novel protein; simiLar to hncm protein.
ICnlo jORF Str I11 Sto mI ath j match gene na-s I t sh I dent Lngth ~I I 201 50 I Intl ral esLon I I intl r- O-L- ai Il 361 3 1 5e I~ IpiI3)9;~ Ilb-2S1 memrne mncleed proten Irypnoom, brucel sub'Rcupl 1 49 I 31 I 2)05 4 I 31 1 4461 3 111 45 44 ce, for protein o unn On function jErihauichla Cali[ I 1 26 ii5 I 2 612 It"6 1n11P501a29865 lornihne dsceboylres Itlcens tbsoumi I 48 o rn1 it11 I1 1 3 j i4 72452 1-keto-3-doxviluconate kinase (iIeM fsFr#x alcantell I o30 1008 1- j424 3375 IuLJ22ll I)4 IIACOCslOII YcvJ IPIIZabIU ep. NC 1 41 299 I 1 I 2 I 214 I14 1,11167 Ilecene rpressor Itreptocorc u m.tenl I 46 33 I 18 I I 1 291241 1 911144042 I gculoe-l-hropht Iadoleje ilucl htecherichie cciii I ii I 64 I 91 j I 602 2 gIli5) Irgi Ltrptcuccus gOronll i 48 29 7 I 20 I 5 I I 1 3I1 I911112415 lors gene prduct iL~ctbuchus deibrueckll 1 tI l II 3532 I i 6I 2i4 I4r IgnIPSlMl 1(470,t ci one rhodece ICclc ase ice n L* I4 3) 22 1 4 133 4 1494 22 rIn1terIkl e26i9 Iuttlue Oar ISrlus Pubtll0l 1 48 1? 9 I 61 56; I 4 iiS, Ig l 'olvlone lzc7ign f produ t i anocabdlti snna l 1 48 14 181 I4 1 10 1611 IIIFFDId)221 I l u on iStaphylococus hee3lyLcusi I ,4 I 29 I do 101 114113jl 93 4 jI5O6 1 gnli '014124 Ian prducterbaoa prt1 elain accignfrs P410 158 I1 3420k I it' I 70 6 IgI ll- -1-1 Im6iPeln ate 0I-age, IJechanococcu AE Ih- 9 4-lE I is 1 j0 1 son I62 I i I 5280 1 34 19110801 Inoines prersr I[C 3.4-i.m I clo I is I 24 los 112 II 150581 121041 preictPed Loigr ILTo 1re3I iSyllNc t 'poph1 lurI so 3 s1 I 440 Ill 4 o 9 1 26 111 414; Iis nlsnr f e cprdut el o. ephaerus lI) I 48 I 21) I II I I 161 1482 1011468 Ingnnr-tcxiccponents I rconslia botuh&linl 1 *8 1 4 15 1 2 25i I I 10 4 J nj1 g ID II I. iunta uine preta died c o ren ilral I ll p l n n I 21 150 2 41 3928 j11615 1 i~yopem pnusonIse, COfotfii Protein lilycoolasmal J~narl 48 ia j 2282 I 38 I SOS0 I 11ia)5 Ictlcr-l product ihleeydostner lnhrdl I 49 I )8 I I 3 11 120979 (11 IniIPID~re)1417 putelve ce1lto-blndng pooten itrprtyres coclcoiri I 41 3 1 5380 2004231248 23 Nov 2004 TABLE 2 5. pneumonlae Putative coding regions of naval proteins sToilar to known proteins SContig lost Start Stop I at match gene name *i Ident length ID 110 (ntl I Intl aceslan (ntl I I 4011 4656 g11;3957 P20 AA 1-1711 IBeclilus lichl;eorsis I 570 I IS 3 3734 1160 IgnlltPibdiOos72 lunknown Ilaclilus subtlisI 47 2S Ii gitii (1121 g)|173351 ICapIL IlSphylCCcue sursus)I 4' I1 2 125I4 1 3547 4003 pirlA37I024A170 129 antigen precursor Mycobcterlum tuberculosis 47 38 I 436 1 Ii 8 10151 I 9273 |1I31641 FU3 Ihaclllus subtiii 471 36 I "II I 92 i4 9) 13 3271 IsnhalIIeaOhit PCPC IStreptococcus pneuaonhasl 4I I i5 I 154 127 9 IA) Sm6 Oil178ss4 1A0001341 1110 tit 120 c or1 Fe 71 pot identlcL 10 gapeS to to 32 204 residues oa en approx. 4I as protein Y527.1HAEN 5SW P41149 IEcherlchli coli Ii 110 i2 12 1 I I 575 Ignl|PIDe265$5S unknown (ycobacterlum tutserculoseil 1 47 I 23 2 I 140 I 1 51I 3542 IjgsnilDldOOiLI hosplogue of hypothetical protein in a repamycin syntheae gene cluster of I 41 2 I 1410 I Streptomyces hyproecopicus Ilacillus subtilisi I I I IL 41 6654 9 120 1111522&74 Jennseclil predIcted coding region ItlCLI Ieehanococvs ie iisneclilh 47 9 27 111 4 I 3 01 I 1174 IunlPIDd1O 20 Yog iBaci4llus subtile I l 1 21 372 III 5 3267 2155 1357190 IAE000301 o334; sequence change joins aRr yqlR y1s from earlier 47 0 SI I 1 g12~ versioan IYrCJR.ECOLE SW: P42599 and YCJSECOLI SW: P42600) ItEchercha i 1 I I I II iii 1 I S27) 1 I 55i9 1g1llPiD)254973 leutclystn cancr ainse (aclilus subtilis I I 32 146 I 00 2 I 880 14 ie 11835715 Isnc linger protein Png-i (Mus musculus] J 47 22 237 I r- I 54 114 114112 112431 IirIS4IlOtlS4)4 IrolA protein Streptococcus pYogtnei 46 24 9 545 I I8 1 1016 1|l PIDl223I91 |Ixyloie repressor IAnerocellum thernophliumi 46 27 1011 9 7 j 4 516860 1 lglni901d50142 QRFr ;O)4115; saiilar to IS-issProt Accession Nuber P65272 Ititericha 1 I 2 1 1306 9 1 1127 o 922b9215 IArOO43iZS puttive odgosacchsrLde repeat unit transporter 1Streptococcus 424 1 I a2 l 7301 79B2 g"1105414 jhr44 gene product (Homno sapiens 46 34 6 127 i 91 921 6I 812 IagiIIII4 ahuA gene product (Actlnobcillus leuropneumonlisael 46 2 074 13 4 1709) i197 9gilI5)794 (9rg iStreptococcus g rdonlil 4 49 I 26 691 I( 14 8 6 220 7723 II113579 Ipuliulanase (rhermanerobactsrim tr~ouiurignesi 1 44 1 21 4 16 095 I 10 j9 705 511 siIOlC iuine rich proten ISrpt~cuII squiaieiliel 37 611s 2004231248 23 Nov 2004 T'ABLE 2 5. pnsqogniae Putative coding regions of novel proteins simillr to known prloerais 1 aCont1 JR I Start I Stop ra ch amtch ),na name aim i Ident length 1D 110I int) I Int ee lon I I I nL' 163 j1 1 J 11251 ijlldlb I0IIF~; ilhd cncpul ranlaLon suppied by author IShgeile ennerll I 46 as iZ II) Is t1 I j 1 1 85 111194771 ri*rool39I Ho detlnin lne ound I(eanorhabditle aleganol e 6 I eI j ss I j 3 I 155 I pj Pdl1tljlsvs IrvOSI lIEAV CHAN1. SKELETAL IICLE IFRCnE7tSP. 2 f 655 7112 3 I 60 1603 Igi11036112 jycfI gene product iCyanophor, paradomal 1 4 I 1 l 26 1 I; I 1 t I. N .pplegma pcumonloe, cytidint deaeinaso; aifilar to Cennk 46 9I 41 IAaflon Numbser CIJ I1. Ice 54 pirum Illycoplcoma pneoncnlal JI 5 54 I 641 j 11781049 IiAO0l701 oIS: Thi 2)5 no art It 21 pct identlcal 410 gapsl to 19a515 1 1 residues of er appror. 116 6c protein VTXsJ-ACSV SW.1 P0Is66 ltschortchia I ccliIII I 36 6 1461 16 1g17232 Iunknown icACobactcr xylinuol 1 45 29 1 4 I 60 1107 2 Ii9i6901 coded for by C 0agaone c04A yk lhI.I; coded Car by C. elagan* cONA, as J6 as6 nI c o .el l.S coded fcr by C. *loga.* cOHN yklISlo.5; coded for by C.
yell :!Or50' 059A10.; cded fr by C, slogans cDA yk~kht S; codd for 1I by C, eloegns cOGA cmllO1; coded 71 7 I 114)71 154514 1111121900 INADH debydrogenes e ublquincncl nelmia tranciscnl I 5 25 51 955 111 ljSIsl- utatLacncau a* eucnoglucan-ainua phenotype: Exco to ctrenesrmbran 45 2a 511 Ii Iprotein; thir d gn ci ao the ebcIFQ opercn;; putative (Ithigobium sellcai'1 51 I 10346 6606 IbiMalSLsi Nicb1iran utllsettron protein laemrphilus Influenza, type b. 0L42. NMi I 24 44 I tMlOL, Peptide. $06 aol Illaemphilue inlhiutitmi I I I i~ 16 161' IIIIII I-yp II~Z CI~ECY hl 45 13 1051
I..
Ios 205 I 114 I41I1011 Irrictaon endonucless beta subunit Iacllius coagulanal 45 28 411 g d- s I SIll Ir 12028 I,1143942 lf ILetococcus lectia 44 1 26 1' T 1 I I 6452 IeLI5i7OA I -R 717 _ltscharichia ccklI 44 2 6 119 40)7 InljPIDcJaOGS2 Imecbrmna tsnport Protein IaacilIlu aubtllil 1 44 2 S i 241 t $31 5 [gnlIP01d00oi l5OAI lacilllus op.$ 40 7a li 18~ I 676 Isi126g PapA Itrporsecus pneumonbel 41 I 2 1951 1 36, 1~jS67 116258 5 10411 0e7. genitailum predlcted coding region 110064 IHycoplana genitellul 1 A) 16 2790 I- 7 51 II, 'I111 111343 lsltsii lonninll b-binding proteins IA and 1 (Sclilus sibtili iif 42 1 27 2686 66 1312 I11516924 IncA gene product Irsckcrichia ccii i 43 i 2 6 09 III Iie00 lllar to ska1rytc ;;ls a./i exchngrs itscharlchic cclii a1 7019 4 2004231248 23 Nov 2004 TABLE 2 S. pneumonia. Putative coding regions of navel proteinlsIalr to known proteins Contiq ORP Start I Stop mach math gone name aim Ident I lftt 3D 1D Intl I Int) aeusi6 on 1 .ont r 3 I 3 I 1 09 1gij4lpa-lr gene product Illacilius eulati l t 1 24 1 801 317 4 1 421 jl 1231652 IIAF166691 No deilnition line found ICeenorhabditis pegansl 41 1 3D I I g115 I 11 0i8 1A 03 Iihizobum s. NO2- 1- 8i 25 309$l 1 I ISol 170 IonIPIle2l68I ICUP-diacyiglycerot SYnthecasa tIrabidopsi thaillan) 1 41 I D 1 511 M 1 1 1 405 15l4 1o1jt156142 to7-2 protein ftrypeno.aoa CrurtI I 41 27 Ml2 3 34 3 41 112t183 LHW glutenin 14.A 1-3561 jTrItIcum aestivuml I 4 I 34 942 I~I 351 3, 441 2963 1g11420 Ilsiar of P-depndent rnsort famlyP, vary sinhar to dr proteins and 1 40 I .18 131 hemolyoin R. axport protein tEschorIchla colil 165 7 1 9S 1 1413 igiII63157 IerpoSvirue salirl ORMI ho-aolg IKapolSs sarcoma-associsted hirpe-iiie 2 1 1344I 1 1 I I I Ivrusi I 0 I 1 2571 3860 joni PiD3dIlOI8 Ihypothetica1 prtein Ilynechocystla sp1 39 2 I 98 s I I I 1 5 1 3914 4441 ni nIDldOl419 I3sypohatica1 protain ISynechocystls sp.) 1 39 I i 1 o3l I 1 3 I 101 110724 kqIjI'''9 1A.1P-d pandnr nucie.se 3Uacllue aubiliil I Is 1 20 1 312 1 1 1 1 4916 1giJ632$45 jFI SO~j Ictrom on arinusl 1 13 23 4914 2004231248 23 Nov 2004 TAB E 3 S. pne'~eonlee Putative coding regions of novel proteins not ii~tiar to kno t procaine I ~Contig OR I Start StOP ID IIDI) t1'tD I Intl II 5L45 6497 125046 2526 6 114 1161 2004231248 23 Nov 2004 TABLE 3 S. 9neumonLae Putative codlip reglons of proteins not similar to known psoten.
ID 0 1nl I 1 3 1. M39 f I I 602 4482 r I 32 1j 26 1209 II3' I 22 3 35 11 7 1519620 4- 1 2 I'126),,r 124211 22 136 126212 I''3'I -T12 23 Ii 16655 603 342 *0 8012 64a II I 9340 6 1 26 I 14~ I 3550 I (J 26 I I 5346 I $062 I 29 I 503 ii I29 11 620 4 I 1 266 2344 4 I 1 fl 1203 I 26 I I3 1101 8024 17740 I I 34 I I19 6 66431 S- I I 9647 I I 2004231248 23 Nov 2004 TABLE 3 S. pneumoniae PttlJVe cading jegione o novel protein, no% *1btI4r to known proteins Contig I01 0 Start I Stop ID ito I Intl Intl 94-- 08 I-as 11-- 11 )3HI 114 toot 035 I M I 0 858 I -0 It3 III IIO I 910 41 7 B84 i.; 36 3 33 I 1 1041 I 45 12 1L1204 1111 1 16 1I3 110193 111118 34 11 (1113 114595 I 31 I1 42619 17) 8I I 440(0 5001 31 1 jio Sill ji11 I 34 I 1J7 5
I
I I I 175 131 I i 43 I 1 8884 j 8 I I 4 I 6831 661 4I 3 3 20 3665 I 4 1 4 1 17 I 15 I ta 7 8 074 1081 48 5 3196 3982i I- 48 III I 9323 8933 I 48 116 I''1 1 111 48 11 118)42 115764 I 8 2 11 1171 111351I 1 B 41 O 131,7, )1 I~ I 4 I I 1 301 I 3 I 2004231248 23 Nov 2004 TAB~LE 3 S. pneumoniAg Putative coding rVgLanS a novel prCotins not bTelr to known proteins 4 I Cntigla StartIIL Step ID Ip I (nol Intl 50 1 3 307 2672 I i I 5 1 231 ]SSG 52 11 11Jius 112883 I r ii I I 561 1 545w 1 I 9 1 0004 I10 'p 54 116 1760$ 11506 1 1 5 ji 1l(s 1015 ji5i I'55 Il jul47 1*1 I I 4 I 14 13 I 99 5 I 3 1 16 5 2130
I
0571 4 200 I 750* k F) 4 56_ 6 5i 71 I I 1 2 1 -p I 9 1 3 1934 I9I9 I p 19 5458 I 52 1 9 60 6 5741 I 51 4 I o 81 I 2315 I 612 1 1 j 5 216 1*76 64 I 1 272 66 J 1 180 3 147 61 1 a 9082 4 9485 61 3 1133 1 1282i I 61 2 j 1145 1 180 2004231248 23 Nov 2004 TABLE 3 S. pneumonla Putative coding region* oL novel proteins not dlIar to knovn proteins Conhig ICAP I start I Stop ID l10 IntlI Intl I 5 j 4059 3323 1 I '0 I I 4235 4053~ I Ii Iis 12513 13,01 1 j i116i Izzass l'J'36 1 i Iil I~6 125)56 72 I S 65 601 I as 1 73 I 4 I 3115 1 09 ft I 73 I I 61 I 458 I 43 I 7 1 4 11 11 73 110 I 3 66 81 5 1 GI p46 9 -ft I 14 t I 5l4 Ii'5 I I a 76 l I 6502 ~1l I 4 I 6 1 714 6309 f I I o 1 ai I Ir I ft., ft 1~ 63 L 10 661 I 6133 -ft -f 83 Ii 1161 p6.601 o~ -ft I e *4 3 I I 292) I 1 5 6 1 flUL ftol I I I Ii 1324 1111 I~Ito ft I, 61 I £659 6001o 2004231248 23 Nov 2004 TABLE 3 S. pneumonia* Putatlue coding regions of novel proteins not s3,dlae to knoft proteins Cantlg ORP I Start Stop 1 i Int I Intl -0 I' 67 1 I 117910 1171101 I I 35 16( I i9 3 I26)8 1 2~19 4 1 1 I I 6668 1 0 5 1 4 0 691 1 11 4- 1 0 I I 3951 I 3 317 1 fl I I 4l13 I 413 4 I 9 I 1 I 3643 I 237 I I l Ii 6 1~ t3 311 1 3 1 j I 3J P 3 91 j 2 Ic I 6327 4 I 1 I6 4 1101 ~12 I--4--4 17 9 1 1041 I 6153 I 9 I I192 .~116312 r I9 JI 11917 111 I 10 1 1580I (I 4 503 1 1 461 I 2719 I 7 503 1 6 1 94 11 101 J I 1093 2004231248 23 Nov 2004 TABLE 3 S. pncumonla* Utstive coding reglons of novel proteins not JlYilar to known proteins Contlg O I Start SCop 0 IED Intl I IntI 0o 1 2 268 Ioi I I 17 I I IIs Ito 11on54 110430 I S 11 1 1 I 27 I3 1 S122 26| us 2233 1 I 122 110 1 5654 1 611 I 132 I 1 I 6301 I 1416 121 346 490 111 1 I 44 124 V 19 I- 654 i I 12s 111 7 09 i 7631 I 61 I tOs I ~1s I 221 1 2 l ls 224so 1F 9 1 64541 50561 1 129 I I 6540 I sooI 121n IIaI 3SO 1 1625 1 I 1 1 S 1 1 0 I 4 5 a 5613 I2 I- 6 j- 1 -0 1 I 1 2) I 4 I5 2 13 iii 32 1) I 50 67 l l II)114 1563 114815 4 2004231248 23 Nov 2004 TABLE 3 8. pneu-amoa. Putative coding regions of noveL proteln no a sllar to knovn proteins Contig O011 Start So ap 10 ID Intl I IntJ jr 340 20 15972~ 1J~l25 1112 I I 1 26 I 116 2 6 1 146 4 1 3149 VI
-I
1 146 6 a-o4 I 4 1-T 6223 1 ;01 I 146 114 1 299 1104;6; I4 I1 1±01 5750 4 16 1 716)1 7fl6 53 4 1 1 si 2 64 6 I 36 I I55 S I ae 1411I I 3) I1 14 2 I *5Sl I 260
I
I IS I I 556 liii 1F 3 I 0 2411 1 164 I 395 111 i, 155SS I 23 3110 I156 I 4 155 1) I 9 1 57 1 1 275 l lt I l S151 j 2 631 fl 11 1 39 4 I 3)84 I 11 is 34 1 I 1322 I lollI I- 1356 I 2I i (iS 1 4945 4 36 1 6 5406 72
I
2004231248 23 Nov 2004 TABLE 3 S. pneuwoniae Putative coding region. o novel proteins not sialar to knon proteins Contg CAP S tart Stp 1 re I 3D I nl I IntlI I I f51 601 1 n 110 5 6411 3 103 110 I2 66, 2 5 370 1 I aI 11 1~ 11 I 1 1119 I Is -394 8 172 1 J 1 966 113 I 4I 29311 17 V I I 6 I
I
I77 I5I 19IS ;T a 177 1 139 879 3 r
I
17 162 1 II 365 2473 r 142 I 3 I 2111 2006 I 365 I 46 I 4634 3 11 1*7 I I I 2940 1 1 01 1 31 I 4 Irl 266 6361
C
I 11 I 5 1 6363 1 4621 2004231248 23 Nov 2004 TABLEI 3 S. pneusonlia rwtativa coding regions of novel protelns not aim(4ar to lmnoon proteins Contig Jo I Start ISop 10 lID 0 Intl I Intl I
I
Igo a 51182 6 1 5 I3143 20i4 I 1 9 595 55 1 I 11 I J 618 4l I 15 ii 110 57 110001 I 11 1 3 I 2961 2266 I 152 I 4 I 3081 29716 117 6 100 6-9 1123 I I 17 1sis 2 11 1640 I I I 241 454 I~ 4 6 o, 462 4 200 5 1 5769 136 I 2 1 31513 12 is 1
I
13D i I 17 0 IIl I 1 4944 I 64111 200 l i k 532)9 110 0' 4 320 2 I I 170 I 210Z 317 37 73 1205 Ii] 1 103 Jcii 4 S209 Ii 1 1 039 1 10 4 3 0 t 1 I 1 256 1 3210 I ,I 21 11 I 519 5754r 2004231248 23 Nov 2004 TABLE 3 S. pneufntae PutatIve cadinA reglns at novel prOteins not ti1 er to knoq protplne 4- -I Contig loR I Start Stop D D ij (ntl (t) SI 111 12 f t 41 194 S 18 12 t14 1Sa 2 218 3 1430 197
I
F2 I I M3I 3 F 236 313 129 1 I 1 j 7 1 I 223 1 1I 2 1 1944 3 227 1 1 110a 2 4 9I 32 1 263 1;00 64 1 34 6 2114I 1 13 I S23 1 I 52 312 4 37 I 1 310 16 I
I
1236 IIJ 6*0 64 I I 26 I i 63 362
-I
I 6 3s 276 jSC 2 92 I6 2 I I8 1 116
I
I260 3 11 I ~1023 263 4 1 2244 I 1900 I I 36 I 3569 2 9f 216 I (1 1 3421 2004231248 23 Nov 2004 TABLE 3 S. pneumonias Putative coding regions of novel proteins not MY lar to known proteins Conttg j Start JSt op.
to I Intl It 4- z; I8 1 I 1 3.I 5391 4 7 82 4 I j 1413 1134 I 23) 1 a 1'Ii 11 526 28B1 1 550 4 5 293 2 J2) I- I S 2 23 0 1819~ 33 3 1 1 21 51 0 39 S 2004231248 23 Nov 2004 TABE 3 S. Pfumenlae Patacive coding region. or novel protein. not ILilar to kni.in protin
-F
Contig 10,r Start Stop 0 1 1 Intl I 3S I 1 ii *I 3 3 I 11 1' r 'i I "o I 2 6* I 44
-B
f fl I r~ 54 676 I3 1 2 Idl 1255 SS I i45 1004 35 I2 ISO n 36 I I 909 I I i I I I~ 4
-F
4-4 0
O
148 GENERAL INTORMATION: c APPLICANT; Charles Kunsch Cil H. Choi Patrick S. Dillon C Craig A. Rosen n Steven C. Barash ciMichael R. Fannon Brian A. Dougherty 0 Cii) TTLE or INVENTION: Streptococcus pneumoniae Polynucleotides and Sequences ciii) NUMBER OF SEQUENCES: 391 (iv) CORRESPOWENCC ADDRESS: ADDRESSEE: Human Genome Sciences. Inc.
STREET: 9410 Key West Avenue CITY: Rockville STATE: Maryland fE) COINTRY: USA ZIP: 20850 Cv) COMPPUTER READABLE FORM.
MEDIUM TYPE: Diskette. 3,50 inch. 1.4Mb storage COMPUTER: HP Vectra 486/33 CC) OPERATING SYSTEM: MSDOS version 6.2 SOFTWARE;: ASCII Text Ivi) CURRENT APPLICATION DATA: 0 0 1Q9 0 IA) APPLICATION NUMBER: FILLNG DATE:
CLASSIFXCATION:
(vii) PRIOR APPLICATION DATA: 00 APPLICATION NUMBER: c FILING DATE: Ci) (viii) ATTORNEY/AGENT LNFORMATION: 0 NAME: Brookes. A. Anders I0) REGISTRATION NUMBER: 36,373 REFERENCEDOCKET NUMBER: PB3401PI (vi) TELECOMUNICATION INFORMATION: TELEPHONE: (301) 309-8504 TELEFAX: (301) 309-8512

Claims (9)

  1. 2. A vector comprising an isolated nucleic acid molecule of claim 1.
  2. 3. An isolated fragment of the Streptococcus pneumoniae genome, that specifically modulates the expression of ORF 5 of SEQ ID NO: 94, wherein said fragment consists of a nucleotide sequence from about 10 to 200 bases in length which is selected from the 200 consecutive bases which are 5' to ORF ID NO:3 or a degenerate variant thereof.
  3. 4. A non-human organism which has been altered to contain an isolated nucleic acid molecule of claim 1. A non-human organism which has been altered to contain the fragment of claim 3.
  4. 6. A method for regulating the expression of an isolated nucleic acid molecule of claim 1 comprising the step of covalently attaching to said nucleic acid molecule a second nucleic acid molecule consisting of an isolated fragment of claim 3.
  5. 7. A method of preparing a homolog of an isolated nucleic acid molecule including ORF 5 of SEQ ID NO: 94 comprising the steps of: screening a genomic DNA library using a probe derived from ORF 5 of SEQ ID NO: 94 as a target sequence; -1403- 0 o identifying members of said library which contain sequences that hybridize to said target sequence; and O cn isolating the nucleic acid molecules from said members identified in step
  6. 8. A method of preparing a homolog of an isolated nucleic acid molecule comprising ORF 5 of SEQ ID NO: 94, the method comprising the steps of: C-i C isolating mRNA, DNA, or cDNA produced from an organism; 0 C1 using primers derived from ORF 5 of SEQ ID NO: 94 to amplify nucleic acid molecules from the isolated mRNA, DNA or cDNA. isolating the amplified nucleic acid molecules produced in step
  7. 9. An isolated polypeptide encoded by an isolated nucleic acid molecule of claim 1. An isolated polypeptide of claim 9 comprising at least 17, 20 or 50 amino acids.
  8. 11. An antibody which selectively binds to any one of the polypeptides of claim 9 or
  9. 12. A method for producing a peptide, polypeptide or protein in a host cell comprising the steps of: incubating a host containing a heterologous nucleic acid molecule whose nucleotide sequence consists of an isolated nucleic acid molecule of claim 1, under conditions where said heterologous nucleic acid molecule is expressed to produce said peptide, polypeptide or protein, and isolating the peptide, polypeptide or protein. DATED this TWENTY-THIRD day of NOVEMBER 2004 Human Genome Sciences, Inc. Applicant Wray Associates Perth, Western Australia Patent Attorneys for the Applicant.
AU2004231248A 1996-10-31 2004-11-23 Streptococcus pneumoniae Polynucleotides and Sequences Ceased AU2004231248B2 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
AU2004231248A AU2004231248B2 (en) 1996-10-31 2004-11-23 Streptococcus pneumoniae Polynucleotides and Sequences
AU2008203821A AU2008203821A1 (en) 1996-10-31 2008-08-12 Streptococcus Pneumoniae Polynucleotides and Sequences

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US60/029960 1996-10-31
AU69090/98A AU6909098A (en) 1996-10-31 1997-10-30 Streptococcus pneumoniae polynucleotides and sequences
AU2004231248A AU2004231248B2 (en) 1996-10-31 2004-11-23 Streptococcus pneumoniae Polynucleotides and Sequences

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
AU33351/01A Division AU777190B2 (en) 1996-10-31 2001-03-30 Streptococcus pneumoniae polynucleotides and sequences

Related Child Applications (1)

Application Number Title Priority Date Filing Date
AU2008203821A Division AU2008203821A1 (en) 1996-10-31 2008-08-12 Streptococcus Pneumoniae Polynucleotides and Sequences

Publications (2)

Publication Number Publication Date
AU2004231248A1 true AU2004231248A1 (en) 2004-12-23
AU2004231248B2 AU2004231248B2 (en) 2008-05-15

Family

ID=39830189

Family Applications (2)

Application Number Title Priority Date Filing Date
AU2004231248A Ceased AU2004231248B2 (en) 1996-10-31 2004-11-23 Streptococcus pneumoniae Polynucleotides and Sequences
AU2008203821A Abandoned AU2008203821A1 (en) 1996-10-31 2008-08-12 Streptococcus Pneumoniae Polynucleotides and Sequences

Family Applications After (1)

Application Number Title Priority Date Filing Date
AU2008203821A Abandoned AU2008203821A1 (en) 1996-10-31 2008-08-12 Streptococcus Pneumoniae Polynucleotides and Sequences

Country Status (1)

Country Link
AU (2) AU2004231248B2 (en)

Also Published As

Publication number Publication date
AU2004231248B2 (en) 2008-05-15
AU2008203821A1 (en) 2008-10-02

Similar Documents

Publication Publication Date Title
EP1400592A1 (en) Streptococcus pneumoniae polynucleotides and sequences
US20070020746A1 (en) Staphylococcus aureus polynucleotides and sequences
WO1996033276A1 (en) NUCLEOTIDE SEQUENCE OF THE HAEMOPHILUS INFLUENZAE Rd GENOME, FRAGMENTS THEREOF, AND USES THEREOF
US5599693A (en) Methods and compositions relating to useful antigens of moraxella catarrhalis
Nagamune et al. Distribution of the intermedilysin gene among the anginosus group streptococci and correlation between intermedilysin production and deep-seated infection with Streptococcus intermedius
Frost et al. Group A streptococcal M-like proteins: From pathogenesis to vaccine potential
US6355450B1 (en) Computer readable genomic sequence of Haemophilus influenzae Rd, fragments thereof, and uses thereof
US6537773B1 (en) Nucleotide sequence of the mycoplasma genitalium genome, fragments thereof, and uses thereof
JP2000508178A (en) New compound
JP2002529046A (en) Enterococcus faecalis polynucleotides and polypeptides
JP2000511769A (en) New compound
US5700683A (en) Virulence-attenuating genetic deletions deleted from mycobacterium BCG
Smits et al. Cytolysins of Actinobacillus pleuropneumoniae serotype 9
US6468765B1 (en) Selected Haemophilus influenzae Rd polynucleotides and polypeptides
US20020120116A1 (en) Enterococcus faecalis polynucleotides and polypeptides
CA2195090A1 (en) Lkp pilin structural genes and operon of nontypable haemophilus influenzae
US20050131222A1 (en) Nucleotide sequence of the haemophilus influenzae Rd genome, fragments thereof, and uses thereof
Murphy et al. Conservation of outer membrane protein E among strains of Moraxella catarrhalis
Padmalayam et al. Molecular cloning, sequencing, expression, and characterization of an immunogenic 43-kilodalton lipoprotein of Bartonella bacilliformis that has homology to NlpD/LppB
US6348328B1 (en) Compounds
AU2004231248A1 (en) Streptococcus pneumoniae Polynucleotides and Sequences
CA2296814A1 (en) Treponema pallidum polynucleotides and sequences
JPH09248186A (en) Polypeptide chain lengthening factor gene tu of microorganism of lactobacillus and its use
Domain Functional Analysis of the

Legal Events

Date Code Title Description
FGA Letters patent sealed or granted (standard patent)
MK14 Patent ceased section 143(a) (annual fees not paid) or expired