WO2006091813A2 - Novel thermophilic proteins and the nucleic acids encoding them - Google Patents

Novel thermophilic proteins and the nucleic acids encoding them Download PDF

Info

Publication number
WO2006091813A2
WO2006091813A2 PCT/US2006/006593 US2006006593W WO2006091813A2 WO 2006091813 A2 WO2006091813 A2 WO 2006091813A2 US 2006006593 W US2006006593 W US 2006006593W WO 2006091813 A2 WO2006091813 A2 WO 2006091813A2
Authority
WO
WIPO (PCT)
Prior art keywords
protein
nucleic acid
sequence
amino acid
seq
Prior art date
Application number
PCT/US2006/006593
Other languages
French (fr)
Other versions
WO2006091813A3 (en
Inventor
Arcady Mushegian
Jing Liu
Konstantin Severinov
Tatyana Naryshkina
Original Assignee
Stowers Institute For Medical Research
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Stowers Institute For Medical Research filed Critical Stowers Institute For Medical Research
Publication of WO2006091813A2 publication Critical patent/WO2006091813A2/en
Publication of WO2006091813A3 publication Critical patent/WO2006091813A3/en

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N9/00Enzymes; Proenzymes; Compositions thereof; Processes for preparing, activating, inhibiting, separating or purifying enzymes
    • C12N9/10Transferases (2.)
    • C12N9/12Transferases (2.) transferring phosphorus containing groups, e.g. kinases (2.7)
    • C12N9/1241Nucleotidyltransferases (2.7.7)
    • C12N9/1252DNA-directed DNA polymerase (2.7.7.7), i.e. DNA replicase
    • CCHEMISTRY; METALLURGY
    • C07ORGANIC CHEMISTRY
    • C07KPEPTIDES
    • C07K14/00Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof
    • C07K14/005Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof from viruses
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61KPREPARATIONS FOR MEDICAL, DENTAL OR TOILETRY PURPOSES
    • A61K48/00Medicinal preparations containing genetic material which is inserted into cells of the living body to treat genetic diseases; Gene therapy
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N2795/00Bacteriophages
    • C12N2795/00011Details
    • C12N2795/10011Details dsDNA Bacteriophages
    • C12N2795/10111Myoviridae
    • C12N2795/10122New viral proteins or individual genes, new structural or functional aspects of known viral proteins or genes

Definitions

  • the disclosed invention relates to the fields of molecular biology and biochemistry.
  • Thermophilic proteins and the nucleic acids encoding them are disclosed.
  • the thermophilic proteins are from, or derived from, a bacteriophage, YS40, that infects the thermophilic bacterium Thermus thermophilics. These proteins have enhanced stability, particularly at high temperatures.
  • thermophilic phages [02] In the last decade, bacteriophage (phage) genome sequencing projects have deposited more than 200 complete phage genome sequences in the public databases. While the hosts of these phages are phylogenetically quite diverse, only 10 completely sequenced phages are known to infect thermophilic microorganisms. Most of these thermophilic phages were isolated from a small number of archaeal species (Palm, P., et al. 1991; Wiedenheft, B., et al 2004; Arnold, H.P., et al. 2000).
  • Bacteriophages may be the most abundant living entities on Earth, represented by about 10 31 individuals, as indicated by random sampling and sequencing of DNA from environmental sources (Hendrix RW. Bacteriophage genomics. Curr Opin Microbiol. 2003 Oct;6(5):506-511). It has been proposed that the origin of dsDNA bacteriophages is as ancient as DNA replication itself (Filee J, Forterre P 5 Sen-Lin T, Laurent J.
  • Bacteriophage YS40 infects the thermophilic bacterium Thermus thermophilics H8. Analysis of the YS40 genome revealed a dsDNA molecule of 152,372 bp with no terminal repeats or redundancies that contains 169 putative open reading frames, which express polypeptides longer than 50 amino acids, and three tRNA genes. The ability of YS40 to infect and propagate in 7! thermophilus at permissive temperatures from about 56 to about 78°C suggests that proteins encoded in the YS40 genome may have enhanced stability, particularly at higher temperatures. In addition to greater stability, proteins of YS40 may also possess novel enzymatic characteristics with commercial applicability.
  • the present invention provides isolated proteins that include a thermophilic amino acid sequence at least 75%, or even at least 80%, 85%, 90%, 95%, 96%, 97%, 98% or even at least 99% identical to a YS40 amino acid sequence encoded by at least 25, or at least 30, 35, 40, 45, 50, 60, 70, 75, 80, 90, or even at least 100 contiguous codons from SEQ ID NO: 1-170; preferably from SEQ ID NO: 2, 4-64, 70-149, 151 or 153-170, or conservatively modified variants of the same.
  • thermophilic amino acid sequence is identical to the YS40 amino acid sequence.
  • the YS40 coding sequence is to a YS40 structural protein expressed by a nucleotide sequence selected from SEQ ID NOs: 1, 3, 65, 69, 71, 151 or 152.
  • the thermophilic amino acid sequence confers to the protein, at a permissible temperature of at least 36 0 C, more preferably at least 45 0 C, 55 0 C, 65 0 C, or even 75 0 C, an enzymatic activity.
  • Exemplary enzymatic activities of proteins of the present invention include, but are not limited to, decarboxylase, nuclease, synthase, recombinase, helicase, dehydrogenase, reductase, nucleotide primase, kinase, protease, nucleotidyltransferase, nucleic acid polymerase, deaminase, acyltransferase, terminase, helicase, glycosyltransferase and peptidase activities.
  • decarboxylase nuclease, synthase, recombinase, helicase, dehydrogenase, reductase, nucleotide primase, kinase, protease, nucleotidyltransferase, nucleic acid polymerase, deaminase, acyltransferase, terminase,
  • YS40 coding sequence is selected from SEQ ID NOs: 5, 8, 9, 12-14, 17, 18, 23-27, 29, 33, 38, 41, 42, 52, 57, 59, 60, 62, 71, 79, 114, 144 or 161.
  • the enzyme is a DNA polymerase
  • the YS40 coding sequence preferably is SEQ ID NO: 33.
  • the invention also contemplates nucleic acid embodiments where the nucleic acids encode proteins of the invention. It is desirable that the encoded thermophilic amino acid sequence of the encoded protein is identical to the YS40 amino acid sequence.
  • the YS40 amino acid sequence may be encoded by at least about 25, 50, 75 or 100 contiguous codons of the YS40 coding sequence, with the YS40 coding sequence being from SEQ ID NO: 1, 3, 65, 69, 71, 151 or 152.
  • the thermophilic amino acid sequence confers an enzyme activity to the encoded protein at a permissible temperature of at least 36 0 C, more preferably at least 45 0 C, 55 0 C, 65 0 C, most preferably at least 75 0 C.
  • the enzyme activity may be a decarboxylase, nuclease, synthase, recombinase, helicase, dehydrogenase, reductase, nucleotide primase, kinase, protease, nucleotidyltransferase, nucleic acid polymerase, deaminase, acyltransferase, terminase, helicase, glycosyltransferase or peptidase, depending upon the YS40 coding sequence selected.
  • the YS40 coding sequence may be from SEQ ID NO: 2, 4-64, 70-149, 151 or 153-170, more preferably from SEQ ID NO: 5, 8, 9, 12-14, 17, 18, 23-27, 29, 33, 38, 41, 42, 52, 57, 59, 60, 62, 71, 79, 114, 144 or 161.
  • the protein encoded by the nucleic acid is a DNA polymerase at a permissive temperature
  • the YS40 coding sequence selected is SEQ ID NO: 33.
  • Nucleic acids of the present invention may include a YS40 nucleotide from SEQ ID NO: 1-170.
  • the YS40 nucleotide sequence may encode a YS40 structural protein that does not take a random coil structure at a permissible temperature of at least 36 0 C, 45 0 C, 55 0 C, 65 0 C, or even at least 75 0 C, or a YS40 enzyme.
  • YS40 enzymes may display decarboxylase, nuclease, synthase, recombinase, helicase, dehydrogenase, reductase, nucleotide primase, kinase, protease, nucleotidyltransferase, nucleic acid polymerase, deaminase, acyltransferase, terminase, helicase, glycosyltransferase or peptidase activities when analyzed, at a permissible temperature of at least 36 0 C, 45 0 C, 55 0 C, 65 0 C, or even at least 75 0 C.
  • nucleic acids may optionally be operably linked to a regulatory sequence.
  • Such nucleic acids may also be used to transform a cell, and such recombinant cell types form part of the present invention.
  • inventions include recombinant vectors.
  • the recombinant vectors include a nucleic acid encoding a protein of the invention, as discussed above, operably linked to a promoter.
  • Introduction of the vector into an expression system produces a protein having, at a permissible temperature of at least 36 0 C, or even at least 45 0 C, 55 0 C, 65 0 C, or even at least 75 0 C, and enzymatic activity.
  • proteins may be characterized as being decarboxylase, nuclease, synthase, recombinase, helicase, dehydrogenase, reductase, nucleotide primase, kinase, protease, nucleotidyltransferase, nucleic acid polymerase, deaminase, acyltransferase, terminase, helicase, glycosyltransferase, peptidase, or combinations thereof.
  • the promoter is preferably inducible or constitutive, and ideally is a strong promoter.
  • the YS40 coding sequence is selected from SEQ ID NO: 2, 4-64, 70-149, 151 or 153-170, or from SEQ ID NO: 5, 8, 9, 12-14, 17, 18, 23-27, 29, 33, 38, 41, 42, 52, 57, 59, 60, 62, 71, 79, 114, 144 or 161, or the enzymatic activity is a DNA polymerase at the permissible temperature and the YS40 coding sequence is SEQ ID NO: 33.
  • the present invention also includes protein expression systems. These embodiments include a recombinant vector, as discussed immediately above, and produce the recombinant protein encoded by the vector when incubated under permissible conditions, including a permissible temperature. Protein expression systems of the present invention may be cell-based, or cell-free in nature.
  • a further embodiment of the present invention are vectors that include no more than about 99.9% of the nucleotide sequence of SEQ ID NO: 171 and a non-YS40 nucleotide sequence of at least 20 contiguous nucleotides.
  • the non-YS40 nucleotide sequence is inserted into the vector sequence whereby it is flanked on the 3' end and the 5 'end by at least 10 contiguous nucleotides of YS40 genomic sequence.
  • the non-YS40 nucleotide sequence may optionally be operably linked to a regulatory sequence, such as a promoter. Recombinant cell systems incorporating such a vector are also contemplated as part of the present invention.
  • the cells in such recombinant systems are Thermus thermophilics transformed with the vector.
  • the present invention also includes method embodiments for amplifying nucleic acids. These methods involve contacting a nucleic acid with a PCR reagent mixture including a protein of the present invention, as described above, where the protein has an enzymatic activity necessary for DNA amplification when incubated at a permissive temperature under permissible conditions. Variants on these embodiments include amplification methods where whole cells are the starting material. In these variants, the reaction mix includes at least one protein of the present invention that possesses an enzymatic activity that facilitates entry of PCR reagents into the cell.
  • Such enzymes usually lyse the cells, but do not have to, in order to form part of the present invention.
  • Method embodiments for decomposing a biodegradable material are also contemplated. These methods involve contacting the biodegradable material with at least one protein of the present invention as described above, that has an enzymatic activity necessary for decomposing the biodegradable material when incubated at a permissible temperature. Exemplary enzymatic activities suitable for this purpose include, but are not limited to, amylase, cellulase, nuclease, lipase, deaminase and peptidase.
  • kits are suitable for amplifying a nucleic acid.
  • the kits include a reagent that has at least one protein of the present invention as described above and a buffer solution. Proteins of the invention suitable for inclusion in kit embodiments have an enzymatic activity necessary for DNA amplification or DNA entry into the cell when incubated at a permissible temperature. Kits may optionally include primers suitable for hybridization with the nucleic acid being amplified, and/or control nucleic acids and primers for quantifying the reaction.
  • Open Reading Frame refers to a series of at least 25 contiguous codons, preferably beginning with the codon "ATG.”
  • Permissible temperature refers to a temperature at which a cell may grow and divide, or a protein is capable of retaining its tertiary structure and any innate enzyme activity, or enzymatic activity, the molecule may possess.
  • Modulate refers to the property of being able to quantitatively increase or decrease one or more chemical or physical characteristics of a molecule or process by at least 10% of the initial baseline characteristic in response to an environmental or metabolic change. Modulate may also refer to the ability to qualitatively alter a chemical or physical characteristic of a molecule or process in response to an environmental or metabolic change. Methods for determining modulation of chemical or physical characteristics of a molecule are well known in the art and include, but are not limited to, enzyme assays and spectroscopic analysis. [19] The terms “enzyme activity” and “enzymatic activity” are used interchangeably herein.
  • a reference to "displaying (an enzyme activity or enzymatic activity)" refers to a molecular characteristic where a biomolecule such as a protein or nucleic acid catalyzes a chemical reaction.
  • Exemplary enzyme or enzymatic activities displayed by YS40 proteins include, but are not limited to, decarboxylase, nuclease, synthase, recombinase, helicase, dehydrogenase, reductase, nucleotide primase, kinase, protease, nucleotidyltransferase, nucleic acid polymerase, deaminase, acyltransferase, terminase, helicase, glycosyltransferase and peptidase.
  • polypeptide peptide
  • protein protein
  • amino acid polymers in which one or more amino acid residue is an artificial chemical mimetic of a corresponding naturally occurring amino acid, as well as to naturally occurring amino acid polymers, those containing modified , residues, and non-naturally occurring amino acid polymer.
  • amino acid refers to naturally occurring and synthetic amino acids, as well as amino acid analogs and amino acid mimetics that function similarly to the naturally occurring amino acids.
  • Naturally occurring amino acids are those encoded by the genetic code, as well as those amino acids that are later modified, e.g., hydroxyproline, ⁇ -carboxyglutamate, and O- phosphoserine.
  • Amino acid analogs refers to compounds that have the same basic chemical structure as a naturally occurring amino acid, e.g., an ⁇ carbon that is bound to a hydrogen, a carboxyl group, an amino group, and an R group, e.g., homoserine, norleucine, methionine sulfoxide, methionine methyl sulfonium. Such analogs may have modified R groups (e.g., norleucine) or modified peptide backbones, but retain the same basic chemical structure as a naturally occurring amino acid.
  • Amino acid mimetics refers to chemical compounds that have a structure that is different from the general chemical structure of an amino acid, but that functions similarly to a naturally occurring amino acid.
  • Constantly modified variants applies to both amino acid and nucleic acid sequences. With respect to particular nucleic acid sequences, conservatively modified variants refers to those nucleic acids which encode identical or essentially identical amino acid sequences, or where the nucleic acid does not encode an amino acid sequence, to essentially identical or associated, e.g., naturally contiguous, sequences. Because of the degeneracy of the genetic code, a large number of functionally identical nucleic acids encode most proteins. For instance, the codons GCA, GCC, GCG and GCU all encode the amino acid alanine.
  • nucleic acid variations are "silent variations," which are one species of conservatively modified variations. Every nucleic acid sequence herein which encodes a polypeptide also describes silent variations of the nucleic acid.
  • AUG which is ordinarily the only codon for methionine
  • TGG which is ordinarily the only codon for tryptophan
  • nucleic acid that encodes a polypeptide is implicit in a described sequence with respect to the expression product, but not with respect to actual probe sequences.
  • amino acid sequences one of skill will recognize that individual substitutions, deletions or additions to a nucleic acid, peptide, polypeptide, or protein sequence which alters, adds or deletes a single amino acid or a small percentage of amino acids in the encoded sequence is a "conservatively modified variant" where the alteration results in the substitution of an amino acid with a chemically similar amino acid. Conservative substitution tables providing functionally similar amino acids are well known in the art.
  • Such conservatively modified variants are in addition to and do not exclude polymorphic variants, interspecies homologs, and alleles of the invention.
  • conservative substitutions for one another 1) Alanine (A), Glycine (G); 2) Aspartic acid (D), Glutamic acid (E); 3) Asparagine (N), Glutamine (Q); 4) Arginine (R) 5 Lysine (K); 5) Isoleucine (I) 5 Leucine (L), Methionine (M) 5 Valine (V); 6) Phenylalanine (F) 5 Tyrosine (Y) 5 Tryptophan (W); 7) Serine (S), Threonine (T); and 8) Cysteine (C), Methionine (M) (see, e.g., Creighton, Proteins (1984)).
  • Homologous refers to two or more sequences or subsequences that have a specified percentage of amino acid residues that are the same (i.e., about 60% identity, preferably about 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or higher identity over a specified region, when compared and aligned for maximum correspondence over a comparison window or designated region) as measured using a BLAST or BLAST 2.0 sequence comparison algorithms with default parameters described below, or by manual alignment and visual inspection (see, e.g., NCBI web site http://www.ncbi.nlm.nih.gov/BLAST/ or the like).
  • the definition also includes sequences that have deletions and/or additions, as well as those that have substitutions, as well as naturally occurring, e.g., polymorphic or allelic variants, and man-made variants.
  • the preferred algorithms can account for gaps and the like.
  • identity exists over a region that is at least about 25 amino acids in length, or more preferably over a region that is 50-100 amino acids in length. "Identical” may be used interchangably with "homologous”.
  • sequence comparison typically one sequence acts as a reference sequence, to which test sequences are compared.
  • test and reference sequences are entered into a computer, subsequence coordinates are designated, if necessary, and sequence algorithm program parameters are designated. Preferably, default program parameters can be used, or alternative parameters can be designated.
  • sequence comparison algorithm then calculates the percent sequence identities for the test sequences relative to the reference sequence, based on the program parameters.
  • a “comparison window”, as used herein, includes reference to a segment of one of the number of contiguous positions selected from the group consisting typically of from about 20 to 600, usually about 50 to about 200, more usually about 100 to about 150 in which a sequence may be compared to a reference sequence of the same number of contiguous positions after the two sequences are optimally aligned.
  • Methods of alignment of sequences for comparison are well- known in the art.
  • Optimal alignment of sequences for comparison can be conducted, e.g., by the local homology algorithm of Smith & Waterman, Adv. Appl. Math. 2:482 (1981), by the homology alignment algorithm of Needleman & Wunsch, J. MoI. Biol.
  • BLAST and BLAST 2.0 are used, with the parameters described herein, to determine percent sequence identity for the nucleic acids and proteins of the invention.
  • Software for performing BLAST analyses is publicly available through the National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov/).
  • This algorithm involves first identifying high scoring sequence pairs (HSPs) by identifying short words of length W in the query sequence, which either match or satisfy some positive-valued threshold score T when aligned with a word of the same length in a database sequence. T is referred to as the neighborhood word score threshold (Altschul et ah, supra). These initial neighborhood word hits act as seeds for initiating searches to find longer HSPs containing them. The word hits are extended in both directions along each sequence for as far as the cumulative alignment score can be increased.
  • HSPs high scoring sequence pairs
  • Cumulative scores are calculated using, e.g., for nucleotide sequences, the parameters M (reward score for a pair of matching residues; always > 0) and N (penalty score for mismatching residues; always ⁇ 0).
  • M forward score for a pair of matching residues; always > 0
  • N penalty score for mismatching residues; always ⁇ 0.
  • a scoring matrix is used to calculate the cumulative score. Extension of the word hits in each direction are halted when: the cumulative alignment score falls off by the quantity X from its maximum achieved value; the cumulative score goes to zero or below, due to the accumulation of one or more negative-scoring residue alignments; or the end of either sequence is reached.
  • the BLAST algorithm parameters W, T, and X determine the sensitivity and speed of the alignment.
  • W wordlength
  • E expectation
  • the BLAST algorithm also performs a statistical analysis of the similarity between two sequences (see, e.g., Karlin & Altschul, Proc. Nat'l. Acad. Sci. USA 90:5873-5787 (1993)).
  • One measure of similarity provided by the BLAST algorithm is the smallest sum probability (P(N)), which provides an indication of the probability by which a match between two nucleotide or amino acid sequences would occur by chance.
  • P(N) the smallest sum probability
  • a peptide is considered similar to a reference sequence if the smallest sum probability in a comparison of the test peptide to the reference peptide is less than about 0.2, more preferably less than about 0.01, and most preferably less than about 0.001.
  • Log values may be large negative numbers, e.g., 5, 10, 20, 30, 40, 40, 70, 90, 110, 150, 170, etc.
  • sequence similarity in the context of two or more nucleic acids or polypeptide sequences, refer to two or more sequences or subsequences that are, when optimally aligned with appropriate nucleotide insertions or deletions, the same or have a specified percentage of amino acid residues or nucleotides that are the same (i.e., 50% identity, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or higher identity to an amino acid sequence such as SEQ ID NO:2, or a nucleotide sequence such as SEQ ID NO:1), when compared and aligned for maximum correspondence over a comparison window, or designated region as measured using one of the following sequence comparison algorithms or by manual alignment and visual inspection.
  • This definition also refers to the compliment of a test sequence.
  • the identity exists over a region that is at least about 25 amino acids or nucleotides in length, or or even over a region that is about 50-100 amino acids or nucleotides in length.
  • selective hybridization will occur when there is at least about 55% homology over a stretch of at least about 14 nucleotides, more typically at least about 65%, 75%, 85% and or even at least about 90%. See, Kanehisa, Nuc. Acids Res., 12:203-213 (1984), which is incorporated herein by reference.
  • the length of homology comparison may be over longer stretches, and in certain embodiments will be over a stretch of at least about 17 nucleotides, generally at least about 20 nucleotides, ordinarily at least about 24 nucleotides, usually at least about 28 nucleotides, typically at least about 32 nucleotides, more typically at least about 40 nucleotides, 50 nucleotides, and or even at least about 75 to 100 or more nucleotides.
  • Macromolecular structures such as polypeptide structures can be described in terms of various levels of organization. For a general discussion of this organization, see, e.g., Alberts et al, Molecular Biology of the Cell (3 rd ed., 1994) and Cantor and Schimmel, Biophysical Chemistry Parti: The Conformation of Biological Macromolecules (1980).
  • Primary structure refers to the amino acid sequence of a particular peptide.
  • Secondary structure refers to locally ordered, three- dimensional structures within a polypeptide. These structures are commonly known as domains. Domains are portions of a polypeptide that form a compact unit of the polypeptide and are typically about 5 to 350 amino acids long. Typical domains are made up of organized sections of
  • peptide such as stretches of ⁇ strands (that can interact to form ⁇ sheets) and ⁇ helices.
  • structure refers to the complete three-dimensional structure of a polypeptide monomer.
  • Quaternary structure refers to the three dimensional structure formed by the non-covalent association of independent tertiary units.
  • a “label” or a “detectable moiety” is a composition detectable by spectroscopic, photochemical, biochemical, immunochemical, chemical, or other physical means.
  • useful labels include fluorescent dyes, electron-dense reagents, enzymes (e.g., as commonly used in an ELISA), biotin, digoxigenin, or haptens and proteins or other entities which can be made detectable, e.g., by incorporating a radiolabel into the peptide or used to detect antibodies specifically reactive with the peptide.
  • the radioisotope may be, for example, 3 H, 14 C, 32 P, 35 S, or 125 I.
  • the labels may be incorporated into the antibodies at any position.
  • any method known in the art for conjugating the antibody to the label may be employed, including those methods described by Hunter et al, Nature, 144:945 (1962); David et al, Biochemistry, 13:1014 (1974); Pain et al, J. Immunol Meth., 40:219 (1981); and Nygren, J. Histochem. and Cytochem. , 30:407 (1982).
  • the lifetime of radiolabeled peptides or radiolabeled antibody compositions may extended by the addition of substances that stablize the radiolabeled peptide or antibody and protect it from degradation. Any substance or combination of substances that stablize the radiolabeled peptide or antibody may be used including those substances disclosed in U.S. Patent No. 5,961,955.
  • recombinant when used with reference, e.g., to a cell, or nucleic acid, protein, or vector, indicates that the cell, nucleic acid, protein or vector, has been modified by the introduction of a heterologous nucleic acid or protein or the alteration of a native nucleic acid or protein, or that the cell is derived from a cell so modified.
  • recombinant cells express genes that are not found within the native (non-recombinant) form of the cell or express native genes that are otherwise abnormally expressed, under expressed or not expressed at all.
  • nucleic acid By the term “recombinant nucleic acid” herein is meant nucleic acid, originally formed in vitro, in general, by the manipulation of nucleic acid, e.g., using polymerases and endonucleases, in a form not normally found in nature. In this manner, operably linkage of different sequences is achieved.
  • an isolated nucleic acid, in a linear form, or an expression vector formed in vitro by ligating DNA molecules that are not normally joined are both considered recombinant for the purposes of this invention.
  • a recombinant nucleic acid is made and reintroduced into a host cell or organism, it will replicate non-recombinantly, i.e., using the in vivo cellular machinery of the host cell rather than in vitro manipulations; however, such nucleic acids, once produced recombinantly, although subsequently replicated non-recombinantly, are still considered recombinant for the purposes of the invention.
  • a "recombinant protein” is a protein made using recombinant techniques, i.e., through the expression of a recombinant nucleic acid as depicted above.
  • operably linked refers to a linkage of polynucleotide elements in a functional relationship.
  • operably linked refers to a functional linkage between a nucleic acid expression control sequence (such as a promoter, or an array of transcription factor binding sites) and a second nucleic acid sequence, wherein the expression control sequence directs transcription of the nucleic acid corresponding to the second sequence.
  • a nucleic acid is “operably linked” when it is placed into a functional relationship with another nucleic acid sequence.
  • amino acid refers to naturally occurring and synthetic amino acids, as well as amino acid analogs and amino acid mimetics that function in a manner similar to the naturally occurring amino acids.
  • Naturally occurring amino acids are those encoded by the genetic code, as
  • amino acids that are later modified, e.g., hydroxyproline, ⁇ -carboxyglutamate, and o-
  • amino acid analog refers to compounds that have the same basic chemical structure as a naturally occurring amino acid, i.e., a carbon that is bound to a hydrogen, a carboxyl group, an amino group, and an R group, e.g., homoserine, norleucine, methionine sulfoxide, methionine methyl sulfonium. Such analogs have modified R groups (e.g., norleucine) or modified peptide backbones, but retain the same basic chemical structure as a naturally occurring amino acid.
  • Amino acid mimetics refers to chemical compounds that have a structure that is different from the general chemical structure of an amino acid, but that function in a manner similar to a naturally occurring amino acid.
  • Amino acids may be referred to herein by either their commonly known three letter symbols or by the one-letter symbols recommended by the IUPAC-IUB Biochemical Nomenclature Commission. Nucleotides, likewise, may be referred to by their commonly accepted single-letter codes.
  • amino acid sequence refers to the positional relationship of amino acid residues as they exist in a given polypeptide or protein.
  • coding sequence in relation to nucleic acid sequences, refers to a plurality of contiguous sets of three nucleotides, termed codons, each codon corresponding to an amino acid as translated by biochemical factors according to the universal genetic code, the entire sequence coding for an expressed protein, or an antisense strand that inhibits expression of a protein.
  • a "genetic coding sequence” is a coding sequence where the contiguous codons are intermittently interrupted by non-coding intervening sequences, or "introns.” During mRNA processing intron sequences are removed, restoring the contiguous codon sequence encoding the protein or anti- sense strand.
  • sequences refers to an uninterrupted sequence of bases or amino acids, each base or amino acid being immediately adjacent to its neighbors in the sequence.
  • expression vector and "expression cassette” include any type of genetic construct containing a nucleic acid capable of being transcribed in a cell.
  • the expression vectors of the invention generally supply sequence elements directing translation of the coding sequence into a protein of the present invention, as provided by the invention itself, although vectors used for the amplification of nucleotide sequences (both coding and non-coding) are also encompassed by the definition.
  • expression vectors will generally include restriction enzyme cleavage sites and the other initial, terminal and intermediate DNA sequences that are usually employed in vectors to facilitate their construction and use.
  • the expression vector can be part of a plasmid, virus, or nucleic acid fragment.
  • fusion gene refers to the combination of one or more heterologous coding sequences joined in frame to form a single translational/transcriptional unit. Typically the heterologous coding sequences are joined end-to-end. The definition however includes fusion genes where one sequence, or fragment thereof, intervenes in another heterologous sequence.
  • heterologous when used with reference to portions of a nucleic acid or protein indicates that the molecule comprises two or more subsequences that are not found in the same relationship to each other in nature.
  • a heterologous nucleic acid is typically recombinantly produced, having two or more sequences from unrelated genes arranged to make a new functional nucleic acid, e.g., a promoter from one source and a coding region from another source.
  • a heterologous protein indicates that the protein comprises two or more subsequences that are not found in the same relationship to each other in nature (e.g., a fusion protein).
  • primers or “primer pairs” refer to oligonucleotide probes capable of recognizing and hybridizing to specific nucleotide sequences found in a target gene or sequence to be amplified by polymerase chain reaction (PCR). The degree of complementarity required between the primers and the target sequence determines the specificity, or stringency of conditions
  • a temperature of about 36 0 C is typical for low
  • annealing temperatures may vary between about 32°C and about
  • a temperature of about 62°C is typical, although high stringency annealing temperatures can range from about 50°C to
  • low stringency amplifications include a denaturation phase of about 90°C - 95°C for 30 sec - 2
  • regulatory sequences refers to those sequences, both 5' and 3' to a structural gene, that are required for the transcription and translation of the structural gene in the target host organism. Regulatory sequences include a promoter, ribosome binding site, optional inducible elements and sequence elements required for efficient 3' processing, including polyadenylation. When the structural gene has been isolated from genomic DNA, regulatory sequences also include those intronic sequences required to remove of the introns as part of mRNA formation in the target host.
  • recombinant when used with reference, e.g., to a cell, or nucleic acid, protein, or vector, indicates that the cell, nucleic acid, protein or vector, has been modified by the introduction of a heterologous nucleic acid or protein, or the alteration of a native nucleic acid or protein, or that the cell is derived from a cell so modified.
  • recombinant cells express genes that are not found within the native (non-recombinant) form of the cell or express native genes that are otherwise abnormally expressed, under-expressed or not expressed at all.
  • DNA regions are "operably linked" when they are functionally related to each other.
  • DNA for a signal peptide is operably linked to DNA for a polypeptide if it is expressed as a precursor which participates in the secretion of the polypeptide; a promoter is operably linked to a coding sequence if it controls the transcription of the sequence; or a ribosome- binding site is operably linked to a coding sequence if it is positioned so as to permit translation.
  • operably linked means contiguous and in reading frame.
  • FIG. 1 shows single and multiround transcription at the T7Al-tR' and galP-tR' promoters.
  • Two transcriptionally competent open model promoters from different classes, -10/-35 class (T7A1) and extended -10 class (galPl), attached to a rho-independent terminator were used to
  • reaction 1 was incubated at 65°C (for T.th) and 37 0 C (for E.
  • transcription buffer contained core enzyme and sigma, and, in parallel, Ys 18 and a promoter DNA fragment.
  • Reaction 2 was incubated at 65 0 C (for T. th) and 37 0 C (for E. coli) for 10 minutes and then mixed together. For both reactions 1 and 2, after 10 minutes of incubation at the same
  • FIG. IA shows multi-round transcriptional inhibition by YsI 8 of both T7Al-tR' and
  • FIG. IB shows single-round (+ heparin) and multi-round (- heparin) transcriptional
  • FIG. 1C shows the relative transcriptional activity of the run-off assay in graphical form. Transcriptional activity without YsI 8 present is 100% (dark bar). The addition of YsI 8 in incremental amounts represses transcriptional activity in a dose dependent manner (light bar). The amount of transcriptional repression by Ys 18 presences differs with amount of Ys 18 added, promoter type, RNA polymerase core, reaction mixture and presence or absence of heparin.
  • FIG. 2 shows native binding experiments with histidine-tagged phage protein YsI 8 and primary sigma factors from T. thermophilics and E. coli. Reactions, containing corresponding proteins in 20 ⁇ l of binding buffer (20 niM tris HCl, ⁇ H8.0, 0.5 M NaCl 5 2 mM imidazole, 5% v/v
  • FIG. 2 A shows YS18 H I S bound to the primary sigma factor from T. thermophilus ( ⁇ ⁇ ).
  • FIG. 2B shows YsI 8 HIS bound to the primary sigma factor from E. coli ( ⁇ 70 ).
  • FIG. 2C shows YsI 8 H is bound to the primary sigma factor from E. coli lacking region 4
  • the present invention provides novel proteins from the bacteriophage YS40. These novel proteins retain their functionality at mesophilic or thermophilic temperatures, and consequently allow biosynthetic and/or biodegradative processes to proceed at higher temperatures.
  • the YS40 bacteriophage infects Thermus thermophilics HB8, and grows over the temperature range of about 56 to about 78 0 C.
  • the bacteriophage has a large genome (165 Kbp, ⁇ 150 genes) containing multiple DNA polymerase genes. The phage reproduces above 7O 0 C, and the thermophilic enzymes have an extrinsic structural stability.
  • YS40 proteins have a strong similarity to prokaryotic enzymes, including the length of their amino acid sequences, and the potential to encode most of the proteins required for its own replisome.
  • YS40 encodes its own A-type DNA polymerase (encoded by SEQ ID NO:134), which has a conserved region in its C-terminus including 3 motifs with invariant residues ranging from amino acid residues 825-1102. Like the Klenlow fragment from E. coli DNA pol I, the YS40 A-type DNA polymerase has no N-terminal 5 '-3' exonuclease domain.
  • gpl66 encoded by SEQ ID NO: 166
  • SEQ ID NO: 106 is an S-adenosylmethionine decarboxylase (key enzyme in biosynthesis of spermidine and spermine.).
  • the molecules of this invention may find utility in a wide variety of applications including, but not limited to, synthetic nucleic acid synthesis, biodegradative processes and other applications requiring resilient molecules capable of retaining their integrity, including enzyme activity when present, at higher temperatures such as least about 36 0 C, or even at least about 45 0 C, 55 0 C, 65 0 C, or even about 75 0 C.
  • the following sections detail embodiments of the present invention, and how they may be used in biometabolic reactions. ⁇ . Identifying Open reading Frames
  • Nucleic acids encoding proteins and peptides of the present invention may be identified by screening the YS40 genomic sequence for open reading frames (ORFs) using any method known in the art. Using these methods, nucleic acid coding sequences for proteins of the present invention, as found in wild-type and cultured bacteriophage YS40 strains, may be identified. These coding sequences and/or proteins may be further modified as described herein, to provide additional coding sequences of the invention.
  • the genome sequence of YS40 may be searched for ORFs using the hidden Markov model approach implemented in GeneMark program (See Besemer J. and Borodovsky M (1999), NAR, Vol. 27, No. 19, pp. 3911-3920).
  • ORFs 170 open reading frames encoding preferred proteins of the present invention are predicted, as identified in Table 1, below:
  • Regions between the identified ORFs may be screened for additional genes using the Blastx and tBlastx programs (Schafer et al, 1997), and identified ORF sequences compared with sequences in available databases (e.g., GenBank, GenPept, and the database of unfinished microbial genomes at NCBI) to provide a putative activity or function to the protein encoded by the ORF.
  • available databases e.g., GenBank, GenPept, and the database of unfinished microbial genomes at NCBI
  • ORFs may be used in expression systems to produce YS40 proteins of the present invention, or the proteins may be isolated from cultures of Thermus thermophilics infected with bacteriophage YS40. Alternatively, proteins and peptides of the present invention may be synthesized using solid or liquid phase techniques well known to those of skill in the art.
  • thermophilic amino acid sequence at least 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or even at least 99% homologous to a YS40 amino acid sequence encoded by at least about 25, 35, 45, 50, 65 75 85, 95 or even about 100 contiguous codons of a YS40 coding sequence selected from SEQ ID NO: 1-170.
  • the YS40 coding sequence is selected from SEQ ID NO: 2, 4-64, 70-149, 151 and 153-170 and has an enzyme activity that is a decarboxylase, nuclease, synthase, recombinase, helicase, dehydrogenase, reductase, nucleotide primase, kinase, protease, nucleotidyltransferase, nucleic acid polymerase, deaminase, acyltransferase, terminase, helicase, glycosyltransferase or a peptidase that is active at a permissible temperature of about 36 0 C, or even about 45 0 C, 55 0 C or 65 0 C, or is active at about 75 0 C.
  • the YS40 coding sequence may be from SEQ ID NO: 5, 8, 9, 12-14, 17, 18, 23-27, 29, 33, 38, 41, 42, 52, 57, 59, 60, 62, 71, 79, 114, 144 or 161.
  • the YS40 coding sequence is from SEQ ID NO: 33, and is a DNA polymerase.
  • proteins of the present invention may be isolated from cultures of Thermus thermophilics infected with bacteriophage YS40. Proteins of the invention isolated in this manner will typically be encoded by the ORFs SEQ ID NOs: 1-170, more typically by SEQ ID NO: 2, 4-64, 70-149, 151 or 153-170, even more typically by SEQ ID NO: 5, 8, 9, 12-14, 17, 18, 23-27, 29, 33, 38, 41, 42, 52, 57, 59, 60, 62, 71, 79, 114, 144 or 161, and most typically by SEQ ID NO: 33. In certain embodiments of the invention, the proteins contemplated are fragments of one or more of those proteins described above.
  • YS40 proteins of the present invention may be obtained by growing cultures of Thermus thermophilus infected with bacteriophage YS40 at a permissible temperature using media and techniques well known to those of skill in the art. During culture, preferably in the exponential growth phase of the bacteria, the culture is fractionated by separating the bacteria from the culture media using, for example, low speed centrifugation. If a lytic strain of YS40 is used, then the proteins of the invention may be harvested from the supernatant. If a non-lytic strain of YS40 is used, then the proteins of the invention may be harvested from the bacterial cells after lysis using, for example, a trench press or other method well known in the art.
  • proteins of the invention may be further purified using any combination of a variety of techniques well known to those of skill in the art. (cf., Colley et al., J. Biol. Chem., 264:17619-17622 (1989), and Guide to Protein Purification, in Vol. 182 of Methods in Enzymology (Deutscher ed., 1990), Morrison, D.A., J Bad, 132:349-351 (1977), or by Clark-Curtiss et al, Methods in Enzymology, 101:347-362 (1983), eds. R. Wu et al, Academic Press, New York, (for suitable media, see the catalogues of the American Type Culture Collection)). Additional isolation techniques are described in detail in the following sections.
  • Proteins and peptides of the present invention may be purified to substantial purity by standard techniques, including column chromatography, immunopurification methods, electrophoresis, centrifugation, crystallization, isoelectric focusing and others ⁇ see, e.g., Scopes, Protein Purification: Principles and Practice (1982); Ausubel, et al. (1987 and periodic supplements) Current Protocols in Molecular Biology; Deutscher (1990) "Guide to Protein Purification” in Methods in Enzymology vol. 182, and other volumes in this series; and manufacturers' literature on use of protein purification products, e.g., Pharmacia, Piscataway, NJ., or Bio-Rad, Richmond, Calif.; and Sambrook et al, supra). Standard purification techniques
  • the individual molecular weights of proteins of the present invention may be used to isolate it from proteins of greater and lesser size by, for example, using ultrafiltration through membranes of different pore size (for example, Amicon or Millipore membranes).
  • the protein mixture is ultrafiltered through a membrane with a pore size that has a lower molecular weight cut-off than the molecular weight of the protein of interest.
  • the retentate of the ultrafiltration is then ultrafiltered against a membrane with a molecular cut-off greater than the molecular weight of the protein of interest.
  • the recombinant protein will pass through the membrane into the filtrate.
  • the filtrate may then be chromato graphed as described below.
  • Proteins of the present invention may also be separated from other proteins on the basis of size, net surface charge, hydrophobicity, and affinity for ligands.
  • antibodies raised against proteins may be conjugated to column matrices and the proteins immunopurified. AU of these methods are well known in the art. It will be apparent to one of skill that chromatographic techniques may be performed at any scale and using equipment from many different manufacturers (e.g., Pharmacia Biotech).
  • Purification segments may be fused to appropriate portions of proteins of the present invention to assist in isolation and production.
  • the FLAG sequence or a functional equivalent, may be fused to the protein via a protease-removable sequence, allowing the FLAG sequence to be recognized by an affinity reagent, and the purified protein subjected to protease digestion to remove the extension.
  • Many other equivalent segments exist, e.g., poly- histidine segments possessing affinity for heavy metal column reagents. See, e.g., Hochuli, Chemische Industrie, 12:69-70 (1989); Hochuli, Genetic Engineering, Principle and Methods, 12:87-98 (1990), Plenum Press, N. Y.; and Crowe, et al. (1992) OIAexpress: The High Level Expression & Protein Purification System, QIAGEN, Inc. Chatsworth, Calif.; which are incorporated herein by reference.
  • Affinity tags may also be incorporated into protein constructs of the present invention as analytical tools. Affinity tags provide a convenient way of removing the protein construct from a sample at a desired time, or to detect the location of the protein construct in a sample. Many other applications of affinity tagged protein constructs will be readily apparent to one of skill in the art.
  • Protein constructs of the present invention may also contain a string of histidine residues, incorporated at the amino or carboxyl terminal of the novel protein.
  • the polyhistidine tag allows convenient isolation of the protein in a single step by nickel-chelate chromatography. When a protein that has been "his-tagged" is placed on the nickel column, the histidine residues form a chelate complex with the nickel bound to the column, immobilizing the tagged protein. Contaminating components of the solution comprising the tagged protein may be washed away prior to elution of the tagged protein with a suitable competing chelator, typically imidazole.
  • the polyhistidine tag may be added to the protein through the use of peptide linkers as described in detail below.
  • the tag may be linked to a protein by appending a nucleic acid encoding the tag onto the coding region of recombinant protein, the resulting construct being incorporated into a suitable expression vector that is subsequently used to transform an appropriate host cell. Protein produced in the transformed host cell may then be purified as noted above.
  • Epitope tags are another useful sequence that may be included in a protein construct of the present invention.
  • the epitope tag may consist of an amino acid sequence that allows affinity purification of the activated protein (e.g., on immunoaffmity or chelating matrices).
  • affinity purification of the activated protein e.g., on immunoaffmity or chelating matrices.
  • all of the activated proteins from an activation library may be purified. By purifying the activated proteins away from other cellular and media proteins, screening for novel proteins and enzyme activities may be facilitated. In some instances, it may be desirable to remove the epitope tag following purification of the activated protein.
  • This removal may be accomplished by including a protease recognition sequence (e.g., Factor Ha or enterokinase cleavage site) downstream from the epitope tag on the activation construct.
  • a protease recognition sequence e.g., Factor Ha or enterokinase cleavage site
  • Incubation of the purified, activated protein(s) with the appropriate protease will release the epitope tag from the proteins(s).
  • a preferred method of producing proteins of the present invention is through recombinant expression of the proteins in a heterologous host system.
  • a heterologous host system Such systems are preferably cellular in nature, but may be cell-free.
  • Preferable cell-based systems include bacterial hosts, most preferably E. coli hosts.
  • nucleic acids encoding proteins of the present invention are typically inserted into an expression vector suitable for the chosen host, with the coding sequence of the nucleic acid aligned in-frame and operably linked to suitable control sequences such as a promoter and a transcriptional terminator.
  • the expression vector is then inserted into the host cell, which is then cultured under conditions that allow for the expression of the protein of the invention.
  • the protein is preferably purified using techniques such as the examples provided below.
  • inclusion bodies Proteins expressed in bacteria may form insoluble aggregates ("inclusion bodies").
  • purification of inclusion bodies typically involves the extraction, separation and/or purification of inclusion bodies by disruption of bacterial cells, e.g., by incubation in a buffer of 50 mM Tris/HCl pH 7.5, 50 mM NaCl, 5 mM MgCl 2 , 1 mM DTT, 0.1 mM ATP, and 1 mM PMSF.
  • the cell suspension can be lysed using 2-3 passages through a French Press, homogenized using a Polytron (Brinkman Instruments) or sonicated on ice.
  • the present invention contemplates a recombinant cell or other expression system including an isolated nucleic acid that contains a YS40 nucleotide sequence having the nucleotide sequence SEQ ID NO: 1-170.
  • the YS40 nucleotide sequence encodes either a YS40 structural protein that does not take a random coil structure at a permissible temperature of at least 36 0 C, or a YS40 enzyme that displays decarboxylase, nuclease, synthase, recombinase, helicase, dehydrogenase, reductase, nucleotide primase, kinase, protease, nucleotidyltransferase, nucleic acid polymerase, deaminase, acyltransferase, terminase, helicase, glycosyltransferase or peptidase activity at a permissible temperature of at least 36 0 C.
  • the YS40 nucleotide sequence is operably linked to a regulatory element, preferably a promoter, most preferably a constitutive promoter.
  • inventions include a recombinant vector comprising an isolated nucleic acid encoding an isolated protein comprising a thermophilic amino acid sequence at least about 75% homologous to a YS40 amino acid sequence encoded by at least about 25 contiguous codons from SEQ ID NO: 1-170.
  • the YS40 amino acid sequence is operably linked to a promoter such that introduction of the vector into an expression system produces a protein having, an enzyme activity selected from the group consisting of decarboxylase, nuclease, synthase, recombinase, helicase, dehydrogenase, reductase, nucleotide primase, kinase, protease, nucleotidyltransferase, nucleic acid polymerase, deaminase, acyltransferase, terminase, helicase, glycosyltransferase and peptidase when assayed at a permissible temperature of at least about 36 0 C more typically at least about 45 0 C, 55 0 C, or 65 0 C and most typically at least about 75 0 C.
  • nucleic acids encoding proteins of the present invention in the production of the proteins.
  • These nucleic acids may be any coding sequence capable of expressing a protein of the present invention, when operably linked to appropriate control sequences, including a promoter.
  • the nucleic acid may include a partial deletion, substitution or insertion of the nucleotide sequence, or may have other nucleotide sequence ligated therewith at the 5 '-terminus and/or 3 'terminus thereof.
  • nucleic acid sequences encoding proteins of the present invention may be isolated from Thermus thermophilics strains infected with bacteriophage YS40, or may be isolated from phage libraries constructed from the YS40 bacteriophage genome using methods well known by those of skill in the art.
  • cDNA or genomic libraries are constructed and screened to identify the correct sequence.
  • PCR amplification techniques can also be used to identify and isolate nucleic acid sequences encoding proteins of the invention and are discussed generally in PCi? Protocols: A Guide to Methods and Applications (Innis et al, eds, 1990).
  • Nucleic acids encoding proteins of the invention may also be prepared using synthetic techniques. Chemical synthesis of linear oligonucleotides is well known in the art and can be achieved by solution or solid phase techniques. Moreover, linear oligonucleotides of defined sequence can be purchased commercially or can be made by any of several different synthetic procedures including the phosphoramidite, phosphite triester, H-phosphonate and phosphotriester methods, typically by automated synthesis methods. The synthesis method selected can depend on the length of the desired oligonucleotide and such choice is within the skill of the ordinary artisan.
  • the phosphoramidite and phosphite triester method produce oligonucleotides having 175 or more nucleotides while the H-phosphonate method works well for oligonucleotides of less than 100 nucleotides.
  • Oligonucleotides of the present invention can be synthesized chemically according to the solid phase phosphoramidite triester method described by Beaucage and Caruthers (1981), Tetrahedron Letts., 22(20): 1859-1862, e.g., using an automated synthesizer, as described in Needham-VanDevanter et al. (1984) Nucleic Acids Res., 12:6159-6168.
  • Oligonucleotides can also be custom made and ordered from a variety of commercial sources known to persons of skill in the art. Purification of oligonucleotides, where necessary, is typically performed by either native acrylamide gel electrophoresis or by anion-exchange HPLC as described in Pearson and Regnier (1983) J. Chrom. 255:137-149. See also Sambrook, J. et al. Molecular Cloning, A Laboratory Manual, 2d Ed. Cold Spring Harbor Laboratory Press, New York, 13.7-13.9 and Hunkapiller, M. W. (1991) Curr. Op. Gen. Devi. 1:88-92.
  • Nucleic acids encoding proteins of the present invention may be expressed in a variety of host organisms once they are operably linked in expression vectors suitable for the selected host organism. Suitable expression vectors typically comprise regulatory sequences operable in the host organism. These regulatory sequences are necessarily operably linked to the nucleic acid to control its expression.
  • the expression vector includes a promoter that is either inducible or constitutively drives transcription, and may optionally comprise other regulatory, replication or manipulation sequences to aid in the expression and incorporation of the nucleic acid into the expression vector, as required by the particular application being pursued.
  • expression vectors that contain, at a minimum; a strong promoter to direct transcription, a ribosome-binding site for translational initiation, a transcription/translation terminator, and unique restriction sites in nonessential regions of the plasmid to allow insertion of foreign nucleic acids.
  • Other factors may also be carried on the expression vector, such as selectable and/or scorable markers, such as those described below.
  • Suitable expression systems for use with the present invention are well known in the art. See, e.g., Pouwels, et al. (1985 and Supplements) Cloning Vectors: A Laboratory Manual, Elsevier, N.
  • Exemplary bacterial host organisms suitable for use in the present invention are well known in the art and include gram-positive and gram-negative bacteria such as Escherichia coli (cf. Sambrook et al, supra). E. coli strains are particularly preferred host organisms for expression of proteins of the present invention. Exemplary E. coli strains include BL21 (DE3), BL21-Gold (DE3), BL21 (DE3)-pLysS (Stratagene), MMLV-RT: JMl 09, DH5.alpha.f , XLlBLUE
  • Standard transfection methods are used to introduce expression systems for proteins of the present invention to host organisms, (see, e.g., Morrison, J Bact., 132:349-351 (1977); Clark- Curtiss & Curtiss, Methods in Enzymology, 101 :347-362 (Wu et al, eds, 1983); Sambrook et al, and Ausubel et al, supra.).
  • the proteins can be recovered from the cells or from the culture medium by standard protein purification techniques as described above.
  • Identifying host organisms that have successfully incorporated nucleic acids encoding a protein of the present invention is preferably accomplished through inclusion of a selectable marker gene into the vector or expression system used for producing the protein.
  • Selectable markers allow a transformed cell, tissue or animal to be identified and isolated by selecting or screening the engineered material for traits encoded by the marker genes present on the transforming DNA. For instance, selection may be performed by growing the engineered cells on media containing inhibitory amounts of an antibiotic to which the transforming marker gene construct confers resistance. Further, transformed cells may also be identified by screening for the
  • any visible marker genes e.g., the ⁇ -glucuronidase, green fluorescent protein,
  • luciferase, B or Cl genes that may be present on the recombinant nucleic acid constructs of the present invention.
  • selection and screening methodologies are well known to those skilled in
  • Physical and biochemical methods may also be used to identify a cell transformant containing the genetic constructs of the present invention. These methods include but are not limited to: 1) Southern analysis or PCR amplification for detecting and determining the structure of the recombinant DNA insert; 2) Northern blot, S-I RNase protection, primer-extension or reverse transcriptase-PCR amplification for detecting and examining RNA transcripts of the gene constructs; 3) enzymatic assays for detecting enzyme activity, where such gene products are encoded by the gene construct; 4) protein gel electrophoresis, western blot techniques, immunoprecipitation, or enzyme-linked immunoassays, where the gene construct products are proteins; 5) biochemical measurements of compounds produced as a consequence of the expression of the introduced gene constructs. The methods for performing these assays are well known to those skilled in the art
  • Proteins of the present invention may also be synthesized chemically.
  • peptides may be synthesized either in solution, solid phase or a combination of these methods following standard protocols. See, for example, Wilken et al. (Curr. Opin. Biotech. (1998) 9(4) :412-426), which reviews chemical protein synthesis techniques.
  • the solution and solid phase synthesis methods are readily automated.
  • a variety of peptide synthesizers are commercially available for batchwise and continuous flow operations as well as for the synthesis of multiple peptides within the same run. Briefly, the solid phase method consists of anchoring the growing peptide chain to an insoluble support or resin.
  • Solution phase peptide synthesis generally involves reacting individual protected amino acids in solution to generate protected dipeptide product.
  • the method of chemical synthesis employs a combination of chemical synthesis and chemical ligation techniques.
  • chemical synthesis approaches described above may be utilized in combination with various chemoselective chemical ligation techniques for producing the proteins of the invention.
  • Chemoselective chemical ligation chemistries that can be utilized in the methods of the invention include native chemical ligation (Dawson et al, Science (1994) 266:77-779; Kent et al., WO 96/34878), extended general chemical ligation (Kent et al., WO 98/28434), oxime-forming chemical ligation (Rose et al., J. Amer. Chem. Soc.
  • chemical tags examples include metal binding tags (e.g., his-tags), carbohydrate/substrate binding tags (e.g., cellulose and chitin binding domains), antibodies and antibody fragment tags, isotopic labels, haptens such as biotin and various unnatural amino acids comprising a chromophore, some of which have been discussed supra.
  • a chemical tag also may include a cleavable linker so as to permit separation of the protein from the chemical tag depending on its intended end use.
  • Proteins of the present invention find application in a variety of processes, including biosynthetic and biodegradive processes, particularly those where performance of the process at a mesophilically or thermophilically compatible temperature is beneficial.
  • proteins of the present invention that catalyze reactions of import in nucleic acid synthesis are particularly suited for nucleic acid amplification processes.
  • Methods of "quantitative" nucleic acid amplification are well known to those of skill in the art.
  • quantitative PCR involves simultaneously co-amplifying a known quantity of a control sequence using the same primers. This type of quantitative amplification provides an internal standard that may be used to calibrate the PCR reaction.
  • One exemplary internal standard is a synthetic AWl 06 cRNA.
  • the AWl 06 cRNA is combined with RNA isolated from the sample according to standard techniques known to those of skilled in the art.
  • the RNA is then reverse transcribed using a reverse transcriptase to provide cDNA.
  • the cDNA sequences are then amplified (e.g., by PCR) using labeled primers.
  • the amplification products are separated, typically by electrophoresis, and the amount of radioactivity (proportional to the amount of amplified product) is determined.
  • the amount of mRNA in the sample is then calculated by comparison with the signal produced by the known AWl 06 RNA standard.
  • Detailed protocols for quantitative PCR are provided in PCR Protocols, A Guide to Methods and Applications, Innis et al., Academic Press, Inc. N.
  • PCR polymerase chain reaction
  • LCR ligase chain reaction
  • RNA antisense
  • the oligonucleotide probes provided in the array are chosen to be complementary to subsequences of the antisense nucleic acids.
  • the target nucleic acid pool is a pool of sense nucleic acids
  • the oligonucleotide probes are selected to be complementary to subsequences of the sense nucleic acids.
  • the probes may be of either sense as the target nucleic acids include both sense and antisense strands.
  • the protocols cited above include methods of generating pools of either sense or antisense nucleic acids. Indeed, one approach can be used to generate either sense or antisense nucleic acids as desired.
  • the cDNA can be directionally cloned into a vector (e.g., Stratagene's p Bluscript II KS (+) phagemid) such that it is flanked by the T3 and T7 promoters.
  • RNA of one sense the sense depending on the orientation of the insert
  • in vitro transcription with the T7 polymerase will produce RNA having the opposite sense.
  • suitable cloning systems include phage lambda vectors designed for Cre-loxP plasmid subcloning (see e.g., Palazzolo et al., Gene, 88: 25-36 (1990)).
  • Exemplary reagent mixtures for use in amplifying nucleic acids according to the methods of the present invention include a recombinant protein that has a thermophilic amino acid sequence at least about 75% homologous to an YS40 amino acid sequence encoded by at least about 25 contiguous codons of SEQ ID NO: 2, 4-64, 70-149, 151 or 153-170.
  • This thermophilic amino acid sequence confers to the recombinant protein an enzyme activity necessary for DNA amplification when incubated at a permissible temperature of at least about 36°C, more typically at least about 55°C most typically at least about 65 0 C.
  • the YS40 amino acid sequence is encoded by at least 25 contiguous codons of SEQ ID NO: 5, 8, 9, 12-14, 17, 18, 23-27, 29, 33, 38, 41, 42, 52, 57, 59, 60, 62, 71, 79, 114, 144 or 161. Most typically the YS40 amino acid sequence is encoded by at least about 25 contiguous codons of SEQ ID NO: 33, and the enzyme activity is DNA polymerase. [101] In some embodiments of the present invention, amplification of nucleic acids is contemplated as taking place directly from whole cells containing the nucleic acid to be amplified.
  • Exemplary proteins of the present invention possessing protease, lipase or other enzymatic activities that degrade biomolecules of a cell may be included in the amplification reaction. These enzymes, together with the elevated temperatures of the reaction, provide a means of breaching the cell membrane and allowing the nucleic acid within the cell to be amplified. Methodology for carrying out such reactions will be obvious to one of skill in the art, and may be adapted to virtually any cell system through routine experimentation.
  • thermophilic protein that has a recombinant amino acid sequence at least about 75% homologous to an YS40 amino acid sequence, which is encoded by at least 25 contiguous codons from SEQ ID NO: 2, 4-64, 70-149, 151 and 153-170, more typically SEQ ID NOs: 5, 8, 9, 12-14, 17, 18, 23-27, 29, 33, 38, 41, 42, 52, 57, 59, 60, 62, 71, 79, 114, 144 or 161, most typcially SEQ ID NO: 33.
  • the thermophilic protein encoded by SEQ ID NO: 33 is preferred.
  • the cell membrane When incubated at a permissible temperature greater than about 36 0 C, more preferably greater than about 55 0 C, and most preferably greater than about 65 0 C, the cell membrane is breached, allowing the amplification reagents to contact the nucleic acids of the cell, which are subsequently amplified.
  • proteins of the present invention are used in commercially important biosynthetic or biodegradative processes.
  • the present invention contemplates using proteins described herein in mesophilic and thermophilic processes for the synthesis or degradation of biomaterials.
  • proteins described herein in mesophilic and thermophilic processes for the synthesis or degradation of biomaterials.
  • these reactions may be carried out at elevated temperatures that are incompatible with growth of bacteria that may normally interfere with such processes, while providing accelerated enzymatic activity resulting from the higher temperature.
  • processes in which protein enzymes of the present invention may be used include, but are not limited to, waste water treatment, fermentation processes, composting, paper manufacture, etc. It will be readily appreciated by one of skill in the art that the proteins of the present invention find use in many processes in addition to those listed here, and may be applied to such processes through routine experimentation.
  • Methods of the present invention suitable for decomposing a biodegradable material involve contacting the biodegradable material with at least one recombinant protein that has a recombinant amino acid sequence that is at least about 75% homologous to an YS40 amino acid sequence encoded by at least 25 contiguous codons of SEQ ID NO.: 2, 4-64, 70-149, 151 or 153-170 of Table 1.
  • the recombinant amino acid sequence confers to the recombinant protein, at a permissible temperature greater than 36 0 C, more preferably greater than 55 0 C and most preferably greater than 65 0 C, an enzyme activity necessary for decomposing the biodegradable material, which may be a protease, amylase, cellulase, nuclease, lipase, deaminase or a peptidase.
  • the present invention also contemplates a Thermus thermophilics expression system for expression of foreign proteins at elevated temperatures.
  • Central to Thermus thermophilus expression systems of the present invention is an expression vector based on the YS40 bacteriophage genome.
  • the expression vector includes no more than 99.9% of the nucleotide sequence of SEQ ID NO: 171 or its complement. Inserted into this vector is a non-YS40 nucleotide sequence of at least 20 contiguous nucleotides.
  • This non-YS40 nucleotide sequence is inserted into the vector sequence such that it is flanked on its 3' and 5 'ends by at least 10 contiguous nucleotides from the YS40 genome.
  • the non-YS40 nucleotide sequence may include a promoter suitable for expression of the protein encoded by the non-YS40 nucleotide sequence in T. thermophilus, and/or the non-YS40 nucleotide sequence may be operably linked to one or more regulatory sequences of YS40 bacteriophage.
  • T. thermophilics using any technique known to those of skill in the art, such as those described above.
  • the transformed T. thermophilics is then cutured at a permissible temperature under suitable conditions allowing expression of the protein encoded by the non-YS40 nucleotide sequence.
  • thermophilics are a preferred embodiment of the present invention
  • other cellular hosts are also contemplated, as are cell-free expression systems, such as reticulocyte lysates.
  • kits suitable for amplifying nucleic acid samples.
  • kits include a reagent containing at least one recombinant protein that has a thermophilic amino acid sequence at least about 75% homologous to an YS40 amino acid sequence encoded by at least about 25 contiguous taken from one of the sequences SEQ ID NO: 1-170.
  • the reagent has an enzyme activity necessary for DNA amplification or DNA entry into the cell, as described above, at a permissible temperature of at least about 36 0 C, more typically at least about 55 0 C most typically at least about 65 0 C.
  • Kit embodiments also include a buffer solution for diluting the reagent and may optionally include universal primers and/or known calibration nucleic acids known to those of skill in the art.
  • This example describes one method of identifying coding sequences of the present invention starting from the genomic sequence of the YS40 thermophilic phage.
  • the genome sequence of YS40 was searched for open reading frames (ORFs) using the hidden Markov model approach implemented in GeneMark program (See Besemer J. and Borodovsky M (1999)., NAR, Vol. 27, No. 19, pp. 3911-3920). Using this technique, 170 open reading frames (ORFs) encoding preferred proteins of the present invention were predicted (Table 1).
  • Regions between the identified ORFs were then screened for additional genes using the Blastx and tBlastx programs (Schafer et al, 1997) to identify regions having similarity with available entries in GenBank, GenPept, and the database of unfinished microbial genomes at NCBI. This latter search did not identify any additional coding sequences.
  • the predicted YS40 ORFs have lengths between 43 and 1744 codons. As with most other phages, the genome of YS40 is tightly packed, with little space between ORFs and 46 cases of overlaps (from 1 to 40 bases in length) between the adjoining ORFs.
  • YS40 encodes several tRNAs.
  • tRNA scan-SE program Lowe, T.M. & Eddy, S. R.
  • YS40 encodes a number of enzymes that are involved in nucleotide metabolism. They are gp8, ahomolog of mammalian/virus UTPase (EC 3.6.1.23); gp9, related to flavin-dependent thymidylate synthase (EC 2.1.1.148); gpl7, a GMP reductase, having sequence similarity to EC 1.7.1.7; gp24, a thymidine kinase, having sequence similarity to EC 2.7.1.21, PF00265; gp38, a deoxycytidylate deaminase, having sequence similarity to PF00383, EC 3.5.4.12; gp60, a dNMP
  • YS40 encodes most of the proteins required for its own replisome formation, namely gp27 and gp79, two helicases with DEAD signature in the Walker B motif; gpl4, replication initiation helicase DnaB; gp23, bacterial DnaG-family DNA primase; gp26, RecB family exonuclease; gp33, type A DNA polymerase; and gp65, a terminal protein that may be covalently attached to the 5' of YS40 genome DNA terminus.
  • YS40 also encodes two recombination proteins, gpl2, RecA/RadA recombinase; and gpl 14, recombination protein ERF.
  • gp33 contains conserved nucleotidyltransferase domain and 3 '-5' exonuclease domain. However, like the Klenow fragment of E. coli DNA polymerase I, gp33 lacks the N-terminal 5 '-3' exonuclease domain. Furthermore, in YS40 genome, there are no gene products with detectable sequence similarity to single-stranded DNA binding protein from any known class (Ponomarev VA, et al. MoI Microbiol Biotechnol. 2003), nor to any DNA ligases from other bacteria or bacteriophages.
  • the protein gp65 is of particular interest in understanding the replication mechanism of YS40. It shows striking sequence similarity to the C-terminal portion of the podovirus phi29 terminal protein (TP) that is essential for the protein-primed DNA replication of the linear phi29 genome.
  • TP podovirus phi29 terminal protein
  • the 5 '-terminal dAMP is linked via a phosphoester bond to the hydroxyl group of Ser 23 2 of the TP (Hermoso JM, et al 1985), and this Ser 232 is absolutely critical for the priming activity of TP (Garmendia C, et al. 1988; Garmendia C, et al. 1990).
  • the overall architecture of YS40 genome is unique, as compared to other sequenced phage genomes.
  • the tendency towards tight clustering of gene coding for virion component, that is so prevalent in lambdoid phages and T4-like phage groups, is hardly observed in YS40 genome.
  • gpl50 encoding a putative Myovirus-like baseplate assembly protein
  • gpl 52 that encodes a putative Myovirus-like wac fibritin neck whisker
  • these two structural genes are located far away from other recognized YS40 structural genes, such as the genes coding for gpl (distal tail fiber protein), gp3 (portal protein), gp62 (terminase large subunit) and gp69 (tail sheath protein).
  • YS40 is capable of withstanding temperature as high as 75°C in its Thermus host. Thus its molecular milieu is extremely resistant to elevated temperatures, such as those desirably employed in bioreactors, including PCR processes.
  • gpl 3 fragmentation protein ERF
  • ERF recombination protein
  • the best database match is from thermophilic microorganisms, including Thermotoga maritima, Thermoanaerobacter tengcongensis, and Methanocaldococcus jannaschii.
  • thermophilie-affiliated YS40 proteins may give us more clues on the survival strategy of this phage under the extreme temperature.
  • gp5 S- adenosylmethionine decarboxylase
  • adoMetDC S- adenosylmethionine decarboxylase
  • thermophilic adoMetDC on a thermophilic species-specific clade, including both bacterial and archaeal species, such as Aquifex aeolicus, Thermoplasma, Picrophilus and Pyrococcus, in the phylogenetic tree built on the basis of multiple sequence alignment of adoMetDC enzymes (data not shown) suggests that thermophilic adoMetDC enzymes are evolutionarily specialized, therefore may be important for the survival of thermophilic microorganisms in extreme high temperature.
  • thermophilic adoMetDC enzymes are evolutionarily specialized, therefore may be important for the survival of thermophilic microorganisms in extreme high temperature.
  • This example describes the ability of the phage protein YsI 8, encoded by SEQ ID NO: 18, to negatively regulate transcription initiation by binding RNA polymerase sigma factors from T. thermophilics (T. th) and E. coli. Binding experiments and transcription experiments have been used to determine the function of Ys 18.
  • YsI 8 The function of Ys 18 was analyzed by using a run-off transcription assay to determine if YsI 8 was involved in transcription (FIG. 1).
  • Reaction 2 in lO ⁇ l of transcription buffer, contained core enzyme and
  • Reaction 2 was incubated at 65°C (for T. th) and 37°C (for E. coli) for 10 minutes and then mixed together. For both reactions 1 and
  • binding mixtures were then added to Ni-NTA agarose beads, equilibrated in the binding
  • YsI 8HIS also bound to the RNA polymerase sigma factor from E. coli ( ⁇ 70 ). In the presence

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Zoology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biochemistry (AREA)
  • Wood Science & Technology (AREA)
  • Engineering & Computer Science (AREA)
  • Microbiology (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Biotechnology (AREA)
  • Virology (AREA)
  • Gastroenterology & Hepatology (AREA)
  • Biophysics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Enzymes And Modification Thereof (AREA)
  • Micro-Organisms Or Cultivation Processes Thereof (AREA)
  • Preparation Of Compounds By Using Micro-Organisms (AREA)
  • Saccharide Compounds (AREA)

Abstract

The disclosed invention relates to the fields of molecular biology and biochemistry. Thermophilic proteins and the nucleic acids encoding them are disclosed. The thermophilic proteins are from, or derived from, a bacteriophage, YS40, that infects the thermophilic bacterium Thermus thermophilus. These proteins have enhances stability, particularly at high temperatures.

Description

NOVEL THERMOPHILIC PROTEINS AND THE NUCLEIC ACIDS
ENCODING THEM
FIELD OF THE INVENTION
[01] The disclosed invention relates to the fields of molecular biology and biochemistry. Thermophilic proteins and the nucleic acids encoding them are disclosed. The thermophilic proteins are from, or derived from, a bacteriophage, YS40, that infects the thermophilic bacterium Thermus thermophilics. These proteins have enhanced stability, particularly at high temperatures.
BACKGROUND OF THE INVENTION
[02] In the last decade, bacteriophage (phage) genome sequencing projects have deposited more than 200 complete phage genome sequences in the public databases. While the hosts of these phages are phylogenetically quite diverse, only 10 completely sequenced phages are known to infect thermophilic microorganisms. Most of these thermophilic phages were isolated from a small number of archaeal species (Palm, P., et al. 1991; Wiedenheft, B., et al 2004; Arnold, H.P., et al. 2000). The only sequenced genome of a phage from a thermophilic bacterium is RM 378 that infects Rhodothermus marinus (Hjorleifsdottir, S., et al Patent: WO 0075335-A 14-DEC-2000). [03] Bacteriophages may be the most abundant living entities on Earth, represented by about 1031 individuals, as indicated by random sampling and sequencing of DNA from environmental sources (Hendrix RW. Bacteriophage genomics. Curr Opin Microbiol. 2003 Oct;6(5):506-511). It has been proposed that the origin of dsDNA bacteriophages is as ancient as DNA replication itself (Filee J, Forterre P5 Sen-Lin T, Laurent J. Evolution of DNA polymerase families: evidences for multiple gene exchange between cellular and viral proteins. J MoI Evol. 2002 Jun;54(6):763-773), and the analysis of the currently known bacteriophages may provide clues to early evolution of cellular and viral genomes. Phage genomes thus present a relatively unexplored source of genetic variation and enzymatic activities that may be of considerable commercial import. i SUMMARY OF THE INVENTION
[04] Bacteriophage YS40 infects the thermophilic bacterium Thermus thermophilics H8. Analysis of the YS40 genome revealed a dsDNA molecule of 152,372 bp with no terminal repeats or redundancies that contains 169 putative open reading frames, which express polypeptides longer than 50 amino acids, and three tRNA genes. The ability of YS40 to infect and propagate in 7! thermophilus at permissive temperatures from about 56 to about 78°C suggests that proteins encoded in the YS40 genome may have enhanced stability, particularly at higher temperatures. In addition to greater stability, proteins of YS40 may also possess novel enzymatic characteristics with commercial applicability.
[05] Accordingly, the present invention provides isolated proteins that include a thermophilic amino acid sequence at least 75%, or even at least 80%, 85%, 90%, 95%, 96%, 97%, 98% or even at least 99% identical to a YS40 amino acid sequence encoded by at least 25, or at least 30, 35, 40, 45, 50, 60, 70, 75, 80, 90, or even at least 100 contiguous codons from SEQ ID NO: 1-170; preferably from SEQ ID NO: 2, 4-64, 70-149, 151 or 153-170, or conservatively modified variants of the same.
[06] In certain aspects, the thermophilic amino acid sequence is identical to the YS40 amino acid sequence. In some aspects, the YS40 coding sequence is to a YS40 structural protein expressed by a nucleotide sequence selected from SEQ ID NOs: 1, 3, 65, 69, 71, 151 or 152. The thermophilic amino acid sequence confers to the protein, at a permissible temperature of at least 360C, more preferably at least 450C, 550C, 650C, or even 750C, an enzymatic activity. Exemplary enzymatic activities of proteins of the present invention include, but are not limited to, decarboxylase, nuclease, synthase, recombinase, helicase, dehydrogenase, reductase, nucleotide primase, kinase, protease, nucleotidyltransferase, nucleic acid polymerase, deaminase, acyltransferase, terminase, helicase, glycosyltransferase and peptidase activities. For enzymes, the
YS40 coding sequence is selected from SEQ ID NOs: 5, 8, 9, 12-14, 17, 18, 23-27, 29, 33, 38, 41, 42, 52, 57, 59, 60, 62, 71, 79, 114, 144 or 161. Where the enzyme is a DNA polymerase, the YS40 coding sequence preferably is SEQ ID NO: 33.
[07] The invention also contemplates nucleic acid embodiments where the nucleic acids encode proteins of the invention. It is desirable that the encoded thermophilic amino acid sequence of the encoded protein is identical to the YS40 amino acid sequence. The YS40 amino acid sequence may be encoded by at least about 25, 50, 75 or 100 contiguous codons of the YS40 coding sequence, with the YS40 coding sequence being from SEQ ID NO: 1, 3, 65, 69, 71, 151 or 152. In some aspects the thermophilic amino acid sequence confers an enzyme activity to the encoded protein at a permissible temperature of at least 360C, more preferably at least 450C, 550C, 650C, most preferably at least 750C. The enzyme activity may be a decarboxylase, nuclease, synthase, recombinase, helicase, dehydrogenase, reductase, nucleotide primase, kinase, protease, nucleotidyltransferase, nucleic acid polymerase, deaminase, acyltransferase, terminase, helicase, glycosyltransferase or peptidase, depending upon the YS40 coding sequence selected. The YS40 coding sequence may be from SEQ ID NO: 2, 4-64, 70-149, 151 or 153-170, more preferably from SEQ ID NO: 5, 8, 9, 12-14, 17, 18, 23-27, 29, 33, 38, 41, 42, 52, 57, 59, 60, 62, 71, 79, 114, 144 or 161. Ideally, the protein encoded by the nucleic acid is a DNA polymerase at a permissive temperature, and the YS40 coding sequence selected is SEQ ID NO: 33. [08] Nucleic acids of the present invention may include a YS40 nucleotide from SEQ ID NO: 1-170. The YS40 nucleotide sequence may encode a YS40 structural protein that does not take a random coil structure at a permissible temperature of at least 360C, 450C, 550C, 650C, or even at least 750C, or a YS40 enzyme. YS40 enzymes may display decarboxylase, nuclease, synthase, recombinase, helicase, dehydrogenase, reductase, nucleotide primase, kinase, protease, nucleotidyltransferase, nucleic acid polymerase, deaminase, acyltransferase, terminase, helicase, glycosyltransferase or peptidase activities when analyzed, at a permissible temperature of at least 360C, 450C, 550C, 650C, or even at least 750C. Such nucleic acids may optionally be operably linked to a regulatory sequence. Such nucleic acids may also be used to transform a cell, and such recombinant cell types form part of the present invention.
[09] Other embodiments of the invention include recombinant vectors. The recombinant vectors include a nucleic acid encoding a protein of the invention, as discussed above, operably linked to a promoter. Introduction of the vector into an expression system produces a protein having, at a permissible temperature of at least 360C, or even at least 450C, 550C, 650C, or even at least 750C, and enzymatic activity. These proteins may be characterized as being decarboxylase, nuclease, synthase, recombinase, helicase, dehydrogenase, reductase, nucleotide primase, kinase, protease, nucleotidyltransferase, nucleic acid polymerase, deaminase, acyltransferase, terminase, helicase, glycosyltransferase, peptidase, or combinations thereof. The promoter is preferably inducible or constitutive, and ideally is a strong promoter. In some embodiments the YS40 coding sequence is selected from SEQ ID NO: 2, 4-64, 70-149, 151 or 153-170, or from SEQ ID NO: 5, 8, 9, 12-14, 17, 18, 23-27, 29, 33, 38, 41, 42, 52, 57, 59, 60, 62, 71, 79, 114, 144 or 161, or the enzymatic activity is a DNA polymerase at the permissible temperature and the YS40 coding sequence is SEQ ID NO: 33.
[10] The present invention also includes protein expression systems. These embodiments include a recombinant vector, as discussed immediately above, and produce the recombinant protein encoded by the vector when incubated under permissible conditions, including a permissible temperature. Protein expression systems of the present invention may be cell-based, or cell-free in nature.
[11] A further embodiment of the present invention are vectors that include no more than about 99.9% of the nucleotide sequence of SEQ ID NO: 171 and a non-YS40 nucleotide sequence of at least 20 contiguous nucleotides. The non-YS40 nucleotide sequence is inserted into the vector sequence whereby it is flanked on the 3' end and the 5 'end by at least 10 contiguous nucleotides of YS40 genomic sequence. The non-YS40 nucleotide sequence may optionally be operably linked to a regulatory sequence, such as a promoter. Recombinant cell systems incorporating such a vector are also contemplated as part of the present invention. Preferably the cells in such recombinant systems are Thermus thermophilics transformed with the vector. [12] The present invention also includes method embodiments for amplifying nucleic acids. These methods involve contacting a nucleic acid with a PCR reagent mixture including a protein of the present invention, as described above, where the protein has an enzymatic activity necessary for DNA amplification when incubated at a permissive temperature under permissible conditions. Variants on these embodiments include amplification methods where whole cells are the starting material. In these variants, the reaction mix includes at least one protein of the present invention that possesses an enzymatic activity that facilitates entry of PCR reagents into the cell. Such enzymes usually lyse the cells, but do not have to, in order to form part of the present invention. [13] Method embodiments for decomposing a biodegradable material are also contemplated. These methods involve contacting the biodegradable material with at least one protein of the present invention as described above, that has an enzymatic activity necessary for decomposing the biodegradable material when incubated at a permissible temperature. Exemplary enzymatic activities suitable for this purpose include, but are not limited to, amylase, cellulase, nuclease, lipase, deaminase and peptidase.
[14] Finally, the invention also includes kit that are suitable for amplifying a nucleic acid. The kits include a reagent that has at least one protein of the present invention as described above and a buffer solution. Proteins of the invention suitable for inclusion in kit embodiments have an enzymatic activity necessary for DNA amplification or DNA entry into the cell when incubated at a permissible temperature. Kits may optionally include primers suitable for hybridization with the nucleic acid being amplified, and/or control nucleic acids and primers for quantifying the reaction. DEFINITIONS
[15] Unless defined otherwise, all technical and scientific terms used herein have the meaning commonly understood by a person skilled in the art to which this invention belongs. The following references provide one of skill with a general definition of many of the terms used in this invention: Singleton et ah, Dictionary of Microbiology and Molecular Biology (2nd ed. 1994); The Cambridge Dictionary of Science and Technology (Walker ed., 1988); The Glossary of Genetics, 5th Ed., R. Rieger et al. (eds.), Springer Verlag (1991); and Hale & Marham, The Harper Collins Dictionary of Biology (1991). As used herein, the following terms have the meanings ascribed to them unless specified otherwise.
[16] "Open Reading Frame", or "ORF," refers to a series of at least 25 contiguous codons, preferably beginning with the codon "ATG."
[17] "Permissible temperature" refers to a temperature at which a cell may grow and divide, or a protein is capable of retaining its tertiary structure and any innate enzyme activity, or enzymatic activity, the molecule may possess.
[18] "Modulate" refers to the property of being able to quantitatively increase or decrease one or more chemical or physical characteristics of a molecule or process by at least 10% of the initial baseline characteristic in response to an environmental or metabolic change. Modulate may also refer to the ability to qualitatively alter a chemical or physical characteristic of a molecule or process in response to an environmental or metabolic change. Methods for determining modulation of chemical or physical characteristics of a molecule are well known in the art and include, but are not limited to, enzyme assays and spectroscopic analysis. [19] The terms "enzyme activity" and "enzymatic activity" are used interchangeably herein. [20] A reference to "displaying (an enzyme activity or enzymatic activity)" refers to a molecular characteristic where a biomolecule such as a protein or nucleic acid catalyzes a chemical reaction. Exemplary enzyme or enzymatic activities displayed by YS40 proteins include, but are not limited to, decarboxylase, nuclease, synthase, recombinase, helicase, dehydrogenase, reductase, nucleotide primase, kinase, protease, nucleotidyltransferase, nucleic acid polymerase, deaminase, acyltransferase, terminase, helicase, glycosyltransferase and peptidase.
[21] The terms "polypeptide," "peptide" and "protein" are used interchangeably herein to refer to a polymer of amino acid residues. The terms apply to amino acid polymers in which one or more amino acid residue is an artificial chemical mimetic of a corresponding naturally occurring amino acid, as well as to naturally occurring amino acid polymers, those containing modified , residues, and non-naturally occurring amino acid polymer.
[22] The term "amino acid" refers to naturally occurring and synthetic amino acids, as well as amino acid analogs and amino acid mimetics that function similarly to the naturally occurring amino acids. Naturally occurring amino acids are those encoded by the genetic code, as well as those amino acids that are later modified, e.g., hydroxyproline, γ-carboxyglutamate, and O- phosphoserine. Amino acid analogs refers to compounds that have the same basic chemical structure as a naturally occurring amino acid, e.g., an α carbon that is bound to a hydrogen, a carboxyl group, an amino group, and an R group, e.g., homoserine, norleucine, methionine sulfoxide, methionine methyl sulfonium. Such analogs may have modified R groups (e.g., norleucine) or modified peptide backbones, but retain the same basic chemical structure as a naturally occurring amino acid. Amino acid mimetics refers to chemical compounds that have a structure that is different from the general chemical structure of an amino acid, but that functions similarly to a naturally occurring amino acid.
[23] "Conservatively modified variants" applies to both amino acid and nucleic acid sequences. With respect to particular nucleic acid sequences, conservatively modified variants refers to those nucleic acids which encode identical or essentially identical amino acid sequences, or where the nucleic acid does not encode an amino acid sequence, to essentially identical or associated, e.g., naturally contiguous, sequences. Because of the degeneracy of the genetic code, a large number of functionally identical nucleic acids encode most proteins. For instance, the codons GCA, GCC, GCG and GCU all encode the amino acid alanine. Thus, at every position where an alanine is specified by a codon, the codon can be altered to another of the corresponding codons described without altering the encoded polypeptide. Such nucleic acid variations are "silent variations," which are one species of conservatively modified variations. Every nucleic acid sequence herein which encodes a polypeptide also describes silent variations of the nucleic acid. One of skill in the art will recognize that in certain contexts each codon in a nucleic acid (except AUG, which is ordinarily the only codon for methionine, and TGG, which is ordinarily the only codon for tryptophan) can be modified to yield a functionally identical molecule. Accordingly, often silent variations of a nucleic acid that encodes a polypeptide is implicit in a described sequence with respect to the expression product, but not with respect to actual probe sequences. [24] As to amino acid sequences, one of skill will recognize that individual substitutions, deletions or additions to a nucleic acid, peptide, polypeptide, or protein sequence which alters, adds or deletes a single amino acid or a small percentage of amino acids in the encoded sequence is a "conservatively modified variant" where the alteration results in the substitution of an amino acid with a chemically similar amino acid. Conservative substitution tables providing functionally similar amino acids are well known in the art. Such conservatively modified variants are in addition to and do not exclude polymorphic variants, interspecies homologs, and alleles of the invention. Typically conservative substitutions for one another: 1) Alanine (A), Glycine (G); 2) Aspartic acid (D), Glutamic acid (E); 3) Asparagine (N), Glutamine (Q); 4) Arginine (R)5 Lysine (K); 5) Isoleucine (I)5 Leucine (L), Methionine (M)5 Valine (V); 6) Phenylalanine (F)5 Tyrosine (Y)5 Tryptophan (W); 7) Serine (S), Threonine (T); and 8) Cysteine (C), Methionine (M) (see, e.g., Creighton, Proteins (1984)).
[25] "Homologous," -in relation to two or more peptides, refers to two or more sequences or subsequences that have a specified percentage of amino acid residues that are the same (i.e., about 60% identity, preferably about 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or higher identity over a specified region, when compared and aligned for maximum correspondence over a comparison window or designated region) as measured using a BLAST or BLAST 2.0 sequence comparison algorithms with default parameters described below, or by manual alignment and visual inspection (see, e.g., NCBI web site http://www.ncbi.nlm.nih.gov/BLAST/ or the like). The definition also includes sequences that have deletions and/or additions, as well as those that have substitutions, as well as naturally occurring, e.g., polymorphic or allelic variants, and man-made variants. As described below, the preferred algorithms can account for gaps and the like. Preferably, identity exists over a region that is at least about 25 amino acids in length, or more preferably over a region that is 50-100 amino acids in length. "Identical" may be used interchangably with "homologous". [26] For sequence comparison, typically one sequence acts as a reference sequence, to which test sequences are compared. When using a sequence comparison algorithm, test and reference sequences are entered into a computer, subsequence coordinates are designated, if necessary, and sequence algorithm program parameters are designated. Preferably, default program parameters can be used, or alternative parameters can be designated. The sequence comparison algorithm then calculates the percent sequence identities for the test sequences relative to the reference sequence, based on the program parameters.
[27] A "comparison window", as used herein, includes reference to a segment of one of the number of contiguous positions selected from the group consisting typically of from about 20 to 600, usually about 50 to about 200, more usually about 100 to about 150 in which a sequence may be compared to a reference sequence of the same number of contiguous positions after the two sequences are optimally aligned. Methods of alignment of sequences for comparison are well- known in the art. Optimal alignment of sequences for comparison can be conducted, e.g., by the local homology algorithm of Smith & Waterman, Adv. Appl. Math. 2:482 (1981), by the homology alignment algorithm of Needleman & Wunsch, J. MoI. Biol. 48:443 (1970), by the search for similarity method of Pearson & Lipman, Proc. Nat'l. Acad. Sci. USA 85:2444 (1988), by computerized implementations of these algorithms (GAP, BESTFIT5 FASTA, and TFASTA in the Wisconsin Genetics Software Package, Genetics Computer Group, 575 Science Dr., Madison, WI), or by manual alignment and visual inspection (see, e.g., Current Protocols in Molecular Biology (Ausubel et al., eds. 1995 supplement)).
[28] Examples of algorithms that are suitable for determining percent sequence identity and sequence similarity include the BLAST and BLAST 2.0 algorithms, which are described in Altschul et al, Nuc. Acids Res. 25:3389-3402 (1977) and Altschul et al, J. MoI. Biol. 215:403-410 (1990). BLAST and BLAST 2.0 are used, with the parameters described herein, to determine percent sequence identity for the nucleic acids and proteins of the invention. Software for performing BLAST analyses is publicly available through the National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov/). This algorithm involves first identifying high scoring sequence pairs (HSPs) by identifying short words of length W in the query sequence, which either match or satisfy some positive-valued threshold score T when aligned with a word of the same length in a database sequence. T is referred to as the neighborhood word score threshold (Altschul et ah, supra). These initial neighborhood word hits act as seeds for initiating searches to find longer HSPs containing them. The word hits are extended in both directions along each sequence for as far as the cumulative alignment score can be increased. Cumulative scores are calculated using, e.g., for nucleotide sequences, the parameters M (reward score for a pair of matching residues; always > 0) and N (penalty score for mismatching residues; always < 0). For amino acid sequences, a scoring matrix is used to calculate the cumulative score. Extension of the word hits in each direction are halted when: the cumulative alignment score falls off by the quantity X from its maximum achieved value; the cumulative score goes to zero or below, due to the accumulation of one or more negative-scoring residue alignments; or the end of either sequence is reached. The BLAST algorithm parameters W, T, and X determine the sensitivity and speed of the alignment. The BLASTN program (for nucleotide sequences) uses as defaults a wordlength (W) of 11, an expectation (E) of 10, M=5, N=-4 and a comparison of both strands. For amino acid sequences, the BLASTP program uses as defaults a wordlength of 3, and expectation (E) of 10, and the BLOSUM62 scoring matrix (see Henikoff & Henikoff, Proc. Natl. Acad. Sci. USA 89:10915 (1989)) alignments (B) of 50, expectation (E) of 10, M=5, N=-4, and a comparison of both strands. [29] The BLAST algorithm also performs a statistical analysis of the similarity between two sequences (see, e.g., Karlin & Altschul, Proc. Nat'l. Acad. Sci. USA 90:5873-5787 (1993)). One measure of similarity provided by the BLAST algorithm is the smallest sum probability (P(N)), which provides an indication of the probability by which a match between two nucleotide or amino acid sequences would occur by chance. For example, a peptide is considered similar to a reference sequence if the smallest sum probability in a comparison of the test peptide to the reference peptide is less than about 0.2, more preferably less than about 0.01, and most preferably less than about 0.001. Log values may be large negative numbers, e.g., 5, 10, 20, 30, 40, 40, 70, 90, 110, 150, 170, etc.
[30] The terms "sequence similarity", "sequence identity", or "percent identity" in the context of two or more nucleic acids or polypeptide sequences, refer to two or more sequences or subsequences that are, when optimally aligned with appropriate nucleotide insertions or deletions, the same or have a specified percentage of amino acid residues or nucleotides that are the same (i.e., 50% identity, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or higher identity to an amino acid sequence such as SEQ ID NO:2, or a nucleotide sequence such as SEQ ID NO:1), when compared and aligned for maximum correspondence over a comparison window, or designated region as measured using one of the following sequence comparison algorithms or by manual alignment and visual inspection. This definition also refers to the compliment of a test sequence. Preferably, the identity exists over a region that is at least about 25 amino acids or nucleotides in length, or or even over a region that is about 50-100 amino acids or nucleotides in length. These relationships hold, notwithstanding evolutionary origin (Reeck et al, Cell, 50:667 (1987)). When the sequence identity of a pair of polynucleotides or polypeptides is greater or equal to 65%, the sequences are said to be "substantially identical." [31] Alternatively, substantial identity will exist when a nucleic acid will hybridize under selective hybridization conditions, to a strand or its complement. Typically, selective hybridization will occur when there is at least about 55% homology over a stretch of at least about 14 nucleotides, more typically at least about 65%, 75%, 85% and or even at least about 90%. See, Kanehisa, Nuc. Acids Res., 12:203-213 (1984), which is incorporated herein by reference. The length of homology comparison, as described, may be over longer stretches, and in certain embodiments will be over a stretch of at least about 17 nucleotides, generally at least about 20 nucleotides, ordinarily at least about 24 nucleotides, usually at least about 28 nucleotides, typically at least about 32 nucleotides, more typically at least about 40 nucleotides, 50 nucleotides, and or even at least about 75 to 100 or more nucleotides.
[32] Macromolecular structures such as polypeptide structures can be described in terms of various levels of organization. For a general discussion of this organization, see, e.g., Alberts et al, Molecular Biology of the Cell (3 rd ed., 1994) and Cantor and Schimmel, Biophysical Chemistry Parti: The Conformation of Biological Macromolecules (1980). "Primary structure" refers to the amino acid sequence of a particular peptide. "Secondary structure" refers to locally ordered, three- dimensional structures within a polypeptide. These structures are commonly known as domains. Domains are portions of a polypeptide that form a compact unit of the polypeptide and are typically about 5 to 350 amino acids long. Typical domains are made up of organized sections of
peptide such as stretches of β strands (that can interact to form β sheets) and α helices. "Tertiary
structure" refers to the complete three-dimensional structure of a polypeptide monomer.
"Quaternary structure" refers to the three dimensional structure formed by the non-covalent association of independent tertiary units. A "random coil structure," when referring to the structure of a protein or peptide indicates a lack of higher level (secondary or tertiary) structure, or a
relatively disorganized structural sequence between secondary structural motifs, such as β-sheets
and α-helices.
[33] A "label" or a "detectable moiety" is a composition detectable by spectroscopic, photochemical, biochemical, immunochemical, chemical, or other physical means. For example, useful labels include fluorescent dyes, electron-dense reagents, enzymes (e.g., as commonly used in an ELISA), biotin, digoxigenin, or haptens and proteins or other entities which can be made detectable, e.g., by incorporating a radiolabel into the peptide or used to detect antibodies specifically reactive with the peptide. The radioisotope may be, for example, 3H, 14C, 32P, 35S, or 125I. The labels may be incorporated into the antibodies at any position. Any method known in the art for conjugating the antibody to the label may be employed, including those methods described by Hunter et al, Nature, 144:945 (1962); David et al, Biochemistry, 13:1014 (1974); Pain et al, J. Immunol Meth., 40:219 (1981); and Nygren, J. Histochem. and Cytochem. , 30:407 (1982). The lifetime of radiolabeled peptides or radiolabeled antibody compositions may extended by the addition of substances that stablize the radiolabeled peptide or antibody and protect it from degradation. Any substance or combination of substances that stablize the radiolabeled peptide or antibody may be used including those substances disclosed in U.S. Patent No. 5,961,955. [34] The term "recombinant" when used with reference, e.g., to a cell, or nucleic acid, protein, or vector, indicates that the cell, nucleic acid, protein or vector, has been modified by the introduction of a heterologous nucleic acid or protein or the alteration of a native nucleic acid or protein, or that the cell is derived from a cell so modified. Thus, e.g., recombinant cells express genes that are not found within the native (non-recombinant) form of the cell or express native genes that are otherwise abnormally expressed, under expressed or not expressed at all. By the term "recombinant nucleic acid" herein is meant nucleic acid, originally formed in vitro, in general, by the manipulation of nucleic acid, e.g., using polymerases and endonucleases, in a form not normally found in nature. In this manner, operably linkage of different sequences is achieved. Thus an isolated nucleic acid, in a linear form, or an expression vector formed in vitro by ligating DNA molecules that are not normally joined, are both considered recombinant for the purposes of this invention. It is understood that once a recombinant nucleic acid is made and reintroduced into a host cell or organism, it will replicate non-recombinantly, i.e., using the in vivo cellular machinery of the host cell rather than in vitro manipulations; however, such nucleic acids, once produced recombinantly, although subsequently replicated non-recombinantly, are still considered recombinant for the purposes of the invention. Similarly, a "recombinant protein" is a protein made using recombinant techniques, i.e., through the expression of a recombinant nucleic acid as depicted above.
[35] The term "operably linked" refers to a linkage of polynucleotide elements in a functional relationship. With regard to the present invention, the term "operably linked" refers to a functional linkage between a nucleic acid expression control sequence (such as a promoter, or an array of transcription factor binding sites) and a second nucleic acid sequence, wherein the expression control sequence directs transcription of the nucleic acid corresponding to the second sequence. Thus, a nucleic acid is "operably linked" when it is placed into a functional relationship with another nucleic acid sequence.
[36] The term "amino acid" refers to naturally occurring and synthetic amino acids, as well as amino acid analogs and amino acid mimetics that function in a manner similar to the naturally occurring amino acids. Naturally occurring amino acids are those encoded by the genetic code, as
well as those amino acids that are later modified, e.g., hydroxyproline, γ-carboxyglutamate, and o-
phosphoserine. "Amino acid analog" refers to compounds that have the same basic chemical structure as a naturally occurring amino acid, i.e., a carbon that is bound to a hydrogen, a carboxyl group, an amino group, and an R group, e.g., homoserine, norleucine, methionine sulfoxide, methionine methyl sulfonium. Such analogs have modified R groups (e.g., norleucine) or modified peptide backbones, but retain the same basic chemical structure as a naturally occurring amino acid. Amino acid mimetics refers to chemical compounds that have a structure that is different from the general chemical structure of an amino acid, but that function in a manner similar to a naturally occurring amino acid.
[37] Amino acids may be referred to herein by either their commonly known three letter symbols or by the one-letter symbols recommended by the IUPAC-IUB Biochemical Nomenclature Commission. Nucleotides, likewise, may be referred to by their commonly accepted single-letter codes.
[38] The term "amino acid sequence" refers to the positional relationship of amino acid residues as they exist in a given polypeptide or protein.
[39] The term "coding sequence", in relation to nucleic acid sequences, refers to a plurality of contiguous sets of three nucleotides, termed codons, each codon corresponding to an amino acid as translated by biochemical factors according to the universal genetic code, the entire sequence coding for an expressed protein, or an antisense strand that inhibits expression of a protein. A "genetic coding sequence" is a coding sequence where the contiguous codons are intermittently interrupted by non-coding intervening sequences, or "introns." During mRNA processing intron sequences are removed, restoring the contiguous codon sequence encoding the protein or anti- sense strand.
[40] The term "contiguous" in the context of polynucleotide or polypeptide sequences, refers to an uninterrupted sequence of bases or amino acids, each base or amino acid being immediately adjacent to its neighbors in the sequence.
[41] The terms "expression vector" and "expression cassette" include any type of genetic construct containing a nucleic acid capable of being transcribed in a cell. The expression vectors of the invention generally supply sequence elements directing translation of the coding sequence into a protein of the present invention, as provided by the invention itself, although vectors used for the amplification of nucleotide sequences (both coding and non-coding) are also encompassed by the definition. Li addition to the coding sequence, expression vectors will generally include restriction enzyme cleavage sites and the other initial, terminal and intermediate DNA sequences that are usually employed in vectors to facilitate their construction and use. The expression vector can be part of a plasmid, virus, or nucleic acid fragment.
[42] The term "fusion gene" refers to the combination of one or more heterologous coding sequences joined in frame to form a single translational/transcriptional unit. Typically the heterologous coding sequences are joined end-to-end. The definition however includes fusion genes where one sequence, or fragment thereof, intervenes in another heterologous sequence. [43] The term "heterologous" when used with reference to portions of a nucleic acid or protein indicates that the molecule comprises two or more subsequences that are not found in the same relationship to each other in nature. For instance, a heterologous nucleic acid is typically recombinantly produced, having two or more sequences from unrelated genes arranged to make a new functional nucleic acid, e.g., a promoter from one source and a coding region from another source. Similarly, a heterologous protein indicates that the protein comprises two or more subsequences that are not found in the same relationship to each other in nature (e.g., a fusion protein).
[44] The terms "primers" or "primer pairs" refer to oligonucleotide probes capable of recognizing and hybridizing to specific nucleotide sequences found in a target gene or sequence to be amplified by polymerase chain reaction (PCR). The degree of complementarity required between the primers and the target sequence determines the specificity, or stringency of conditions
required for hybridization of the sequences. A temperature of about 360C is typical for low
stringency amplification, although annealing temperatures may vary between about 32°C and about
480C depending on primer length. For high stringency PCR amplification, a temperature of about 62°C is typical, although high stringency annealing temperatures can range from about 50°C to
about 650C5 depending on the primer length and specificity. Typical cycle conditions for both high
and low stringency amplifications include a denaturation phase of about 90°C - 95°C for 30 sec - 2
min., an annealing phase lasting 30 sec. - 2 min., and an extension phase of about 72°C for 1 - 2
min. Protocols and guidelines for low and high stringency amplification reactions are provided, e.g., in Innis et ah, PCR Protocols, A Guide to Methods and Applications, Academic Press, Inc. N.Y. (1990)).
[45] The term "Regulatory sequences" refers to those sequences, both 5' and 3' to a structural gene, that are required for the transcription and translation of the structural gene in the target host organism. Regulatory sequences include a promoter, ribosome binding site, optional inducible elements and sequence elements required for efficient 3' processing, including polyadenylation. When the structural gene has been isolated from genomic DNA, regulatory sequences also include those intronic sequences required to remove of the introns as part of mRNA formation in the target host.
[46] The term "recombinant" when used with reference, e.g., to a cell, or nucleic acid, protein, or vector, indicates that the cell, nucleic acid, protein or vector, has been modified by the introduction of a heterologous nucleic acid or protein, or the alteration of a native nucleic acid or protein, or that the cell is derived from a cell so modified. Thus, for example, recombinant cells express genes that are not found within the native (non-recombinant) form of the cell or express native genes that are otherwise abnormally expressed, under-expressed or not expressed at all. [47] DNA regions are "operably linked" when they are functionally related to each other. For example, DNA for a signal peptide (secretory leader) is operably linked to DNA for a polypeptide if it is expressed as a precursor which participates in the secretion of the polypeptide; a promoter is operably linked to a coding sequence if it controls the transcription of the sequence; or a ribosome- binding site is operably linked to a coding sequence if it is positioned so as to permit translation. Generally, operably linked means contiguous and in reading frame.
BRIEF DESCRIPTION OF DRAWINGS
[48] The following drawings form part of the present specification and are included to further demonstrate certain aspects of the present invention. The invention may be better understood by reference to one or more of these drawings in combination with the detailed description of specific embodiments presented herein.
[49] FIG. 1 shows single and multiround transcription at the T7Al-tR' and galP-tR' promoters. Two transcriptionally competent open model promoters from different classes, -10/-35 class (T7A1) and extended -10 class (galPl), attached to a rho-independent terminator were used to
qualitatively visualize efficiency of run-off transcription. Reaction 1, in 20μl of transcription
buffer (3OmM tris-HCl, pH 8.0, 1OmM MgC12, 4OmM KCl, 1 mM β-mercaptoethanol), contained
core enzyme, sigma (σ) and Ysl8. Reaction 1 was incubated at 65°C (for T.th) and 370C (for E.
coli) for 10 minutes, followed by the addition of promoter DNA fragments. Reaction 2, in lOμl of
transcription buffer, contained core enzyme and sigma, and, in parallel, Ys 18 and a promoter DNA fragment. Reaction 2 was incubated at 650C (for T. th) and 370C (for E. coli) for 10 minutes and then mixed together. For both reactions 1 and 2, after 10 minutes of incubation at the same
temperatures, 200 μM ATP, CTP, UTP, 20μM GTP and 10 μCi of [α-32P] GTP were added, the
reactions were incubated for the next 10 minutes, and terminated by an equal volume of 9 M urea loading buffer. In E. coli, if during the transcription reaction heparin is not added with the nucleotides, the RNA polymerase is able to initiate transcription from the promoter repeatedly (multi-round transcription). If heparin is added, the RNA polymerase is only able to transcribe once (single-round transcription). The open complexes formed by T. th RNA polymerase at the promoters used are very sensitive to heparin so single-round transcription cannot be performed. [50] FIG. IA shows multi-round transcriptional inhibition by YsI 8 of both T7Al-tR' and
galP-tR' with the T. th RNA polymerase core and σA. With both promoters, increasing amounts of
Ys 18 (triangle) repressed transcription in a dose dependent manner (run-off bands) when the components were mixed as in Reaction 1 (order 1). When the components were mixed as in Reaction 2, YsI 8 did not significantly affect transcription of promoter T7Al-tR', but repressed transcription of promoter galP-tR'. The similar change in band intensity of run-off bands and terminator tR' bands indicated that Ys 18 did not affect elongation or termination. [51] FIG. IB shows single-round (+ heparin) and multi-round (- heparin) transcriptional
inhibition by Ys 18 of both T7Al-tR' and galP-tR' with the E, coli RNA polymerase core and σ70.
With both promoters, increasing amounts of Ys 18 (triangle) repressed transcription in a dose dependent manner (run-off bands). In the presence of heparin, when the components were mixed as in Reaction 1 (order 1), Ys 18 was not as active in repressing transcription as when the components were mixed as in Reaction 2 (order2). In the absence of heparin, YsI 8 was more active in repressing transcription when the components were mixed as in Reaction 2 (order 2) with the T7Al-tR' promoter. There was only a slight difference in repressing activity of Ys 18 in the absence of heparin between the two reaction mixtures. The similar change in band intensity of run-off bands and terminator tR' bands indicated that YsI 8 did not affect elongation or termination.
[52] FIG. 1C shows the relative transcriptional activity of the run-off assay in graphical form. Transcriptional activity without YsI 8 present is 100% (dark bar). The addition of YsI 8 in incremental amounts represses transcriptional activity in a dose dependent manner (light bar). The amount of transcriptional repression by Ys 18 presences differs with amount of Ys 18 added, promoter type, RNA polymerase core, reaction mixture and presence or absence of heparin. [53] FIG. 2 shows native binding experiments with histidine-tagged phage protein YsI 8 and primary sigma factors from T. thermophilics and E. coli. Reactions, containing corresponding proteins in 20 μl of binding buffer (20 niM tris HCl, ρH8.0, 0.5 M NaCl5 2 mM imidazole, 5% v/v
glycerol), were preincubated for 10' at 65°C (for T.th. σA) and 37°C (E. coli σ70). The binding mixtures were then added to Ni-NTA agarose beads equilibrated in the binding buffer. Reactions were incubated for 10' at room temperature. The agarose beads were pelleted by quick centrifugation and the unbound proteins were withdrawn. The beads were washed 3 times with the binding buffer containing 20 mM imidazole, and the bound proteins were eluted with the binding buffer containing 200 mM imidazole. Fractions were resolved by SDS-PAGE and stained by Coomassie (L = proteins loaded, U = proteins unbound, B = proteins bound to Ni-NTA agarose).
[54] FIG. 2 A shows YS18HIS bound to the primary sigma factor from T. thermophilus (σΛ).
With both YS18HIS and σA present in the sample (+ lane, L), σA was detected in the unbound (+
lane, U) and the bound fractions (+ lane, B). In the absence of YsI 8HIS (- lane, L), σA was
exclusively observed in the unbound fraction (- lane, U). σA cannot bind to the Ni-NTA agarose
beads without YS18HIS, indicating Ys 18ms was capable of binding to σA.
[55] FIG. 2B shows YsI 8HIS bound to the primary sigma factor from E. coli (σ70). When both
YS18HIS and σ70 were present in the sample (+ lane, L), σ70 was detected in the unbound (+ lane, U)
and the bound fractions (+ lane, B). In the absence of YS18HIS (- lane, L), σ70 was exclusively
observed in the unbound fraction (- lane, U). σ70 cannot bind to the Ni-NTA agarose beads
without Ys 18HIS, indicating Yslδms was capable of binding to σ70.
[56] FIG. 2C shows YsI 8His bound to the primary sigma factor from E. coli lacking region 4
70 1-549)- When both YS18HIS and σ7V549 were present in the sample (+ lane, L), σ70 1-549 was detected in the unbound (+ lane, U) and the bound fractions (+ lane, B). In the absence of Ysl8His
(- lane, L), σ70 1-549 was exclusively observed in the unbound fraction (- lane, U). σ7Y549 cannot
bind to the Ni-NTA agarose beads without YsI 8His, indicating YsI 8HIS was capable of binding to σ70 1-549 in a region other than region 4. DETAILED DESCRIPTION
I. Introduction
[57] The present invention provides novel proteins from the bacteriophage YS40. These novel proteins retain their functionality at mesophilic or thermophilic temperatures, and consequently allow biosynthetic and/or biodegradative processes to proceed at higher temperatures. [58] The YS40 bacteriophage infects Thermus thermophilics HB8, and grows over the temperature range of about 56 to about 780C. The bacteriophage has a large genome (165 Kbp, ~150 genes) containing multiple DNA polymerase genes. The phage reproduces above 7O0C, and the thermophilic enzymes have an extrinsic structural stability. Most of the YS40 proteins have a strong similarity to prokaryotic enzymes, including the length of their amino acid sequences, and the potential to encode most of the proteins required for its own replisome. [59] YS40 encodes its own A-type DNA polymerase (encoded by SEQ ID NO:134), which has a conserved region in its C-terminus including 3 motifs with invariant residues ranging from amino acid residues 825-1102. Like the Klenlow fragment from E. coli DNA pol I, the YS40 A-type DNA polymerase has no N-terminal 5 '-3' exonuclease domain. Other proteins encoded by YS40 include gpl66 (encoded by SEQ ID NO: 166), which is similar to podovirus phi 29 terminal protein. gρl66 may be involved in protein-primed DNA replication of the linear phi 29 genome (linked to 5 'ends of both strands via phospodiester bonds). gplO6 (encoded by SEQ ID NO: 106) is an S-adenosylmethionine decarboxylase (key enzyme in biosynthesis of spermidine and spermine.).
[60] Thus, the molecules of this invention may find utility in a wide variety of applications including, but not limited to, synthetic nucleic acid synthesis, biodegradative processes and other applications requiring resilient molecules capable of retaining their integrity, including enzyme activity when present, at higher temperatures such as least about 360C, or even at least about 450C, 550C, 650C, or even about 750C. The following sections detail embodiments of the present invention, and how they may be used in biometabolic reactions. π. Identifying Open reading Frames
[61] Nucleic acids encoding proteins and peptides of the present invention may be identified by screening the YS40 genomic sequence for open reading frames (ORFs) using any method known in the art. Using these methods, nucleic acid coding sequences for proteins of the present invention, as found in wild-type and cultured bacteriophage YS40 strains, may be identified. These coding sequences and/or proteins may be further modified as described herein, to provide additional coding sequences of the invention.
[62] By way of example, the genome sequence of YS40 may be searched for ORFs using the hidden Markov model approach implemented in GeneMark program (See Besemer J. and Borodovsky M (1999), NAR, Vol. 27, No. 19, pp. 3911-3920). Using this technique, 170 open reading frames (ORFs) encoding preferred proteins of the present invention are predicted, as identified in Table 1, below:
Table 1. Gene products of phage YS40 and their predicted molecular functions
Figure imgf000024_0001
Figure imgf000025_0001
Figure imgf000026_0001
Figure imgf000027_0001
Figure imgf000028_0001
position of the ORFs in the phage YS40 genome; "-" indicates a leftwards transcription orientation, b presence of transmembrane domains (TM) and coiled coil regions are indicated.
[63] Regions between the identified ORFs may be screened for additional genes using the Blastx and tBlastx programs (Schafer et al, 1997), and identified ORF sequences compared with sequences in available databases (e.g., GenBank, GenPept, and the database of unfinished microbial genomes at NCBI) to provide a putative activity or function to the protein encoded by the ORF. III. YS40 proteins
[64] Once identified, ORFs may be used in expression systems to produce YS40 proteins of the present invention, or the proteins may be isolated from cultures of Thermus thermophilics infected with bacteriophage YS40. Alternatively, proteins and peptides of the present invention may be synthesized using solid or liquid phase techniques well known to those of skill in the art. These proteins include a thermophilic amino acid sequence at least 75%, 80%, 85%, 90%, 95%, 96%, 97%, 98%, or even at least 99% homologous to a YS40 amino acid sequence encoded by at least about 25, 35, 45, 50, 65 75 85, 95 or even about 100 contiguous codons of a YS40 coding sequence selected from SEQ ID NO: 1-170. In certain instances, the YS40 coding sequence is selected from SEQ ID NO: 2, 4-64, 70-149, 151 and 153-170 and has an enzyme activity that is a decarboxylase, nuclease, synthase, recombinase, helicase, dehydrogenase, reductase, nucleotide primase, kinase, protease, nucleotidyltransferase, nucleic acid polymerase, deaminase, acyltransferase, terminase, helicase, glycosyltransferase or a peptidase that is active at a permissible temperature of about 360C, or even about 450C, 550C or 650C, or is active at about 750C. Activity at about 750C is preferred but not requisite. Alternatively, the YS40 coding sequence may be from SEQ ID NO: 5, 8, 9, 12-14, 17, 18, 23-27, 29, 33, 38, 41, 42, 52, 57, 59, 60, 62, 71, 79, 114, 144 or 161. In certain instances, the YS40 coding sequence is from SEQ ID NO: 33, and is a DNA polymerase.
IV. Purifying proteins
[65] As noted above, certain proteins of the present invention may be isolated from cultures of Thermus thermophilics infected with bacteriophage YS40. Proteins of the invention isolated in this manner will typically be encoded by the ORFs SEQ ID NOs: 1-170, more typically by SEQ ID NO: 2, 4-64, 70-149, 151 or 153-170, even more typically by SEQ ID NO: 5, 8, 9, 12-14, 17, 18, 23-27, 29, 33, 38, 41, 42, 52, 57, 59, 60, 62, 71, 79, 114, 144 or 161, and most typically by SEQ ID NO: 33. In certain embodiments of the invention, the proteins contemplated are fragments of one or more of those proteins described above.
[66] Briefly, YS40 proteins of the present invention may be obtained by growing cultures of Thermus thermophilus infected with bacteriophage YS40 at a permissible temperature using media and techniques well known to those of skill in the art. During culture, preferably in the exponential growth phase of the bacteria, the culture is fractionated by separating the bacteria from the culture media using, for example, low speed centrifugation. If a lytic strain of YS40 is used, then the proteins of the invention may be harvested from the supernatant. If a non-lytic strain of YS40 is used, then the proteins of the invention may be harvested from the bacterial cells after lysis using, for example, a trench press or other method well known in the art. Whichever approach is used, proteins of the invention may be further purified using any combination of a variety of techniques well known to those of skill in the art. (cf., Colley et al., J. Biol. Chem., 264:17619-17622 (1989), and Guide to Protein Purification, in Vol. 182 of Methods in Enzymology (Deutscher ed., 1990), Morrison, D.A., J Bad, 132:349-351 (1977), or by Clark-Curtiss et al, Methods in Enzymology, 101:347-362 (1983), eds. R. Wu et al, Academic Press, New York, (for suitable media, see the catalogues of the American Type Culture Collection)). Additional isolation techniques are described in detail in the following sections.
[67] Proteins and peptides of the present invention may be purified to substantial purity by standard techniques, including column chromatography, immunopurification methods, electrophoresis, centrifugation, crystallization, isoelectric focusing and others {see, e.g., Scopes, Protein Purification: Principles and Practice (1982); Ausubel, et al. (1987 and periodic supplements) Current Protocols in Molecular Biology; Deutscher (1990) "Guide to Protein Purification" in Methods in Enzymology vol. 182, and other volumes in this series; and manufacturers' literature on use of protein purification products, e.g., Pharmacia, Piscataway, NJ., or Bio-Rad, Richmond, Calif.; and Sambrook et al, supra). Standard purification techniques
Gel filtration
[68] The individual molecular weights of proteins of the present invention may be used to isolate it from proteins of greater and lesser size by, for example, using ultrafiltration through membranes of different pore size (for example, Amicon or Millipore membranes). As a first step, the protein mixture is ultrafiltered through a membrane with a pore size that has a lower molecular weight cut-off than the molecular weight of the protein of interest. The retentate of the ultrafiltration is then ultrafiltered against a membrane with a molecular cut-off greater than the molecular weight of the protein of interest. The recombinant protein will pass through the membrane into the filtrate. The filtrate may then be chromato graphed as described below.
Exchange chromatography
[69] Proteins of the present invention may also be separated from other proteins on the basis of size, net surface charge, hydrophobicity, and affinity for ligands. In addition, antibodies raised against proteins may be conjugated to column matrices and the proteins immunopurified. AU of these methods are well known in the art. It will be apparent to one of skill that chromatographic techniques may be performed at any scale and using equipment from many different manufacturers (e.g., Pharmacia Biotech).
Tagging techniques
[70] Purification segments, or "affinity tags" may be fused to appropriate portions of proteins of the present invention to assist in isolation and production. For example, the FLAG sequence, or a functional equivalent, may be fused to the protein via a protease-removable sequence, allowing the FLAG sequence to be recognized by an affinity reagent, and the purified protein subjected to protease digestion to remove the extension. Many other equivalent segments exist, e.g., poly- histidine segments possessing affinity for heavy metal column reagents. See, e.g., Hochuli, Chemische Industrie, 12:69-70 (1989); Hochuli, Genetic Engineering, Principle and Methods, 12:87-98 (1990), Plenum Press, N. Y.; and Crowe, et al. (1992) OIAexpress: The High Level Expression & Protein Purification System, QIAGEN, Inc. Chatsworth, Calif.; which are incorporated herein by reference.
[71] Affinity tags may also be incorporated into protein constructs of the present invention as analytical tools. Affinity tags provide a convenient way of removing the protein construct from a sample at a desired time, or to detect the location of the protein construct in a sample. Many other applications of affinity tagged protein constructs will be readily apparent to one of skill in the art.
His-tag
[72] Protein constructs of the present invention may also contain a string of histidine residues, incorporated at the amino or carboxyl terminal of the novel protein. The polyhistidine tag allows convenient isolation of the protein in a single step by nickel-chelate chromatography. When a protein that has been "his-tagged" is placed on the nickel column, the histidine residues form a chelate complex with the nickel bound to the column, immobilizing the tagged protein. Contaminating components of the solution comprising the tagged protein may be washed away prior to elution of the tagged protein with a suitable competing chelator, typically imidazole. [73] The polyhistidine tag may be added to the protein through the use of peptide linkers as described in detail below. Alternatively, the tag may be linked to a protein by appending a nucleic acid encoding the tag onto the coding region of recombinant protein, the resulting construct being incorporated into a suitable expression vector that is subsequently used to transform an appropriate host cell. Protein produced in the transformed host cell may then be purified as noted above.
Epitope tagging
[74] Epitope tags are another useful sequence that may be included in a protein construct of the present invention. The epitope tag may consist of an amino acid sequence that allows affinity purification of the activated protein (e.g., on immunoaffmity or chelating matrices). Thus, by including an epitope tag on the activation construct, all of the activated proteins from an activation library may be purified. By purifying the activated proteins away from other cellular and media proteins, screening for novel proteins and enzyme activities may be facilitated. In some instances, it may be desirable to remove the epitope tag following purification of the activated protein. This removal may be accomplished by including a protease recognition sequence (e.g., Factor Ha or enterokinase cleavage site) downstream from the epitope tag on the activation construct. Incubation of the purified, activated protein(s) with the appropriate protease will release the epitope tag from the proteins(s).
[75] In libraries in which an epitope tag sequence is located in the protein construct, all of the tagged proteins may be purified away from all other cellular and media components using affinity purification. In addition to purifying the tagged protein, this method also concentrates the protein sample.
V. Recombinant expression
[76] A preferred method of producing proteins of the present invention is through recombinant expression of the proteins in a heterologous host system. Such systems are preferably cellular in nature, but may be cell-free. Preferable cell-based systems include bacterial hosts, most preferably E. coli hosts. As described below, nucleic acids encoding proteins of the present invention are typically inserted into an expression vector suitable for the chosen host, with the coding sequence of the nucleic acid aligned in-frame and operably linked to suitable control sequences such as a promoter and a transcriptional terminator. The expression vector is then inserted into the host cell, which is then cultured under conditions that allow for the expression of the protein of the invention. After protein expression, the protein is preferably purified using techniques such as the examples provided below. [77] Proteins expressed in bacteria may form insoluble aggregates ("inclusion bodies"). Several protocols are suitable for purification of Recombinant proteins from inclusion bodies. For example, purification of inclusion bodies typically involves the extraction, separation and/or purification of inclusion bodies by disruption of bacterial cells, e.g., by incubation in a buffer of 50 mM Tris/HCl pH 7.5, 50 mM NaCl, 5 mM MgCl2, 1 mM DTT, 0.1 mM ATP, and 1 mM PMSF. The cell suspension can be lysed using 2-3 passages through a French Press, homogenized using a Polytron (Brinkman Instruments) or sonicated on ice. Alternate methods of lysing bacteria are apparent to those of skill in the art (see, e.g., Sambrook et al, supra; Ausubel et al, supra). [78] Thus the present invention contemplates a recombinant cell or other expression system including an isolated nucleic acid that contains a YS40 nucleotide sequence having the nucleotide sequence SEQ ID NO: 1-170. The YS40 nucleotide sequence encodes either a YS40 structural protein that does not take a random coil structure at a permissible temperature of at least 360C, or a YS40 enzyme that displays decarboxylase, nuclease, synthase, recombinase, helicase, dehydrogenase, reductase, nucleotide primase, kinase, protease, nucleotidyltransferase, nucleic acid polymerase, deaminase, acyltransferase, terminase, helicase, glycosyltransferase or peptidase activity at a permissible temperature of at least 360C. In certain embodiments the YS40 nucleotide sequence is operably linked to a regulatory element, preferably a promoter, most preferably a constitutive promoter.
[79] Other embodiments of the invention include a recombinant vector comprising an isolated nucleic acid encoding an isolated protein comprising a thermophilic amino acid sequence at least about 75% homologous to a YS40 amino acid sequence encoded by at least about 25 contiguous codons from SEQ ID NO: 1-170. The YS40 amino acid sequence is operably linked to a promoter such that introduction of the vector into an expression system produces a protein having, an enzyme activity selected from the group consisting of decarboxylase, nuclease, synthase, recombinase, helicase, dehydrogenase, reductase, nucleotide primase, kinase, protease, nucleotidyltransferase, nucleic acid polymerase, deaminase, acyltransferase, terminase, helicase, glycosyltransferase and peptidase when assayed at a permissible temperature of at least about 360C more typically at least about 450C, 550C, or 650C and most typically at least about 750C.
VI. Preparation of nucleic acids encoding YS40 proteins
[80] Several embodiments of the present invention utilize nucleic acids encoding proteins of the present invention in the production of the proteins. These nucleic acids may be any coding sequence capable of expressing a protein of the present invention, when operably linked to appropriate control sequences, including a promoter. Thus, so long as a protein expressed from the nucleic acid is a protein of the invention described herein, the nucleic acid may include a partial deletion, substitution or insertion of the nucleotide sequence, or may have other nucleotide sequence ligated therewith at the 5 '-terminus and/or 3 'terminus thereof. [81] In general, nucleic acid sequences encoding proteins of the present invention may be isolated from Thermus thermophilics strains infected with bacteriophage YS40, or may be isolated from phage libraries constructed from the YS40 bacteriophage genome using methods well known by those of skill in the art. Generally, cDNA or genomic libraries are constructed and screened to identify the correct sequence. (For cDNA libraries, see e.g., Gubler & Hoffman, Gene, 25:263-269 (1983); Sambrook et al, 2001, Molecular Cloning: A Laboratory Manual (3 rd ed.); Cold Spring Harbor Laboratory, Cold Spring Harbor, N. Y; Ausubel et al. (eds.), 1993, Current Protocols in Molecular Biology, John Wiley & Sons, NY. For genomic libraries, see Benton & Davis, Science, 196:180-182 (1977); Grunstein et al. , Proc. Natl. Acad. ScL USA., 72:3961-3965 (1975); and Gussow, D. and Clackson, T., Nucl. Acids Res., 17:4000 (1989).) [82] PCR amplification techniques can also be used to identify and isolate nucleic acid sequences encoding proteins of the invention and are discussed generally in PCi? Protocols: A Guide to Methods and Applications (Innis et al, eds, 1990). [83] Nucleic acids encoding proteins of the invention may also be prepared using synthetic techniques. Chemical synthesis of linear oligonucleotides is well known in the art and can be achieved by solution or solid phase techniques. Moreover, linear oligonucleotides of defined sequence can be purchased commercially or can be made by any of several different synthetic procedures including the phosphoramidite, phosphite triester, H-phosphonate and phosphotriester methods, typically by automated synthesis methods. The synthesis method selected can depend on the length of the desired oligonucleotide and such choice is within the skill of the ordinary artisan. For example, the phosphoramidite and phosphite triester method produce oligonucleotides having 175 or more nucleotides while the H-phosphonate method works well for oligonucleotides of less than 100 nucleotides. Oligonucleotides of the present invention can be synthesized chemically according to the solid phase phosphoramidite triester method described by Beaucage and Caruthers (1981), Tetrahedron Letts., 22(20): 1859-1862, e.g., using an automated synthesizer, as described in Needham-VanDevanter et al. (1984) Nucleic Acids Res., 12:6159-6168. Oligonucleotides can also be custom made and ordered from a variety of commercial sources known to persons of skill in the art. Purification of oligonucleotides, where necessary, is typically performed by either native acrylamide gel electrophoresis or by anion-exchange HPLC as described in Pearson and Regnier (1983) J. Chrom. 255:137-149. See also Sambrook, J. et al. Molecular Cloning, A Laboratory Manual, 2d Ed. Cold Spring Harbor Laboratory Press, New York, 13.7-13.9 and Hunkapiller, M. W. (1991) Curr. Op. Gen. Devi. 1:88-92.
VII. Expression systems
[84] Nucleic acids encoding proteins of the present invention may be expressed in a variety of host organisms once they are operably linked in expression vectors suitable for the selected host organism. Suitable expression vectors typically comprise regulatory sequences operable in the host organism. These regulatory sequences are necessarily operably linked to the nucleic acid to control its expression. The expression vector includes a promoter that is either inducible or constitutively drives transcription, and may optionally comprise other regulatory, replication or manipulation sequences to aid in the expression and incorporation of the nucleic acid into the expression vector, as required by the particular application being pursued. [85] For example, to obtain a high level expression of a protein in a prokaryotic system, it is essential to construct expression vectors that contain, at a minimum; a strong promoter to direct transcription, a ribosome-binding site for translational initiation, a transcription/translation terminator, and unique restriction sites in nonessential regions of the plasmid to allow insertion of foreign nucleic acids. Other factors may also be carried on the expression vector, such as selectable and/or scorable markers, such as those described below. Suitable expression systems for use with the present invention are well known in the art. See, e.g., Pouwels, et al. (1985 and Supplements) Cloning Vectors: A Laboratory Manual, Elsevier, N. Y.; Rodriquez, et al. (eds.) Vectors: A Survey of Molecular Cloning Vectors and Their Uses, Buttersworth, Boston, 1988; Luckow, V. A. and Summers, M. D., Bio/Technology, 6:47-55 (1988); Herskowitz, I. and Hagen, D., Ann. Rev. Genet, 14:399-445 (1980); and Yanofsky, C, J Bacteriol, 158:1018-1024 (1984).
[86] Exemplary bacterial host organisms suitable for use in the present invention are well known in the art and include gram-positive and gram-negative bacteria such as Escherichia coli (cf. Sambrook et al, supra). E. coli strains are particularly preferred host organisms for expression of proteins of the present invention. Exemplary E. coli strains include BL21 (DE3), BL21-Gold (DE3), BL21 (DE3)-pLysS (Stratagene), MMLV-RT: JMl 09, DH5.alpha.f , XLlBLUE
STRATAGENE ®, San Diego, Calif.), JM105, ER 1458, NM 522, In αf (Invitrogen, San Diego,
Calif.), TOPP™. strains 1-6 (STRATAGENE ®), 1200, MRE 600, Q13, and A19. Some of these strains (1200, MRE 600, Q13, and A19) are mutants that have reduced levels of RNase I (referred to as "RNase I deficient") compared to wild type strains (Durwald et al., 1968, J. MoI. Biol.
34:331-346; Clark, 1963, Genetics 48:105-120; Gesteland, 1966, J. MoI. Biol. 16:67; Reiner, 1969, J. Bacteriol.97:1522), while others are common laboratory strains. Some of these strains contain the lac Iq repressor and required use of isopropylthiogalactoside (IPTG) to induce transcription. The level of RT expression of host cells containing the RT gene was estimated by visualizing the resulting proteins on SDS-polyacrylamide gels and also, in most cases, by enzyme activity assays on crude cell lysates. Of the RNase I deficient strains, E. coli 1200 (Strain 4449, available from the E. coli Genetic Stock Center, Yale University) consistently showed high levels of enzyme expression using these assays; unless indicated otherwise, all experiments described herein were conducted using this strain.
[87] Standard transfection methods are used to introduce expression systems for proteins of the present invention to host organisms, (see, e.g., Morrison, J Bact., 132:349-351 (1977); Clark- Curtiss & Curtiss, Methods in Enzymology, 101 :347-362 (Wu et al, eds, 1983); Sambrook et al, and Ausubel et al, supra.). The proteins can be recovered from the cells or from the culture medium by standard protein purification techniques as described above.
Selectable marker genes
[88] Identifying host organisms that have successfully incorporated nucleic acids encoding a protein of the present invention is preferably accomplished through inclusion of a selectable marker gene into the vector or expression system used for producing the protein. Selectable markers allow a transformed cell, tissue or animal to be identified and isolated by selecting or screening the engineered material for traits encoded by the marker genes present on the transforming DNA. For instance, selection may be performed by growing the engineered cells on media containing inhibitory amounts of an antibiotic to which the transforming marker gene construct confers resistance. Further, transformed cells may also be identified by screening for the
activities of any visible marker genes (e.g., the β-glucuronidase, green fluorescent protein,
luciferase, B or Cl genes) that may be present on the recombinant nucleic acid constructs of the present invention. Such selection and screening methodologies are well known to those skilled in
the art.
[89] Physical and biochemical methods may also be used to identify a cell transformant containing the genetic constructs of the present invention. These methods include but are not limited to: 1) Southern analysis or PCR amplification for detecting and determining the structure of the recombinant DNA insert; 2) Northern blot, S-I RNase protection, primer-extension or reverse transcriptase-PCR amplification for detecting and examining RNA transcripts of the gene constructs; 3) enzymatic assays for detecting enzyme activity, where such gene products are encoded by the gene construct; 4) protein gel electrophoresis, western blot techniques, immunoprecipitation, or enzyme-linked immunoassays, where the gene construct products are proteins; 5) biochemical measurements of compounds produced as a consequence of the expression of the introduced gene constructs. The methods for performing these assays are well known to those skilled in the art
VIIl. Chemical protein synthesis
[90] Proteins of the present invention may also be synthesized chemically. For chemical synthesis, peptides may be synthesized either in solution, solid phase or a combination of these methods following standard protocols. See, for example, Wilken et al. (Curr. Opin. Biotech. (1998) 9(4) :412-426), which reviews chemical protein synthesis techniques. The solution and solid phase synthesis methods are readily automated. A variety of peptide synthesizers are commercially available for batchwise and continuous flow operations as well as for the synthesis of multiple peptides within the same run. Briefly, the solid phase method consists of anchoring the growing peptide chain to an insoluble support or resin. This is accomplished through the use of a chemical handle, which links the support to the first amino acid at the carboxyl terminus of the peptide. Subsequent amino acids are then added in a stepwise fashion one at a time until the peptide segment is folly constructed. Solid phase chemistry has the advantage of permitting removal of excess reagents and soluble reaction by products by filtration and washing. The protecting groups of the folly assembled resin bound peptide chain are removed by standard chemistries suitable for this purpose. Standard chemistries also may be employed to remove the peptide chain from the resin. Cleavable linkers can be employed for this purpose. [91] Solution phase peptide synthesis generally involves reacting individual protected amino acids in solution to generate protected dipeptide product. After removal of a protection group to expose a reactive group for addition of the next amino acid, a second protected amino acid is reacted to this group to give a protected tripeptide. The process of deprotection/amino acid addition is repeated in a stepwise fashion to yield a protected peptide product. One or more to these protected peptides can be reacted to give the foil-length protected peptide. Most or all or the remaining protecting groups are removed to generate an unprotected synthetic peptide segment. Thus, solid phase or solution phase chemistries may be employed to form synthetic peptides comprising one or more functional protein modules.
[92] In general, the method of chemical synthesis employs a combination of chemical synthesis and chemical ligation techniques. By way of example, chemical synthesis approaches described above may be utilized in combination with various chemoselective chemical ligation techniques for producing the proteins of the invention. Chemoselective chemical ligation chemistries that can be utilized in the methods of the invention include native chemical ligation (Dawson et al, Science (1994) 266:77-779; Kent et al., WO 96/34878), extended general chemical ligation (Kent et al., WO 98/28434), oxime-forming chemical ligation (Rose et al., J. Amer. Chem. Soc. (1994) 116:30- 33), thioester forming ligation (Schnolzer et al., Science (1992) 256:221-225), thioether forming ligation (Englebretsen et al., Tet. Letts. (1995) 36(48):8871-8874), hydrazone forming ligation (Gaertner et al., Bioconj. Chem. (1994) 5(4):333-338). thaizolidine forming ligation and oxazolidine forming ligation (Zhang et al., Proc. Natl. Acad. Sci. (1998) 95(16):9184-9189; Tarn et al, WO 95/00846). The preferred chemical ligation chemistry for synthesis of cross-over proteins according to the method of the invention is native chemical ligation.
[93] Synthesis of proteins by a combination of chemical ligation and chemical synthesis permits facile incorporation of one or more chemical tags. These include synthesis and purification handles, as well as detectable labels and optionally chemical moieties for attaching the protein to a support matrix for screening and diagnostic assays and the like. As can be appreciated, in some instances it may be advantageous to utilize a given chemical tag for more than one purpose, e.g., both as a handle for attaching to support matrix and as a detectable label. Examples of chemical tags include metal binding tags (e.g., his-tags), carbohydrate/substrate binding tags (e.g., cellulose and chitin binding domains), antibodies and antibody fragment tags, isotopic labels, haptens such as biotin and various unnatural amino acids comprising a chromophore, some of which have been discussed supra. A chemical tag also may include a cleavable linker so as to permit separation of the protein from the chemical tag depending on its intended end use.
IX. Thermophilic Applications
A. Nucleic Acid Amplification Techniques
[94] Proteins of the present invention find application in a variety of processes, including biosynthetic and biodegradive processes, particularly those where performance of the process at a mesophilically or thermophilically compatible temperature is beneficial. For example, proteins of the present invention that catalyze reactions of import in nucleic acid synthesis are particularly suited for nucleic acid amplification processes. Methods of "quantitative" nucleic acid amplification are well known to those of skill in the art. For example, quantitative PCR involves simultaneously co-amplifying a known quantity of a control sequence using the same primers. This type of quantitative amplification provides an internal standard that may be used to calibrate the PCR reaction. [95] One exemplary internal standard is a synthetic AWl 06 cRNA. The AWl 06 cRNA is combined with RNA isolated from the sample according to standard techniques known to those of skilled in the art. The RNA is then reverse transcribed using a reverse transcriptase to provide cDNA. The cDNA sequences are then amplified (e.g., by PCR) using labeled primers. The amplification products are separated, typically by electrophoresis, and the amount of radioactivity (proportional to the amount of amplified product) is determined. The amount of mRNA in the sample is then calculated by comparison with the signal produced by the known AWl 06 RNA standard. Detailed protocols for quantitative PCR are provided in PCR Protocols, A Guide to Methods and Applications, Innis et al., Academic Press, Inc. N. Y., (1990). [96] Other suitable amplification methods include, but are not limited to polymerase chain reaction (PCR) (Innis, et al., PCR Protocols. A guide to Methods and Application. Academic Press, Inc. San Diego, (1990)), ligase chain reaction (LCR) (see Wu and Wallace, Genomics, 4: 560 (1989), Landegren, et al., Science, 241: 1077 (1988) and Barringer, et al., Gene, 89: 117 (1990), transcription amplification (Kwoh, et al., Proc. Natl. Acad. Sci. USA, 86: 1173 (1989)), and self- sustained sequence replication (Guatelli, et al., Proc. Nat. Acad. Sci. USA, 87: 1874 (1990)). [97] Methods of in vitro polymerization are well known to those of skill in the art (see, e.g., Sambrook, supra.) and this particular method is described in detail by Van Gelder, et al., Proc. Natl. Acad. Sci. USA, 87: 1663-1667 (1990) who demonstrate that in vitro amplification according to this method preserves the relative frequencies of the various RNA transcripts. Moreover, Eberwine et al. Proc. Natl. Acad. Sci. USA, 89: 3010-3014 provide a protocol that uses two rounds of amplification via in vitro transcription to achieve greater than 106 fold amplification of the original starting material, thereby permitting expression monitoring even where biological samples are limited. [98] It will be appreciated by one of skill in the art that the direct transcription method described above provides an antisense (aRNA) pool. Where antisense RNA is used as the target nucleic acid, the oligonucleotide probes provided in the array are chosen to be complementary to subsequences of the antisense nucleic acids. Conversely, where the target nucleic acid pool is a pool of sense nucleic acids, the oligonucleotide probes are selected to be complementary to subsequences of the sense nucleic acids. Finally, where the nucleic acid pool is double stranded, the probes may be of either sense as the target nucleic acids include both sense and antisense strands. [99] The protocols cited above include methods of generating pools of either sense or antisense nucleic acids. Indeed, one approach can be used to generate either sense or antisense nucleic acids as desired. For example, the cDNA can be directionally cloned into a vector (e.g., Stratagene's p Bluscript II KS (+) phagemid) such that it is flanked by the T3 and T7 promoters. In vitro transcription with the T3 polymerase will produce RNA of one sense (the sense depending on the orientation of the insert), while in vitro transcription with the T7 polymerase will produce RNA having the opposite sense. Other suitable cloning systems include phage lambda vectors designed for Cre-loxP plasmid subcloning (see e.g., Palazzolo et al., Gene, 88: 25-36 (1990)). [100] Exemplary reagent mixtures for use in amplifying nucleic acids according to the methods of the present invention include a recombinant protein that has a thermophilic amino acid sequence at least about 75% homologous to an YS40 amino acid sequence encoded by at least about 25 contiguous codons of SEQ ID NO: 2, 4-64, 70-149, 151 or 153-170. This thermophilic amino acid sequence confers to the recombinant protein an enzyme activity necessary for DNA amplification when incubated at a permissible temperature of at least about 36°C, more typically at least about 55°C most typically at least about 650C. Typically the YS40 amino acid sequence is encoded by at least 25 contiguous codons of SEQ ID NO: 5, 8, 9, 12-14, 17, 18, 23-27, 29, 33, 38, 41, 42, 52, 57, 59, 60, 62, 71, 79, 114, 144 or 161. Most typically the YS40 amino acid sequence is encoded by at least about 25 contiguous codons of SEQ ID NO: 33, and the enzyme activity is DNA polymerase. [101] In some embodiments of the present invention, amplification of nucleic acids is contemplated as taking place directly from whole cells containing the nucleic acid to be amplified. Exemplary proteins of the present invention possessing protease, lipase or other enzymatic activities that degrade biomolecules of a cell may be included in the amplification reaction. These enzymes, together with the elevated temperatures of the reaction, provide a means of breaching the cell membrane and allowing the nucleic acid within the cell to be amplified. Methodology for carrying out such reactions will be obvious to one of skill in the art, and may be adapted to virtually any cell system through routine experimentation.
[102] Methods of the present invention for amplifying nucleic acids from whole cells include subjecting the cell preparation to at least one thermophilic protein that has a recombinant amino acid sequence at least about 75% homologous to an YS40 amino acid sequence, which is encoded by at least 25 contiguous codons from SEQ ID NO: 2, 4-64, 70-149, 151 and 153-170, more typically SEQ ID NOs: 5, 8, 9, 12-14, 17, 18, 23-27, 29, 33, 38, 41, 42, 52, 57, 59, 60, 62, 71, 79, 114, 144 or 161, most typcially SEQ ID NO: 33. In certain instances, the thermophilic protein encoded by SEQ ID NO: 33 is preferred. When incubated at a permissible temperature greater than about 360C, more preferably greater than about 550C, and most preferably greater than about 650C, the cell membrane is breached, allowing the amplification reagents to contact the nucleic acids of the cell, which are subsequently amplified.
B. Biodegredation techniques
[103] Another preferred application of the proteins of the present invention is the use of the proteins in commercially important biosynthetic or biodegradative processes. For example, the present invention contemplates using proteins described herein in mesophilic and thermophilic processes for the synthesis or degradation of biomaterials. Using protein enzymes of the present invention, these reactions may be carried out at elevated temperatures that are incompatible with growth of bacteria that may normally interfere with such processes, while providing accelerated enzymatic activity resulting from the higher temperature. By way of example, processes in which protein enzymes of the present invention may be used include, but are not limited to, waste water treatment, fermentation processes, composting, paper manufacture, etc. It will be readily appreciated by one of skill in the art that the proteins of the present invention find use in many processes in addition to those listed here, and may be applied to such processes through routine experimentation.
[104] Methods of the present invention suitable for decomposing a biodegradable material involve contacting the biodegradable material with at least one recombinant protein that has a recombinant amino acid sequence that is at least about 75% homologous to an YS40 amino acid sequence encoded by at least 25 contiguous codons of SEQ ID NO.: 2, 4-64, 70-149, 151 or 153-170 of Table 1. The recombinant amino acid sequence confers to the recombinant protein, at a permissible temperature greater than 360C, more preferably greater than 550C and most preferably greater than 650C, an enzyme activity necessary for decomposing the biodegradable material, which may be a protease, amylase, cellulase, nuclease, lipase, deaminase or a peptidase.
X. Thermus expression system
[105] In addition to the proteins of the present invention and the nucleic acids encoding them, the present invention also contemplates a Thermus thermophilics expression system for expression of foreign proteins at elevated temperatures. Central to Thermus thermophilus expression systems of the present invention is an expression vector based on the YS40 bacteriophage genome. The expression vector includes no more than 99.9% of the nucleotide sequence of SEQ ID NO: 171 or its complement. Inserted into this vector is a non-YS40 nucleotide sequence of at least 20 contiguous nucleotides. This non-YS40 nucleotide sequence is inserted into the vector sequence such that it is flanked on its 3' and 5 'ends by at least 10 contiguous nucleotides from the YS40 genome. The non-YS40 nucleotide sequence may include a promoter suitable for expression of the protein encoded by the non-YS40 nucleotide sequence in T. thermophilus, and/or the non-YS40 nucleotide sequence may be operably linked to one or more regulatory sequences of YS40 bacteriophage.
[106] The expression vector containing the non-YS40 nucleotide sequence is then introduced into
T. thermophilics using any technique known to those of skill in the art, such as those described above. The transformed T. thermophilics is then cutured at a permissible temperature under suitable conditions allowing expression of the protein encoded by the non-YS40 nucleotide sequence.
[107] Although use of T. thermophilics is a preferred embodiment of the present invention, other cellular hosts are also contemplated, as are cell-free expression systems, such as reticulocyte lysates.
XI. Kits
[108] The present invention also contemplates kit embodiments suitable for amplifying nucleic acid samples. These kits include a reagent containing at least one recombinant protein that has a thermophilic amino acid sequence at least about 75% homologous to an YS40 amino acid sequence encoded by at least about 25 contiguous taken from one of the sequences SEQ ID NO: 1-170. The reagent has an enzyme activity necessary for DNA amplification or DNA entry into the cell, as described above, at a permissible temperature of at least about 360C, more typically at least about 550C most typically at least about 650C. Kit embodiments also include a buffer solution for diluting the reagent and may optionally include universal primers and/or known calibration nucleic acids known to those of skill in the art.
[109] All publications and patent applications cited in this specification are herein incorporated by reference as if each individual publication or patent application were specifically and individually indicated to be incorporated by reference. [110] Although the foregoing invention has been described in some detail by way of illustration and example for clarity and understanding, it will be readily apparent to one of ordinary skill in the art in light of the teachings of this invention that certain changes and modifications may be made thereto without departing from the spirit and scope of the appended claims. The following examples are included to demonstrate certain embodiments of the invention. It should be appreciated by those of skill in the art that the techniques disclosed in the examples which follow represent techniques discovered by the inventor to function well in the practice of the invention, and thus can be considered to constitute preferred modes for its practice. However, those of skill in the art should, in light of the present disclosure, appreciate that many changes can be made in the specific embodiments which are disclosed and still obtain a like or similar result without departing from the spirit and scope of the invention.
EXAMPLES Example 1.
[Ill] This example describes one method of identifying coding sequences of the present invention starting from the genomic sequence of the YS40 thermophilic phage. [112] The genome sequence of YS40 was searched for open reading frames (ORFs) using the hidden Markov model approach implemented in GeneMark program (See Besemer J. and Borodovsky M (1999)., NAR, Vol. 27, No. 19, pp. 3911-3920). Using this technique, 170 open reading frames (ORFs) encoding preferred proteins of the present invention were predicted (Table 1). Regions between the identified ORFs were then screened for additional genes using the Blastx and tBlastx programs (Schafer et al, 1997) to identify regions having similarity with available entries in GenBank, GenPept, and the database of unfinished microbial genomes at NCBI. This latter search did not identify any additional coding sequences. [113] The predicted YS40 ORFs have lengths between 43 and 1744 codons. As with most other phages, the genome of YS40 is tightly packed, with little space between ORFs and 46 cases of overlaps (from 1 to 40 bases in length) between the adjoining ORFs. Ninety-five percent of the YS40 genome is occupied by coding sequence, and on the average, there are 1.129 genes per 1 kb. The G + C-content are 32.29% and 33.92% for coding and non-coding regions, respectively. The longest non-coding region is 390bp and length; it lies between the ORF 138 and ORF 139 and does not appear to demarcate any functional regions.
[114] Among the 170 predicted ORFs, most initiate translation at the AUG codon, 22 appear to use GUG and 3 use UUG. Among the stop codons, TAA is found in 90 cases, TGA in 66, and TAG in 16 cases.
[115] The two thirds of YS40 genome (114 genes) are transcribed leftwards, and 56 genes are transcribed rightwards. The G + C content is approximately the same for both sets of ORFs. The largest cluster of consecutive genes with the same transcriptional orientation contains 35 ORFs that encodes mostly proteins with unknown function (SEQ ID NO:97 through SEQ ID N0:131). tRNAs genes
[116] Similarly to some large tailed dsDNA bacteriophages, such as coliphage T4 (Miller ES, et al 2003), vibriophage KVP40 (Miller ES, et al. 2003) and phage phiKZ of P. aeruginosa (Mesyanzhinov VV., et al. 2002), YS40 encodes several tRNAs. Using the tRNA scan-SE program (Lowe, T.M. & Eddy, S. R. 1997), three tRNA genes were identified within the YS40 genome in two intergenic regions, with MetAUG (SEQ ID NO: 172), ArgAGA (SEQ ID NO: 173), and Jj11 ACA £SEQ ID NO: 174) specificities. The first two tRNA genes are located in a non-coding
region between ORF71 and ORF72, whereas tRNA-Thr gene overlaps with ORF 164. Given the significant difference in G + C content of YS40 and its Thermus host, the phage own tRNAs presumably influence the rate of translation of YS40 proteins. Sequence analysis of predicted YS40 proteins
[117] There are 170 potential ORFs in YS40 genome, coding for predicted proteins ranging from 43 to 1744 amino acid residues. These are presented as SEQ ID NO:1 through SEQ ID NO: 170. Analysis of intrinsic sequence features indicates that least 7 proteins contain putative transmembrane domains (from one to three), and 4 proteins have coiled coil regions. There is only one predicted non-globular proteins, gplO7 in YS40. And one protein, gp35, that is predicted to have an N-terminal secretion signal peptide, while there are about 10 proteins have such predicted N-terminal signal peptide. All deduced amino acid sequences were compared to proteins in the protein sequence database using the PSI-BLAST program (Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997 Sep l;25(17):3389-3402) with a slightly relaxed cutoff for profile inclusion (-h parameter) of 0.02, and the output was analyzed for the presence of conserved sequence motifs, with particular attention paid at the matches to proteins from other bacteriophages (Table 1). The comparison showed that about 25% of the YS40 proteins that are longer than 100 amino acids display sequence similarity to proteins of known function from a diversity of bacteria and bacteriophages (Table 1). YS40 proteins involved in nucleotide metabolism
[118] YS40 encodes a number of enzymes that are involved in nucleotide metabolism. They are gp8, ahomolog of mammalian/virus UTPase (EC 3.6.1.23); gp9, related to flavin-dependent thymidylate synthase (EC 2.1.1.148); gpl7, a GMP reductase, having sequence similarity to EC 1.7.1.7; gp24, a thymidine kinase, having sequence similarity to EC 2.7.1.21, PF00265; gp38, a deoxycytidylate deaminase, having sequence similarity to PF00383, EC 3.5.4.12; gp60, a dNMP
kinase; and the α subunit of ribonucleotide reductase, encoded by two adjoining ORFs, gp41 and gp42. YS40 proteins involved in DNA replication and recombination
[119] YS40 encodes most of the proteins required for its own replisome formation, namely gp27 and gp79, two helicases with DEAD signature in the Walker B motif; gpl4, replication initiation helicase DnaB; gp23, bacterial DnaG-family DNA primase; gp26, RecB family exonuclease; gp33, type A DNA polymerase; and gp65, a terminal protein that may be covalently attached to the 5' of YS40 genome DNA terminus. YS40 also encodes two recombination proteins, gpl2, RecA/RadA recombinase; and gpl 14, recombination protein ERF.
[120] As with most A-type DNA polymerases, gp33 contains conserved nucleotidyltransferase domain and 3 '-5' exonuclease domain. However, like the Klenow fragment of E. coli DNA polymerase I, gp33 lacks the N-terminal 5 '-3' exonuclease domain. Furthermore, in YS40 genome, there are no gene products with detectable sequence similarity to single-stranded DNA binding protein from any known class (Ponomarev VA, et al. MoI Microbiol Biotechnol. 2003), nor to any DNA ligases from other bacteria or bacteriophages.
[121] The protein gp65 is of particular interest in understanding the replication mechanism of YS40. It shows striking sequence similarity to the C-terminal portion of the podovirus phi29 terminal protein (TP) that is essential for the protein-primed DNA replication of the linear phi29 genome. In phi29, the 5 '-terminal dAMP is linked via a phosphoester bond to the hydroxyl group of Ser232 of the TP (Hermoso JM, et al 1985), and this Ser232 is absolutely critical for the priming activity of TP (Garmendia C, et al. 1988; Garmendia C, et al. 1990). As shown in figure #, this serine residue is conserved in all TPs from phi29 family phages and YS40. The sequence similarity between gp65 and phi29 TP strongly suggests that YS40 replicates its genome in a linear form, probably adopting the similar protein-primed replication mode as phi29 family phages. Identified YS40 structural proteins and DNA packaging proteins
[122] The overall architecture of YS40 genome is unique, as compared to other sequenced phage genomes. In particular, the tendency towards tight clustering of gene coding for virion component, that is so prevalent in lambdoid phages and T4-like phage groups, is hardly observed in YS40 genome. For example, gpl50, encoding a putative Myovirus-like baseplate assembly protein, is adjacent to gpl 52 that encodes a putative Myovirus-like wac fibritin neck whisker, but these two structural genes are located far away from other recognized YS40 structural genes, such as the genes coding for gpl (distal tail fiber protein), gp3 (portal protein), gp62 (terminase large subunit) and gp69 (tail sheath protein). YS40 protein sequences and thermophily
[123] YS40 is capable of withstanding temperature as high as 75°C in its Thermus host. Thus its molecular milieu is extremely resistant to elevated temperatures, such as those desirably employed in bioreactors, including PCR processes. However, only one YS40-encoded protein, gpl 3 (recombination protein ERF), has the best database match in Thermus bacteria. And for 6 of the 170 YS40 predicted gene products, the best database match is from thermophilic microorganisms, including Thermotoga maritima, Thermoanaerobacter tengcongensis, and Methanocaldococcus jannaschii. Further investigation on the sequence, function and evolution of these groups of thermophilie-affiliated YS40 proteins may give us more clues on the survival strategy of this phage under the extreme temperature. For instance, the best database match of gp5, S- adenosylmethionine decarboxylase (adoMetDC), a key enzyme in the biosynthesis of spermidine and spermine, is from M. jannaschii. The placement of YS40 adoMetDC on a thermophilic species-specific clade, including both bacterial and archaeal species, such as Aquifex aeolicus, Thermoplasma, Picrophilus and Pyrococcus, in the phylogenetic tree built on the basis of multiple sequence alignment of adoMetDC enzymes (data not shown) suggests that thermophilic adoMetDC enzymes are evolutionarily specialized, therefore may be important for the survival of thermophilic microorganisms in extreme high temperature. Example 2.
[124] This example describes the ability of the phage protein YsI 8, encoded by SEQ ID NO: 18, to negatively regulate transcription initiation by binding RNA polymerase sigma factors from T. thermophilics (T. th) and E. coli. Binding experiments and transcription experiments have been used to determine the function of Ys 18.
[125] The function of Ys 18 was analyzed by using a run-off transcription assay to determine if YsI 8 was involved in transcription (FIG. 1). Two transcriptionally competent open model promoters from different classes, -10/-35 class (T7A1) and extended -10 class (galPl), attached to a rho-independent terminator were used to quatitatively visualize efficiency of run-off
transcription. Reaction 1, in 20μl of transcription buffer (3OmM tris-HCl, pH 8.0, 1OmM MgC12,
4OmM KCl, 1 mM β-mercaptoethanol), contained core enzyme, sigma and YsI 8. Reaction 1 was
incubated at 65°C (for T.th) and 37°C (for E. coli) for 10 minutes, followed by the addition of
promoter DNA fragments. Reaction 2, in lOμl of transcription buffer, contained core enzyme and
sigma, and, in parallel, Ysl8 and a promoter DNA fragment. Reaction 2 was incubated at 65°C (for T. th) and 37°C (for E. coli) for 10 minutes and then mixed together. For both reactions 1 and
2, after 10 minutes of incubation at the same temperatures, 200 μM ATP, CTP, UTP, 20μM GTP
and 10 μCi of [α-32P] GTP were added, the reactions were incubated for the next 10 minutes, and
terminated by an equal volume of 9 M urea loading buffer.
[126] In the presence of YsI 8, the T. th RNA polymerase inhibited transcription in a YS 18 dose- dependent manner to different degrees with each promoter (FIG. IA). This data suggests that YS 18 inhibited transcription more actively from extended -10 promoter galPl, likely due to the absence of the -35 box stabilizing effect.
[127] YsI 8-dependent inhibition of transcription by E. coli RNA polymerase (FIG. IB) was similar to that of T. thermophilics (FIG. IA). In the presence of YsI 8, the E. coli RNA polymerase inhibited transcription in an YS 18 dose-dependent manner to different degrees with each promoter in multi- and single-round transcription (FIG. IB). The results demonstrate that Ysl8 only slightly, if not at all, inhibited single-round transcription, especially at the -10/-35 promoter T7A1. Ys 18 inhibited multi-round transcription from extended -10 promoter galPl more actively. [128] With both E. coli and T. th RNA polymerases, the difference of transcriptional inhibition between the different orders of promoter DNA addition to the transcription reaction (FIG. 1C),
indicates that Ys 18 interacted not only with RNA polymerase through sigma (σ) but also with
promoter DNA. Further the proportional change in terminator bands to run-off bands suggests that Ys 18 does not affect termination. Thus the phage protein seems to negatively regulate transcription initiation but not elongation or termination. Taken together, the results obtained in the transcription (FIG. 1) and the Ni-NTA agarose binding (FIG. 2) experiments suggest that Ys 18 inhibited transcription initiation through interaction with RNA polymerase sigma subunits. [129] Binding experiments using Ni-NTA agarose suggest YsI 8 associates with RNA polymerase sigma subunits to inhibit transcription. Primary sigma factors were preincubated in the presence
and absence of His-tagged Ysl8 (YsI 8ms) in 20 μl 1 of binding buffer (2OmM tris HCl, pH8.0, 0.5
M NaCl, 2 mM imidazole, 5% v/v glycerol) for 10 minutes at 65°C (for T.th σA) and 370C (E.coli
σ70). The binding mixtures were then added to Ni-NTA agarose beads, equilibrated in the binding
buffer, which bind His tags. Reactions were incubated for 10 minutes at room temperature. The agarose beads were pelleted by quick centrifugation and the unbound proteins were withdrawn. The beads were washed 3 times with the binding buffer containing 200 mM imidazole. Fractions were then resolved by SDS-PAGE and stained by Coomassie.
[130] YsI 8 HIS bound to the RNA polymerase sigma factor from T. th (σA). In the presence of
both YsI 8HIS and σA (FIG. 2A), σA was detected in the unbound and the bound fractions. In the
absence of YS18HIS (FIG. 2A), σA was exclusively observed in the unbound fraction. These results show σA cannot bind to the Ni-NTA agarose beads without YS18HIS, indicating Ysl8His was
capable of binding to σA.
[131] YsI 8HIS also bound to the RNA polymerase sigma factor from E. coli (σ70). In the presence
of both Ysl8His and σ70 (FIG. 2B), σ70 was detected in the unbound and the bound fractions. In the
absence of YS18HIS (FIG. 2B), σ70 was exclusively observed in the unbound fraction. Further,
YS18HIS also bound to the primary sigma factor from E. coli lacking region 4 (σ70 1-549). When both
YsI 8HIS and σ70 1.549 were present in the sample, σ70 1-549 was detected in the unbound and the bound
fractions (FIG. 2C). This data suggests YsI 8HIS bound to σ70 in a region other than region 4.

Claims

WHAT IS CLAIMED IS:
1. An isolated protein, including conservatively modified variants thereof, comprising a thermophilic amino acid sequence at least 75% identical to a YS40 amino acid sequence encoded by at least 25 contiguous codons of a YS40 coding sequence selected from the group consisting of SEQ ID NO: 1-170.
2. The protein of claim 1, wherein the thermophilic amino acid sequence is identical to the YS40 amino acid sequence.
3. The protein of claim 1, wherein the YS40 amino acid sequence is encoded by at least 50 contiguous codons of the YS40 coding sequence.
4. The protein of claim 1, wherein the YS40 amino acid sequence is encoded by at least 100 contiguous codons of the YS40 coding sequence.
5. The protein of claim 1, wherein the YS40 coding sequence is from a YS40 structural protein.
6. The protein of claim 5, wherein the YS40 coding sequence is selected from the group consisting of SEQ ID NO: 1, 3, 65, 69, 71, 151 and 152.
7. The protein of claim 1, wherein the YS40 coding sequence is selected from the group consisting of SEQ ID NO: 2, 4-64, 70-149, 151 and 153-170.
8. The protein of claim 7, wherein the thermophilic amino acid sequence confers to the protein, at a permissible temperature of about 360C, an enzymatic activity selected from the group consisting of decarboxylase, nuclease, synthase, recombinase, helicase, dehydrogenase, reductase, nucleotide primase, kinase, protease, nucleotidyltransferase, nucleic acid polymerase, deaminase, acyltransferase, terminase, helicase, glycosyltransferase and peptidase. 9. The protein of claim 8, wherein the YS40 coding sequence is selected from the group consisting of SEQ ID NO: 5, 8,
9, 12-14, 17, 18, 23-27, 29, 33, 38, 41, 42, 52, 57, 59, 60, 62,
71, 79, 114, 144 and 161.
10. The protein of claim 8, wherein the YS40 coding sequence is SEQ ID NO: 33, and the enzymatic activity is DNA polymerase.
11. The protein of claim 7, wherein the permissible temperature is at least 450C.
12. The protein of claim 7, wherein the permissible temperature is at least 550C.
13. The protein of claim 7, wherein the permissible temperature is at least 650C.
14. The protein of claim 7, wherein the permissible temperature is at least 750C.
15. An isolated nucleic acid encoding the protein of claim 1.
16. The nucleic acid of claim 15, wherein the thermophilic amino acid sequence of the encoded protein is identical to the YS40 amino acid sequence.
17. The nucleic acid of claim 15, wherein the YS40 amino acid sequence is encoded by at least 50 contiguous codons of the YS40 coding sequence.
18. The nucleic acid of claim 15, wherein the YS40 amino acid sequence is encoded by at least 100 contiguous codons of the YS40 coding sequence.
19. The nucleic acid of claim 15, wherein the YS40 coding sequence is selected from the group consisting of SEQ ID NO: 1, 3, 65, 69, 71, 151 and 152.
20. The nucleic acid of claim 15, wherein the thermophilic amino acid sequence confers an enzymatic activity to the encoded protein at a permissible temperature of 360C, the enzymatic activity selected from the group consisting of decarboxylase, nuclease, synthase, recombinase, helicase, dehydrogenase, reductase, nucleotide primase, kinase, protease, nucleotidyltransferase, nucleic acid polymerase, deaminase, acyltransferase, terminase, helicase, glycosyltransferase and peptidase.
21. The nucleic acid of claim 20, wherein the YS40 coding sequence is selected from the group consisting of SEQ ID NO: 2, 4-64, 70-149, 151 and 153-170.
22. The nucleic acid of claim 21, wherein the YS40 coding sequence is selected from the group consisting of SEQ ID NO: 5, 8, 9, 12-14, 17, 18, 23-27, 29, 33, 38, 41, 42, 52, 57, 59, 60, 62, 71, 79, 114, 144 and 161.
23. The nucleic acid of claim 21, wherein the YS40 coding sequence is SEQ ID NO: 33, and the enzymatic activity is DNA polymerase.
24. The nucleic acid of claim 20, wherein the permissible temperature is at least 450C.
25. The nucleic acid of claim 20, wherein the permissible temperature is at least 550C.
26. The nucleic acid of claim 20, wherein the permissible temperature is at least 650C.
27. The nucleic acid of claim 20, wherein the permissible temperature is at least 750C.
28. A recombinant vector comprising the nucleic acid of claim 15 operably linked to a promoter wherein introduction of the vector into an expression system produces a protein having, at a permissible temperature of 360C, an enzyme activity selected from the group consisting of decarboxylase, nuclease, synthase, recombinase, helicase, dehydrogenase, reductase, nucleotide primase, kinase, protease, nucleotidyltransferase, nucleic acid polymerase, deaminase, acyltransferase, terminase, helicase, glycosyltransferase and peptidase.
29. The vector of claim 28, wherein the promoter is inducible.
30. The vector of claim 28, wherein the permissible temperature is at least 450C.
31. The vector of claim 28, wherein the permissible temperature is at least 550C.
32. The vector of claim 28, wherein the permissible temperature is at least 650C.
33. The vector of claim 28, wherein the permissible temperature is at least 750C.
34. The vector of claim 28, wherein the YS40 coding sequence is selected from the group consisting of SEQ ID NO: 2, 4-64, 70-149, 151 and 153-170.
35. The vector of claim 34, wherein the YS40 coding sequence is selected from the group consisting of SEQ ID NO: 5, 8, 9, 12-14, 17, 18, 23-27, 29, 33, 38, 41, 42, 52, 57, 59, 60, 62, 71, 79, 114, 144 and 161.
36. The vector of claim 34, wherein the YS40 coding sequence is SEQ ID NO: 33, and the enzyme activity is DNA polymerase.
37. A protein expression system comprising the vector of claim 28 wherein incubating the expression system under permissible conditions produces the recombinant protein encoded by the vector.
38. The protein expression system of claim 37, further comprising a cell wherein the vector is within the cell.
39. An isolated nucleic acid comprising an YS40 nucleotide sequence selected from the group consisting of SEQ ID NO: 1-170, wherein the YS40 nucleotide sequence encodes a YS40 structural protein that does not take a random coil structure at a permissible temperature of 360C, or a YS40 enzyme that displays, at a permissible temperature of 360C, an enzyme activity selected from the group consisting of decarboxylase, nuclease, synthase, recombinase, helicase, dehydrogenase, reductase, nucleotide primase, kinase, protease, nucleotidyltransferase, nucleic acid polymerase, deaminase, acyltransferase, terminase, helicase, glycosyltransferase and peptidase.
40. The nucleic acid of claim 39, further comprising a regulatory element operably linked to the YS40 nucleotide sequence.
41. The nucleic acid of claim 39, wherein the permissible temperature is at least 450C.
42. The nucleic acid of claim 39, wherein the permissible temperature is at least 550C.
43. The nucleic acid of claim 39, wherein the permissible temperature is at least 650C.
44. The nucleic acid of claim 39, wherein the permissible temperature is at least 750C.
45. A recombinant cell comprising the nucleic acid of claim 39.
46. An isolated nucleic acid comprising: a) a vector sequence comprising no more than about 99.9% of the nucleotide sequence of SEQ ID NO: 171 and b) a non-YS40 nucleotide sequence of at least 20 contiguous nucleotides having a 3' end and a 5' end, wherein the non- YS40 nucleotide sequence is inserted into the vector sequence whereby the non-YS40 nucleotide sequence is flanked on the 3' end and the 5 'end by at least 10 contiguous nucleotides of the vector sequence.
47. The nucleic acid of claim 46, further comprising a regulatory element operably linked to the non-Y$40 nucleotide sequence.
48. A recombinant system comprising a cell including the nucleic acid of claim 46.
49. The recombinant system of claim 48, wherein the cell is Thermus thermophilus.
50. A method of amplifying a nucleic acid comprising contacting the nucleic acid with a PCR reagent mixture including a recombinant protein, including conservatively modified variants thereof, comprising a thermophilic amino acid sequence at least 75% identical to an YS40 amino acid sequence encoded by at least 25 contiguous codons of a YS40 coding sequence selected from the group consisting of SEQ ID NO: 2, 4-64, 70-149, 151 and 153-170, wherein the thermophilic amino acid sequence confers to the recombinant protein, at a permissible temperature of 360C, an enzyme activity necessary for DNA amplification.
51. The method of claim 50, wherein the YS40 coding sequence of the thermophilic amino acid sequence is selected from the group consisting of SEQ ID NO: 5, 8, 9, 12-14, 17, 18, 23-27, 29, 33, 38, 41, 42, 52, 57, 59, 60, 62, 71, 79, 114, 144 and 161.
52. The method of claim 50, wherein the YS40 coding sequence of the YS40 amino acid sequence is SEQ ID NO: 33, and the enzyme activity is DNA polymerase.
53. A method of amplifying a nucleic acid from a whole cell, the method comprising: contacting the cell with at least one recombinant protein comprising a thermophilic amino acid sequence, including conservatively modified variants thereof, that is at least 75% identical to a YS40 amino acid sequence encoded by at least 25 contiguous codons of a YS40 coding sequence selected from the group consisting of SEQ ID NO: 1-170, wherein the thermophilic amino acid sequence confers to the thermophilic protein, at a permissible temperature of 360C, an enzyme activity necessary for DNA amplification or DNA entry into the cell.
54. The method of claim 53, wherein DNA entry into the cell comprises lysing the cell.
55. A method for decomposing a biodegradable material comprising contacting the biodegradable material with at least one recombinant protein, including conservatively modified variants thereof, comprising a thermophilic amino acid sequence at least 75% identical to an YS40 amino acid sequence encoded by at least 25 contiguous codons of a YS40 coding sequence selected from the group consisting of SEQ ID NO: 2, 4-64, 70-149, 151 and 153-170, wherein the thermophilic amino acid sequence confers to the recombinant protein, at a permissible temperature of 360C, an enzyme activity necessary for decomposing the biodegradable material selected from the group consisting of protease, amylase, cellulase, nuclease, lipase, deaminase and peptidase.
56. A kit suitable for use in amplifying a nucleic acid, the kit comprising: a) a reagent comprising at least one recombinant protein, including conservatively modified variants thereof, comprising a thermophilic amino acid sequence at least 75% identical to an YS40 amino acid sequence encoded by at least 25 contiguous codons of a YS40 coding sequence selected from the group consisting of SEQ ID NO: 1-170, wherein the thermophilic amino acid sequence confers to the recombinant protein, at a permissible temperature of 360C, an enzyme activity necessary for DNA amplification or DNA entry into the cell; and, b) a buffer solution.
57. The kit of claim 56, further comprising primers suitable for hybridization in a polymerase chain reaction mixture with the nucleic acid being amplified.
PCT/US2006/006593 2005-02-25 2006-02-24 Novel thermophilic proteins and the nucleic acids encoding them WO2006091813A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US65622905P 2005-02-25 2005-02-25
US60/656,229 2005-02-25

Publications (2)

Publication Number Publication Date
WO2006091813A2 true WO2006091813A2 (en) 2006-08-31
WO2006091813A3 WO2006091813A3 (en) 2009-08-27

Family

ID=36928041

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2006/006593 WO2006091813A2 (en) 2005-02-25 2006-02-24 Novel thermophilic proteins and the nucleic acids encoding them

Country Status (1)

Country Link
WO (1) WO2006091813A2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102851246A (en) * 2012-09-14 2013-01-02 大地绿源环保科技(北京)有限公司 Thermus thermophilus UTM802 and application thereof

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6368801B1 (en) * 2000-04-12 2002-04-09 Molecular Staging, Inc. Detection and amplification of RNA using target-mediated ligation of DNA by RNA ligase

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6368801B1 (en) * 2000-04-12 2002-04-09 Molecular Staging, Inc. Detection and amplification of RNA using target-mediated ligation of DNA by RNA ligase

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
NARYSHKINA ET AL. ACCESSION AOMN 12 December 2006, page 19 *
NARYSHKINA ET AL. J MOL BIOL. vol. 364, no. 4, 08 December 2006, pages 667 - 77 *
SAKAKI ET AL. J VIROL. vol. 15, no. 6, June 1975, pages 1449 - 53 *
SAMBROOK ET AL. MOLECULAR CLONING A LABORATORY MANNUAL 1989, N.Y., pages 8.46 - 8.52 AND *
STANDING ET AL. CURR OPIN STRUCT BIOL. vol. 13, no. 5, October 2003, pages 595 - 601 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102851246A (en) * 2012-09-14 2013-01-02 大地绿源环保科技(北京)有限公司 Thermus thermophilus UTM802 and application thereof
CN102851246B (en) * 2012-09-14 2013-08-21 大地绿源环保科技(北京)有限公司 Thermus thermophilus UTM802 and application thereof

Also Published As

Publication number Publication date
WO2006091813A3 (en) 2009-08-27

Similar Documents

Publication Publication Date Title
KR102084186B1 (en) Method of identifying genome-wide off-target sites of base editors by detecting single strand breaks in genomic DNA
US7335471B2 (en) Polypeptides derived from RNA polymerases and use thereof
US7303901B2 (en) Thermostable RNA ligase from thermus phage
US10883091B2 (en) DNA polymerase variant and application thereof
CN109971834B (en) Normal temperature nucleic acid amplification reaction
CN111534493B (en) Purine nucleoside phosphorylase mutant, gene and application
JP7258361B2 (en) Cell-free protein expression using double-stranded concatemer DNA
JP2024028962A (en) Composition and method for orderly and continuous synthesis of complementary DNA (cDNA) from multiple discontinuous templates
US20040058330A1 (en) Methods of use for thermostable RNA ligases
CN109266628B (en) Fused TaqDNA polymerase and application thereof
CN114645033B (en) Nucleoside triphosphate hydrolase and purification method and application thereof
CN108795900B (en) DNA polymerase and preparation method thereof
US20070202508A1 (en) Novel thermophilic proteins and the nucleic acids encoding them
WO2006091813A2 (en) Novel thermophilic proteins and the nucleic acids encoding them
WO2009113718A1 (en) Rna polymerase mutant having improved functions
WO2021093434A1 (en) Modified klenow fragment and application thereof
CN114807084A (en) Mutant Tn5 transposase and kit
CN114829593B (en) Chimeric DNA polymerase and application thereof
WO2001083696A2 (en) Methods for rapid isolation and sequence determination of gene-specific sequences
WO2023098036A1 (en) Taq enzyme mutant, preparation method, and application thereof
WO2024121022A1 (en) Variants of poly(a) polymerase and uses thereof
WO2022082482A1 (en) Recombinant kod polymerase
WO2022210748A1 (en) Novel polypeptide having ability to form complex with guide rna
EP4036237A1 (en) Pwo-neqssb polymerase, method of its preparation, recombinant plasmid, primers and the use of polymerase
JP2006180886A (en) Method for producing dna polymerase

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application
DPE2 Request for preliminary examination filed before expiration of 19th month from priority date (pct application filed from 20040101)
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 06736023

Country of ref document: EP

Kind code of ref document: A2